Data Science Assignment
Introduction
Data Science Assignment
For this assignment, you need to create a R Script file that interacts with the dataset accompanying this assignment. The file containing the data can be downloaded from “data set 6.csv”. The data set showing the monthly power consumption of 500 residential houses contains the following variables.
- “Area”: The area of the house in sqm2. “City”: The city at which the house is placed3. “P.Winter”: The average monthly power consumptions of the house in winter in kW.h4. “P.Summer”: The average monthly power consumptions of the house in summer in kW.h
Part 1 – Data Cleaning and Transformation
- A) Write a data cleaning function that makes the data set ready for further analysis. This function may perform various data cleaning tasks including but not limited to- Correcting possible typos- Removing irrelevant data (only houses in Auckland and Wellington are considered) – Removing outliers, e.g. negative area, negative power consumptions, very high areas, very high power consumptions Note: You should not clean the data set manually. All the data cleaning tasks should be carried out by the data cleaning function automatically.
- B) Write a function that calculates the annual average power consumption given “P.Winter” and “P.Summer”. (you just need to add “P.Winter” and “P.Summer” and divide the result by two). By using this function, create a new variable named “P.Annual” and add it to the dataset.
Part 2 – Univariate Analysis
- A) Write R codes that calculate the mean and standard deviation of the annual, winter and summer power consumption. Show the results in your report by using a table.
- B) Write R codes that plots the density function of the annual, winter and summer power consumption. Use appropriate labels for the plots. Use same scale for the plots. Add the plots to your report.
- C) Write R codes that creates the boxplots for the annual, winter and summer power consumption. Use appropriate labels for the plots. Use same scale for the plots. Add the plots to your report.
- D) Write R codes that divide the data set into two subsets based on the values of “City” variable.
- E) Write R codes that repeat tasks A, B, C for the two subsets.
- F) Compare the results obtained from the above tasks and make comments on the power consumptions of Auckland and Wellington residential houses during winter and summer.
Part 3 – Bivariate Analysis
- A) Write R codes that create a scatterplot from “P.Annual” and “Area” variables. Use appropriate labels for the plots. Use same scale for the plots. Add the plots to your report.
- B) Write R codes that calculate a linear regression model for “P.Annual” and “Area” variables. Show the linear model in the scatterplot. Calculate the MSE (mean square error) of the .
- C) Write R codes that calculate a second order polynomial regression model for “P.Annual” and “Area” variables. Show the linear model in the scatterplot. Calculate the MSE of the model.
- D) Write R codes that calculate a third order polynomial regression model for “P.Annual” and “Area” variables. Show the linear model in the scatterplot. Calculate the MSE of the model.
- E) Make comments on the three MSE values obtained in the previous tasks. Which regression model has the highest accuracy?
- F) Repeat tasks A-E for “P.Winter” and “P.Summer”.
- G) Repeat task A-F for Auckland and Wellington sub data sets.
Solution
Rpro_28.10.2017_Keshav.R
# fOR THIS CODE TO RUN, THE DATA FILE
# “Data Set 6.csv” must be saved in your working drectory of R.
data<- read.csv(“Data Set 6.csv”) #importing dataset
data<- data.frame(data) #making the dataset a drame frame
#Q1A
data_cleaning<- function(data) #function for data cleaning
{
d1 <- data[data$City == “Wellington”, ] #keeping data for Wellington city
d2 <- data[data$City == “Auckland “, ] #keeping data for Auckland city
d <- rbind(d1, d2)
d <- d[d$Area> 0 &d$P.Winter>= 0 &d$P.Summer>=0, ] #removing negative area and negative power consumption values
s.winter<- summary(d$P.Winter)
s.summer<- summary(d$P.Summer)
d <- d[d$P.Winter<= s.winter[5] + (s.winter[5] – s.winter[2]), ] #removing very high winter power consumption value
d <- d[d$P.Summer<= s.summer[5] + (s.summer[5] – s.summer[2]), ] #removing very high summer power consumption value
return(d)
}
d <- data_cleaning(data)
#Q1B
annual<- function(d) #a function that calculates the annual average power consumption
{
P.Annual<- (d$P.Winter + d$P.Summer)/2 #calculating annual power consumption
d <- cbind(d, P.Annual) #adding the new variable “P.Annual to the dataset
return(d)
}
d <- annual(d)
#Q2A
P.Winter<- c(mean(d$P.Winter), sd(d$P.Winter)) #mean and standard deviation of winter power consumption
P.Summer<- c(mean(d$P.Summer), sd(d$P.Summer)) #mean and standard deviation of summer power consumption
P.Annual<- c(mean(d$P.Annual), sd(d$P.Annual)) #mean and standard deviation of annual power consumption
result<- rbind(P.Winter, P.Summer, P.Annual)
print(result)
#Q2B
plot(density(d$P.Annual), xlim = c(500,2500), ylim = c(0,0.0025),
main = “Density function of annual power consumption”,
xlab = “Annual power consumption”, ylab = “Density”)
plot(density(d$P.Winter), xlim = c(500,2500), ylim = c(0,0.0025),
main = “Density function of winter power consumption”,
xlab = “Winter power consumption”, ylab = “Density”)
plot(density(d$P.Summer), xlim = c(500,2500), ylim = c(0,0.0025),
main = “Density function of summer power consumption”,
xlab = “Summer power consumption”, ylab = “Density”)
#Q2C
boxplot(d$P.Annual, main = “Boxplot of annual power consumption”, ylim=c(500,2500))
boxplot(d$P.Winter, main = “Boxplot of winter power consumption”, ylim=c(500,2500))
boxplot(d$P.Summer, main = “Boxplot of summer power consumption”, ylim=c(500,2500))
#Q2D
d1 <- subset(d, d$City == “Wellington”) #subsetting the dataset with city Wellington
d2 <- subset(d, d$City == “Auckland “) #subsetting the dataset with city Auckland
#Q2E
#repeating task A for the city Wellington
P.Winter<- c(mean(d1$P.Winter), sd(d1$P.Winter)) #mean and standard deviation of winter power consumption
P.Summer<- c(mean(d1$P.Summer), sd(d1$P.Summer)) #mean and standard deviation of summer power consumption
P.Annual<- c(mean(d1$P.Annual), sd(d1$P.Annual)) #mean and standard deviation of annual power consumption
result<- rbind(P.Winter, P.Summer, P.Annual)
print(result)
#repeating task B for the city Wellington
plot(density(d1$P.Annual), xlim = c(500,2500), ylim = c(0,0.0025),
main = “Density function of annual power consumption
in Wellington”, xlab = “Annual power consumption”, ylab = “Density”)
plot(density(d1$P.Winter), xlim = c(500,2500), ylim = c(0,0.0025),
main = “Density function of winter power consumption
in Wellington”, xlab = “Winter power consumption”, ylab = “Density”)
plot(density(d1$P.Summer), xlim = c(500,2500), ylim = c(0,0.0025),
main = “Density function of summer power consumption
in Wellington”, xlab = “Summer power consumption”, ylab = “Density”)
##repeating task C for the city Wellington
boxplot(d1$P.Annual, main = “Boxplot of annual power consumption in Wellington”, ylim=c(500,2500))
boxplot(d1$P.Winter, main = “Boxplot of winter power consumption in Wellington”, ylim=c(500,2500))
boxplot(d1$P.Summer, main = “Boxplot of summer power consumption in Wellington”, ylim=c(500,2500))
#repeating task A for the city Auckland
P.Winter<- c(mean(d2$P.Winter), sd(d2$P.Winter)) #mean and standard deviation of winter power consumption
P.Summer<- c(mean(d2$P.Summer), sd(d2$P.Summer)) #mean and standard deviation of summer power consumption
P.Annual<- c(mean(d2$P.Annual), sd(d2$P.Annual)) #mean and standard deviation of annual power consumption
result<- rbind(P.Winter, P.Summer, P.Annual)
print(result)
#repeating task B for the city Auckland
plot(density(d2$P.Annual), xlim = c(500,2500), ylim = c(0,0.0025),
main = “Density function of annual power consumption
in Auckland”, xlab = “Annual power consumption”, ylab = “Density”)
plot(density(d2$P.Winter), xlim = c(500,2500), ylim = c(0,0.0025),
main = “Density function of winter power consumption
in Auckland”, xlab = “Winter power consumption”, ylab = “Density”)
plot(density(d2$P.Summer), xlim = c(500,2500), ylim = c(0,0.0025),
main = “Density function of summer power consumption
in Auckland”, xlab = “Summer power consumption”, ylab = “Density”)
#repeating task C for the city Auckland
boxplot(d2$P.Annual, main = “Boxplot of annual power consumption in Auckland”, ylim=c(500,2500))
boxplot(d2$P.Winter, main = “Boxplot of winter power consumption in Auckland”, ylim=c(500,2500))
boxplot(d2$P.Summer, main = “Boxplot of summer power consumption in Auckland”, ylim=c(500,2500))
#Q3A
#creating the scatterplot:
plot(d$Area, d$P.Annual, main = “Scatterplot for annual power consumption”,col=”green”,
xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Annual power consumption”)
#Q3B
plot(d$Area, d$P.Annual, main = “Linear regression for annual power consumption”,col=”green”,
xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Annual power consumption”)
lfit<- lm(d$P.Annual ~ d$Area) #fitting the linear model
abline(lfit, col=”red”) #plotting the linear model
print(summary(lfit))
mse<- mean(lfit$residuals^2) #calculating MSE
print(mse)
#Q3C
plot(d$Area, d$P.Annual, main = “Second order polynomial regression for
annual power consumption”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Annual power consumption”)
qfit<- lm(formula = d$P.Annual ~ d$Area + I(d$Area ^ 2)) #fitting second order polynomial regression
print(summary(qfit))
mse<- mean(qfit$residuals^2) #calculating MSE
print(mse)
pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]
curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#Q3D
plot(d$Area, d$P.Annual, main = “Third order polynomial regression for
annual power consumption”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Annual power consumption”)
tfit<- lm(formula = d$P.Annual ~ d$Area + I(d$Area ^ 2) + I(d$Area ^ 3)) #fitting third order polynomial regression
print(summary(tfit))
mse<- mean(tfit$residuals^2) #calculating MSE
print(mse)
pol3 <- function(x) (tfit$coefficients[4])*(x^3) + (tfit$coefficients[3])*(x^2) + (tfit$coefficients[2])*x + tfit$coefficients[1]
curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#Q3F
#repeating task A for winter
#creating the scatterplot:
plot(d$Area, d$P.Winter, main = “Scatterplot for winter power consumption”,col=”green”,
xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Winter power consumption”)
#repeating task B for winter
plot(d$Area, d$P.Winter, main = “Linear regression for winter power consumption”,col=”green”,
xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Winter power consumption”)
lfit<- lm(d$P.Winter ~ d$Area) #fitting the linear model
abline(lfit, col=”red”) #plotting the linear model
print(summary(lfit))
mse<- mean(lfit$residuals^2) #calculating MSE
print(mse)
#repeating task C for winter
plot(d$Area, d$P.Winter, main = “Second order polynomial regression for
winter power consumption”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Winter power consumption”)
qfit<- lm(formula = d$P.Winter ~ d$Area + I(d$Area ^ 2)) #fitting second order polynomial regression
print(summary(qfit))
mse<- mean(qfit$residuals^2) #calculating MSE
print(mse)
pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]
curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task D for winter
plot(d$Area, d$P.Winter, main = “Third order polynomial regression for
winter power consumption”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Winter power consumption”)
tfit<- lm(formula = d$P.Winter ~ d$Area + I(d$Area ^ 2) + I(d$Area ^ 3)) #fitting third order polynomial regression
print(summary(tfit))
mse<- mean(tfit$residuals^2) #calculating MSE
print(mse)
pol3 <- function(x) (tfit$coefficients[4])*(x^3) + (tfit$coefficients[3])*(x^2) + (tfit$coefficients[2])*x + tfit$coefficients[1]
curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task A for summer
#creating the scatterplot:
plot(d$Area, d$P.Summer, main = “Scatterplot for summer power consumption”,col=”green”,
xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Summer power consumption”)
#repeating task B for summer
plot(d$Area, d$P.Summer, main = “Linear regression for summer power consumption”,col=”green”,
xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Summer power consumption”)
lfit<- lm(d$P.Summer ~ d$Area) #fitting the linear model
abline(lfit, col=”red”) #plotting the linear model
print(summary(lfit))
mse<- mean(lfit$residuals^2) #calculating MSE
print(mse)
#repeating task C for summer
plot(d$Area, d$P.Summer, main = “Second order polynomial regression for
summer power consumption”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Summer power consumption”)
qfit<- lm(formula = d$P.Summer ~ d$Area + I(d$Area ^ 2)) #fitting second order polynomial regression
print(summary(qfit))
mse<- mean(qfit$residuals^2) #calculating MSE
print(mse)
pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]
curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task D for summer
plot(d$Area, d$P.Summer, main = “Third order polynomial regression for
summer power consumption”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Summer power consumption”)
tfit<- lm(formula = d$P.Summer ~ d$Area + I(d$Area ^ 2) + I(d$Area ^ 3)) #fitting third order polynomial regression
print(summary(tfit))
mse<- mean(tfit$residuals^2) #calculating MSE
print(mse)
pol3 <- function(x) (tfit$coefficients[4])*(x^3) + (tfit$coefficients[3])*(x^2) + (tfit$coefficients[2])*x + tfit$coefficients[1]
curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#Q3G
#repeating task A for Wellington
#creating the scatterplot:
plot(d1$Area, d1$P.Annual, main = “Scatterplot for annual power consumption
in Wellington”,col=”green”, xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Annual power consumption”)
#repeating task B for Wellington
plot(d1$Area, d1$P.Annual, main = “Linear regression for annual power consumption
in Wellington”,col=”green”,xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Annual power consumption”)
lfit<- lm(d1$P.Annual ~ d1$Area) #fitting the linear model
abline(lfit, col=”red”) #plotting the linear model
print(summary(lfit))
mse<- mean(lfit$residuals^2) #calculating MSE
print(mse)
#repeating task C for Wellington
plot(d1$Area, d1$P.Annual, main = “Second order polynomial regression for
annual power consumption in Wellington”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Annual power consumption”)
qfit<- lm(formula = d1$P.Annual ~ d1$Area + I(d1$Area ^ 2)) #fitting second order polynomial regression
print(summary(qfit))
mse<- mean(qfit$residuals^2) #calculating MSE
print(mse)
pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]
curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task D for Wellington
plot(d1$Area, d1$P.Annual, main = “Third order polynomial regression for
annual power consumption in Wellington”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Annual power consumption”)
tfit<- lm(formula = d1$P.Annual ~ d1$Area + I(d1$Area ^ 2) + I(d1$Area ^ 3)) #fitting third order polynomial regression
print(summary(tfit))
mse<- mean(tfit$residuals^2) #calculating MSE
print(mse)
pol3 <- function(x) (tfit$coefficients[4])*(x^3) + (tfit$coefficients[3])*(x^2) + (tfit$coefficients[2])*x + tfit$coefficients[1]
curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task F for Wellington
#repeating task A for winter in Wellington
#creating the scatterplot:
plot(d1$Area, d1$P.Winter, main = “Scatterplot for winter power consumption
in Wellington”,col=”green”, xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Winter power consumption”)
#repeating task B for winter in Wellington
plot(d1$Area, d1$P.Winter, main = “Linear regression for winter power consumption
inWellington”,col=”green”,
xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Winter power consumption”)
lfit<- lm(d1$P.Winter ~ d1$Area) #fitting the linear model
abline(lfit, col=”red”) #plotting the linear model
print(summary(lfit))
mse<- mean(lfit$residuals^2) #calculating MSE
print(mse)
#repeating task C for winter in Wellington
plot(d1$Area, d1$P.Winter, main = “Second order polynomial regression for
winter power consumption in Wellington”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Winter power consumption”)
qfit<- lm(formula = d1$P.Winter ~ d1$Area + I(d1$Area ^ 2)) #fitting second order polynomial regression
print(summary(qfit))
mse<- mean(qfit$residuals^2) #calculating MSE
print(mse)
pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]
curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task D for winter in Wellington
plot(d1$Area, d1$P.Winter, main = “Third order polynomial regression for
winter power consumption in Wellington”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Winter power consumption”)
tfit<- lm(formula = d1$P.Winter ~ d1$Area + I(d1$Area ^ 2) + I(d1$Area ^ 3)) #fitting third order polynomial regression
print(summary(tfit))
mse<- mean(tfit$residuals^2) #calculating MSE
print(mse)
pol3 <- function(x) (tfit$coefficients[4])*(x^3) + (tfit$coefficients[3])*(x^2) + (tfit$coefficients[2])*x + tfit$coefficients[1]
curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task A for summer in Wellington
#creating the scatterplot:
plot(d1$Area, d1$P.Summer, main = “Scatterplot for summer power consumption
inWellington”,col=”green”,
xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Summer power consumption”)
#repeating task B for summer in Wellington
plot(d1$Area, d1$P.Summer, main = “Linear regression for summer power consumption
inWellington”,col=”green”,
xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Summer power consumption”)
lfit<- lm(d1$P.Summer ~ d1$Area) #fitting the linear model
abline(lfit, col=”red”) #plotting the linear model
print(summary(lfit))
mse<- mean(lfit$residuals^2) #calculating MSE
print(mse)
#repeating task C for summer in Wellington
plot(d1$Area, d1$P.Summer, main = “Second order polynomial regression for
summer power consumption in Wellington”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Summer power consumption”)
qfit<- lm(formula = d1$P.Summer ~ d1$Area + I(d1$Area ^ 2)) #fitting second order polynomial regression
print(summary(qfit))
mse<- mean(qfit$residuals^2) #calculating MSE
print(mse)
pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]
curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task D for summer in Wellington
plot(d1$Area, d1$P.Summer, main = “Third order polynomial regression for
summer power consumption in Wellington”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Summer power consumption”)
tfit<- lm(formula = d1$P.Summer ~ d1$Area + I(d1$Area ^ 2) + I(d1$Area ^ 3)) #fitting third order polynomial regression
print(summary(tfit))
mse<- mean(tfit$residuals^2) #calculating MSE
print(mse)
pol3 <- function(x) (tfit$coefficients[4])*(x^3) + (tfit$coefficients[3])*(x^2) + (tfit$coefficients[2])*x + tfit$coefficients[1]
curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task A for Auckland
#creating the scatterplot:
plot(d2$Area, d2$P.Annual, main = “Scatterplot for annual power consumption
in Auckland”,col=”green”, xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Annual power consumption”)
#repeating task B for Auckland
plot(d2$Area, d2$P.Annual, main = “Linear regression for annual power consumption
in Auckland”,col=”green”,xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Annual power consumption”)
lfit<- lm(d2$P.Annual ~ d2$Area) #fitting the linear model
abline(lfit, col=”red”) #plotting the linear model
print(summary(lfit))
mse<- mean(lfit$residuals^2) #calculating MSE
print(mse)
#repeating task C for Auckland
plot(d2$Area, d2$P.Annual, main = “Second order polynomial regression for
annual power consumption in Auckland”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Annual power consumption”)
qfit<- lm(formula = d2$P.Annual ~ d2$Area + I(d2$Area ^ 2)) #fitting second order polynomial regression
print(summary(qfit))
mse<- mean(qfit$residuals^2) #calculating MSE
print(mse)
pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]
curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task D for Auckland
plot(d2$Area, d2$P.Annual, main = “Third order polynomial regression for
annual power consumption in Auckland”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Annual power consumption”)
tfit<- lm(formula = d2$P.Annual ~ d2$Area + I(d2$Area ^ 2) + I(d2$Area ^ 3)) #fitting third order polynomial regression
print(summary(tfit))
mse<- mean(tfit$residuals^2) #calculating MSE
print(mse)
pol3 <- function(x) (tfit$coefficients[4])*(x^3) + (tfit$coefficients[3])*(x^2) + (tfit$coefficients[2])*x + tfit$coefficients[1]
curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task F for Auckland
#repeating task A for winter in Auckland
#creating the scatterplot:
plot(d2$Area, d2$P.Winter, main = “Scatterplot for winter power consumption
in Auckland”,col=”green”, xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Winter power consumption”)
#repeating task B for winter in Auckland
plot(d2$Area, d2$P.Winter, main = “Linear regression for winter power consumption
inAuckland”,col=”green”,
xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Winter power consumption”)
lfit<- lm(d2$P.Winter ~ d2$Area) #fitting the linear model
abline(lfit, col=”red”) #plotting the linear model
print(summary(lfit))
mse<- mean(lfit$residuals^2) #calculating MSE
print(mse)
#repeating task C for winter in Auckland
plot(d2$Area, d2$P.Winter, main = “Second order polynomial regression for
winter power consumption in Auckland”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Winter power consumption”)
qfit<- lm(formula = d2$P.Winter ~ d2$Area + I(d2$Area ^ 2)) #fitting second order polynomial regression
print(summary(qfit))
mse<- mean(qfit$residuals^2) #calculating MSE
print(mse)
pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]
curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task D for winter in Auckland
plot(d2$Area, d2$P.Winter, main = “Third order polynomial regression for
winter power consumption in Auckland”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Winter power consumption”)
tfit<- lm(formula = d2$P.Winter ~ d2$Area + I(d2$Area ^ 2) + I(d2$Area ^ 3)) #fitting third order polynomial regression
print(summary(tfit))
mse<- mean(tfit$residuals^2) #calculating MSE
print(mse)
pol3 <- function(x) (tfit$coefficients[4])*(x^3) + (tfit$coefficients[3])*(x^2) + (tfit$coefficients[2])*x + tfit$coefficients[1]
curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task A for summer in Auckland
#creating the scatterplot:
plot(d2$Area, d2$P.Summer, main = “Scatterplot for summer power consumption
inAuckland”,col=”green”,
xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Summer power consumption”)
#repeating task B for summer in Auckland
plot(d2$Area, d2$P.Summer, main = “Linear regression for summer power consumption
inAuckland”,col=”green”,
xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Summer power consumption”)
lfit<- lm(d2$P.Summer ~ d2$Area) #fitting the linear model
abline(lfit, col=”red”) #plotting the linear model
print(summary(lfit))
mse<- mean(lfit$residuals^2) #calculating MSE
print(mse)
#repeating task C for summer in Auckland
plot(d2$Area, d2$P.Summer, main = “Second order polynomial regression for
summer power consumption in Auckland”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Summer power consumption”)
qfit<- lm(formula = d2$P.Summer ~ d2$Area + I(d2$Area ^ 2)) #fitting second order polynomial regression
print(summary(qfit))
mse<- mean(qfit$residuals^2) #calculating MSE
print(mse)
pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]
curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot
#repeating task D for summer in Auckland
plot(d2$Area, d2$P.Summer, main = “Third order polynomial regression for
summer power consumption in Auckland”, col=”green”, xlim = c(40,300), ylim = c(500,2500),
xlab = “Area”, ylab = “Summer power consumption”)
tfit<- lm(formula = d2$P.Summer ~ d2$Area + I(d2$Area ^ 2) + I(d2$Area ^ 3)) #fitting third order polynomial regression
print(summary(tfit))
mse<- mean(tfit$residuals^2) #calculating MSE
print(mse)
pol3 <- function(x) (tfit$coefficients[4])*(x^3) + (tfit$coefficients[3])*(x^2) + (tfit$coefficients[2])*x + tfit$coefficients[1]
curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot