# Data Science Assignment

**Introduction **

**Data Science Assignment **

For this assignment, you need to create a R Script file that interacts with the dataset accompanying this assignment. The file containing the data can be downloaded from “data set 6.csv”. The data set showing the monthly power consumption of 500 residential houses contains the following variables.

- “Area”: The area of the house in sqm2. “City”: The city at which the house is placed3. “P.Winter”: The average monthly power consumptions of the house in winter in kW.h4. “P.Summer”: The average monthly power consumptions of the house in summer in kW.h

**Part 1 – Data Cleaning and Transformation **

- A) Write a data cleaning function that makes the data set ready for further analysis. This function may perform various data cleaning tasks including but not limited to- Correcting possible typos- Removing irrelevant data (only houses in Auckland and Wellington are considered) – Removing outliers, e.g. negative area, negative power consumptions, very high areas, very high power consumptions Note: You should not clean the data set manually. All the data cleaning tasks should be carried out by the data cleaning function automatically.
- B) Write a function that calculates the annual average power consumption given “P.Winter” and “P.Summer”. (you just need to add “P.Winter” and “P.Summer” and divide the result by two). By using this function, create a new variable named “P.Annual” and add it to the dataset.

**Part 2 – Univariate Analysis**

- A) Write R codes that calculate the mean and standard deviation of the annual, winter and summer power consumption. Show the results in your report by using a table.
- B) Write R codes that plots the density function of the annual, winter and summer power consumption. Use appropriate labels for the plots. Use same scale for the plots. Add the plots to your report.
- C) Write R codes that creates the boxplots for the annual, winter and summer power consumption. Use appropriate labels for the plots. Use same scale for the plots. Add the plots to your report.
- D) Write R codes that divide the data set into two subsets based on the values of “City” variable.
- E) Write R codes that repeat tasks A, B, C for the two subsets.
- F) Compare the results obtained from the above tasks and make comments on the power consumptions of Auckland and Wellington residential houses during winter and summer.

** ****Part 3 – Bivariate Analysis **

- A) Write R codes that create a scatterplot from “P.Annual” and “Area” variables. Use appropriate labels for the plots. Use same scale for the plots. Add the plots to your report.
- B) Write R codes that calculate a linear regression model for “P.Annual” and “Area” variables. Show the linear model in the scatterplot. Calculate the MSE (mean square error) of the .
- C) Write R codes that calculate a second order polynomial regression model for “P.Annual” and “Area” variables. Show the linear model in the scatterplot. Calculate the MSE of the model.
- D) Write R codes that calculate a third order polynomial regression model for “P.Annual” and “Area” variables. Show the linear model in the scatterplot. Calculate the MSE of the model.
- E) Make comments on the three MSE values obtained in the previous tasks. Which regression model has the highest accuracy?
- F) Repeat tasks A-E for “P.Winter” and “P.Summer”.
- G) Repeat task A-F for Auckland and Wellington sub data sets.

**Solution**** **

**Rpro_28.10.2017_Keshav.R**** **

# fOR THIS CODE TO RUN, THE DATA FILE

# “Data Set 6.csv” must be saved in your working drectory of R.

data<- read.csv(“Data Set 6.csv”) #importing dataset

data<- data.frame(data) #making the dataset a drame frame

#Q1A

data_cleaning<- function(data) #function for data cleaning

{

d1 <- data[data$City == “Wellington”, ] #keeping data for Wellington city

d2 <- data[data$City == “Auckland “, ] #keeping data for Auckland city

d <- rbind(d1, d2)

d <- d[d$Area> 0 &d$P.Winter>= 0 &d$P.Summer>=0, ] #removing negative area and negative power consumption values

s.winter<- summary(d$P.Winter)

s.summer<- summary(d$P.Summer)

d <- d[d$P.Winter<= s.winter[5] + (s.winter[5] – s.winter[2]), ] #removing very high winter power consumption value

d <- d[d$P.Summer<= s.summer[5] + (s.summer[5] – s.summer[2]), ] #removing very high summer power consumption value

return(d)

}

d <- data_cleaning(data)

#Q1B

annual<- function(d) #a function that calculates the annual average power consumption

{

P.Annual<- (d$P.Winter + d$P.Summer)/2 #calculating annual power consumption

d <- cbind(d, P.Annual) #adding the new variable “P.Annual to the dataset

return(d)

}

d <- annual(d)

#Q2A

P.Winter<- c(mean(d$P.Winter), sd(d$P.Winter)) #mean and standard deviation of winter power consumption

P.Summer<- c(mean(d$P.Summer), sd(d$P.Summer)) #mean and standard deviation of summer power consumption

P.Annual<- c(mean(d$P.Annual), sd(d$P.Annual)) #mean and standard deviation of annual power consumption

result<- rbind(P.Winter, P.Summer, P.Annual)

print(result)

#Q2B

plot(density(d$P.Annual), xlim = c(500,2500), ylim = c(0,0.0025),

main = “Density function of annual power consumption”,

xlab = “Annual power consumption”, ylab = “Density”)

plot(density(d$P.Winter), xlim = c(500,2500), ylim = c(0,0.0025),

main = “Density function of winter power consumption”,

xlab = “Winter power consumption”, ylab = “Density”)

plot(density(d$P.Summer), xlim = c(500,2500), ylim = c(0,0.0025),

main = “Density function of summer power consumption”,

xlab = “Summer power consumption”, ylab = “Density”)

#Q2C

boxplot(d$P.Annual, main = “Boxplot of annual power consumption”, ylim=c(500,2500))

boxplot(d$P.Winter, main = “Boxplot of winter power consumption”, ylim=c(500,2500))

boxplot(d$P.Summer, main = “Boxplot of summer power consumption”, ylim=c(500,2500))

#Q2D

d1 <- subset(d, d$City == “Wellington”) #subsetting the dataset with city Wellington

d2 <- subset(d, d$City == “Auckland “) #subsetting the dataset with city Auckland

#Q2E

#repeating task A for the city Wellington

P.Winter<- c(mean(d1$P.Winter), sd(d1$P.Winter)) #mean and standard deviation of winter power consumption

P.Summer<- c(mean(d1$P.Summer), sd(d1$P.Summer)) #mean and standard deviation of summer power consumption

P.Annual<- c(mean(d1$P.Annual), sd(d1$P.Annual)) #mean and standard deviation of annual power consumption

result<- rbind(P.Winter, P.Summer, P.Annual)

print(result)

#repeating task B for the city Wellington

plot(density(d1$P.Annual), xlim = c(500,2500), ylim = c(0,0.0025),

main = “Density function of annual power consumption

in Wellington”, xlab = “Annual power consumption”, ylab = “Density”)

plot(density(d1$P.Winter), xlim = c(500,2500), ylim = c(0,0.0025),

main = “Density function of winter power consumption

in Wellington”, xlab = “Winter power consumption”, ylab = “Density”)

plot(density(d1$P.Summer), xlim = c(500,2500), ylim = c(0,0.0025),

main = “Density function of summer power consumption

in Wellington”, xlab = “Summer power consumption”, ylab = “Density”)

##repeating task C for the city Wellington

boxplot(d1$P.Annual, main = “Boxplot of annual power consumption in Wellington”, ylim=c(500,2500))

boxplot(d1$P.Winter, main = “Boxplot of winter power consumption in Wellington”, ylim=c(500,2500))

boxplot(d1$P.Summer, main = “Boxplot of summer power consumption in Wellington”, ylim=c(500,2500))

#repeating task A for the city Auckland

P.Winter<- c(mean(d2$P.Winter), sd(d2$P.Winter)) #mean and standard deviation of winter power consumption

P.Summer<- c(mean(d2$P.Summer), sd(d2$P.Summer)) #mean and standard deviation of summer power consumption

P.Annual<- c(mean(d2$P.Annual), sd(d2$P.Annual)) #mean and standard deviation of annual power consumption

result<- rbind(P.Winter, P.Summer, P.Annual)

print(result)

#repeating task B for the city Auckland

plot(density(d2$P.Annual), xlim = c(500,2500), ylim = c(0,0.0025),

main = “Density function of annual power consumption

in Auckland”, xlab = “Annual power consumption”, ylab = “Density”)

plot(density(d2$P.Winter), xlim = c(500,2500), ylim = c(0,0.0025),

main = “Density function of winter power consumption

in Auckland”, xlab = “Winter power consumption”, ylab = “Density”)

plot(density(d2$P.Summer), xlim = c(500,2500), ylim = c(0,0.0025),

main = “Density function of summer power consumption

in Auckland”, xlab = “Summer power consumption”, ylab = “Density”)

#repeating task C for the city Auckland

boxplot(d2$P.Annual, main = “Boxplot of annual power consumption in Auckland”, ylim=c(500,2500))

boxplot(d2$P.Winter, main = “Boxplot of winter power consumption in Auckland”, ylim=c(500,2500))

boxplot(d2$P.Summer, main = “Boxplot of summer power consumption in Auckland”, ylim=c(500,2500))

#Q3A

#creating the scatterplot:

plot(d$Area, d$P.Annual, main = “Scatterplot for annual power consumption”,col=”green”,

xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Annual power consumption”)

#Q3B

plot(d$Area, d$P.Annual, main = “Linear regression for annual power consumption”,col=”green”,

xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Annual power consumption”)

lfit<- lm(d$P.Annual ~ d$Area) #fitting the linear model

abline(lfit, col=”red”) #plotting the linear model

print(summary(lfit))

mse<- mean(lfit$residuals^2) #calculating MSE

print(mse)

#Q3C

plot(d$Area, d$P.Annual, main = “Second order polynomial regression for

annual power consumption”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Annual power consumption”)

qfit<- lm(formula = d$P.Annual ~ d$Area + I(d$Area ^ 2)) #fitting second order polynomial regression

print(summary(qfit))

mse<- mean(qfit$residuals^2) #calculating MSE

print(mse)

pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]

curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#Q3D

plot(d$Area, d$P.Annual, main = “Third order polynomial regression for

annual power consumption”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Annual power consumption”)

tfit<- lm(formula = d$P.Annual ~ d$Area + I(d$Area ^ 2) + I(d$Area ^ 3)) #fitting third order polynomial regression

print(summary(tfit))

mse<- mean(tfit$residuals^2) #calculating MSE

print(mse)

pol3 <- function(x) (tfit$coefficients[4])*(x^3) + (tfit$coefficients[3])*(x^2) + (tfit$coefficients[2])*x + tfit$coefficients[1]

curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#Q3F

#repeating task A for winter

#creating the scatterplot:

plot(d$Area, d$P.Winter, main = “Scatterplot for winter power consumption”,col=”green”,

xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Winter power consumption”)

#repeating task B for winter

plot(d$Area, d$P.Winter, main = “Linear regression for winter power consumption”,col=”green”,

xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Winter power consumption”)

lfit<- lm(d$P.Winter ~ d$Area) #fitting the linear model

abline(lfit, col=”red”) #plotting the linear model

print(summary(lfit))

mse<- mean(lfit$residuals^2) #calculating MSE

print(mse)

#repeating task C for winter

plot(d$Area, d$P.Winter, main = “Second order polynomial regression for

winter power consumption”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Winter power consumption”)

qfit<- lm(formula = d$P.Winter ~ d$Area + I(d$Area ^ 2)) #fitting second order polynomial regression

print(summary(qfit))

mse<- mean(qfit$residuals^2) #calculating MSE

print(mse)

pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]

curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task D for winter

plot(d$Area, d$P.Winter, main = “Third order polynomial regression for

winter power consumption”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Winter power consumption”)

tfit<- lm(formula = d$P.Winter ~ d$Area + I(d$Area ^ 2) + I(d$Area ^ 3)) #fitting third order polynomial regression

print(summary(tfit))

mse<- mean(tfit$residuals^2) #calculating MSE

print(mse)

pol3 <- function(x) (tfit$coefficients[4])*(x^3) + (tfit$coefficients[3])*(x^2) + (tfit$coefficients[2])*x + tfit$coefficients[1]

curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task A for summer

#creating the scatterplot:

plot(d$Area, d$P.Summer, main = “Scatterplot for summer power consumption”,col=”green”,

xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Summer power consumption”)

#repeating task B for summer

plot(d$Area, d$P.Summer, main = “Linear regression for summer power consumption”,col=”green”,

xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Summer power consumption”)

lfit<- lm(d$P.Summer ~ d$Area) #fitting the linear model

abline(lfit, col=”red”) #plotting the linear model

print(summary(lfit))

mse<- mean(lfit$residuals^2) #calculating MSE

print(mse)

#repeating task C for summer

plot(d$Area, d$P.Summer, main = “Second order polynomial regression for

summer power consumption”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Summer power consumption”)

qfit<- lm(formula = d$P.Summer ~ d$Area + I(d$Area ^ 2)) #fitting second order polynomial regression

print(summary(qfit))

mse<- mean(qfit$residuals^2) #calculating MSE

print(mse)

pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]

curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task D for summer

plot(d$Area, d$P.Summer, main = “Third order polynomial regression for

summer power consumption”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Summer power consumption”)

tfit<- lm(formula = d$P.Summer ~ d$Area + I(d$Area ^ 2) + I(d$Area ^ 3)) #fitting third order polynomial regression

print(summary(tfit))

mse<- mean(tfit$residuals^2) #calculating MSE

print(mse)

pol3 <- function(x) (tfit$coefficients[4])*(x^3) + (tfit$coefficients[3])*(x^2) + (tfit$coefficients[2])*x + tfit$coefficients[1]

curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#Q3G

#repeating task A for Wellington

#creating the scatterplot:

plot(d1$Area, d1$P.Annual, main = “Scatterplot for annual power consumption

in Wellington”,col=”green”, xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Annual power consumption”)

#repeating task B for Wellington

plot(d1$Area, d1$P.Annual, main = “Linear regression for annual power consumption

in Wellington”,col=”green”,xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Annual power consumption”)

lfit<- lm(d1$P.Annual ~ d1$Area) #fitting the linear model

abline(lfit, col=”red”) #plotting the linear model

print(summary(lfit))

mse<- mean(lfit$residuals^2) #calculating MSE

print(mse)

#repeating task C for Wellington

plot(d1$Area, d1$P.Annual, main = “Second order polynomial regression for

annual power consumption in Wellington”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Annual power consumption”)

qfit<- lm(formula = d1$P.Annual ~ d1$Area + I(d1$Area ^ 2)) #fitting second order polynomial regression

print(summary(qfit))

mse<- mean(qfit$residuals^2) #calculating MSE

print(mse)

pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]

curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task D for Wellington

plot(d1$Area, d1$P.Annual, main = “Third order polynomial regression for

annual power consumption in Wellington”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Annual power consumption”)

tfit<- lm(formula = d1$P.Annual ~ d1$Area + I(d1$Area ^ 2) + I(d1$Area ^ 3)) #fitting third order polynomial regression

print(summary(tfit))

mse<- mean(tfit$residuals^2) #calculating MSE

print(mse)

curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task F for Wellington

#repeating task A for winter in Wellington

#creating the scatterplot:

plot(d1$Area, d1$P.Winter, main = “Scatterplot for winter power consumption

in Wellington”,col=”green”, xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Winter power consumption”)

#repeating task B for winter in Wellington

plot(d1$Area, d1$P.Winter, main = “Linear regression for winter power consumption

inWellington”,col=”green”,

xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Winter power consumption”)

lfit<- lm(d1$P.Winter ~ d1$Area) #fitting the linear model

abline(lfit, col=”red”) #plotting the linear model

print(summary(lfit))

mse<- mean(lfit$residuals^2) #calculating MSE

print(mse)

#repeating task C for winter in Wellington

plot(d1$Area, d1$P.Winter, main = “Second order polynomial regression for

winter power consumption in Wellington”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Winter power consumption”)

qfit<- lm(formula = d1$P.Winter ~ d1$Area + I(d1$Area ^ 2)) #fitting second order polynomial regression

print(summary(qfit))

mse<- mean(qfit$residuals^2) #calculating MSE

print(mse)

pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]

curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task D for winter in Wellington

plot(d1$Area, d1$P.Winter, main = “Third order polynomial regression for

winter power consumption in Wellington”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Winter power consumption”)

tfit<- lm(formula = d1$P.Winter ~ d1$Area + I(d1$Area ^ 2) + I(d1$Area ^ 3)) #fitting third order polynomial regression

print(summary(tfit))

mse<- mean(tfit$residuals^2) #calculating MSE

print(mse)

curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task A for summer in Wellington

#creating the scatterplot:

plot(d1$Area, d1$P.Summer, main = “Scatterplot for summer power consumption

inWellington”,col=”green”,

xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Summer power consumption”)

#repeating task B for summer in Wellington

plot(d1$Area, d1$P.Summer, main = “Linear regression for summer power consumption

inWellington”,col=”green”,

xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Summer power consumption”)

lfit<- lm(d1$P.Summer ~ d1$Area) #fitting the linear model

abline(lfit, col=”red”) #plotting the linear model

print(summary(lfit))

mse<- mean(lfit$residuals^2) #calculating MSE

print(mse)

#repeating task C for summer in Wellington

plot(d1$Area, d1$P.Summer, main = “Second order polynomial regression for

summer power consumption in Wellington”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Summer power consumption”)

qfit<- lm(formula = d1$P.Summer ~ d1$Area + I(d1$Area ^ 2)) #fitting second order polynomial regression

print(summary(qfit))

mse<- mean(qfit$residuals^2) #calculating MSE

print(mse)

pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]

curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task D for summer in Wellington

plot(d1$Area, d1$P.Summer, main = “Third order polynomial regression for

summer power consumption in Wellington”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Summer power consumption”)

tfit<- lm(formula = d1$P.Summer ~ d1$Area + I(d1$Area ^ 2) + I(d1$Area ^ 3)) #fitting third order polynomial regression

print(summary(tfit))

mse<- mean(tfit$residuals^2) #calculating MSE

print(mse)

curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task A for Auckland

#creating the scatterplot:

plot(d2$Area, d2$P.Annual, main = “Scatterplot for annual power consumption

in Auckland”,col=”green”, xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Annual power consumption”)

#repeating task B for Auckland

plot(d2$Area, d2$P.Annual, main = “Linear regression for annual power consumption

in Auckland”,col=”green”,xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Annual power consumption”)

lfit<- lm(d2$P.Annual ~ d2$Area) #fitting the linear model

abline(lfit, col=”red”) #plotting the linear model

print(summary(lfit))

mse<- mean(lfit$residuals^2) #calculating MSE

print(mse)

#repeating task C for Auckland

plot(d2$Area, d2$P.Annual, main = “Second order polynomial regression for

annual power consumption in Auckland”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Annual power consumption”)

qfit<- lm(formula = d2$P.Annual ~ d2$Area + I(d2$Area ^ 2)) #fitting second order polynomial regression

print(summary(qfit))

mse<- mean(qfit$residuals^2) #calculating MSE

print(mse)

pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]

curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task D for Auckland

plot(d2$Area, d2$P.Annual, main = “Third order polynomial regression for

annual power consumption in Auckland”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Annual power consumption”)

tfit<- lm(formula = d2$P.Annual ~ d2$Area + I(d2$Area ^ 2) + I(d2$Area ^ 3)) #fitting third order polynomial regression

print(summary(tfit))

mse<- mean(tfit$residuals^2) #calculating MSE

print(mse)

curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task F for Auckland

#repeating task A for winter in Auckland

#creating the scatterplot:

plot(d2$Area, d2$P.Winter, main = “Scatterplot for winter power consumption

in Auckland”,col=”green”, xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Winter power consumption”)

#repeating task B for winter in Auckland

plot(d2$Area, d2$P.Winter, main = “Linear regression for winter power consumption

inAuckland”,col=”green”,

xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Winter power consumption”)

lfit<- lm(d2$P.Winter ~ d2$Area) #fitting the linear model

abline(lfit, col=”red”) #plotting the linear model

print(summary(lfit))

mse<- mean(lfit$residuals^2) #calculating MSE

print(mse)

#repeating task C for winter in Auckland

plot(d2$Area, d2$P.Winter, main = “Second order polynomial regression for

winter power consumption in Auckland”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Winter power consumption”)

qfit<- lm(formula = d2$P.Winter ~ d2$Area + I(d2$Area ^ 2)) #fitting second order polynomial regression

print(summary(qfit))

mse<- mean(qfit$residuals^2) #calculating MSE

print(mse)

pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]

curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task D for winter in Auckland

plot(d2$Area, d2$P.Winter, main = “Third order polynomial regression for

winter power consumption in Auckland”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Winter power consumption”)

tfit<- lm(formula = d2$P.Winter ~ d2$Area + I(d2$Area ^ 2) + I(d2$Area ^ 3)) #fitting third order polynomial regression

print(summary(tfit))

mse<- mean(tfit$residuals^2) #calculating MSE

print(mse)

curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task A for summer in Auckland

#creating the scatterplot:

plot(d2$Area, d2$P.Summer, main = “Scatterplot for summer power consumption

inAuckland”,col=”green”,

xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Summer power consumption”)

#repeating task B for summer in Auckland

plot(d2$Area, d2$P.Summer, main = “Linear regression for summer power consumption

inAuckland”,col=”green”,

xlim = c(40,300), ylim = c(500,2500), xlab = “Area”, ylab = “Summer power consumption”)

lfit<- lm(d2$P.Summer ~ d2$Area) #fitting the linear model

abline(lfit, col=”red”) #plotting the linear model

print(summary(lfit))

mse<- mean(lfit$residuals^2) #calculating MSE

print(mse)

#repeating task C for summer in Auckland

plot(d2$Area, d2$P.Summer, main = “Second order polynomial regression for

summer power consumption in Auckland”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Summer power consumption”)

qfit<- lm(formula = d2$P.Summer ~ d2$Area + I(d2$Area ^ 2)) #fitting second order polynomial regression

print(summary(qfit))

mse<- mean(qfit$residuals^2) #calculating MSE

print(mse)

pol2 <- function(x) (qfit$coefficients[3])*(x^2) + (qfit$coefficients[2])*x + qfit$coefficients[1]

curve(pol2, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot

#repeating task D for summer in Auckland

plot(d2$Area, d2$P.Summer, main = “Third order polynomial regression for

summer power consumption in Auckland”, col=”green”, xlim = c(40,300), ylim = c(500,2500),

xlab = “Area”, ylab = “Summer power consumption”)

tfit<- lm(formula = d2$P.Summer ~ d2$Area + I(d2$Area ^ 2) + I(d2$Area ^ 3)) #fitting third order polynomial regression

print(summary(tfit))

mse<- mean(tfit$residuals^2) #calculating MSE

print(mse)

curve(pol3, from = 40, to = 300, col=”red”, add = TRUE) #plotting the model on the scatterplot