Forecasting Instruction Sheet

Forecasting Instruction Sheet

In this assignment, you have two tasks. You will work independently on this assignment.

Task 1:

In the first task, you will analyse and forecast the amount of horizontal solar radiation reaching the ground at a particular location over the globe.  For this aim, you will work on the monthly average horizontal solar radiation and the monthly precipitation series measured at the same points between January 1960 and December 2014. Both series are given in “data1.csv” file under the Bb shell via “Course webpage à Assignments à Assignment 2.”

Your task is to give best 2 years ahead forecasts in terms of MASE for the solar radiation series by using the time series regression methods (distributed lag models (dLagM package)), dynamic linear models (dynlm package), and exponential smoothing and corresponding state space models covered in the Modules 3 – 7 of our course in this semester. While working with exponential smoothing and state space model, you will only use solar radiation series. Hint: Use MASE() function from the 0.0.8 version of dLagM package to compute MASE for time series regression methods for model comparisons.

For the solar radiation forecasts, the required precipitation measurements (predictor series) for the months from January 2015 to December 2016 at the exact same locations are given in “data.x.csv”file under the Bb shell via “Course webpage à Assignments à Assignment 2.” You will use this data for the calculation of 2 years ahead forecasts.

Task 2:

In the second task, you will analyse the correlation between quarterly Residential Property Price Index (PPI) in Melbourne and quarterly population change over previous quarter in Victoria between September 2003 and December 2016. The quarterly PPI and population change series are available in “data2.csv”file under the Bb shell via “Course webpage à Assignments à Assignment 2.”

In this task, main goal of your analysis is to demonstrate that whether the correlation between these two series is spurious or not.

In both tasks, it is expected that you ask yourself what are the elements of a suitable and successful data analysis, and how you might go about presenting your results in a written report format. Please review the contents of the relevant modules and apply suitable approaches here. The rubric given below will guide you through my expectations in terms of reporting, R codes, descriptive analysis, modelling, and diagnostic checking.

Collaboration vs. Collusion and Plagiarism:

You are free to discuss main aspects of the assignment with your classmates. However, keep in mind that this is an individual assignment and you should demonstrate your own effort and understanding. Because assignments will be submitted through Turnitin, all the material you submitted will be checked for plagiarism. 

Solution 

Task 1:

First of all we need to visualize the time series of Solar value between 1960 January and 2014 December. So We need to create a time series object using “ts” library for R and import the data in the csv file data1.csv .

Solar<- ts(data1$solar, start=c(1960, 1), end=c(2014, 12), frequency=12)

plot(Solar)

The time series regression methods:

Now let Us decompose the time series in sub components : seasonal , trend components :

plot(stl(Solar,s.window=”periodic”))

Let Us use dLagM LIBRARY : First we will use finiteDLMauto() function to get the model that have the smale MASE value finiteDLMauto(x = data1$ppt,y=data1$solar,q.max = 21, k.order = 3, model.type = “poly”,error.type = “MASE”, trace = TRUE)

So the optimal model is the model with q=13 and k=3.Let us create this model using model = polyDlm(x = data1$ppt , y = data1$solar ,q = 13 , k = 3 , show.beta = TRUE ,  show.summary = FALSE)

Now to make forecast we will use this function :dlmForecast

x <- auto.arima(ts(data1$ppt))

x=forecast(x,h=24)

plot(polyDlmForecast(model , x[4]$mean[1], h = 24))

results :

Dynamic linear models:

Data2 <- dynlm(Solar ~ trend(Solar) + season(Solar))

summary(Data2)

SL<-forecast((Solar), 24)plot(SL)

Exponential smoothing

# simple exponential – models level

fit <- HoltWinters(Solar, beta=FALSE, gamma=FALSE)

# predictive accuracy

library(forecast)

accuracy(fit)

# double exponential – models level and trend

fit <- HoltWinters(Solar, gamma=FALSE)

# predictive accuracy

library(forecast)

accuracy(fit)

# predict next 24 future values

library(forecast)

forecast(fit, 24)

plot(forecast(fit, 24))

# triple exponential – models level, trend, and seasonal components

fit <- HoltWinters(Solar)

# predictive accuracy

library(forecast)

accuracy(fit)

# predict next 24 future values

library(forecast)

forecast(fit, 24)

plot(forecast(fit, 24))

Task2:

We need to calculate and analyse the correlation between quarterly Residential Property Price Index (PPI) in Melbourne and quarterly population change over previous quarter in Victoria between September 2003 and December 2016

corelation<-data.frame(data2$change, data2$price)result<- cor(corelation)corrplot(result, type=”upper”, order=”hclust”, tl.col=”black”, tl.srt=45)  [1] 0.6970439

It shows a strong correlation for lags -10 to 5 with tapering in both directions. If I interpret it correctly, I suppose I could say that it takes 5 to 10 quarters for PRICE  to react to any change in change.

While this outcome is greatly fascinating, I feel it doesn’t portray well every one of the elements between the two bends, for example, the great relationship restricted to quarter 6 and 12. In any case, as observed over, the test that could do it doesn’t demonstrate any factual hugeness. So I am somewhat astounded with respect to what can be the most fitting technique to portray these information.

Solar <- ts(data1$solar, start=c(1960, 1), end=c(2014, 12), frequency=12)

plot(Solar)

plot(stl(Solar,s.window=”periodic”))

finiteDLMauto(x = data1$ppt,y=data1$solar,q.max = 21, k.order = 3, model.type = “poly”,error.type = “MASE”, trace = TRUE)

model = polyDlm(x = data1$ppt , y = data1$solar ,q = 13 , k = 3 , show.beta = TRUE ,  show.summary = FALSE)

x <- auto.arima(ts(data1$ppt))

x=forecast(x,h=24)

plot(polyDlmForecast(model , x[4]$mean[1], h = 24))

Data2 <- dynlm(Solar ~ trend(Solar) + season(Solar))

summary(Data2)

SL<-forecast((Solar), 24)

plot(SL)

# simple exponential – models level

fit <- HoltWinters(Solar, beta=FALSE, gamma=FALSE)

# predictive accuracy

library(forecast)

accuracy(fit)

# double exponential – models level and trend

fit <- HoltWinters(Solar, gamma=FALSE)

# predictive accuracy

library(forecast)

accuracy(fit)

# predict next 24 future values

library(forecast)

forecast(fit, 24)

plot(forecast(fit, 24))

# triple exponential – models level, trend, and seasonal components

fit <- HoltWinters(Solar)

# predictive accuracy

library(forecast)

accuracy(fit)

# predict next 24 future values

library(forecast)

forecast(fit, 24)

plot(forecast(fit, 24))

corelation <-data.frame(data2$change, data2$price)

result <- cor(corelation)

corrplot(result, type=”upper”, order=”hclust”, tl.col=”black”, tl.srt=45)