Forecasting Right-Hand Side Variables
This is a forecasting problem which needs to be completed in R. The Progressive Stock file is attached. The question is:
Using the stock data and at least one associated external regressor predictor, use a technique in Diebold for Forecasting the Right-Hand SideVariables. Discuss.
Solution
DISCUSSION:
Here, 16th October 2015 refers to . Then 17th October 2015 refers to etc.
Observe that, data for many dates are not given/missing.
Here represents Adj. close price on time
We built a model of regressing on . (the fit model in R code). From the output and checking the p-values of the coefficients, we determined that has the most significant variable role in predicting . That’s why we chose as the external regressor.
Here, we are trying to fit a model ……(*)
Where,is , an external regressor variable, and represents trend . Here, assume here.
We do a least square fit for the coefficients in this model (*).(fit1)
However, suppose I want to predict . Then I need to know/forecast too.
Using as Close values in the dataset (the data for the missing dates were interpolated by library zoo in R) , we build an ARIMA model (fit2) and forecast next 100 values for close . Now we use these forecasts as to predict .
Krista_17th Oct.R
x <-PGR
#code to transform date into the variable TIME , taking 10/16/2015 as TIME=0
DF <- data.frame(Date = x$Date)
DF$Date<- as.Date(x$Date, “%m/%d/%Y”)
Diff <- function(x, start) as.numeric(x – as.Date(cut(start, “year”)))
transform(DF, NumDays = Diff(Date, Date), TotalDays = Diff(Date, Date[1]))
y<-transform(DF, NumDays = Diff(Date, Date), TotalDays = Diff(Date, Date[1]))
z=y$TotalDays-288
x$Date=z
#done
#Linear model fit
fit<-lm(x$`Adj Close`~. , data=x)#regress x$adjclose on other varibles as well as TIME
summary(fit) #gives details of the coefficients and p-values
#we see that CLOSE is the most important predictor , consider a so low p-value
fit1 <- lm(x$`Adj Close`~x$Date+x$Close,data=x)#since we have to choose one exogenous predictor , we choose CLOSE
summary(fit1)
#Residual standard error: 0.2217 on 500 degrees of freedom
#Multiple R-squared: 0.9986, Adjusted R-squared: 0.9986 , so the model is a good fit
#we’ll use fit1 for further study
#we want to forecast next 3 months based on the model fitted
#forecasting . To forecast next 100 values of Adjclose , we need to forecast regressor CLOSE. But sata for some dates are missing . We’ll use zoo package to interpolate those missing values and fill them
a <- c(rep(“NA”,729))
for(i in 0:728) for ( j in 1:503) {
if (x$Date[j]==i) a[i+1]=x$Close[j]
} #this a represents the ts of CLOSE with missing values.
TIME=c(0:728)
df<-data.frame(TIME,a)
library(zoo)
ts_close<-na.approx(df)[,2] #data for all dates are filled by interpolatrion
library(forecast)
fit2 <-auto.arima(ts_close) #fit an ARIMA model to forecast CLOSE
CLOSE_futureforecast<-forecast(fit2,100)$mean #this gives forecast of CLOSE values of next 100 days
#forecast of next 100 days for ADJ CLOSE
ADJ_CLOSE_FORECAST <- c(rep(0,100))
for ( i in 1:100) {
ADJ_CLOSE_FORECAST[i]=sum(c(fit1$coefficients[1],fit1$coefficients[2],fit1$coefficients[3])*c(1,728+i,CLOSE_futureforecast[i]))
}#these are forecasted values of Adj close