Forecasting Right-Hand Side Variables

Forecasting Right-Hand Side Variables

This is a forecasting problem which needs to be completed in R.  The Progressive Stock file is attached.  The question is:

Using the stock data and at least one associated external regressor predictor, use a technique in Diebold for Forecasting the Right-Hand SideVariables.  Discuss.



Here, 16th October 2015 refers to  . Then 17th October 2015 refers to  etc.

Observe that, data for many dates are not given/missing.

Here  represents Adj. close price on time

We built a model of regressing on  . (the fit model in R code). From the output and checking the p-values of the coefficients, we determined that  has the most significant variable role in predicting . That’s why we chose  as the external regressor.

Here, we are trying to fit a model  ……(*)

Where,is , an external regressor variable, and  represents trend . Here, assume  here.

We do a least square fit for the coefficients  in this model (*).(fit1)

However, suppose I want to predict  . Then I need to know/forecast  too.

Using  as Close values in the dataset (the data for the missing dates were interpolated by library zoo in R) , we build an ARIMA model (fit2) and forecast next 100 values for close . Now we use these forecasts as  to predict  .

Krista_17th Oct.R

x <-PGR

#code to transform date into the variable  TIME , taking 10/16/2015 as TIME=0

DF <- data.frame(Date = x$Date)

DF$Date<- as.Date(x$Date, “%m/%d/%Y”)

Diff <- function(x, start) as.numeric(x – as.Date(cut(start, “year”)))

transform(DF, NumDays = Diff(Date, Date), TotalDays = Diff(Date, Date[1]))

y<-transform(DF, NumDays = Diff(Date, Date), TotalDays = Diff(Date, Date[1]))




#Linear model fit

fit<-lm(x$`Adj Close`~. , data=x)#regress x$adjclose on other varibles as well as TIME

summary(fit) #gives details of the coefficients and p-values

#we see that CLOSE is the most important predictor , consider a so low p-value

fit1 <- lm(x$`Adj Close`~x$Date+x$Close,data=x)#since we have to choose one exogenous predictor , we choose CLOSE


#Residual standard error: 0.2217 on 500 degrees of freedom

#Multiple R-squared:  0.9986, Adjusted R-squared:  0.9986 , so the model is a good fit

#we’ll use fit1 for further study

#we want to forecast next 3 months based on the model fitted

#forecasting . To forecast next 100 values of Adjclose , we need to forecast regressor CLOSE. But sata for some dates are missing . We’ll use zoo package to interpolate those missing values and fill them

a <- c(rep(“NA”,729))

for(i in 0:728) for ( j in 1:503) {

if (x$Date[j]==i) a[i+1]=x$Close[j]

} #this a represents the ts of CLOSE with missing values.




ts_close<-na.approx(df)[,2] #data for all dates are filled by interpolatrion


fit2 <-auto.arima(ts_close) #fit an ARIMA model to forecast CLOSE

CLOSE_futureforecast<-forecast(fit2,100)$mean #this gives forecast of CLOSE values of next 100 days

#forecast of next 100 days for ADJ CLOSE

ADJ_CLOSE_FORECAST <- c(rep(0,100))

for ( i in 1:100) {


}#these are forecasted values of Adj close