CSSS/POLS 512: Lab 4

Stationary Time Series: Model Estimation and Assessment

Tao Lin

April 22, 2022

Agenda

  • Estimation and interpretation of ARMA models
  • Cross-validation and model selection
  • Counterfactual forecasting

Model Estimation and Interpretation

Estimation and Interpretation of ARMA Models

Recall that we use maximum likelihood approach to estimate the parameters of an ARMA model (although we also discussed using least squares).

This should be familiar:

  1. Express the joint probability of the data using the chosen probability distribution (i.e. the likelihood of data given parameters)

  2. Take a logarithm and transform the product of probabilities to the sum of log-probabilities (because \(\sum\) is easier for optimization than \(\prod\))

  3. Substitute the linear predictor \(\mu = X\beta\) (sometimes we call it “systematic component”)

Estimation and Interpretation

For an AR(1) process, we are left with the log likelihood function:

\[\mathcal{L}(\boldsymbol{\beta}, \phi_1 | \boldsymbol{y}, \boldsymbol{X}) = -\frac{1}{2}\text{log}\Bigg(\frac{\sigma^2}{1-\phi^2_1}\Bigg)-\frac{\Bigg(y_1-\frac{\boldsymbol{x_{1}\beta}}{1-\phi_1}\Bigg)^2}{\frac{2\sigma^2}{1-\phi^2_1}}-\frac{T-1}{2}\text{log}\sigma^2-\sum^T_{T=2}\frac{(y_t-\boldsymbol{x_t\beta}-\phi_1y_{t-1})^2}{2\sigma^2}\]

We can estimate the parameters, \(\boldsymbol{\beta}\) and \(\phi_1\) that make the data most likely using numerical methods.

The arima() function in R will compute these for us.

Estimation and Interpretation

# Load data
# Number of deaths and serious injuries in UK road accidents each month.
# Jan 1969 - Dec 1984. Seatbelt law introduced in Feb 1983
# (indicator in second column). Source: Harvey, 1989, p.519ff.
# http://www.staff.city.ac.uk/~sc397/courses/3ts/datasets.html
#
# Variable names:  death law

ukdata <- read.csv("Lab4_data/ukdeaths.csv", header = TRUE)
attach(ukdata)
colnames(ukdata)
[1] "death" "law"   "year"  "mt"    "month"
## Model 1a:  AR(1) model of death as function of law
##

## Estimate an AR(1) using arima
xcovariates <- law # we want to use seatbelt law to explain the change in death number
arima.res1a <- arima(
  death, # dependent variable
  order = c(1, 0, 0), # AR(1) term
  xreg = xcovariates, # feed covariates - it should be a matrix in which each column is a time series of a covariate
  include.mean = TRUE # include intercept
)

Estimation and Interpretation

print(arima.res1a) # How do we interpret these parameter estimates?

Call:
arima(x = death, order = c(1, 0, 0), xreg = xcovariates, include.mean = TRUE)

Coefficients:
         ar1  intercept  xcovariates
      0.6439   1719.193    -377.4542
s.e.  0.0553     42.078     107.6520

sigma^2 estimated as 39289:  log likelihood = -1288.26,  aic = 2584.52
  • The numbers in the 1st row are coefficients, and the numbers in the 2nd row are standard erros of coefficients.
    • The table does not show stars, but a convenient way to determine statistical significance (0.05 level) is to see whether coef/se > 2.
  • Introducing the seatbelt law is associated on average with a 378 decrease in the number of road accidents in the next period, all else constant.
  • We know in an AR(1) process the effect accumulates over time and will converge to a fixed mean.\[\mathbb{E}(y_t) = \frac{x_t \boldsymbol{\beta}}{1-\phi_1}\]
    • For the above AR(1) model, the coefficient for AR(1) term is about 0.64, so we have \[\mathbb{E}(y_t) = \frac{x_t \boldsymbol{\beta}}{1-\phi_1} = \frac{1719.193}{1-0.644}=4828\]
    • In general, we will forecast these expected values and also predicted values using simulation.

Model Selection

How can we know the above AR(1) model is better than other candidate models?

  • We often assess and select models in three general ways:
    • In-sample fit
    • Out-of-sample fit: examine model performance on a new dataset
    • Split sample into training and test sets and perform cross-validation
  • What are common goodness-of-fit metrics?
    • log likelihood: the optimized value of loss function
    • Akaike Information Criterion (AIC) / Bayesian Information Criterion (BIC)
    • mean squared error (MSE), i.e. the expected value of squared residuals
    • root mean squared error (RMSE)
    • mean absolute error (MAE)
  • Suppose that we don’t know whether AR(1) model is better able to capture the underlying process in the observed time series than AR(2) model. We can compare them based on their goodness-of-fit metrics.

What Performance Metrics Should We Choose?

  • There are a lot of GOF metrics, e.g. MSE, RMSE, MAE, etc. Do we have particular reasons to choose certain metrics?
  • difference between MAE and MSE/RMSE
    • Since the errors are squared before they are averaged, the RMSE gives a relatively high weight to large errors. It indicates that MSE/RMSE is more sensitive to outliers.
    • A forecast method that minimizes the MAE will lead to forecasts of the median1, while minimizing the RMSE will lead to forecasts of the mean.
  • However, both MAE and MSE (not RMSE!) are scale-dependent.
    • we cannot compare model performance on different time series based on MAE and MSE, because different time series have different scales and units.
    • some scholars recommend using scale-invariant metrics such as mean absolute percentage error (MAPE) to compare model performance on different time series.2

In-sample Fit

# Extract estimation results from the list object `arima.res1a`
pe.1a <- arima.res1a$coef                    # parameter estimates (betas)
se.1a <- sqrt(diag(arima.res1a$var.coef))    # standard errors of coefficients
ll.1a <- arima.res1a$loglik                  # log likelihood at its maximum
sigma2hat.1a <- arima.res1a$sigma2           # standard error of the regression
aic.1a <- arima.res1a$aic                    # Akaike Information Criterion
resid.1a <- arima.res1a$resid                # residuals

# We can manually calculate AIC.
# Recall that the AIC is equal to the deviance (-2*log likelihood at its maximum)
# of the model plus 2 * the dimension of the model 
# (number of free parameters of the model)
-2*ll.1a + 2*length(pe.1a)
[1] 2582.52
# And MSE is just the expected value of the squared residuals
mean(resid.1a^2)
[1] 39289.43
# RMSE
sqrt(mean(resid.1a^2))
[1] 198.2156
# MAE
mean(abs(resid.1a))
[1] 156.7996

Cross Validation

  • We assume that the relationship between subsample and sample is very close to the relationship between sample and population. If this analogy applies, we should be able to split the data, and accurately predict one half the data using a model fit on the other half.
  • Using CV to calculate any GOF metrics yields more reliable results than in-sample versions of the test.
  • Types of cross-validation
    • Hold out cross-validation: split the sample to training set and test set. We train our model with the training dataset and we validate our model with the test dataset.
    • \(k\)-fold cross-validation: repeat for many different splits - this approach is fast but biased and less robust
    • Leave-one-out cross-validation (LOOCV): just leave out one case at each iteration (an extreme version of \(K\)-fold CV) - this approach is slow (esp. in a large dataset!) but unbiased and more robust

Hold Out Cross-Validation

\(K\)-fold Cross-Validation

Cross Validation for Time Series

  • Conventional CV techniques assume data are i.i.d.. However, in time series, because our observations have temporal dependence, we cannot shuffle data randomly, otherwise we will predict past from future, which doesn’t make any sense. Rather, we cross-validate our model by “rolling forecast window.”
    • The length of each test set is fixed. The test set is rolling forward at each iteration.
    • But we can have two variants of training set:
      • expanding window: all observations prior to test set constitute a training set
      • rolling/sliding window: the length of training set is fixed. At each iteration we forget older observations.
        • This idea is typically adopted when past data becomes deprecated, which is common in non-stationary environments.3
    • The forecast performance is computed by averaging over the test sets.
  • In R, we can use tsCV() from forecast package to perform time series CV.4
    • you can also use stretch_tsibble() in fable package to generate data folds.5
    • A tidyverse style workflow is presented in https://otexts.com/fpp3/tscv.html.
    • But for our homework, we need to perform time series CV from scratch. This is a way to check whether we really understand this concept.

How to perform time series CV from scratch? (sliding window scheme)

  • Hyper-parameters:
    • minimum length of training set \(K\)
    • number of forward periods \(P\) (\(P \geq 1\))
    • sampling scheme of training set: expanding window or sliding window?
  • What do we know?
    • time series object
      • starting point of time series: \(t\)
      • The total length of time series \(T\)
    • Model Setup: e.g. the order of AR process, etc.
  • What do we want to know? (😈the trickiest part): training set and test set at each iteration
    • number of iterations: \(T - K\)
    • training set:
      • starting point: \(t + (i - 1)\)
      • end point: \(t + (i - 1) + (K - 1)\)
    • test set:
      • starting point: \(t + K + (i - 1)\)
      • end point: \(\min\{t + K + (i - 1) + (P - 1), \,\,\, T\}\)

How to perform time series CV from scratch?

# ARIMA Cross-validation by rolling windows
# Adapted from Rob J Hyndman's code:
# http://robjhyndman.com/hyndsight/tscvexample/
#
# Could use further generalization, e.g. to seasonality
# Careful!  This can produce singularities using categorical covariates
arimaCV <- function(
    x, # time series object
    order, seasonal, xreg, include.mean, # options for ARMA model
    forward=NULL, minper=NULL # hyper-parameters for rolling-window cross-validation
  ) {
  require(forecast) # use package forecast
  if (!any(class(x)=="ts")) x <- ts(x) # automatically change inputs to ts object
  n <- length(x) # the length of time series
  mae <- matrix(NA, nrow=n-minper, ncol=forward) # pre-allocate a [(T-K) x P] matrix for performance metrics, here we use mean standard error.
  st <- tsp(x)[1]+(minper-2) # Here we calculate the "starting point" of test set minus 2 (i.e. the end point of training set - 1). If we use the setup from previous slides, we can find that st = t + K - 2. `tsp()` will extract 3 attributes of ts object: start time, end time, and frequency.
  for(i in 1:(n-minper)) {
    xshort <- window(x, start=st+(i-minper+1), end=st+i)
    xnext <- window(x, start=st+(i+1), end=min(n, st+(i+forward)))
    xregshort <- window(xreg, start=st+(i-minper+1), end=st+i)
    xregnext <- window(xreg, start=st+(i+1), end=min(n, st+(i+forward)))
    fit <- Arima(xshort, order=order, seasonal=seasonal, xreg=xregshort, include.mean=include.mean)
    fcast <- forecast(fit, h=length(xnext), xreg=xregnext)
    mae[i,1:length(xnext)] <- abs(fcast[['mean']]-xnext)
  } # calculating MAEs from the first period to kth period forward using all window lengths
  # e.g. 1~170, 2~171, 3~172, 4~173 , ..... per all kth period forward
  colMeans(mae, na.rm=TRUE) # averaging the MAE per each period forward
}
 [1] 119.6679 136.2173 175.0493 182.9675 185.3571 187.4022 198.2450 188.4625
 [9] 183.4294 165.0588 164.3636 161.9931

Exercise: write a CV function for time series

See Lab4_exercise.Rmd in Lab4_replication.zip.

How to perform time series CV?

Time series CV can be easily done by tsCV() function in forecast package in R.

farma <- function(y, h, xreg) {
  ncol <- NCOL(xreg)
  X <- matrix(xreg[seq_along(y), ], ncol = ncol)
  if (NROW(xreg) < length(y) + h) {
    stop("Not enough xreg data for forecasting")
  }
  newX <- matrix(xreg[length(y) + seq(h), ], ncol = ncol)
  fit <- auto.arima(y, xreg = X)
  forecast(fit, xreg = newX, h = h)
}
# Seems like it works well with auto.arima function
# Because tsCV() function requires xcovariates that have the same length of observed/test data

xcovariates <- law
e <- tsCV(death, farma, h = 12, window = 170, xreg = xcovariates)# forecast errors 
colMeans(abs(e), na.rm = TRUE) # MAE of auto.arima() predicted values
 h=1  h=2  h=3  h=4  h=5  h=6  h=7  h=8  h=9 h=10 h=11 h=12 
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN 

Let’s make more time series models!

#It looks like there is seasonality in the time series, so let's try to control for each month or Q4
# Make some month variables (there are easier ways!)
jan <- as.numeric(month=="January")
feb <- as.numeric(month=="February")
mar <- as.numeric(month=="March")
apr <- as.numeric(month=="April")
may <- as.numeric(month=="May")
jun <- as.numeric(month=="June")
jul <- as.numeric(month=="July")
aug <- as.numeric(month=="August")
sep <- as.numeric(month=="September")
oct <- as.numeric(month=="October")
nov <- as.numeric(month=="November")
dec <- as.numeric(month=="December")
# Make a fourth quarter indicator
q4 <- as.numeric(oct|nov|dec)
# Store all these variables in the dataframe
ukdata$jan <- jan
ukdata$feb <- feb
ukdata$mar <- mar
ukdata$apr <- apr
ukdata$may <- may
ukdata$jun <- jun
ukdata$jul <- jul
ukdata$aug <- aug
ukdata$sep <- sep
ukdata$oct <- oct
ukdata$nov <- nov
ukdata$dec <- dec
ukdata$q4 <- q4
## Model 1b:  AR(1) model of death as function of law & q4
xcovariates <- cbind(law, q4)
arima.res1b <- arima(death, order = c(1, 0, 0), xreg = xcovariates, include.mean = TRUE)
cv.1b <- arimaCV(death, order = c(1,0,0), forward = forward, seasonal = NULL,
                 xreg = xcovariates, include.mean = TRUE, minper = minper)
## Model 1c:  AR(1) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res1c <- arima(death, order = c(1,0,0), xreg = xcovariates, include.mean = TRUE)
cv.1c <- arimaCV(ts(death), order=c(1,0,0), forward=forward, seasonal = NULL,
                 xreg=xcovariates, include.mean=TRUE, minper=minper)
## Model 1d:  AR(1) model of death as function of law & select months
xcovariates <- cbind(law, jan, sep, oct, nov, dec)
arima.res1d <- arima(death, order = c(1,0,0), xreg = xcovariates, include.mean = TRUE)
cv.1d <- arimaCV(ts(death), order=c(1,0,0), forward=forward, seasonal = NULL,
                 xreg=xcovariates, include.mean=TRUE, minper=minper)
## Model 1e:  AR(1)AR(1)_12 model of death as function of law
xcovariates <- cbind(law)
arima.res1e <- arima(death, order = c(1,0,0),
                     seasonal = list(order = c(1,0,0), period = 12),
                     xreg = xcovariates, include.mean = TRUE)
cv.1e <- arimaCV(ts(death), order=c(1,0,0), forward=forward,
                 seasonal = list(order = c(1,0,0), period = 12),
                 xreg=xcovariates, include.mean=TRUE, minper=minper)
## Model 2a:  AR(2) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res2a <- arima(death, order = c(2,0,0), xreg = xcovariates, include.mean = TRUE)
cv.2a <- arimaCV(ts(death), order=c(2,0,0), forward=forward, seasonal = NULL,
                 xreg=xcovariates, include.mean=TRUE, minper=minper)
## Model 2b:  MA(1) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res2b <- arima(death, order = c(0,0,1), xreg = xcovariates, include.mean = TRUE)
cv.2b <- arimaCV(ts(death), order=c(0,0,1), forward=forward, seasonal = NULL,
                 xreg=xcovariates, include.mean=TRUE, minper=minper)
## Model 2c:  ARMA(1,1) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res2c <- arima(death, order = c(1,0,1), xreg = xcovariates, include.mean = TRUE)
cv.2c <- arimaCV(ts(death), order=c(1,0,1), forward=forward, seasonal = NULL,
                 xreg=xcovariates, include.mean=TRUE, minper=minper)
## Model 2d:  ARMA(2,1) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res2d <- arima(death, order = c(2,0,1), xreg = xcovariates, include.mean = TRUE)
cv.2d <- arimaCV(ts(death), order=c(2,0,1), forward=forward, seasonal = NULL,
                 xreg=xcovariates, include.mean=TRUE, minper=minper)
## Model 2e:  ARMA(1,2) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res2e <- arima(death, order = c(1,0,2), xreg = xcovariates, include.mean = TRUE)
cv.2e <- arimaCV(ts(death), order=c(1,0,2), forward=forward, seasonal = NULL,
                 xreg=xcovariates, include.mean=TRUE, minper=minper)
## Model 2f:  ARMA(2,2) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res2f <- arima(death, order = c(2,0,2), xreg = xcovariates, include.mean = TRUE)
cv.2f <- arimaCV(ts(death), order=c(2,0,2), forward=forward, seasonal = TRUE,
                 xreg=xcovariates, include.mean=TRUE, minper=minper)
# Tired of manual search?  Auto.arima in the forecast package can automate the search,
# but be careful!
# auto.arima guessed that this time series was non-stationary,
# and without the additive seasonal effects manually added, tried to fit
# complex seasonal ARMA terms. Restricting to a stationary model and
# telling auto.arima to leave seasonality to the month dummies
# produced a simpler, better fitting model
# another warning:  if you seasonal = TRUE, set approximation and stepwise to TRUE
# or prepare to wait
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res3a <- auto.arima(death,
                          stationary=TRUE, seasonal=FALSE,
                          ic="aic", approximation=FALSE, stepwise=FALSE,
                          xreg = xcovariates)
print(arima.res3a) # same as model 2d
Series: death 
Regression with ARIMA(2,0,1) errors 

Coefficients:
         ar1      ar2      ma1  intercept        law      jan       feb
      1.1899  -0.2157  -0.7950  1626.1862  -321.2201  84.8843  -94.5311
s.e.  0.1071   0.0976   0.0724    68.6982    78.8301  41.3869   41.3010
           mar        apr       may       jun      aug      sep       oct
      -43.8782  -157.0544  -19.7871  -75.5646  14.8208  67.5749  206.8634
s.e.   41.0435    40.5352   39.3222   35.1484  35.1483  39.3216   40.5327
           nov       dec
      406.3691  522.9159
s.e.   41.0341   41.2487

sigma^2 = 15582:  log likelihood = -1191.33
AIC=2416.66   AICc=2420.18   BIC=2472.04
cv.3a <- arimaCV(ts(death), order=c(2,0,1), forward=forward, seasonal = NULL,
                 xreg=xcovariates, include.mean=TRUE, minper=minper)
library(RColorBrewer)
# Plot cross-validation results
allCV <- cbind(cv.1a, cv.1b, cv.1c, cv.1d, cv.1e, cv.2a, cv.2b, cv.2c, cv.2d, cv.2e, cv.2f)
labs <- c("1a", "1b", "1c", "1d", "1e", "2a", "2b", "2c", "2d", "2e", "2f")
col <- c(brewer.pal(7, "Reds")[3:7], brewer.pal(8, "Blues")[3:8])
matplot(allCV, type="l", col=col, lty=1, ylab="Mean Absolute Error", xlab="Periods Forward",
        main="Cross-validation of accident deaths models", xlim=c(0.75,12.75))
text(labs, x=rep(12.5,length(labs)), y=allCV[nrow(allCV),], col=col)
# Average cross-validation results
avgCV12 <- apply(allCV, 2, mean) |> sort()
avgCV12
    cv.2f     cv.2c     cv.2e     cv.2d     cv.1c     cv.2a     cv.2b     cv.1d 
 78.06823  79.45386  80.14037  80.29682  80.40383  81.38111  83.69651  93.57797 
    cv.1b     cv.1e     cv.1a 
101.54084 102.25939 170.68447 
# Based on AIC & cross-validation,
# let's select Model 2f to be our final model;
# there are other plausible models
arima.resF <- arima.res2f

Alternative way to estimate time series model

  • We can also use least squares to estimate time series model.
  • Then we can use Breusch-Godfrey Test to see whether the linear regression model still have serial correlation.
  • Intuition of Breusch-Godfrey Test:
    • suppose that we have original model: \(y = X\beta + u\)
    • then we can estimate an auxillary model of residuals: \(u = X\alpha + \sum_p \rho_p u_{t-p} + \varepsilon_t\)
    • The null hypothesis is no serial correlation, i.e. \(\rho_p = 0\)
    • we perform hypothesis testing on \((T - p) R^2 \sim \chi_p^2\)
  • In R, we use bgtest() from lmtest package.

Alternative way to estimate time series model

library(lmtest)
## What would happen if we used linear regression on a single lag of death?
lm.res1f <- lm(death ~ lag(death) + jan + feb + mar + apr + may + jun +
                 aug + sep + oct + nov + dec + law)
summary(lm.res1f)

Call:
lm(formula = death ~ lag(death) + jan + feb + mar + apr + may + 
    jun + aug + sep + oct + nov + dec + law)

Residuals:
    Min      1Q  Median      3Q     Max 
-323.58  -84.45   -3.80   80.97  404.88 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  635.11393   96.64706   6.571 5.38e-10 ***
lag(death)     0.64313    0.05787  11.114  < 2e-16 ***
jan         -302.58936   59.33982  -5.099 8.71e-07 ***
feb         -211.00947   48.46926  -4.353 2.26e-05 ***
mar          -31.82070   47.33602  -0.672 0.502314    
apr         -177.52653   47.35870  -3.749 0.000241 ***
may           32.58040   47.55810   0.685 0.494199    
jun         -111.47957   47.43316  -2.350 0.019863 *  
aug          -33.76181   47.52523  -0.710 0.478393    
sep            9.48411   47.61220   0.199 0.842339    
oct          114.89374   48.04444   2.391 0.017832 *  
nov          224.81981   50.07068   4.490 1.28e-05 ***
dec          213.09991   54.93824   3.879 0.000148 ***
law         -145.31036   37.36477  -3.889 0.000142 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 133.9 on 177 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.802, Adjusted R-squared:  0.7875 
F-statistic: 55.17 on 13 and 177 DF,  p-value: < 2.2e-16
# Check LS result for serial correlation in the first or second order
bgtest(lm.res1f, 1)

    Breusch-Godfrey test for serial correlation of order up to 1

data:  lm.res1f
LM test = 11.546, df = 1, p-value = 0.000679
bgtest(lm.res1f, 2)

    Breusch-Godfrey test for serial correlation of order up to 2

data:  lm.res1f
LM test = 11.984, df = 2, p-value = 0.002498
# Evidence of residual serial correlation is strong
# Rerun with two lags
lm.res1g <- lm(death ~ lag(death, 1) + lag(death, 2) + jan + feb + mar + apr + may + jun +
                 aug + sep + oct + nov + dec + law)
summary(lm.res1g)

Call:
lm(formula = death ~ lag(death, 1) + lag(death, 2) + jan + feb + 
    mar + apr + may + jun + aug + sep + oct + nov + dec + law)

Residuals:
    Min      1Q  Median      3Q     Max 
-378.22  -88.29   -5.04   89.71  308.44 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    475.12645  103.68324   4.582 8.71e-06 ***
lag(death, 1)    0.47250    0.07332   6.445 1.09e-09 ***
lag(death, 2)    0.26362    0.07284   3.619 0.000387 ***
jan           -311.45937   57.62112  -5.405 2.09e-07 ***
feb           -329.58156   57.96856  -5.686 5.37e-08 ***
mar            -68.08737   46.99905  -1.449 0.149212    
apr           -152.44095   46.46031  -3.281 0.001248 ** 
may             25.02334   46.18114   0.542 0.588610    
jun            -65.76811   47.71466  -1.378 0.169851    
aug             -6.16090   46.72852  -0.132 0.895259    
sep             19.68658   46.27238   0.425 0.671032    
oct            130.18618   46.79714   2.782 0.005997 ** 
nov            249.97112   49.06743   5.094 9.00e-07 ***
dec            235.55993   53.65766   4.390 1.96e-05 ***
law           -111.47166   37.45979  -2.976 0.003336 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 129.8 on 175 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.8155,    Adjusted R-squared:  0.8008 
F-statistic: 55.26 on 14 and 175 DF,  p-value: < 2.2e-16
# Check LS result for serial correlation in the first or second order
bgtest(lm.res1g,1)

    Breusch-Godfrey test for serial correlation of order up to 1

data:  lm.res1g
LM test = 0.6961, df = 1, p-value = 0.4041
bgtest(lm.res1g,2)

    Breusch-Godfrey test for serial correlation of order up to 2

data:  lm.res1g
LM test = 3.2256, df = 2, p-value = 0.1993
# Borderline evidence of serial correlation, but substantively different result.
# (Even small time series assumptions can have big implications for substance.)

Counterfactual Forecasting

Counterfactual Forecasting

  • Why do we perform counterfactual forecasting?
    • We care about the “what-if” question: e.g. what if we keep the seatbelt law for the next 12 months, holding other conditions unchanged? What if we repeal the law, ceteris paribus?
  • To answer the “what-if” question, we need the following things:
    • a set of hypothetical scenarios: holding other variables at certain constant values, but only change the variable we are interested in.
    • a fitted model: gives point estimates and uncertainty of parameters

Forecasting Predicted Value vs. Expected Value

  • Predicted values: in most cases we want to predict the outcome in the future given the fitted model
    • In this case, the uncertainty of predicted outcomes is derived from the average distance between observed data and fitted regression line. We call this prediction interval.
  • Expected values: sometimes we want to forecast the expected outcome (i.e. the true population mean estimated by the model)
    • In this case, the uncertainty of expected outcome is derived from the sampling distribution of parameters. We call this confidence interval (i.e. the interval that have some chances of including the true population mean).
    • The procedure of producing confidence interval is tricky:
      • simulate sets of parameter samples from the multivariate normal distribution, whose mean and variance-covariance matrix are extracted from the coefficients estimated by our model.
      • each draw of parameters constitute a new model. We can plug hypothetical scenarios in that simulated model to predict expected outcomes.
      • Suppose that we have 10,000 draws of parameters, then we can compute the upper and lower bound of 95% confidence interval using quantile() in R.
  • In most cases, confidence interval of expected values should be narrower than prediction interval of predicted values.

Prediction Interval of Predicted Values vs. Confidence Interval of Expected Values

Create Hypothetical Scenarios

## Now that we've selected a model, let's interpret it: ARMA(2,2)
## using counterfactuals iterated over time
##

## Predict out five years (60 periods, i.e. 5 years) assuming law is kept

# Make newdata data frame for prediction
xcovariates <- cbind(law, jan, feb, mar, apr, may,
                     jun, aug, sep, oct, nov, dec)
n.ahead <- 60
lawhyp0 <- rep(0, n.ahead)
lawhyp1 <- rep(1, n.ahead)
janhyp <- rep( c( 1,0,0, 0,0,0, 0,0,0, 0,0,0 ), 5)
febhyp <- rep( c( 0,1,0, 0,0,0, 0,0,0, 0,0,0 ), 5)
marhyp <- rep( c( 0,0,1, 0,0,0, 0,0,0, 0,0,0 ), 5)
aprhyp <- rep( c( 0,0,0, 1,0,0, 0,0,0, 0,0,0 ), 5)
mayhyp <- rep( c( 0,0,0, 0,1,0, 0,0,0, 0,0,0 ), 5)
junhyp <- rep( c( 0,0,0, 0,0,1, 0,0,0, 0,0,0 ), 5)
aughyp <- rep( c( 0,0,0, 0,0,0, 0,1,0, 0,0,0 ), 5)
sephyp <- rep( c( 0,0,0, 0,0,0, 0,0,1, 0,0,0 ), 5)
octhyp <- rep( c( 0,0,0, 0,0,0, 0,0,0, 1,0,0 ), 5)
novhyp <- rep( c( 0,0,0, 0,0,0, 0,0,0, 0,1,0 ), 5)
dechyp <- rep( c( 0,0,0, 0,0,0, 0,0,0, 0,0,1 ), 5)
newdata0 <- cbind(lawhyp0, janhyp, febhyp, marhyp,
                  aprhyp, mayhyp, junhyp, aughyp,
                  sephyp, octhyp, novhyp, dechyp) %>%
  as.data.frame() %>%
  `colnames<-`(c("law", "jan", "feb", "mar", "apr",
                 "may", "jun", "aug", "sep", "oct",
                 "nov", "dec"))
newdata1 <- cbind(lawhyp1, janhyp, febhyp, marhyp,
                  aprhyp, mayhyp, junhyp, aughyp,
                  sephyp, octhyp, novhyp, dechyp) %>%
  as.data.frame() %>%
  `colnames<-`(c("law", "jan", "feb", "mar", "apr",
                 "may", "jun", "aug", "sep", "oct",
                 "nov", "dec"))

Hypothetical Scenarios w/ Seatbelt Law Repealed

Hypothetical Scenarios w/ Seatbelt Law in Effect

Simulate Expected Values and Confidence Interval

# Run predict
ypred0 <- predict(arima.resF,
                  n.ahead = n.ahead, # 5 years prediction
                  newxreg = newdata0) # covariates

# Simulate predicted values 
sims <- 10000 
simparam <- MASS::mvrnorm(sims, arima.resF$coef, arima.resF$var.coef) # multivariate normal for parameters
simphi <- simparam[,1:2] #ar1, ar2 coeffs
simrho <- simparam[,3:4] #ma1, ma2 coeffs
simbetas <- simparam[,5:ncol(simparam)] # intercept through covariates
lagY <- c(death[length(death)],death[length(death)-1]) # lagged observations
lagY <- as.vector(lagY)
lagEps <- c(arima.res2f$resid[length(death)], arima.res2f$resid[length(death)-1]) # lagged shocks
lagEps <- as.vector(lagEps)
sigma <- sqrt(arima.res2f$sigma) # sigma^2 for prediction

sim.ev2f <- simcf::ldvsimpv(newdata0,
                    simbetas,
                    ci=0.95,
                    constant=1,
                    phi=simphi,
                    lagY=lagY,
                    rho=simrho,
                    lagEps=lagEps,
                    sigma=sigma
)

Plot Predicted Values and Prediction Interval

plot.new()
par(usr = c(0, length(death) + n.ahead, 1000, 3000))
# make the x-axis
axis(1,
     at = seq(from = 10, to = 252, by = 12),
     labels = 1969:1989
)
axis(2)
title(xlab = "Time",
      ylab = "Deaths",
      main="Predicted effect of reversing seat belt law")
# Polygon of predictive interval for no law (optional)
x0 <- (length(death)+1):(length(death) + n.ahead)
y0 <- c(ypred0$pred - 2*ypred0$se, rev(ypred0$pred + 2*ypred0$se), (ypred0$pred - 2*ypred0$se)[1])
polygon(x = c(x0, rev(x0), x0[1]),
        y = y0,
        border=NA,
        col="#FFBFBFFF"
)
# Plot the actual data
lines(x = 1:length(death),
      y = death
)
# Add the predictions for no law
lines(x = length(death):(length(death)+n.ahead),
      y = c(death[length(death)],ypred0$pred),  # link up the actual data to the prediction
      col = "red"
)
# Add the lower predictive interval for no law
lines(x = length(death):(length(death) + n.ahead),
      y = c(death[length(death)], ypred0$pred - 2*ypred0$se),
      col = "red",
      lty="dashed")
# Add the upper predictive interval for no law
lines(x = length(death):(length(death) + n.ahead),
      y = c(death[length(death)], ypred0$pred + 2*ypred0$se),
      col = "red",
      lty = "dashed")
# Add the predictions for keeping law
lines(x = length(death):(length(death)+n.ahead),
      y = c(death[length(death)],ypred0$pred),  # link up the actual data to the prediction
      col = "blue")

Exercise - Draw a plot of prediction and confidence interval

See Lab4_exercise.Rmd in Lab4_replication.zip.

Footnotes

  1. A technical proof is given at https://en.wikipedia.org/wiki/Mean_absolute_error#Optimality_property, and an intuitive explanation is given at https://stats.stackexchange.com/questions/355538/why-does-minimizing-the-mae-lead-to-forecasting-the-median-and-not-the-mean.

  2. For more information, see https://otexts.com/fpp3/accuracy.html and https://towardsdatascience.com/time-series-forecast-error-metrics-you-should-know-cc88b8c67f27.

  3. See https://arxiv.org/pdf/1905.11744.pdf.

  4. See https://robjhyndman.com/hyndsight/tscv/.

  5. See https://robjhyndman.com/hyndsight/tscv-fable/.