Recall that we use maximum likelihood approach to estimate the parameters of an ARMA model (although we also discussed using least squares).
This should be familiar:
Express the joint probability of the data using the chosen probability distribution (i.e. the likelihood of data given parameters)
Take a logarithm and transform the product of probabilities to the sum of log-probabilities (because \(\sum\) is easier for optimization than \(\prod\))
Substitute the linear predictor \(\mu = X\beta\) (sometimes we call it “systematic component”)
\[ \begin{aligned} \mathcal{L}(\boldsymbol{\beta}, \phi_1 | \boldsymbol{y}, \boldsymbol{X}) & = -\frac{1}{2}\text{log}\Bigg(\frac{\sigma^2}{1-\phi^2_1}\Bigg)-\frac{\Bigg(y_1-\frac{\boldsymbol{x_{1}\beta}}{1-\phi_1}\Bigg)^2}{\frac{2\sigma^2}{1-\phi^2_1}} \\ & \qquad \qquad -\frac{T-1}{2}\text{log}\sigma^2-\sum^T_{T=2}\frac{(y_t-\boldsymbol{x_t\beta}-\phi_1y_{t-1})^2}{2\sigma^2} \end{aligned} \]
We can estimate the parameters, \(\boldsymbol{\beta}\) and \(\phi_1\) that make the data most likely using numerical methods.
The arima()
function in R
will compute
these for us.
# Load data
# Number of deaths and serious injuries in UK road accidents each month.
# Jan 1969 - Dec 1984. Seatbelt law introduced in Feb 1983
# (indicator in second column). Source: Harvey, 1989, p.519ff.
# http://www.staff.city.ac.uk/~sc397/courses/3ts/datasets.html
#
# Variable names: death law
ukdata <- read.csv("data/ukdeaths.csv", header = TRUE)
names(ukdata)
## [1] "death" "law" "year" "mt" "month"
attach(ukdata) # recall detach()
## Estimate an AR(1) using arima
xcovariates <- law # we want to use seatbelt law to explain the change in death number
arima.res1a <- arima(
death, # dependent variable
order = c(1, 0, 0), # AR(1) term
xreg = xcovariates, # feed covariates - it should be a matrix in which each column is a time series of a covariate
include.mean = TRUE # include intercept
)
print(arima.res1a) # How do we interpret these parameter estimates?
##
## Call:
## arima(x = death, order = c(1, 0, 0), xreg = xcovariates, include.mean = TRUE)
##
## Coefficients:
## ar1 intercept xcovariates
## 0.6439 1719.193 -377.4542
## s.e. 0.0553 42.078 107.6521
##
## sigma^2 estimated as 39289: log likelihood = -1288.26, aic = 2584.52
coef/se > 2
.How can we know the above AR(1) model is better than other candidate models?
# Extract estimation results from the list object `arima.res1a`
pe.1a <- arima.res1a$coef # parameter estimates (betas)
se.1a <- sqrt(diag(arima.res1a$var.coef)) # standard errors of coefficients
ll.1a <- arima.res1a$loglik # log likelihood at its maximum
sigma2hat.1a <- arima.res1a$sigma2 # standard error of the regression
aic.1a <- arima.res1a$aic # Akaike Information Criterion
resid.1a <- arima.res1a$resid # residuals
# We can manually calculate AIC.
# Recall that the AIC is equal to the deviance (-2*log likelihood at its maximum)
# of the model plus 2 * the dimension of the model
# (number of free parameters of the model)
-2*ll.1a + 2*length(pe.1a)
## [1] 2582.52
# And MSE is just the expected value of the squared residuals
mean(resid.1a^2)
## [1] 39289.43
# RMSE
sqrt(mean(resid.1a^2))
## [1] 198.2156
# MAE
mean(abs(resid.1a))
## [1] 156.7996
R
, we can use tsCV()
from
forecast
package to perform time series CV.[^1]
stretch_tsibble()
in
fable
package to generate data folds.[^2]expand_window <- matrix(
c(1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2),
ncol = 15,
byrow = TRUE,
dimnames = list(seq(1, 10), seq(1, 15))
) %>%
as.data.frame() %>%
# the dot "." it's a way of referencing the current piped df
mutate(iteration = rownames(.)) %>%
pivot_longer(-iteration,
names_to = "period",
values_to = "value") %>%
mutate(iteration = as.integer(iteration),
period = as.integer(period),
value = factor(value))
expand_window %>%
ggplot(aes(x = period,
y = iteration,
fill = value)) +
geom_tile(color = "white") +
scale_x_continuous(breaks = seq(1, 15)) +
scale_y_reverse(breaks = seq(1, 10)) +
scale_fill_manual(labels = c("Dropped", "Train", "Test"),
values = c("gray", "blue", "red"),
name = "") +
theme_classic() +
labs(x = "Periods",
y = "Iterations",
title = "Expanding Window Scheme") +
theme(
axis.line.y = element_line(arrow = arrow(type='closed',
ends = "first",
length = unit(10,'pt'))),
axis.line.x = element_line(arrow = arrow(type='closed',
ends = "last",
length = unit(10,'pt')))
)
sliding_window <- matrix(
c(
1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2
),
ncol = 15,
byrow = TRUE,
dimnames = list(seq(1, 10), seq(1, 15))) %>%
as.data.frame() %>%
mutate(iteration = rownames(.)) %>%
pivot_longer(-iteration,
names_to = "period",
values_to = "value") %>%
mutate(iteration = as.integer(iteration),
period = as.integer(period),
value = factor(value))
sliding_window %>%
ggplot(aes(x = period, y = iteration, fill = value)) +
geom_tile(color = "white") +
scale_x_continuous(breaks = seq(1, 15)) +
scale_y_reverse(breaks = seq(1, 10)) +
scale_fill_manual(labels = c("Dropped", "Train", "Test"),
values = c("gray", "blue", "red"),
name = "") +
theme_classic() +
labs(x = "Periods", y = "Iterations", title = "Sliding Window Scheme") +
theme(
axis.line.y = element_line(arrow = arrow(type='closed',
ends = "first",
length = unit(10,'pt'))),
axis.line.x = element_line(arrow = arrow(type='closed',
ends = "last",
length = unit(10,'pt')))
)
## Now we repeat the above but with a loop
# this is a for loop for time series CV
min_window <- 170
forward <- 12
# ts information
ts <- ts(death)
ts_start <- tsp(ts)[1]
ts_length <- length(ts)
# model setup
order <- c(1, 0, 0)
xcovariates <- law
include.mean <- TRUE
# for loop for CV procedure
n_iter <- ts_length - min_window # iterations
mae <- matrix(NA, # matrix to store GoF
nrow = n_iter,
ncol = forward)
for (i in 1:n_iter) {
# start and end of train set
train_start <- ts_start + (i - 1)
train_end <- ts_start + (i - 1) + (min_window - 1)
# start and end of test set
test_start <- ts_start + min_window + (i -1)
test_end <- min(ts_start + min_window + (i -1) + (forward - 1), ts_length)
# construct train and test set
train_set <- window(ts, start = train_start, end = train_end)
test_set <- window(ts, start = test_start, end = test_end)
x_train <- window(xcovariates, start = train_start, end = train_end)
x_test <- window(xcovariates, start = test_start, end = test_end)
# fit model
fit <- forecast::Arima(train_set,
order = order,
include.mean = include.mean,
xreg = x_train)
# predict outcomes in test set
pred <- forecast::forecast(fit,
h = forward,
xreg = x_test)
# performance metrics: MAE
mae[i, 1:length(test_set)] <- abs(pred$mean - test_set)
}
colMeans(mae, na.rm = TRUE)
## [1] 119.6679 136.2173 175.0493 182.9675 185.3571 187.4022 198.2450 188.4625
## [9] 183.4294 165.0588 164.3636 161.9931
# Note: GoF is based on "next period" prediction performance.
arimaCV
Look the custom function arimaCV
in the
TS_code.R
file. In the following code chunk, we break and
explain line by line how the functions is programmed.
# ARIMA Cross-validation by rolling windows
# Adapted from Rob J Hyndman's code:
# http://robjhyndman.com/hyndsight/tscvexample/
#
# Could use further generalization, e.g. to seasonality
# Careful! This can produce singularities using categorical covariates
arimaCV <- function(
x, # time series object
order, seasonal, xreg, include.mean, # options for ARMA model
# hyper-parameters for rolling-window cross-validation
forward=NULL,
minper=NULL
) {
require(forecast) # use package forecast
if (!any(class(x)=="ts")) x <- ts(x) # automatically change inputs to ts object
n <- length(x) # the length of time series
# pre-allocate a [(T-K) x P] matrix for performance metrics
mae <- matrix(NA, nrow=n-minper, ncol=forward) # here we use mean standard error.
st <- tsp(x)[1]+(minper-2) # Here we calculate the "starting point" of test set minus 2 (i.e. the end point of training set - 1).
for(i in 1:(n-minper)) {
xshort <- window(x, start=st+(i-minper+1), end=st+i)
xnext <- window(x, start=st+(i+1), end=min(n, st+(i+forward)))
xregshort <- window(xreg, start=st+(i-minper+1), end=st+i)
xregnext <- window(xreg, start=st+(i+1), end=min(n, st+(i+forward)))
# fitting the model
fit <- Arima(xshort,
order=order,
seasonal=seasonal,
xreg=xregshort,
include.mean=include.mean)
# prediction of the model
fcast <- forecast(fit,
h=length(xnext),
xreg=xregnext)
mae[i,1:length(xnext)] <- abs(fcast[['mean']]-xnext)
}
colMeans(mae, na.rm=TRUE)
}
See in the next code chunk how to apply arimaCV
.
# Set rolling window length and look ahead period for cross-validation
minper <- 170 # minimum window- 170th observation has a seat belt law finally!
forward <- 12 # 1~12th period forward after 170th observation!
# Attempt at rolling window cross-validation (see caveats)
cv.1a <- arimaCV(death,
order=c(1,0,0),
forward=forward,
seasonal = NULL,
xreg=xcovariates,
include.mean=TRUE,
minper=minper)
cv.1a
## [1] 119.6679 136.2173 175.0493 182.9675 185.3571 187.4022 198.2450 188.4625
## [9] 183.4294 165.0588 164.3636 161.9931
In the next section, we will construct a matrix of covariates and
evaluate multiple candidate models. We will employ arimaCV
to automate the calculation of out-of-sample goodness-of-fit metrics and
compare the performance of all the models.
#It looks like there is seasonality in the time series, so let's try to control for each month or Q4
# Make some month variables (there are easier ways!)
jan <- as.numeric(month=="January")
feb <- as.numeric(month=="February")
mar <- as.numeric(month=="March")
apr <- as.numeric(month=="April")
may <- as.numeric(month=="May")
jun <- as.numeric(month=="June")
jul <- as.numeric(month=="July")
aug <- as.numeric(month=="August")
sep <- as.numeric(month=="September")
oct <- as.numeric(month=="October")
nov <- as.numeric(month=="November")
dec <- as.numeric(month=="December")
# Make a fourth quarter indicator
q4 <- as.numeric(oct|nov|dec)
# Store all these variables in the dataframe
ukdata$jan <- jan
ukdata$feb <- feb
ukdata$mar <- mar
ukdata$apr <- apr
ukdata$may <- may
ukdata$jun <- jun
ukdata$jul <- jul
ukdata$aug <- aug
ukdata$sep <- sep
ukdata$oct <- oct
ukdata$nov <- nov
ukdata$dec <- dec
ukdata$q4 <- q4
## Model estimation
# time series, outcome:
ts_death <- ts(death)
## Model 1b: AR(1) model of death as function of law & q4
xcovariates <- cbind(law, q4)
arima.res1b <- arima(ts_death,
order = c(1, 0, 0),
xreg = xcovariates,
include.mean = TRUE)
cv.1b <- arimaCV(ts_death,
order = c(1,0,0),
forward = forward,
seasonal = NULL,
xreg = xcovariates,
include.mean = TRUE,
minper = minper)
## Model 1c: AR(1) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res1c <- arima(ts_death,
order = c(1,0,0),
xreg = xcovariates,
include.mean = TRUE)
cv.1c <- arimaCV(ts_death,
order=c(1,0,0),
forward=forward,
seasonal = NULL,
xreg=xcovariates,
include.mean=TRUE,
minper=minper)
## Model 1d: AR(1) model of death as function of law & select months
xcovariates <- cbind(law, jan, sep, oct, nov, dec)
arima.res1d <- arima(ts_death,
order = c(1,0,0),
xreg = xcovariates,
include.mean = TRUE)
cv.1d <- arimaCV(ts_death,
order=c(1,0,0),
forward=forward,
seasonal = NULL,
xreg=xcovariates,
include.mean=TRUE,
minper=minper)
## Model 1e: AR(1)AR(1)_12 model of death as function of law
xcovariates <- cbind(law)
arima.res1e <- arima(ts_death,
order = c(1,0,0),
seasonal = list(order = c(1,0,0),
period = 12),
xreg = xcovariates,
include.mean = TRUE)
cv.1e <- arimaCV(ts_death,
order=c(1,0,0),
forward=forward,
seasonal = list(order = c(1,0,0),
period = 12),
xreg=xcovariates,
include.mean=TRUE,
minper=minper)
## Model 2a: AR(2) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res2a <- arima(ts_death,
order = c(2,0,0),
xreg = xcovariates,
include.mean = TRUE)
cv.2a <- arimaCV(ts_death,
order=c(2,0,0),
forward=forward,
seasonal = NULL,
xreg=xcovariates,
include.mean=TRUE,
minper=minper)
## Model 2b: MA(1) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res2b <- arima(ts_death,
order = c(0,0,1),
xreg = xcovariates,
include.mean = TRUE)
cv.2b <- arimaCV(ts_death,
order=c(0,0,1),
forward=forward,
seasonal = NULL,
xreg=xcovariates,
include.mean=TRUE,
minper=minper)
## Model 2c: ARMA(1,1) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res2c <- arima(ts_death,
order = c(1,0,1),
xreg = xcovariates,
include.mean = TRUE)
cv.2c <- arimaCV(ts_death,
order=c(1,0,1),
forward=forward,
seasonal = NULL,
xreg=xcovariates,
include.mean=TRUE,
minper=minper)
## Model 2d: ARMA(2,1) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res2d <- arima(ts_death,
order = c(2,0,1),
xreg = xcovariates,
include.mean = TRUE)
cv.2d <- arimaCV(ts_death,
order=c(2,0,1),
forward=forward,
seasonal = NULL,
xreg=xcovariates,
include.mean=TRUE,
minper=minper)
## Model 2e: ARMA(1,2) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res2e <- arima(ts_death,
order = c(1,0,2),
xreg = xcovariates,
include.mean = TRUE)
cv.2e <- arimaCV(ts_death,
order=c(1,0,2),
forward=forward,
seasonal = NULL,
xreg=xcovariates,
include.mean=TRUE,
minper=minper)
## Model 2f: ARMA(2,2) model of death as function of law & months
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res2f <- arima(ts_death,
order = c(2,0,2),
xreg = xcovariates,
include.mean = TRUE)
cv.2f <- arimaCV(ts_death,
order=c(2,0,2),
forward=forward,
seasonal = TRUE,
xreg=xcovariates,
include.mean=TRUE,
minper=minper)
We have fit several models, the best way to assess the best fitting DGP is via visualizaiton:
library(RColorBrewer)
# Plot cross-validation results
allCV <- cbind(cv.1a, cv.1b, cv.1c, cv.1d, cv.1e, cv.2a, cv.2b, cv.2c, cv.2d, cv.2e, cv.2f)
labs <- c("1a", "1b", "1c", "1d", "1e", "2a", "2b", "2c", "2d", "2e", "2f")
col <- c(brewer.pal(7, "Reds")[3:7], brewer.pal(8, "Blues")[3:8])
matplot(allCV,
type="l",
col=col,
lty=1,
ylab="Mean Absolute Error",
xlab="Periods Forward",
main="Cross-validation of accident deaths models",
xlim=c(0.75,12.75))
text(labs, x=rep(12.5,length(labs)), y=allCV[nrow(allCV),], col=col)
# Average cross-validation results
avgCV12 <- apply(allCV, 2, mean) %>% sort()
avgCV12
## cv.2f cv.2c cv.2e cv.2d cv.1c cv.2a cv.2b cv.1d
## 78.07321 79.45386 80.14037 80.29682 80.40383 81.38111 83.69651 93.57797
## cv.1b cv.1e cv.1a
## 101.54084 102.25939 170.68447
# Based on AIC & cross-validation,
# let's select Model 2f to be our final model;
# there are other plausible models
arima.resF <- arima.res2f
arima.res2f
##
## Call:
## arima(x = ts_death, order = c(2, 0, 2), xreg = xcovariates, include.mean = TRUE)
##
## Coefficients:
## ar1 ar2 ma1 ma2 intercept law jan feb
## 0.0526 0.8449 0.3497 -0.6503 1625.7793 -312.2308 86.0931 -91.7482
## s.e. 0.0538 0.0413 0.1006 0.0998 61.5565 81.8335 40.9421 38.1258
## mar apr may jun aug sep oct
## -43.7677 -154.3960 -19.6984 -72.8430 17.6629 67.3856 209.8757
## s.e. 40.4084 36.9053 38.9443 34.4385 34.4298 38.9431 36.8765
## nov dec
## 405.8869 526.1152
## s.e. 40.3991 38.0647
##
## sigma^2 estimated as 13794: log likelihood = -1189.2, aic = 2414.39
# Alas, is 2f the most parsimonious?
length(arima.res2f$coef)
## [1] 17
length(arima.res2c$coef) # <-- best model, in my opinion
## [1] 15
length(arima.res2e$coef)
## [1] 16
length(arima.res2d$coef)
## [1] 16
length(arima.res1c$coef) # <-- second best option?
## [1] 14
# Let's check for evidence on white noise:
Box.test(arima.res2f$residuals)
##
## Box-Pierce test
##
## data: arima.res2f$residuals
## X-squared = 0.44144, df = 1, p-value = 0.5064
Box.test(arima.res2c$residuals)
##
## Box-Pierce test
##
## data: arima.res2c$residuals
## X-squared = 0.86001, df = 1, p-value = 0.3537
Box.test(arima.res1c$residuals) # does not reject the Q-Statistic test
##
## Box-Pierce test
##
## data: arima.res1c$residuals
## X-squared = 5.6678, df = 1, p-value = 0.01728
Now, let’s compare how much least squares lm()
differs
from MLE-arima arima()
when estimating similar models for
this case of univariate time series.
## What would happen if we used linear regression on a single lag of death?
lm.res1f <- lm(death ~ lag(death) + jan + feb + mar + apr + may + jun +
aug + sep + oct + nov + dec + law)
summary(lm.res1f)
##
## Call:
## lm(formula = death ~ lag(death) + jan + feb + mar + apr + may +
## jun + aug + sep + oct + nov + dec + law)
##
## Residuals:
## Min 1Q Median 3Q Max
## -323.58 -84.45 -3.80 80.97 404.88
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 635.11393 96.64706 6.571 5.38e-10 ***
## lag(death) 0.64313 0.05787 11.114 < 2e-16 ***
## jan -302.58936 59.33982 -5.099 8.71e-07 ***
## feb -211.00947 48.46926 -4.353 2.26e-05 ***
## mar -31.82070 47.33602 -0.672 0.502314
## apr -177.52653 47.35870 -3.749 0.000241 ***
## may 32.58040 47.55810 0.685 0.494199
## jun -111.47957 47.43316 -2.350 0.019863 *
## aug -33.76181 47.52523 -0.710 0.478393
## sep 9.48411 47.61220 0.199 0.842339
## oct 114.89374 48.04444 2.391 0.017832 *
## nov 224.81981 50.07068 4.490 1.28e-05 ***
## dec 213.09991 54.93824 3.879 0.000148 ***
## law -145.31036 37.36477 -3.889 0.000142 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 133.9 on 177 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.802, Adjusted R-squared: 0.7875
## F-statistic: 55.17 on 13 and 177 DF, p-value: < 2.2e-16
library(lmtest)
# Check LS result for serial correlation in the first or second order
# Breusch-Godfrey Test:
bgtest(lm.res1f, 1)
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: lm.res1f
## LM test = 11.546, df = 1, p-value = 0.000679
bgtest(lm.res1f, 2)
##
## Breusch-Godfrey test for serial correlation of order up to 2
##
## data: lm.res1f
## LM test = 11.984, df = 2, p-value = 0.002498
# Q-statistic test:
Box.test(lm.res1f$residuals)
##
## Box-Pierce test
##
## data: lm.res1f$residuals
## X-squared = 4.9828, df = 1, p-value = 0.0256
# Evidence of residual serial correlation is strong
# Rerun with two lags
lm.res1g <- lm(death ~ lag(death, 1) + lag(death, 2) + jan + feb + mar + apr + may + jun +
aug + sep + oct + nov + dec + law)
summary(lm.res1g)
##
## Call:
## lm(formula = death ~ lag(death, 1) + lag(death, 2) + jan + feb +
## mar + apr + may + jun + aug + sep + oct + nov + dec + law)
##
## Residuals:
## Min 1Q Median 3Q Max
## -378.22 -88.29 -5.04 89.71 308.44
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 475.12645 103.68324 4.582 8.71e-06 ***
## lag(death, 1) 0.47250 0.07332 6.445 1.09e-09 ***
## lag(death, 2) 0.26362 0.07284 3.619 0.000387 ***
## jan -311.45937 57.62112 -5.405 2.09e-07 ***
## feb -329.58156 57.96856 -5.686 5.37e-08 ***
## mar -68.08737 46.99905 -1.449 0.149212
## apr -152.44095 46.46031 -3.281 0.001248 **
## may 25.02334 46.18114 0.542 0.588610
## jun -65.76811 47.71466 -1.378 0.169851
## aug -6.16090 46.72852 -0.132 0.895259
## sep 19.68658 46.27238 0.425 0.671032
## oct 130.18618 46.79714 2.782 0.005997 **
## nov 249.97112 49.06743 5.094 9.00e-07 ***
## dec 235.55993 53.65766 4.390 1.96e-05 ***
## law -111.47166 37.45979 -2.976 0.003336 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 129.8 on 175 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.8155, Adjusted R-squared: 0.8008
## F-statistic: 55.26 on 14 and 175 DF, p-value: < 2.2e-16
# Check LS result for serial correlation in the first or second order
# Breusch-Godfrey Test:
bgtest(lm.res1g,1)
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: lm.res1g
## LM test = 0.6961, df = 1, p-value = 0.4041
bgtest(lm.res1g,2)
##
## Breusch-Godfrey test for serial correlation of order up to 2
##
## data: lm.res1g
## LM test = 3.2256, df = 2, p-value = 0.1993
# Q-statistic test:
Box.test(lm.res1g$residuals)
##
## Box-Pierce test
##
## data: lm.res1g$residuals
## X-squared = 0.053158, df = 1, p-value = 0.8177
# Borderline evidence of serial correlation, but substantively different result.
# (Even small time series assumptions can have big implications for substance.)
# Comparing best fitted models: LS vs arima:
arima.res2ls <- arima(ts_death,
order = c(2,0,0),
xreg = xcovariates,
include.mean = TRUE)
stargazer::stargazer(arima.res2c,lm.res1g,arima.res2ls,
type="text")
##
## ======================================================================
## Dependent variable:
## --------------------------------------------------
## ts_death death ts_death
## ARIMA OLS ARIMA
## (1) (2) (3)
## ----------------------------------------------------------------------
## ar1 0.935*** 0.470***
## (0.038) (0.069)
##
## ma1 -0.599***
## (0.108)
##
## ar2 0.271***
## (0.069)
##
## intercept 1,629.555*** 1,635.087***
## (58.679) (45.608)
##
## lag(death, 1) 0.473***
## (0.073)
##
## lag(death, 2) 0.264***
## (0.073)
##
## law -323.493*** -111.472*** -347.921***
## (83.208) (37.460) (80.568)
##
## jan 85.747** -311.459*** 83.747*
## (40.454) (57.621) (46.930)
##
## feb -94.092** -329.582*** -94.988**
## (40.235) (57.969) (46.515)
##
## mar -43.600 -68.087 -44.044
## (39.725) (46.999) (45.045)
##
## apr -156.861*** -152.441*** -157.232***
## (38.895) (46.460) (42.845)
##
## may -19.647 25.023 -19.838
## (37.723) (46.181) (37.972)
##
## jun -75.503** -65.768 -75.596**
## (36.167) (47.715) (35.063)
##
## aug 14.734 -6.161 14.806
## (36.167) (46.729) (35.062)
##
## sep 67.387* 19.687 67.505*
## (37.721) (46.272) (37.964)
##
## oct 206.592*** 130.186*** 206.736***
## (38.890) (46.797) (42.824)
##
## nov 405.957*** 249.971*** 406.057***
## (39.711) (49.067) (44.976)
##
## dec 522.374*** 235.560*** 522.460***
## (40.208) (53.658) (46.437)
##
## Constant 475.126***
## (103.683)
##
## ----------------------------------------------------------------------
## Observations 192 190 192
## R2 0.816
## Adjusted R2 0.801
## Log Likelihood -1,193.184 -1,196.649
## sigma2 14,567.900 15,117.850
## Akaike Inf. Crit. 2,418.368 2,425.299
## Residual Std. Error 129.842 (df = 175)
## F Statistic 55.261*** (df = 14; 175)
## ======================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
forecast::auto.arima()
Tired of manual search? auto.arima()
in the forecast
package can automate the search, but be careful!
auto.arima guessed that this time series was non-stationary, and
without the additive seasonal effects manually added, tried to fit
complex seasonal ARMA terms. Restricting to a stationary model and
telling auto.arima to leave seasonality to the month dummies produced a
simpler, better fitting model another warning: if you
seasonal = TRUE
, set approximation and
stepwise to TRUE
or prepare to wait
xcovariates <- cbind(law, jan, feb, mar, apr, may, jun, aug, sep, oct, nov, dec)
arima.res3a <- auto.arima(ts_death,
stationary=TRUE,
seasonal=FALSE,
ic="aic",
approximation=FALSE,
stepwise=FALSE,
xreg = xcovariates)
print(arima.res3a) # same as model 2d
## Series: ts_death
## Regression with ARIMA(2,0,1) errors
##
## Coefficients:
## ar1 ar2 ma1 intercept law jan feb
## 1.1899 -0.2157 -0.7950 1626.1862 -321.2201 84.8843 -94.5311
## s.e. 0.1071 0.0976 0.0724 68.6982 78.8301 41.3869 41.3010
## mar apr may jun aug sep oct
## -43.8782 -157.0544 -19.7871 -75.5646 14.8208 67.5749 206.8634
## s.e. 41.0435 40.5352 39.3222 35.1484 35.1483 39.3216 40.5327
## nov dec
## 406.3691 522.9159
## s.e. 41.0341 41.2487
##
## sigma^2 = 15582: log likelihood = -1191.33
## AIC=2416.66 AICc=2420.18 BIC=2472.04
print(arima.res2d)
##
## Call:
## arima(x = ts_death, order = c(2, 0, 1), xreg = xcovariates, include.mean = TRUE)
##
## Coefficients:
## ar1 ar2 ma1 intercept law jan feb
## 1.1899 -0.2157 -0.7950 1626.1862 -321.2201 84.8843 -94.5311
## s.e. 0.1071 0.0976 0.0724 68.6982 78.8301 41.3869 41.3010
## mar apr may jun aug sep oct
## -43.8782 -157.0544 -19.7871 -75.5646 14.8208 67.5749 206.8634
## s.e. 41.0435 40.5352 39.3222 35.1484 35.1483 39.3216 40.5327
## nov dec
## 406.3691 522.9159
## s.e. 41.0341 41.2487
##
## sigma^2 estimated as 14284: log likelihood = -1191.33, aic = 2416.66
cv.3a <- arimaCV(ts_death,
order=c(2,0,1),
forward=forward,
seasonal = NULL,
xreg=xcovariates,
include.mean=TRUE,
minper=minper)
Now that we’ve selected a model, let’s interpret it: ARMA(2,2) using counterfactuals iterated over time.
## Predict out five years (60 periods, i.e. 5 years) assuming law is kept
# Make newdata data frame for prediction
(xcovariates <- cbind(law, jan, feb, mar, apr, may,
jun, aug, sep, oct, nov, dec))
## law jan feb mar apr may jun aug sep oct nov dec
## [1,] 0 1 0 0 0 0 0 0 0 0 0 0
## [2,] 0 0 1 0 0 0 0 0 0 0 0 0
## [3,] 0 0 0 1 0 0 0 0 0 0 0 0
## [4,] 0 0 0 0 1 0 0 0 0 0 0 0
## [5,] 0 0 0 0 0 1 0 0 0 0 0 0
## [6,] 0 0 0 0 0 0 1 0 0 0 0 0
## [7,] 0 0 0 0 0 0 0 0 0 0 0 0
## [8,] 0 0 0 0 0 0 0 1 0 0 0 0
## [9,] 0 0 0 0 0 0 0 0 1 0 0 0
## [10,] 0 0 0 0 0 0 0 0 0 1 0 0
## [11,] 0 0 0 0 0 0 0 0 0 0 1 0
## [12,] 0 0 0 0 0 0 0 0 0 0 0 1
## [13,] 0 1 0 0 0 0 0 0 0 0 0 0
## [14,] 0 0 1 0 0 0 0 0 0 0 0 0
## [15,] 0 0 0 1 0 0 0 0 0 0 0 0
## [16,] 0 0 0 0 1 0 0 0 0 0 0 0
## [17,] 0 0 0 0 0 1 0 0 0 0 0 0
## [18,] 0 0 0 0 0 0 1 0 0 0 0 0
## [19,] 0 0 0 0 0 0 0 0 0 0 0 0
## [20,] 0 0 0 0 0 0 0 1 0 0 0 0
## [21,] 0 0 0 0 0 0 0 0 1 0 0 0
## [22,] 0 0 0 0 0 0 0 0 0 1 0 0
## [23,] 0 0 0 0 0 0 0 0 0 0 1 0
## [24,] 0 0 0 0 0 0 0 0 0 0 0 1
## [25,] 0 1 0 0 0 0 0 0 0 0 0 0
## [26,] 0 0 1 0 0 0 0 0 0 0 0 0
## [27,] 0 0 0 1 0 0 0 0 0 0 0 0
## [28,] 0 0 0 0 1 0 0 0 0 0 0 0
## [29,] 0 0 0 0 0 1 0 0 0 0 0 0
## [30,] 0 0 0 0 0 0 1 0 0 0 0 0
## [31,] 0 0 0 0 0 0 0 0 0 0 0 0
## [32,] 0 0 0 0 0 0 0 1 0 0 0 0
## [33,] 0 0 0 0 0 0 0 0 1 0 0 0
## [34,] 0 0 0 0 0 0 0 0 0 1 0 0
## [35,] 0 0 0 0 0 0 0 0 0 0 1 0
## [36,] 0 0 0 0 0 0 0 0 0 0 0 1
## [37,] 0 1 0 0 0 0 0 0 0 0 0 0
## [38,] 0 0 1 0 0 0 0 0 0 0 0 0
## [39,] 0 0 0 1 0 0 0 0 0 0 0 0
## [40,] 0 0 0 0 1 0 0 0 0 0 0 0
## [41,] 0 0 0 0 0 1 0 0 0 0 0 0
## [42,] 0 0 0 0 0 0 1 0 0 0 0 0
## [43,] 0 0 0 0 0 0 0 0 0 0 0 0
## [44,] 0 0 0 0 0 0 0 1 0 0 0 0
## [45,] 0 0 0 0 0 0 0 0 1 0 0 0
## [46,] 0 0 0 0 0 0 0 0 0 1 0 0
## [47,] 0 0 0 0 0 0 0 0 0 0 1 0
## [48,] 0 0 0 0 0 0 0 0 0 0 0 1
## [49,] 0 1 0 0 0 0 0 0 0 0 0 0
## [50,] 0 0 1 0 0 0 0 0 0 0 0 0
## [51,] 0 0 0 1 0 0 0 0 0 0 0 0
## [52,] 0 0 0 0 1 0 0 0 0 0 0 0
## [53,] 0 0 0 0 0 1 0 0 0 0 0 0
## [54,] 0 0 0 0 0 0 1 0 0 0 0 0
## [55,] 0 0 0 0 0 0 0 0 0 0 0 0
## [56,] 0 0 0 0 0 0 0 1 0 0 0 0
## [57,] 0 0 0 0 0 0 0 0 1 0 0 0
## [58,] 0 0 0 0 0 0 0 0 0 1 0 0
## [59,] 0 0 0 0 0 0 0 0 0 0 1 0
## [60,] 0 0 0 0 0 0 0 0 0 0 0 1
## [61,] 0 1 0 0 0 0 0 0 0 0 0 0
## [62,] 0 0 1 0 0 0 0 0 0 0 0 0
## [63,] 0 0 0 1 0 0 0 0 0 0 0 0
## [64,] 0 0 0 0 1 0 0 0 0 0 0 0
## [65,] 0 0 0 0 0 1 0 0 0 0 0 0
## [66,] 0 0 0 0 0 0 1 0 0 0 0 0
## [67,] 0 0 0 0 0 0 0 0 0 0 0 0
## [68,] 0 0 0 0 0 0 0 1 0 0 0 0
## [69,] 0 0 0 0 0 0 0 0 1 0 0 0
## [70,] 0 0 0 0 0 0 0 0 0 1 0 0
## [71,] 0 0 0 0 0 0 0 0 0 0 1 0
## [72,] 0 0 0 0 0 0 0 0 0 0 0 1
## [73,] 0 1 0 0 0 0 0 0 0 0 0 0
## [74,] 0 0 1 0 0 0 0 0 0 0 0 0
## [75,] 0 0 0 1 0 0 0 0 0 0 0 0
## [76,] 0 0 0 0 1 0 0 0 0 0 0 0
## [77,] 0 0 0 0 0 1 0 0 0 0 0 0
## [78,] 0 0 0 0 0 0 1 0 0 0 0 0
## [79,] 0 0 0 0 0 0 0 0 0 0 0 0
## [80,] 0 0 0 0 0 0 0 1 0 0 0 0
## [81,] 0 0 0 0 0 0 0 0 1 0 0 0
## [82,] 0 0 0 0 0 0 0 0 0 1 0 0
## [83,] 0 0 0 0 0 0 0 0 0 0 1 0
## [84,] 0 0 0 0 0 0 0 0 0 0 0 1
## [85,] 0 1 0 0 0 0 0 0 0 0 0 0
## [86,] 0 0 1 0 0 0 0 0 0 0 0 0
## [87,] 0 0 0 1 0 0 0 0 0 0 0 0
## [88,] 0 0 0 0 1 0 0 0 0 0 0 0
## [89,] 0 0 0 0 0 1 0 0 0 0 0 0
## [90,] 0 0 0 0 0 0 1 0 0 0 0 0
## [91,] 0 0 0 0 0 0 0 0 0 0 0 0
## [92,] 0 0 0 0 0 0 0 1 0 0 0 0
## [93,] 0 0 0 0 0 0 0 0 1 0 0 0
## [94,] 0 0 0 0 0 0 0 0 0 1 0 0
## [95,] 0 0 0 0 0 0 0 0 0 0 1 0
## [96,] 0 0 0 0 0 0 0 0 0 0 0 1
## [97,] 0 1 0 0 0 0 0 0 0 0 0 0
## [98,] 0 0 1 0 0 0 0 0 0 0 0 0
## [99,] 0 0 0 1 0 0 0 0 0 0 0 0
## [100,] 0 0 0 0 1 0 0 0 0 0 0 0
## [101,] 0 0 0 0 0 1 0 0 0 0 0 0
## [102,] 0 0 0 0 0 0 1 0 0 0 0 0
## [103,] 0 0 0 0 0 0 0 0 0 0 0 0
## [104,] 0 0 0 0 0 0 0 1 0 0 0 0
## [105,] 0 0 0 0 0 0 0 0 1 0 0 0
## [106,] 0 0 0 0 0 0 0 0 0 1 0 0
## [107,] 0 0 0 0 0 0 0 0 0 0 1 0
## [108,] 0 0 0 0 0 0 0 0 0 0 0 1
## [109,] 0 1 0 0 0 0 0 0 0 0 0 0
## [110,] 0 0 1 0 0 0 0 0 0 0 0 0
## [111,] 0 0 0 1 0 0 0 0 0 0 0 0
## [112,] 0 0 0 0 1 0 0 0 0 0 0 0
## [113,] 0 0 0 0 0 1 0 0 0 0 0 0
## [114,] 0 0 0 0 0 0 1 0 0 0 0 0
## [115,] 0 0 0 0 0 0 0 0 0 0 0 0
## [116,] 0 0 0 0 0 0 0 1 0 0 0 0
## [117,] 0 0 0 0 0 0 0 0 1 0 0 0
## [118,] 0 0 0 0 0 0 0 0 0 1 0 0
## [119,] 0 0 0 0 0 0 0 0 0 0 1 0
## [120,] 0 0 0 0 0 0 0 0 0 0 0 1
## [121,] 0 1 0 0 0 0 0 0 0 0 0 0
## [122,] 0 0 1 0 0 0 0 0 0 0 0 0
## [123,] 0 0 0 1 0 0 0 0 0 0 0 0
## [124,] 0 0 0 0 1 0 0 0 0 0 0 0
## [125,] 0 0 0 0 0 1 0 0 0 0 0 0
## [126,] 0 0 0 0 0 0 1 0 0 0 0 0
## [127,] 0 0 0 0 0 0 0 0 0 0 0 0
## [128,] 0 0 0 0 0 0 0 1 0 0 0 0
## [129,] 0 0 0 0 0 0 0 0 1 0 0 0
## [130,] 0 0 0 0 0 0 0 0 0 1 0 0
## [131,] 0 0 0 0 0 0 0 0 0 0 1 0
## [132,] 0 0 0 0 0 0 0 0 0 0 0 1
## [133,] 0 1 0 0 0 0 0 0 0 0 0 0
## [134,] 0 0 1 0 0 0 0 0 0 0 0 0
## [135,] 0 0 0 1 0 0 0 0 0 0 0 0
## [136,] 0 0 0 0 1 0 0 0 0 0 0 0
## [137,] 0 0 0 0 0 1 0 0 0 0 0 0
## [138,] 0 0 0 0 0 0 1 0 0 0 0 0
## [139,] 0 0 0 0 0 0 0 0 0 0 0 0
## [140,] 0 0 0 0 0 0 0 1 0 0 0 0
## [141,] 0 0 0 0 0 0 0 0 1 0 0 0
## [142,] 0 0 0 0 0 0 0 0 0 1 0 0
## [143,] 0 0 0 0 0 0 0 0 0 0 1 0
## [144,] 0 0 0 0 0 0 0 0 0 0 0 1
## [145,] 0 1 0 0 0 0 0 0 0 0 0 0
## [146,] 0 0 1 0 0 0 0 0 0 0 0 0
## [147,] 0 0 0 1 0 0 0 0 0 0 0 0
## [148,] 0 0 0 0 1 0 0 0 0 0 0 0
## [149,] 0 0 0 0 0 1 0 0 0 0 0 0
## [150,] 0 0 0 0 0 0 1 0 0 0 0 0
## [151,] 0 0 0 0 0 0 0 0 0 0 0 0
## [152,] 0 0 0 0 0 0 0 1 0 0 0 0
## [153,] 0 0 0 0 0 0 0 0 1 0 0 0
## [154,] 0 0 0 0 0 0 0 0 0 1 0 0
## [155,] 0 0 0 0 0 0 0 0 0 0 1 0
## [156,] 0 0 0 0 0 0 0 0 0 0 0 1
## [157,] 0 1 0 0 0 0 0 0 0 0 0 0
## [158,] 0 0 1 0 0 0 0 0 0 0 0 0
## [159,] 0 0 0 1 0 0 0 0 0 0 0 0
## [160,] 0 0 0 0 1 0 0 0 0 0 0 0
## [161,] 0 0 0 0 0 1 0 0 0 0 0 0
## [162,] 0 0 0 0 0 0 1 0 0 0 0 0
## [163,] 0 0 0 0 0 0 0 0 0 0 0 0
## [164,] 0 0 0 0 0 0 0 1 0 0 0 0
## [165,] 0 0 0 0 0 0 0 0 1 0 0 0
## [166,] 0 0 0 0 0 0 0 0 0 1 0 0
## [167,] 0 0 0 0 0 0 0 0 0 0 1 0
## [168,] 0 0 0 0 0 0 0 0 0 0 0 1
## [169,] 0 1 0 0 0 0 0 0 0 0 0 0
## [170,] 1 0 1 0 0 0 0 0 0 0 0 0
## [171,] 1 0 0 1 0 0 0 0 0 0 0 0
## [172,] 1 0 0 0 1 0 0 0 0 0 0 0
## [173,] 1 0 0 0 0 1 0 0 0 0 0 0
## [174,] 1 0 0 0 0 0 1 0 0 0 0 0
## [175,] 1 0 0 0 0 0 0 0 0 0 0 0
## [176,] 1 0 0 0 0 0 0 1 0 0 0 0
## [177,] 1 0 0 0 0 0 0 0 1 0 0 0
## [178,] 1 0 0 0 0 0 0 0 0 1 0 0
## [179,] 1 0 0 0 0 0 0 0 0 0 1 0
## [180,] 1 0 0 0 0 0 0 0 0 0 0 1
## [181,] 1 1 0 0 0 0 0 0 0 0 0 0
## [182,] 1 0 1 0 0 0 0 0 0 0 0 0
## [183,] 1 0 0 1 0 0 0 0 0 0 0 0
## [184,] 1 0 0 0 1 0 0 0 0 0 0 0
## [185,] 1 0 0 0 0 1 0 0 0 0 0 0
## [186,] 1 0 0 0 0 0 1 0 0 0 0 0
## [187,] 1 0 0 0 0 0 0 0 0 0 0 0
## [188,] 1 0 0 0 0 0 0 1 0 0 0 0
## [189,] 1 0 0 0 0 0 0 0 1 0 0 0
## [190,] 1 0 0 0 0 0 0 0 0 1 0 0
## [191,] 1 0 0 0 0 0 0 0 0 0 1 0
## [192,] 1 0 0 0 0 0 0 0 0 0 0 1
(n.ahead <- 60)
## [1] 60
(lawhyp0 <- rep(0, n.ahead))
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(lawhyp1 <- rep(1, n.ahead))
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
janhyp <- rep( c( 1,0,0, 0,0,0, 0,0,0, 0,0,0 ), 5)
febhyp <- rep( c( 0,1,0, 0,0,0, 0,0,0, 0,0,0 ), 5)
marhyp <- rep( c( 0,0,1, 0,0,0, 0,0,0, 0,0,0 ), 5)
aprhyp <- rep( c( 0,0,0, 1,0,0, 0,0,0, 0,0,0 ), 5)
mayhyp <- rep( c( 0,0,0, 0,1,0, 0,0,0, 0,0,0 ), 5)
junhyp <- rep( c( 0,0,0, 0,0,1, 0,0,0, 0,0,0 ), 5)
aughyp <- rep( c( 0,0,0, 0,0,0, 0,1,0, 0,0,0 ), 5)
sephyp <- rep( c( 0,0,0, 0,0,0, 0,0,1, 0,0,0 ), 5)
octhyp <- rep( c( 0,0,0, 0,0,0, 0,0,0, 1,0,0 ), 5)
novhyp <- rep( c( 0,0,0, 0,0,0, 0,0,0, 0,1,0 ), 5)
dechyp <- rep( c( 0,0,0, 0,0,0, 0,0,0, 0,0,1 ), 5)
## Set data for scenario law == 0 (absence of law)
newdata0 <- cbind(lawhyp0, janhyp, febhyp, marhyp,
aprhyp, mayhyp, junhyp, aughyp,
sephyp, octhyp, novhyp, dechyp) %>%
as.data.frame() %>%
`colnames<-`(c("law", "jan", "feb", "mar", "apr",
"may", "jun", "aug", "sep", "oct",
"nov", "dec"))
# Run predict
ypred0 <- predict(arima.resF,
n.ahead = n.ahead, # years of prediction
newxreg = newdata0) # covariates
## Set data for scenario law == 1 (implementation of law)
newdata1 <- cbind(lawhyp1, janhyp, febhyp, marhyp,
aprhyp, mayhyp, junhyp, aughyp,
sephyp, octhyp, novhyp, dechyp) %>%
as.data.frame() %>%
`colnames<-`(c("law", "jan", "feb", "mar", "apr",
"may", "jun", "aug", "sep", "oct",
"nov", "dec"))
# Run predict
ypred <- predict(arima.resF,
n.ahead = n.ahead,
newxreg = newdata1)
plot.new()
par(usr = c(0, length(death) + n.ahead, 1000, 3000))
# make the x-axis
axis(1,
at = seq(from = 10, to = 252, by = 12),
labels = 1969:1989)
# make the y-axis
axis(2)
title(xlab = "Time",
ylab = "Deaths",
main="Predicted effect of reversing seat belt law")
# Polygon of predictive interval for no law (optional)
(x0 <- (length(death)+1):(length(death) + n.ahead))
## [1] 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211
## [20] 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230
## [39] 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249
## [58] 250 251 252
(y0 <- c(ypred0$pred - 2*ypred0$se,
rev(ypred0$pred + 2*ypred0$se),
(ypred0$pred - 2*ypred0$se)[1]))
## [1] 1448.306 1272.419 1299.268 1191.965 1309.842 1260.350 1319.782 1340.968
## [9] 1379.994 1525.688 1713.170 1836.218 1389.400 1213.993 1256.565 1148.011
## [17] 1278.410 1227.019 1296.450 1315.588 1362.608 1506.334 1700.209 1821.471
## [25] 1379.764 1202.785 1249.441 1139.533 1273.192 1220.649 1292.680 1310.847
## [33] 1359.937 1502.849 1698.371 1818.953 1378.553 1201.010 1248.702 1138.324
## [41] 1272.803 1219.870 1292.548 1310.391 1359.991 1502.634 1698.556 1818.914
## [49] 1378.828 1201.098 1249.035 1138.502 1273.170 1220.107 1292.933 1310.666
## [57] 1360.379 1502.931 1698.939 1819.220 2483.233 2362.926 2166.890 2024.306
## [65] 1974.558 1956.783 1883.915 1936.926 1802.203 1912.672 1864.668 2042.318
## [73] 2482.320 2361.860 2165.834 2023.064 1973.334 1955.331 1882.492 1935.223
## [81] 1800.545 1910.669 1862.729 2039.954 2480.046 2359.062 2163.159 2019.740
## [89] 1970.177 1951.370 1878.755 1930.487 1796.106 1904.986 1857.439 2033.111
## [97] 2473.720 2350.794 2155.567 2009.715 1961.033 1939.170 1867.700 1915.583
## [105] 1782.686 1886.706 1841.077 2010.593 2453.674 2322.916 2130.871 1975.007
## [113] 1930.409 1895.664 1829.420 1860.581 1734.343 1816.380 1779.178 1919.210
## [121] 1448.306
polygon(x = c(x0, rev(x0), x0[1]),
y = y0,
border=NA,
col="#FFBFBFFF")
# Plot the actual data
lines(x = 1:length(death),
y = death)
# Add the predictions for no law
lines(x = length(death):(length(death)+n.ahead),
y = c(death[length(death)],ypred0$pred), # link up the actual data to the prediction
col = "red")
# Add the lower predictive interval for no law
lines(x = length(death):(length(death) + n.ahead),
y = c(death[length(death)], ypred0$pred - 2*ypred0$se),
col = "red",
lty="dashed")
# Add the upper predictive interval for no law
lines(x = length(death):(length(death) + n.ahead),
y = c(death[length(death)], ypred0$pred + 2*ypred0$se),
col = "red",
lty = "dashed")
# Add the predictions for keeping law
lines(x = length(death):(length(death)+n.ahead),
y = c(death[length(death)],ypred0$pred), # link up the actual data to the prediction
col = "blue")
A technical proof is given at https://en.wikipedia.org/wiki/Mean_absolute_error#Optimality_property, and an intuitive explanation is given at https://stats.stackexchange.com/questions/355538/why-does-minimizing-the-mae-lead-to-forecasting-the-median-and-not-the-mean.↩︎
For more information, see https://otexts.com/fpp3/accuracy.html and https://towardsdatascience.com/time-series-forecast-error-metrics-you-should-know-cc88b8c67f27.↩︎