---
title: "CSSS/POLS 512 - Time Series and Panel Data Methods"
subtitle: "Lab 2: Time Series Diagnostics"
author: "Ramses Lllobet"
fontfamily: accanthis
linkcolor: blue
output:
pdf_document:
toc: true
toc_depth: 3
editor_options:
chunk_output_type: console
fontsize: 11pt
header-includes:
\usepackage{float}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, error=FALSE, warning=FALSE, message = FALSE)
# clean environment
rm(list=ls())
# load packages
library(tseries)
library(MASS)
library(tidyverse)
# load source
source("source/simTS.R") # Chris's TS simulation code
```
# Box-Jenkins Method
The Box-Jenkins method assumes that time series are composed of multiple temporal processes, including temporal trends ($t$), autoregressive ($\phi$), moving average ($\theta$), and cycles or seasonality (which can follow AR, MA, or both processes simultaneously). The model below represents these processes that define the dynamic functional form of a variable $y$.
$$y_t = \beta_0 + \underbrace{\beta_1}_\text{trend}t + \overbrace{\phi_1}^\text{AR(1)}y_{t-1} + \underbrace{\phi_{12}}_\text{cycle} y_{t-12} + \overbrace{\theta_1}^\text{MA(1)}\varepsilon_{t-1} + \underbrace{\varepsilon_t}_\text{white noises}$$
The Box-Jenkins method is an iterative process for identifying the dynamic form of a data generating process (DGP). Once identified, this dynamic specification can be used for predictions and to filter out biases from dynamic nuances. A dynamic model specification is correctly if the estimated errors $\hat{e}$ exhibit characteristics of white noise $\varepsilon$, meaning they are normally distributed with mean zero, constant variance, and no correlation in the series over time.
Here are the revised steps for the method:
- Step 1: Study generic forms and properties.
- Use `plot()` and `decompose()` for time series visualization, and `acf()` and `pacf()` for correlograms to identify patterns.
- Step 2: Estimate and compare potential candidate models based on fit.
- You can use `lm()` or `arima()`.
- Step 3: Assess estimated residuals $\hat{\epsilon}$ of the best-fitting model. Are these errors consistent with white noise $\varepsilon$?
- If not, return to Step 1.
- Step 4: Perform a meta-analysis to determine the best model specification.
- Simulate the final model and compare it with your sample data to assess its adherence to the underlying data generating process (DGP).
- It then performs diagnostics to compare the observed series with generic forms to decide what processes occur in the data.
# Time Series Diagnostics
## Deterministic Trend
Perhaps the most simple dynamic process is one defined by a deterministic linear trend, such as:
$$y_t = \beta_{0} + \beta_{1}t + \varepsilon_t, \quad \text{where} \quad \varepsilon_{t} \sim \mathcal{N}(0, \sigma^{2})$$
In the population model above, a one-period increase in $t$ leads to an expected increase of $\beta_1$ in $\mathbb{E}(y_t)$. By estimating the parameters $\beta_0$ and$\beta_1$ of this model, we can remove the linear trend effect $\beta_1$ from $y$. After detrending the time series $y_t - t\hat{\beta_1}$, the estimated errors follow *white noise*.
$$y_t - t\hat{\beta_1} = \hat{\beta_0} + \hat{e_t} , \quad \text{where} \quad \hat{e_t} \sim \mathcal{N}(0, \hat{\sigma}^2)$$
Simulate a deterministic trend with noise, de-trend the data, and plot the time series.
```{r, fig.align='center', fig.height=3.5, fig.width=5}
# Set the slope of the trend
b1 <- 3/2
#Set the intercept
b0 <- 2
#Set the number of periods
n <- 50
t <- seq(0, n) # create a sequence of t from 0 to 50
y <- rep(NA, n) # pre-allocate elements for a sequence of time series
# simulate the data
set.seed(98105)
for (i in 1:length(t)){
y[i] <- b1 * t[i] + b0 + rnorm(1, 0, 15)
#The rnorm gives us the noise with mean = 0 and sd = 15
}
plot(y, type = "l", col = "red",
ylab = "y", xlab = "Time",
main = "Simulated Deterministic Trend, y=2+3/2t + Noise")
abline(a = 2, b = 3/2, lty = "dashed") # use slope and intercept to plot the trend
#Now de-trend the time series
y.minus.tbeta <- rep(0,n)
for (i in 1:length(t)){
y.minus.tbeta[i] <- y[i] - b1*t[i]
}
#Plot and take a minute to inspect the residuals
plot(y.minus.tbeta,type="l", col="red",ylab="y",xlab="Time",
main=expression(paste("Detrended Time Series"))); abline(a=2,b=0,lty="dashed")
# Find the least squares estimate of the slope
slope1 <- lm(y ~ t)
slope1
# Why is the estimated slope different from the true slope?
plot(y, type="l", col="red",
ylab="y",xlab="Time",
main = "Simulated Deterministic Trend y=2+3/2t + Noise")
abline(a = 2, b = 3/2, lty = "dashed")
abline(a = slope1$coefficients[1], b = slope1$coefficients[2], lty = "dashed", col = "green")
```
As we learned from the lecture, this is caused by high leverage of starting and ending point in a time series. Moreover, it also can caused by selection bias since we use a certain interval of sampling intervals, not historical (population) time-series data.
# Deterministic Trend and Serial Correlation
Assume the following DGP:
$$ y = \beta_0 + \phi_1 y_{t-1} + \beta_1 t + \varepsilon_t$$
In the above process, $\phi_1$ represents an autoregressive parameter. In particular, an AR$(1)$ process, as we are ussing one lag $y_{t-1}$ to estimate $y_t$ present level. Simulate new data with a deterministic trend and serial correlation.
```{r, fig.align='center', fig.height=3.5, fig.width=5}
#Set the slope
b1 <- 3/2
#Set the intercept
b0 <- 2
#Set phi
phi <- 0.33
#Set the number of periods
n <- 50
t <- seq(0,n)
y <- rep(0,n)
for (i in 2:length(t)){
y[i] <- y[i-1]*phi + b1*t[i] + b0 + rnorm(1,0,15)
}
#Plot the data
plot(y,type="l", col="red",
ylab="y",xlab="Time",
main=expression(paste("Simulated Deterministic Trend + Noise + Serial Correlation")));
abline(a=2,b=3/2,lty="dashed")
# De-trend the time series
y.minus.tbeta2 <- rep(0,n)
for (i in 1:length(t)){
y.minus.tbeta2[i] <- y[i] - b1*t[i]
}
#Plot the data and take a minute to inspect the residuals again
plot(y.minus.tbeta2,
type="l", col="red",
ylab="y",xlab="Time",
main=expression(paste("Detrended Time Series + Noise + Serial Correlation")));
abline(a=2,b=0,lty="dashed")
```
## Autoregressive Processes
We define an autoregressive process AR$(1)$, as follows:
$$y_t = y_{t-1}\phi_{1}+\epsilon_t$$
where past realizations, $y_{t-k}$, influence current levels of $y$. In the AR(1) case, each new realization of $y_t$ incorporates the last period's realization, $y_{t-1}$,
$$y_t= \sum^{\infty}_{j=0}\epsilon_{t-j}\phi^{j}$$
If $y_t$ is AR(1), then $y_t$ includes the effects of every random shock back to the beginning of time. However, as far as the process is stationary $|\phi_1| < 1$, the shock exponentially decreases. Hence, it can also be thought of as the sum of exponentially weighted random shocks. So although previous observations are important to predicting the current observation, the power of those previous observations diminishes over time as one moves more temporally distant from them.
You can use the function `arima.sim()` to univariate time series. Simulate an AR(1) process with $\phi_1$ of 0.5. Plot the data and examine the ACF and PACF. Then, use the functions `acf` and `pacf` look for significant lags.
### Autocorrelation
After de-trending and removing seasonality from the time series, we can check whether the time series have some forms of serial correlation. For today's section, we will only focus on how to diagnose autoregressive process.
Remember what population correlation formula looks like:
$$
corr(x,y) = \frac{cov(x,y)}{\sigma_x\sigma_y} = \frac{\overbrace{E[(x-\mu_x)(y-\mu_y)]}^\text{Covariance: measures how much x and y covary}}{\underbrace{\sigma_x\sigma_y}_\text{Standard Deviation: normalizes the degree of covarying}}
$$
Remember what (sample) correlation formula looks like:
$$
\hat{corr}(x,y) = \frac{\sum_{i=1}^n{(x_i-\overline{x})(y_i-\overline{y}})}{(n-1) s_x s_y}
$$
To see why there should be $n-1$, see [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#For_a_sample) on Wikipedia.
### Population Autocorrelation
Suppose we want to investigate the population of time series, that is, we assume the number of periods in a time series is infinite. Then population autocorrelation $r_k$ is $corr(y_t, y_{t-k})$:
$$
r_k = \frac{cov(y_t, y_{t-k})}{\sigma_y^2}=\frac{\sum_{t=k+1}^T(y_t-\overline{y})(y_{t-k}-\overline{y})}{\sum_{t=1}^T(y_t-\overline{y})^2}
$$
Plainly, it determines how much each observation is correlated to the observations lagged by $k$.
Or, how much each observation covariates with the observations lagged by $k$ (which we call autocovariation) relative to the total variance of all observations $y$.
We call this autocorrelation function, or *acf*, and denote by $\rho_k$.
We denote autocovariation as $\gamma_k$.
### Sample Autocorrelation
Since we use sampling intervals in our usual time-series data, we have to estimate the population autocorrelation with our sample autocorrelation.
First, the sample autocovariation, $c_k$, is calculated as:
$$
c_k = \frac{1}{n-k}\sum_{t=1}^{n-k}(y_t-\overline{y})(y_{t-k}-\overline{y})
$$
Since we use sampling intervals in our usual time-series data, we have to estimate the population autocorrelation with our sample autocorrelation.
First, the sample autocovariation, $c_k$, is calculated as:
$$
c_k = \frac{1}{n-k}\sum_{t=1}^{n-k}(y_t-\overline{y})(y_{t-k}-\overline{y})
$$
Note that $c_0$ will be just $s_y^2$. Therefore, the sample autocorrleation, $r_k$, is:
$$
r_k = \frac{c_k}{c_0} = \frac{\sum_{t=k+1}^{n}(y_t-\overline{y})(y_{t-k}-\overline{y})}{\sum_{t=1}^{n}(y_t-\overline{y})^2} \cdot \frac{n}{n-k}
$$
- As $n$ increases, the $n/(n-k)$ term will converge to $1$.
- Also, $r_k=r_{-k}$. That is, whether one computes the statistic from leads or lags, one should arrive at identical estimates.
- Like correlation coefficient, we also have hypothesis testing and corresponding confidence interval for $\rho_k$.
- This is exactly how R calculate ACF. To see why, check the next slide.
### Calculating ACF in R
Let's calculate $r_1$ with a hypothetical data. With manual check, we know that `acf()` actually use the sample autocorrelation formula to calculate $r_k$.
```{r}
set.seed(98105)
y <- rnorm(5, 10, 1) # simulate a time series with 5 periods
var.y <- var(y) # (sigma_y)^2
mean <- mean(y)
# manually calculate sample autocorrelation
c_1 <- (
(y[1]-mean)*(y[2]-mean) + (y[2]-mean)*(y[3]-mean) +
(y[3]-mean)*(y[4]-mean) + (y[4]-mean)*(y[5]-mean)
)/(5-1)
r_1 <- c_1/var.y
r_1
acf(y, plot = FALSE) # acf() actually use the same formula!
```
### ACF function in R
```{r}
wave <- read.csv("data/wave.csv") %>% ts()
plot(wave)
# a compact version of plotting the trend in plot()
abline(reg = lm(wave ~ time(wave)))
acf(wave, plot = FALSE)
# 0, 1, 2, ... means lag, which is k
# 1.000, 0.470, -0.263, ... means acf per each lag (k)
acf(wave) # so this is simply the plot of all acfs per k=1, 2, 3, ..., 25
```
### How can we use ACF to identify the functional form of time series?
- A time-series data with trend or seasonal variation has high acf, which may cause upward bias in our sample acf. Therefore, we generally de-trend a time-series data to obtain the sample acf.
- Suppose that we already remove trend and seasonality from time series.
- If it is a white noise, then there will be no significant autocorrelation for any lags.
- If it is an autoregressive process AR($p$), we would expect that ACF will tail off after period $p$.
- If it is an moving average process MA($q$), we would expect that ACF will cut off after period $q$.
## Seasonality
Besides using ACF and PACF to detect seasonality, we use `decompose()` or `stl()` to extract seasonality from time series. This is also known as STL decomposition (**S**easonal and **T**rend decomposition using **L**oess^[This is the abbrevation of Locally Weighted Scatterplot Smoothing or locally weighted smoothing.]). It assumes that any time series can be decomposed in either additive or multiplicative ways:
- Additive model: $S + T + L$
- Multiplicative model: $S \times T \times L = \log(S) + \log(T) + \log(L)$
Now let's compare and re-examine two previous examples: Australia Beer consumption and Airline Passenger Numbers.
```{r}
# Electricity (millions of kWh), beer (Ml), and chocolate production (tonnes) in Australia from January 1958 to December 1990
cbe.ts <- read.csv("data/cbe.csv") %>% ts(start = 1958, freq = 12)
head(cbe.ts) # this example of multiple time series shows that each column is a single time series
aus.beer <- cbe.ts[, 2] # Beer consumption in Australia
plot(aus.beer)
```
Let's check the beer example.
- Note that if we don't specify the total periods of a cycle, `decompose()` and `stl()` will automatically use pre-specified frequency information to extract seasonality. In the beer example, the frequence is 12, so `decompose()` will assume a 12-month cycle and extract seasonality.
- If we think there should be longer cycle in time series, we need to tell `decompose()` or `stl()` to extract seasonality based on a new frequency.
```{r}
decompose(aus.beer, type = "additive") %>% plot()
# or
stl(aus.beer, s.window = 12) %>% plot()
# or you can extract seasonality from scratch - this is what decompose() or stl() actually does to extract seasonality
beer.detrend <- lm(aus.beer ~ time(aus.beer)) %>% resid()
beer.detrend %>%
# create a matrix in which each column denotes a month
matrix(ncol = 12, byrow = TRUE) %>%
# calculate the average value of each month over years
colMeans() %>%
# repeat the cycle - there are 33 years in the time series
rep(33) %>%
# transform back into ts class
as.ts() %>%
plot() # we get the same seasonal pattern in decompose() and stl()
```
We need to use `lm()` to extract deterministic trend and then use `decompose()` or `stl()` to extract the seasonality pattern. After de-trending and removing seasonality, then we can use `acf` and `pacf` to determine whether there is autoregressive process in the remaining part of time series.
- We can also use the "random" part in `decompose()` in ACF and PACF plot. However, because `decompose()` use LOESS to fit trend, it is difficult to identify the slope of deterministic trend in our hypothesized functional form of time series.
- Hence, we recommend use `lm()` to de-trend time series.
```{r}
# PACF directly from stl, not linear trend
stl(aus.beer, s.window = 12)$time.series[, "remainder"] %>% pacf()
# PACF after de-ternding using lm()
# 1st, detrend the data and save the de-trended residuals
beer.detrend <- lm(aus.beer ~ time(aus.beer)) %>% resid() %>% ts(freq = 12)
# 2nd, use either decompose or stl to
beer.stl <- stl(beer.detrend, s.window = 12)
head(beer.stl$time.series)
pacf(beer.detrend - beer.stl$time.series[, "seasonal"])
```
The two PACF plots are different because in the first code block, it fit a loess to extract trend, whereas in the second code block, it fit a straight regression line. Again, if we want to recover the functional form of time series, linear assumption is more convenient for us. If the time series has a multiplicative pattern, we need to take a log and then use division to remove trend and seasonality.
### ACF Examples
Let's see how ACF behaves when a time series has trend, seasonality, and AR(1) process.
```{r}
AP <- AirPassengers
plot(AP) # it has trend, seasonality, and potentially AR process
acf(AP) # before de-trending and removing seasonality, it has high autocorrelation and fluctuation
# step-by-step using base R
# use lm() to de-trend daa
AP.trend <- lm(AP ~ seq_along(AP)) # alternative to time()
summary(AP.trend)
AP.detrend <- AP.trend$residuals
acf(AP.detrend) # now we only have fluctuation
# acf after removing seasonality
AP.remainder <- AP.detrend - decompose(AP)$seasonal
acf(AP.remainder) # less fluctuation now
```
The acf diminishes as lag increases due to the increasing trend, and the fluctuation is due to the seasonal variation.
### ACF of white noises
```{r}
set.seed(2022)
whitenoise <- ts(rnorm(50))
acf(whitenoise)
```
As a true white noise simply varies randomly through time, ACFs will be zero (no correlation between lagged time series). More precisely, it is expected that 5% of the ACFs will be significantly different from zero at the 95% significance level.
As this white noise is $T=50$, the blue dotted line ranges around $\pm2/\sqrt{50}=\pm0.28$. Keep in mind that a single significant value may be due to sampling variation.
## Partial ACF
But it is often hard to determine the components of a series based only on ACF, especially when we have trends, seasonal variation or mixed ARMA processes.
Therefore, we also use *partial* ACF, PACF, which is a measure of correlation between time series observations that are $k$ units apart, *after the correlation at intermediate lags has been controlled for or "partialled" out*. In other words, the PACF($k$) is the correlation between $y_t$ and $y_{t-k}$ after removing the effect of the intermediate $y$'s.
We simply use function `pacf()` to see the correlogram of PACF.
PACF can be derived by:
$$
PACF(j) = corr(y_{t-j}, \hat{e_t})
$$
where $\hat{e_t}$ is an estimator of a white noise of our time series data.
### How can we use PACF to identify the functional form of time series?
- A time-series data with trend doesn't create much upward bias in pacf, because some of this information is already **partialled out**.
- Seasonal variation will become more clear in pacf. Suppose that there is a 12-month cycle, then we would expect that there is a spike at $t = 12$ in pacf plot.
- Suppose that we already remove trend and seasonality from time series.
- If it is a white noise, then there will be no significant autocorrelation for any lags.
- If it is an autoregressive process AR($p$), we would expect that ACF will cut off after period $p$.
- If it is an moving average process MA($q$), we would expect that ACF will tail off after period $q$. (we will learn MA process next week)
### PACF Examples
Let's see how PACF behaves for white noises.
```{r, fig.width=7, fig.height=4}
pacf(whitenoise)
acf(AP)
pacf(AP)
```
# Identifying Unknown Time Series Processes
## Coding Exercise: Identifying Unknown Time Series Processes
Please, load the file `df_practice.csv`, you will find in there 3 time series, $y$, $z$, $w$, each represents 12 months frequency data over 10 year (aka 120 observations). Analyze the dynamics
1. Identify the *deterministic trend* in the time series.
2. Identify the *seasonality* in the time series.
3. Remove the deterministic trend and seasonality from the time series.
4. Plot the pattern of ACF and PACF. What functional form do you think is reasonable in this time series?
5. Use `arima.sim()` to verify your guess.
6. Use `Box.test()` to verify for white noise.
```{r}
# load data
# Turn into ts object
# 1. Identify the deterministic trend in the time series.
# 2. use lm() to de-trend data
# 3. remove seasonality
# 4. What ACF and PACF suggests?
# 5. test some models
# 6. test for white noise
# conclusion?
```