CSSS/POLS 512: Lab 3

Time Series Diagnostics

Tao Lin

April 15, 2022

Agenda

We will change the format of lab section for the rest of this quarter. In each section, I will write code in exercise step by step. This format already received some quite positive feedback.
Today’s topic - Box-Jenkins Method: Determine the Functional Form of Time Series

Homework 1

We want to understand what kind of temporal dependence underlie behind the observed time series.

Utilize our generic knowledge about the data (e.g. monthly data \(\Rightarrow\) freq=12)
Draw and observe the original time series plot and ACF/PACF plots
If trend is suspectful:

Regress observed values on time (e.g. y ~ time(y))

If seasonality is suspectful:

Three ways to determine seasonality:
- look at the plot of observed time series
- look at the seasonal part of time series after decomposition
- Splicing the time series and overplotting each cycle
Decide whether it is additive / multiplicative
- additive seasonality tends to have same fluctuation over time
- multiplicative seasonality tends to have increasing fluctuation over time
If additive, find \(\kappa\) per each month through regression
If multiplicative, find \(\phi_k\) in the PACF plot where \(k\) indicates the frequency of seasonality

If autocorrelation is suspectful:

Focus on ACF/PACF plots
Gradually decline in ACF: likely to be AR
Dramatically decline in ACF: likely to be MA
See spikes in ACF and PACF plot that match- likely to be our parameter estimates (\(\phi\) for AR terms and \(\rho\) for MA terms)

De-trend and de-season whenever possible and draw its plots (ACF/PACF)

Subtract the observed values from the predicted values based on our assumed model
ex) De-trend: \(y_t-\beta{t}\) where \(\beta\) is a coefficient we get from linear regression
ex) De-season (additive): Subtract estimated \(\kappa\) for each month and plot it

We can use decompose() or stl(), but be aware that it is based on explicit assumptions that we have to specify (otherwise, it will decompose for you based on its default assumptions)

e.g. decompose() and stl() will automatically extract seasonality based on pre-specified frequency in time series. If you think the time series has a larger cycle, you need to specify it using stl(..., s.window = ...).
decompose() and stl() will automatically extract seasonality even if your time series does not really have one. In this case, human judgement could be more reliable.

Practical Rules for Tentative Identification¹

“Ideal” shapes in ACF and PACF plots:

	AR(\(p\))	MA(\(q\))	ARMA(\(p,q\))
ACF	Tails off	Cuts off after lag \(q\)	Tails off
PACF	Cuts off after lag \(p\); PACF(\(p\)) \(= \phi_p\)	Tails off (potentially with oscillations)	Tails off

Other “irregular” shapes in ACF plot:

Exponential decay to zero: AR model (can use the PACF plot to identify the order \(p\) for AR model)
Damped oscillations decaying (exponentially) to zero: AR model
One or more spikes, the rest is essentially zero: MA model (order \(q\) can be identified by where ACF plot becomes zero)
Exponential decay starting after a few lags: Mixed ARMA model
No significant autocorrelations (zero or close to zero): White noise
High values at fixed intervals: Include seasonal AR terms
No decay to zero or very slow decay: Non-stationarity or long-memory effects

Unit Root Tests (before we move to non-stationary..)

Intuition: if time series is stationary, then regressing \(y_{t}-y_{t-1}\) on \(y_{t-1}\) should produce a negative coefficient. Why?

In a stationary series, knowing the past value of the series helps to predict the next period’s change. Positive shifts should be followed by negative shifts (mean reversion).

\[y_t = \rho y_{t-1} + \epsilon_{t}\] \[y_t - y_{t-1}= \rho y_{t-1} - y_{t-1} + \epsilon_{t}\] \[\Delta y_{t} = \gamma y_{t-1} + \epsilon_{t}\text{, where } \gamma=(\rho - 1)\]

Augmented Dickey-Fuller test: null hypothesis of unit root. That is, if we fail to reject the null (\(p > 0.05\)), the time series is very likely to be non-stationary.

Same with Phillips-Perron test, but differs in how the AR(\(p\)) time series is modeled: lags, serial correlation, heteroskedasticity.

In R, we use pp.test() and adf.test() in tseries package to perform unit root test.

Exercise - Identify Simulated Time Series in R

In this exercise, we will have 3 time series: two are simulated, and one is a real-world example. We will use the above tools to identify the temporal dependence in these time series.

For more information, see Lab3_exercise.Rmd in Lab3_replication.zip.

Next Week

Model Estimation
Model Assessment and Cross-Validation
Model Interpretation and Counterfactual Forecasting

Footnotes

See https://math.unice.fr/~frapetti/CorsoP/Chapitre_5_IMEA_1.pdf.