CSSS/POLS 512

class: center, middle, inverse, title-slide

.title[
# CSSS/POLS 512
]
.subtitle[
## Lab 2: Time Series Diagnostics — ACF, PACF, and the Box-Jenkins Method
]
.author[
### Ramses Llobet
]
.date[
### Spring 2026
]

---

# Today's Plan

.pull-left[
**Part 1: Building Up a Time Series**
- What components can a time series have?
- Building the composite equation step by step
- Simulations for each component

**Part 2: ACF/PACF & "Guess the Process"**
- ACF/PACF as diagnostic tools
- The identification table
- Interactive: identify 9 unknown series
]

.pull-right[
**Part 3: Stationarity Tests & Residual Diagnostics**
- ADF, KPSS: what they test and when to use them
- Ljung-Box and Jarque-Bera
- Estimation and model comparison

**Part 4: Practice**
- Diagnose 6 mystery series using Box-Jenkins
- ~20 minutes hands-on
]

---
class: inverse, center, middle

# Part 1: Building Up a Time Series

---

# The Big Picture

Most time series can be decomposed into a combination of recognizable components:

`$$y_t = \underbrace{\beta_0}_\text{level} + \underbrace{\beta_1 t}_\text{trend} + \underbrace{S_t}_\text{seasonal} + \underbrace{\phi_1 y_{t-1}}_\text{AR(1)} + \underbrace{\theta_1 \varepsilon_{t-1}}_\text{MA(1)} + \underbrace{\varepsilon_t}_\text{white noise}$$`

Not every series has all of these. The goal of **Box-Jenkins diagnostics** is to determine which components are present and how strong they are.

The diagnostic order matters: **first** address trend and seasonality, **then** identify AR and MA. Let's build this equation **one component at a time**.

---

# Step 1: Level + White Noise

`$$y_t = \beta_0 + \varepsilon_t, \quad \varepsilon_t \sim N(0, \sigma^2)$$`

The simplest possible time series: a constant mean plus random noise. No memory, no patterns. This is our **null model** — what "no structure" looks like.

.footnote[*Code: `Lab2.Rmd` → Section 1.1 (ts objects) and Section 1.4 (white noise, ACF/PACF, Ljung-Box)*]

---

# Step 2: Add a Trend

`$$y_t = \beta_0 + \color{firebrick}{\beta_1 t} + \varepsilon_t$$`

Now the mean drifts over time. Each period, the expected value increases by `$\beta_1$`. The series fluctuates around a **line**, not a constant.

.footnote[*Code: `Lab2.Rmd` → Section 2.1 (deterministic trends, detrending with `lm()`)*]

---

# Step 3: Add Seasonality

`$$y_t = \beta_0 + \beta_1 t + \color{firebrick}{S_t} + \varepsilon_t$$`

A repeating pattern at a fixed period (e.g., 12 months). The series oscillates with a predictable rhythm layered on top of trend and noise.

Trend and seasonality are the **first things to address** — they dominate the ACF and mask the AR/MA structure underneath.

.footnote[*Code: `Lab2.Rmd` → Section 1.3 (STL preview) and Section 2.3 (deseasonalization: STL vs. `lm()` + seasonal means)*]

---

# Step 4: Add Autoregressive Dependence (AR)

`$$y_t = \color{firebrick}{\phi_1 y_{t-1}} + \varepsilon_t, \quad |\phi_1| < 1$$`

Now each observation depends on the **previous value**. The series develops **persistence** — smooth, wandering patterns. High values tend to be followed by high values.

Compare to white noise: the AR(1) series is **smoother** — it "remembers" where it has been.

.footnote[*Code: `Lab2.Rmd` → Section 1.5 (AR processes, varying φ, AR(2))*]

---

# Step 5: Moving Average Shocks (MA)

`$$y_t = \varepsilon_t + \color{firebrick}{\theta_1 \varepsilon_{t-1}}$$`

Now the current value depends on **past shocks**, not past values. The effect of any shock lasts exactly one period, then disappears — **finite memory**.

Compare to AR(1): the MA(1) series is **rougher** — less persistence, quicker return to the mean.

.footnote[*Code: `Lab2.Rmd` → Section 1.6 (MA processes) and Section 1.7 (ARMA)*]

---

# The Full Composite

Putting it all together — and in **diagnostic order**:

`$$y_t = \underbrace{\beta_0}_\text{level} + \underbrace{\beta_1 t}_\text{trend} + \underbrace{S_t}_\text{seasonal} + \underbrace{\phi_1 y_{t-1}}_\text{AR} + \underbrace{\theta_1 \varepsilon_{t-1}}_\text{MA} + \underbrace{\varepsilon_t}_\text{noise}$$`

The **Box-Jenkins method**: address each layer in order, then identify what remains.

| Step | What you do | What you remove |
|:-----|:------------|:----------------|
| 1. **Visual inspection** | Plot the series | — |
| 2. **Detrend** | Regress on `$t$`, keep residuals | Trend |
| 3. **Deseasonalize** | Subtract seasonal means or STL | Seasonality |
| 4. **Identify AR/MA** | Examine ACF/PACF of the cleaned series | — |
| 5. **Estimate & diagnose** | Fit model, check residuals for white noise | AR/MA structure |

If residuals still show structure → **revise and repeat**.

---
class: inverse, center, middle

# Part 2: ACF/PACF & "Guess the Process"

---

# ACF and PACF: The Diagnostic Tools

**ACF** — correlation between `$y_t$` and `$y_{t-k}$` (includes indirect paths through intermediate lags).

**PACF** — correlation between `$y_t$` and `$y_{t-k}$` **after removing** the linear effect of lags `$1, \dots, k-1$`.

### The Identification Table

| Process | ACF | PACF |
|:--------|:----|:-----|
| **AR(p)** | Tails off (decays) | **Cuts off** after lag `$p$` |
| **MA(q)** | **Cuts off** after lag `$q$` | Tails off (decays) |
| **ARMA(p,q)** | Tails off | Tails off |

"Cuts off" = drops to zero (within confidence bands) after a specific lag.

"Tails off" = decays gradually (exponential, sinusoidal, or both).

The blue dashed **confidence bands** are `$\pm 1.96/\sqrt{n}$` — they get narrower with more data. A single spike barely crossing the band is likely noise (5% will cross by chance).

**Important:** This table only works on **stationary** data — detrend and deseasonalize first!

---

# How Each Process Looks in ACF/PACF

.footnote[*Code: `Lab2.Rmd` → Section 1.8 (identification table)*]

---

# The Ljung-Box Test: "Are My Residuals Clean?"

The **Ljung-Box test** is the formal answer to: *"Is there any autocorrelation left?"*

- `$H_0$`: No autocorrelation (white noise)
- `$H_1$`: Serial correlation exists

### Why this matters for regression

.pull-left[
**Ljung-Box fails to reject** `$(p > 0.05)$`:

Residuals are white noise → your OLS standard errors, `$t$`-tests, and confidence intervals are **valid** → regular regression is fine.
]

.pull-right[
**Ljung-Box rejects** `$(p \leq 0.05)$`:

Autocorrelation in residuals → your standard errors are **too small** → you're over-rejecting `$H_0$` → you need to model the dynamics (ARMA) or use robust SEs (Newey-West).
]

**Bottom line:** This test is the bridge between time series diagnostics and the regression you already know. If Ljung-Box passes, you're done. If it fails, you need the tools from this lab.

.footnote[`Box.test(residuals(fit), lag = 10, type = "Ljung-Box")`]

---

# Now Let's Practice — Guess the Process!

For each series I will show you:

1. First, the **time series plot** — look at it and form a hypothesis
2. Then, the **ACF and PACF** — match to the identification table

Series A–D are **clean** (stationary, no trend or seasonality).

Series E–G have **complications** (trend, seasonality, unit root).

The .Rmd covers additional processes (negative AR, AR(2)) that we skip here for time — work through them in the lab document.

---

# Guess the Process: Series A

---

# Series A: ACF and PACF

**Answer: White Noise.** No significant spikes in either ACF or PACF — no autocorrelation structure.

---

# Guess the Process: Series B

---

# Series B: ACF and PACF

**Answer: AR(1), `$\phi = 0.85$`.** ACF decays slowly; PACF has a single significant spike at lag 1.

---

# Guess the Process: Series C

---

# Series C: ACF and PACF

**Answer: MA(1), `$\theta = 0.9$`.** ACF cuts off sharply after lag 1; PACF tails off. The **mirror image** of the AR(1) pattern.

---

# Guess the Process: Series D

---

# Series D: ACF and PACF

**Answer: ARMA(1,1), `$\phi = 0.7, \theta = 0.4$`.** Both ACF and PACF tail off — neither cuts off cleanly. When this happens → compare candidate models with AIC.

---
class: inverse, center, middle

# Now the Trickier Ones: Complications

---

# Series E: What's going on here?

---

# Series E: Reveal

**AR(1) with `$\phi = 0.6$` + deterministic trend.** The ACF decays very slowly — this is the trend's signature, not genuine persistence. **Fix: detrend first** (regress on time), then re-examine ACF/PACF.

---

# Series F: What's going on here?

---

# Series F: Reveal

**AR(1) + seasonal component (period = 12).** ACF shows peaks at lags 12, 24, 36. PACF: spike at lag 1 (AR) and lag 12 (seasonal). **Fix: remove seasonality first** (STL or seasonal means), then diagnose the remainder.

---

# Series G: What's going on here?

---

# Series G: Reveal

**Random walk** ( `$y_t = y_{t-1} + \varepsilon_t$` ). ACF barely decays — stays near 1.0 across all lags. This is **non-stationary**. **Fix: first-difference** ( `$\Delta y_t$` ) to recover white noise.

---

# Summary: Complications and Their Signatures

| Complication | What you see | ACF signature | What to do |
|:-------------|:-------------|:--------------|:-----------|
| **Deterministic trend** | Upward/downward drift | Very slow decay | Detrend with `lm()` |
| **Seasonality** | Repeating periodic pattern | Spikes at period multiples | STL or seasonal means |
| **Unit root** | Wandering, no fixed mean | Barely decays from 1.0 | Difference ( `$\Delta y$` ) |

**Key lesson:** Handle trends, seasonality, and non-stationarity *before* reading the identification table. The ACF/PACF rules only work on **stationary** data.

.footnote[*Code: `Lab2.Rmd` → Section 2.1 (trends), Section 2.2 (unit roots, ADF/KPSS), Section 2.3 (seasonality comparison)*]

---
class: inverse, center, middle

# Part 3: Stationarity Tests & Residual Diagnostics

---

# Stationarity Tests: ADF and KPSS

Two tests with **opposite null hypotheses** — use both together:

.pull-left[
### ADF (Augmented Dickey-Fuller)
- `$H_0$`: **Unit root** (non-stationary)
- `$H_1$`: Stationary
- Small `$p$` → reject → **evidence for stationarity**
- `tseries::adf.test(x)`
]

.pull-right[
### KPSS
- `$H_0$`: **Stationary**
- `$H_1$`: Unit root
- Small `$p$` → reject → **evidence for non-stationarity**
- `tseries::kpss.test(x)`
]

### The Decision Table

| ADF result | KPSS result | Interpretation |
|:-----------|:------------|:---------------|
| Reject | Fail to reject | Both agree: **stationary** |
| Fail to reject | Reject | Both agree: **non-stationary** → difference |
| Fail to reject | Fail to reject | Ambiguous — need more data or alternative tests |
| Reject | Reject | Contradictory — possible structural break |

.footnote[*Code: `Lab2.Rmd` → Section 2.2 (stationarity tests, differencing, diagnostic toolkit table)*]

---

# Why Use Both Tests?

Hypothesis tests can only provide evidence **against** the null, not **for** it.

- If ADF **fails to reject** the unit root null, that could mean:
  - (a) There really is a unit root, **or**
  - (b) ADF simply lacks **power** (common when `$\phi \approx 0.95$` or small `$n$`)

- KPSS flips the null → if KPSS **also fails to reject** (stationarity null), we have **positive evidence for stationarity**, not just absence of evidence.

- The **Phillips-Perron test** (`PP.test()`) is an alternative to ADF with the same null. It handles serial correlation differently (nonparametric correction). Useful when ADF and KPSS disagree.

---

# Residual Diagnostics: The Finish Line

After fitting an ARMA model, check whether residuals `$\hat{e}_t$` look like **white noise**:

.pull-left[
### Ljung-Box test
- `$H_0$`: No autocorrelation (white noise)
- `$H_1$`: Serial correlation exists
- **Pass:** `$p > 0.05$` → residuals are clean
- `Box.test(resid, type = "Ljung-Box")`

### Jarque-Bera test
- `$H_0$`: Residuals are normally distributed
- `$H_1$`: Non-normal (excess skew/kurtosis)
- `tseries::jarque.bera.test(resid)`
]

.pull-right[
### All-in-one: `checkresiduals()`

```r
fit <- Arima(y, order = c(1, 0, 1))
checkresiduals(fit)
```

Produces:
1. Residual time plot
2. Residual ACF
3. Histogram
4. Ljung-Box `$p$`-value

If residuals **fail** → revise model → re-estimate → check again.
]

.footnote[*Code: `Lab2.Rmd` → Section 3.1 (estimation), Section 3.2 (residual checks, Jarque-Bera), Section 3.3 (AIC comparison)*]

---

# Diagnostic Toolkit Summary

| Test | R function | `$H_0$` | Use when... |
|:-----|:-----------|:-------|:------------|
| **Ljung-Box** | `Box.test(x, type="Ljung-Box")` | White noise | Testing for remaining autocorrelation |
| **Jarque-Bera** | `jarque.bera.test(x)` | Normal distribution | Checking residual normality |
| **ADF** | `adf.test(x)` | Unit root | Checking if differencing is needed |
| **KPSS** | `kpss.test(x)` | Stationary | Complementing ADF |
| **Phillips-Perron** | `PP.test(x)` | Unit root | Alternative to ADF |

**A note on `auto.arima()`:** Useful as a sanity check, but not a substitute for understanding the Box-Jenkins procedure. It can miss the correct specification or propose complex models that are hard to interpret. Always verify with `checkresiduals()`.

---
class: inverse, center, middle

# Part 4: Practice — Diagnose Unknown Series

---

# Practice Instructions

Open **`Lab2.Rmd`**, Part 4.

You will find **6 mystery time series**. For each one:

1. **Plot** the series — look for trends, seasonality, stationarity
2. **Test stationarity** — run ADF and KPSS
3. **Examine** the ACF and PACF (after detrending/deseasonalizing if needed)
4. **Identify** a candidate model using the identification table
5. **Estimate** the model with `Arima()`
6. **Diagnose** the residuals — are they white noise?

.pull-left[
| ACF | PACF | → Model |
|:----|:-----|:--------|
| Tails off | Cuts off at `$p$` | AR$(p)$ |
| Cuts off at `$q$` | Tails off | MA$(q)$ |
| Tails off | Tails off | ARMA$(p,q)$ — compare with AIC |
]

.pull-right[
**Tips:**
- Series B has `frequency = 12`
- If ACF decays very slowly → check for trend or unit root *first*
- If both ACF/PACF tail off → try ARMA(1,1) as a starting point
- When in doubt, let `auto.arima()` give you a second opinion
]

You have **~20 minutes**. We will debrief together.

---

# Wrap-Up

**Today we covered:**
- The composite equation: level, trend, seasonality, AR, MA, noise — in diagnostic order
- ACF and PACF as fingerprints for AR, MA, and ARMA processes
- Common complications: trends, seasonality, unit roots — handle these first
- Stationarity tests: ADF + KPSS (opposite nulls, use together)
- Residual diagnostics: Ljung-Box + Jarque-Bera

**Coming up next:**
- Estimation and interpretation of ARMA models with covariates
- Model selection: AIC, cross-validation
- Forecasting with ARIMA

**Self-study pointers:**
- Lab2.Rmd Appendices A and B (Box-Jenkins flowchart + mathematical details)
- Shumway & Stoffer, Ch. 3 (ARIMA models)
- Practice: simulate your own ARMA processes, try to identify them

---
class: inverse, center, middle

# Questions?