CSSS/POLS 512

class: center, middle, inverse, title-slide

.title[
# CSSS/POLS 512
]
.subtitle[
## Lab 6: Nickell Bias and GMM Dynamic Panel Estimators
]
.author[
### Ramses Llobet
]
.date[
### Spring 2026
]

---

# Preview

.pull-left[
**Dynamic panel estimators**

- Nickell bias and where it comes from
- The IV/GMM fix, by hand
- `pgmm()`: formula syntax and arguments
- The five diagnostic tests on every printout
]

.pull-right[
**Application — taxes and cigarettes**

- Diff-GMM on a 48-state × 11-year panel
- Counterfactual: a 60-cent tax hike
- Three quantities of interest: **EV**, **FD**, **RR**
]

---
class: inverse, center, middle

# Nickell bias

---

# Nickell bias — the mechanism

The simplest dynamic panel:

`$$y_{it} = \alpha_i + \phi y_{i,t-1} + \beta x_{it} + \varepsilon_{it}, \quad \varepsilon_{it} \sim \text{iid}$$`

The **within transformation** subtracts unit means: `$\tilde y_{it} = y_{it} - \bar y_i$`.

The trouble: `$\tilde y_{i,t-1}$` contains `$\bar y_{i,-1}$` which depends on `$\varepsilon_{i,1}, \ldots, \varepsilon_{i,T-1}$`. The within error `$\tilde \varepsilon_{it}$` contains `$\bar \varepsilon_i$` which depends on `$\varepsilon_{i,1}, \ldots, \varepsilon_{i,T}$`. **They share terms.**

Closed-form result (Nickell 1981):

`$$\boxed{\;\text{plim}(\hat\phi_{\text{FE}} - \phi) \;\approx\; -\frac{1+\phi}{T-1}\;}$$`

- **Downward** on `$\phi$` — FE understates persistence.
- **Order `$1/T$`** — small `$T$` is bad.
- **$N \to \infty$ does not save you** — only more time periods do.

---

# The fix — first differences plus an instrument

Difference both sides; `$\alpha_i$` vanishes (`$\Delta \alpha_i = 0$`):

`$$\Delta y_{it} = \phi \Delta y_{i,t-1} + \beta \Delta x_{it} + \Delta \varepsilon_{it}$$`

But `$\Delta y_{i,t-1}$` and `$\Delta \varepsilon_{it}$` both contain `$\varepsilon_{i,t-1}$` — **need an instrument**.

**$y_{i,t-2}$** is the natural choice (Anderson-Hsiao 1981):

- **Relevant**: `$\Delta y_{i,t-1} = y_{i,t-1} - y_{i,t-2}$` contains `$y_{i,t-2}$`. ✓
- **Exogenous**: `$y_{i,t-2}$` depends on `$\varepsilon$` only up to `$t-2$`, so it is uncorrelated with `$\Delta \varepsilon_{it} = \varepsilon_{it} - \varepsilon_{i,t-1}$`. ✓

Closed-form IV (= one-step GMM) estimator:

`$$\hat\theta = \big(Z'X\big)^{-1} Z' \Delta y$$`

with `$Z = [y_{i,t-2}, \Delta x_{it}]$` and `$X = [\Delta y_{i,t-1}, \Delta x_{it}]$`. **One matrix solve** — the lab works through this from scratch.

---

# The family of GMM estimators

.pull-left[
| Estimator | Instruments for `$\Delta y_{i,t-1}$` |
|:---|:---|
| **Anderson-Hsiao (1981)** | just `$y_{i,t-2}$` |
| **Arellano-Bond (1991)**<br>*Difference GMM* | full sequence `$y_{i,t-2}, y_{i,t-3}, \ldots$` |
| **Blundell-Bond (1998)**<br>*System GMM* | + lagged differences for a level equation |
]

.pull-right[
**Plus a likelihood outsider:**

| Estimator | Idea |
|:---|:---|
| **OPM** (Pickup et al. 2017) | Marginalize `$\alpha_i$` via orthogonal reparameterization — no instruments. |

**When to reach for what:**

- Diff-GMM: standard default.
- Sys-GMM: `$\phi$` near 1, or time-invariant controls matter.
- OPM: small `$T$` (< 8) where GMM is unreliable.
]

---
class: inverse, center, middle

# `pgmm()` and its diagnostic battery

---

# Anatomy of a `pgmm()` formula

The formula uses up to three blocks separated by `|`:

```
y ~ <regressors> | <GMM-style instruments> | <regular instruments (optional)>
```

- **Left of the first `|`**: the regression equation in *levels*. `lag(y, 1)` is the lagged DV; `lag(x, 0:1)` adds both the contemporaneous and lag-1 versions of `x`.
- **First `|` block**: GMM-style instruments — `lag(y, 2:99)` stacks **one moment per period × lag**. This is where over-identification lives.
- **Second `|` block** (optional): regular IV-style instruments — one stacked moment per variable, no per-period expansion.

**Key arguments:**

| Argument | Choices | What it does |
|:---|:---|:---|
| `data` | a `pdata.frame` | Panel with `index = c(unit, time)` |
| `effect` | `"individual"`, `"twoways"`, `"time"` | Adds unit / period / both FE |
| `model` | `"onestep"`, `"twosteps"` | One-step vs two-step weighting; two-step needs Windmeijer SEs |
| `transformation` | `"d"`, `"ld"` | Difference GMM vs system GMM (levels + differences) |

---

# A canonical Diff-GMM fit on `EmplUK`

```r
fit_demo <- pgmm(
  log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) +
             log(capital)       + lag(log(output), 0:1) |
             lag(log(emp), 2:4),                     # GMM-style instruments
  data           = EmplUK,
  effect         = "twoways",                        # unit + time FE
  model          = "twosteps",
  transformation = "d"
)
summary(fit_demo, robust = TRUE)                     # Windmeijer SEs
```

**`summary.pgmm` prints, in order:**

1. The coefficient table (with Windmeijer-corrected SEs if `robust = TRUE`).
2. **Sargan test** — overidentification.
3. **AR(1) and AR(2) tests** on the differenced residuals.
4. **Wald test for time dummies** (only with `effect = "twoways"`).

---

# The five-row diagnostic checklist

| Test | `$H_0$` | What we want | Note |
|:---|:---|:---|:---|
| **Sargan / Hansen** | `$E[Z'\varepsilon] = 0$` | **Fail to reject** (`$p > 0.10$`) | `$p \approx 1.0$` is **bad** (too many instruments) |
| **AR(1) on `$\Delta\hat\varepsilon$`** | no first-order serial corr | **Reject** (`$p < 0.05$`) | Mechanical from differencing |
| **AR(2) on `$\Delta\hat\varepsilon$`** | no second-order serial corr | **Fail to reject** (`$p > 0.05$`) | Rejection invalidates lag-2 IV |
| **Wald on `$\tau_t$`** | all year dummies = 0 | Reject → keep year FE | Only with `effect = "twoways"` |
| **# instruments** | (rule of thumb) | `$<\,N$` (ideally `$\ll N$`) | Roodman 2009 |

**What to do if a test fails:**

- Sargan rejects → tighten lag depth (start instruments from lag 3) or add more LDV lags.
- AR(2) rejects → same; the levels error is not iid.
- Too many instruments → cap the lag range (`lag(y, 2:4)` instead of `2:99`).

---
class: inverse, center, middle

# Application — taxes and cigarettes

---

# The cigarette panel

48 U.S. states, 1985–1995 (balanced, `$T = 11$`). Variables we need:

| Variable | Description |
|:---|:---|
| `packpc` | packs/capita (outcome) |
| `income`, `pop` | for per-capita income |
| `tax`, `taxs`, `avgprs` | excise tax, total tax, average price (cents/pack) |
| `cpi` | for inflation-adjusting to 1995 dollars |

**The substantive question.** By how much would a tax hike reduce cigarette consumption — and how does the effect unfold over time?

We fit Diff-GMM on the dynamic specification

`$$\text{packpc}_{it} = \alpha_i + \phi \, \text{packpc}_{i,t-1} + \beta_1 \, \text{income95pc}_{it} + \beta_2 \, \text{avgprs95}_{it} + \varepsilon_{it}$$`

and forecast a 3-year trajectory under a 60-cent tax shock.

---

# The Diff-GMM fit on cigarette consumption

```r
fit_cig <- pgmm(
  packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
           lag(packpc, 2:99),
  data           = pdat_cig,        # pdata.frame with index c("state","year")
  effect         = "individual",    # state FE; no year dummies (small T)
  model          = "twosteps",      # optimal two-step weighting
  transformation = "d"              # difference GMM
)
```

Table: Diff-GMM on packpc with Windmeijer-corrected SEs.

|term           | Estimate| Std. Error| z-value| Pr(>&#124;z&#124;)|
|:--------------|--------:|----------:|-------:|------------------:|
|lag(packpc, 1) |    0.639|      0.055|  11.552|              0.000|
|income95pc     |   -0.479|      0.496|  -0.965|              0.334|
|avgprs95       |   -0.180|      0.028|  -6.404|              0.000|

**Read off**: persistence `$\hat\phi$` positive and below 1; **higher price → fewer packs** (`$\hat\beta_{\text{avgprs}}$` negative); income coefficient small.

---

# Diagnostics for `fit_cig`

Table: Diagnostic battery on the cigarette Diff-GMM fit.

|test              | statistic| df_or_N| p_value|decision                       |
|:-----------------|---------:|-------:|-------:|:------------------------------|
|Sargan            |    47.099|      44|   0.347|want fail to reject (p > 0.10) |
|AR(1) on Δresid   |    -3.444|      NA|   0.001|want reject (p < 0.05)         |
|AR(2) on Δresid   |    -0.537|      NA|   0.592|want fail to reject (p > 0.05) |
|# instruments / N |    47.000|      48|      NA|want ratio < 1                 |

**Reading the table.** The Sargan p-value sits near 1 because we used the full `lag(packpc, 2:99)` instrument set on a `$N = 48$` panel — the **Roodman pathology** (§2.5 in the lab). For a clean Sargan, cap at `lag(packpc, 2:4)`.

---

# Counterfactual: a 60-cent tax shock

```r
periods_out <- 3
# Sample 1000 parameter draws from MVN(coef, Windmeijer vcov)
simparam <- mvrnorm(n = 1000, mu = coefficients(fit_cig),
                                Sigma = vcovHC(fit_cig))
simphi  <- simparam[, 1]
simbeta <- simparam[, -1, drop = FALSE]

# Treatment: +60 cent change in avgprs95 at period 1, then sustained
xhyp <- cfMake(packpc ~ income95pc + avgprs95 - 1, data = pdat_cig,
               nscen = periods_out)
xhyp$x <- 0 * xhyp$x; xhyp$xpre <- 0 * xhyp$xpre
xhyp <- cfChange(xhyp, "avgprs95", x = 60, scen = 1)

# Baseline: zero change
xbase <- xhyp; xbase$x <- xbase$xpre

# Three simulators: EV (level), FD (treat - base), RR (% change)
sev_treat <- ldvsimev(xhyp,  b = simbeta, phi = simphi, lagY = lagY,
                      transform = "diff", initialY = initialY, ...)
sev_base  <- ldvsimev(xbase, ...)
sfd       <- ldvsimfd(xhyp,  ...)
srr       <- ldvsimrr(xhyp,  ...)
```

---

# The forecast: 3 panels

<div class="figure" style="text-align: center">
<img src="Lab6_slides_files/figure-html/p3-plot-1.svg" alt="Counterfactual forecast of a 60-cent tax hike. EV: predicted packs/capita under hike (solid) vs baseline (dashed). FD: absolute difference. RR: percent change. Bands are 95% intervals from 1000 draws." width="864" />
<p class="caption">Counterfactual forecast of a 60-cent tax hike. EV: predicted packs/capita under hike (solid) vs baseline (dashed). FD: absolute difference. RR: percent change. Bands are 95% intervals from 1000 draws.</p>
</div>

**Reading the panels.** Higher price → fewer packs. The effect builds via the LDV: year 1 is the immediate response; later years carry forward through `$\hat\phi$`.

---

# Takeaways

.pull-left[
**Methodologically**

1. FE-LDV is **biased downward** when `$T$` is small — Nickell formula gives the leading-order term.
2. The IV/GMM family kills `$\alpha_i$` by differencing and instruments past the residual endogeneity.
3. **`pgmm()`** packages this for you — but check the **five-row diagnostic table** every time.
4. **Too many instruments** ruins Sargan and inflates precision (Roodman 2009).
]

.pull-right[
**Substantively (cigarettes)**

1. A 60-cent tax hike produces a **measurable, persistent drop** in packs/capita.
2. The dynamic structure delivers the effect over **multiple periods**, not just at impact.
3. `simcf` lets us express the same effect as a **level forecast**, a **first difference**, or a **percent change** — pick the metric the audience needs.
]

---

# References

- Anderson & Hsiao (1981) — IV in dynamic models with error components
- Arellano & Bond (1991) — Difference GMM
- Blundell & Bond (1998) — System GMM
- Nickell (1981) — bias derivation
- Roodman (2009a, b) — practical guide; "Too Many Instruments"
- Pickup et al. (2017); Pickup & Hopkins (2022) — orthogonal-panel model
- Windmeijer (2005) — finite-sample SE correction

---
class: inverse, center, middle

# Let's get started!

Open `Lab6.Rmd`.

`rllobet@uw.edu`