class: center, middle, inverse, title-slide .title[ # CSSS/POLS 512 ] .subtitle[ ## Lab 5: Panel Data — Dynamics, FE/RE, Mundlak, IRF ] .author[ ### Ramses Llobet ] .date[ ### Spring 2026 ] --- # Today's Plan .pull-left[ **Part 1: Theory + Data** - DAG: `reg`, `edt` (moderator), `oil` - The Przeworski 1950–1990 panel - Within-vs-between framing **Part 2: Dynamics workflow** - ACF/PACF on detrended series - Per-country ADF/PP scan - Panel unit-root **constellation** (CD, MW/IPS, Hadri, CIPS) - Verdict: I(1)-like; choose Path A ] .pull-right[ **Part 3: Panel models family** - Pooled / FE / TWFE / RE / Mundlak (REWB) - Static vs dynamic (AR(1), ARDL(1,1)) - corARMA(1) for RE / Mundlak - Hausman, PCSE, Driscoll–Kraay **Part 4: Counterfactual via IRF + LRE** - FE-ARDL(1,1) closed form - Permanent democratization - Long-run effect by **education** ] --- class: inverse, center, middle # Part 1 — Theory and Data --- # A Toy Theory of Politics in Panel Data We study **regime → growth** with three covariate roles: .pull-left[ 1. **Within-rich treatment**: `reg` (1 = democracy; flips on coups, pacts, transitions) 2. **Slow-moving covariate AND moderator**: `edt` (cumulative education) — moderates the regime-growth slope 3. **Time-invariant control**: `oil` (major oil exporter, 1/0) **Outcome**: `\(\ln\)`(GDP per worker) **Hypothesis**: more-educated workforces benefit more (or less) from democratization. ] .pull-right[ <img src="output/panel_dag.png" alt="" width="100%" style="display: block; margin: auto;" /> ] --- # Within vs Between — Which `\(\beta\)` Are You Estimating? | Specification | What `\(\hat\beta\)` identifies | Identifying assumption | |:---|:---|:---| | **Pooled OLS** | mixture of within + between | `\(u_{it}\perp X_{it}\)`; `\(\alpha_i\)` common | | **Fixed effects (within)** | within only (`\(\beta_W\)`) | `\(\varepsilon_{it}\perp x_{it}\)` given `\(\alpha_i\)`; allows `\(\text{Cov}(\alpha_i, X)\neq 0\)` | | **Random effects** | matrix-weighted average | `\(\alpha_i\perp X_i\)` | | **Mundlak / REWB** | `\(\beta_W\)` and `\(\beta_B\)` separately | RE on residuals; `\(\bar x_i\)` absorbs cross-level confounding | -- **Decision logic**: - *"Does democratization, **for a given country**, raise GDP?"* → `\(\beta_W\)` - *"Are democracies, **on average across countries**, richer?"* → `\(\beta_B\)` - *"Both — and do they differ?"* → Mundlak / REWB returns both, `\(\beta_W = \beta_B\)` is testable .footnote[*Bell & Jones (2015), Kropko & Kubinec (2020).*] --- # The Przeworski Democracy Panel .pull-left[ **Dimensions**: `\(N = 135\)` countries, 1950–1990, **unbalanced** (`\(T_i \in [10, 40]\)`). **Variables**: `ln_gdpw`, `reg`, `edt`, `oil`, `region`. `pvar()` confirms `oil` is between-only; `reg` and `edt` vary both ways. **Spaghetti plot →** trends, drift, country-specific slopes — coloured by region. ] .pull-right[ <img src="Lab5_slides_files/figure-html/p1-spaghetti-region-1.svg" alt="" width="432" style="display: block; margin: auto;" /> ] --- # Unbalanced Panel: `\(T_i\)` Distribution <img src="Lab5_slides_files/figure-html/p1-T-hist-1.svg" alt="" width="720" style="display: block; margin: auto;" /> Most countries have `\(T_i \approx 30\)`–40; African and Asian post-colonial states cluster at the lower end. Short `\(T\)` matters for unit-root power and Nickell bias. --- class: inverse, center, middle # Part 2 — Dynamics Workflow --- # Why Test for Stationarity Before Estimating? A near-unit-root outcome creates two problems: 1. **Spurious regression** — coefficients can look significant even when there is no real relationship. 2. **AR(1) on level becomes a signature**: `\(\hat\phi\)` on `ln_gdpw_lag` lands near 1 → the long-run multiplier `\(\hat\beta/(1-\hat\phi)\)` blows up. Workflow (per Lab 5): 1. **Visualize** all series — spaghetti by region. 2. **Detrend** country-by-country (OLS); inspect ACF/PACF on residuals. 3. **Per-series ADF + PP** → histogram of unit-level `\(p\)`-values. 4. **Panel unit-root tests**: read a *constellation*, not a single verdict. 5. **Pick a path**: levels with ARDL(1) (Path A) or first differences (Path B). --- # Per-Series Unit-Root Scan After removing each country's linear trend, run ADF and PP on the residuals; plot the distribution of `\(p\)`-values. <img src="Lab5_slides_files/figure-html/p2-unit-root-hist-1.svg" alt="" width="720" style="display: block; margin: auto;" /> Most countries land `\(p > 0.05\)`: low power at `\(T \le 30\)` does not let us reject the unit-root null country-by-country. --- # Panel Unit-Root Constellation We read **multiple tests jointly** because no single one is decisive at this `\(T\)`. | Step | Test | `\(H_0\)` | Reading | |:----|:----|:----|:----| | (a) | **Pesaran CD** | cross-sectional independence | If rejected → 1st-gen tests biased | | (b) | **Maddala-Wu/Fisher**, **IPS** | unit root in *all* panels | 1st-gen, assume CSI | | (c) | **Hadri** | stationarity in *all* panels | KPSS-flip; over-rejects under CSD (Hlouskova & Wagner 2006) | | (d) | **Pesaran CIPS** | unit root, common-factor corrected | 2nd-gen, robust to CSD | -- **Verdict for `ln_gdpw`**: CSD present (a). Fisher rejects but IPS does not (b) — Fisher is sensitive to a few low-$p$ countries. Hadri rejects but discount under CSD. **CIPS does not reject**. → **strong persistence consistent with I(1)-type behavior**. --- # Two Paths — We Pick Path A | Path | Outcome | Dynamics absorbed by | Trade-off | |:--:|:--|:--|:--| | **A** | `\(\ln\)`-level | Lagged DV in levels (ARDL(1)) | Coefficients are level effects; Beck & Katz (2011) recommend; **Nickell bias** for short `\(T\)` | | **B** | `\(\Delta\ln\)` | Lagged differences | Outcome is a growth rate; Mundlak split is awkward; forecasts un-differenced | -- **One important exception** — pooled OLS with AR(1) on the level shows `\(\hat\phi \approx 0.99\)` (the I(1) signature). For the pooled model only, we use **Path B (FD + ARDL(1,1))**: outcome `\(\Delta y\)`, lagged `\(\Delta y\)`, and lagged regressors. The growth-rate equation has a well-behaved `\(\hat\phi\)`. For FE / TWFE / RE / Mundlak, we stay on Path A. --- class: inverse, center, middle # Part 3 — Estimating Panel Models --- # One Equation, Five Estimators $$ y_{it} \;=\; \alpha + \beta_1\,\text{reg}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{reg}_{it}\!\cdot\!\text{edt}_{it}) + \beta_4\,\text{oil}_i + u_{it} $$ | Subsection | Static | AR(1) (lagged DV) | ARDL(1,1) (+ lagged `reg`, `edt`) | corARMA(1) | |:---|:---:|:---:|:---:|:---:| | Pooled OLS | ✓ | ✓ | ✓ | — | | FE (within) | ✓ | ✓ | ✓ | — | | TWFE | ✓ | ✓ | ✓ | — | | RE | ✓ | — | — | ✓ | | Mundlak / REWB | ✓ | — | — | ✓ | -- **Why no LDV in RE / Mundlak?** RE-LDV faces the Wooldridge (2005) initial-conditions problem; Mundlak-LDV breaks the within-between decomposition. corARMA absorbs persistence in the residuals. --- # Fitting the Family **Pooled OLS**: level AR(1) shows the I(1) signature (`\(\hat\phi \approx 0.99\)`) → switch outcome to first differences. Pick **FD + ARDL(1,1)**. **FE / TWFE**: stay in levels with **ARDL(1,1)** (lagged DV + lagged `reg` + lagged `edt` + lagged interaction). **RE**: **corARMA(1)** — AR(1) on residuals; no LDV (Wooldridge initial conditions). **Mundlak / REWB**: **corARMA(1)** + within/between split on `reg`, `edt`, and `reg × edt` (precomputed product). --- # Pooled OLS — Where the Headline Lives <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 7.277 </td> <td style="text-align:right;"> 0.028 </td> <td style="text-align:right;"> 264.597 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> reg </td> <td style="text-align:right;"> 0.633 </td> <td style="text-align:right;"> 0.059 </td> <td style="text-align:right;"> 10.664 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> edt </td> <td style="text-align:right;"> 0.233 </td> <td style="text-align:right;"> 0.006 </td> <td style="text-align:right;"> 36.984 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> oil </td> <td style="text-align:right;"> 0.752 </td> <td style="text-align:right;"> 0.037 </td> <td style="text-align:right;"> 20.483 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> reg:edt </td> <td style="text-align:right;"> -0.027 </td> <td style="text-align:right;"> 0.009 </td> <td style="text-align:right;"> -2.878 </td> <td style="text-align:right;"> 0.004 </td> </tr> </tbody> </table> - `\(\hat\beta\)` on `reg` and `reg × edt` is **identified mostly off cross-country variation** — more-educated countries differ from less-educated countries in their average regime-growth slope. - Pooled OLS in levels is also the spec where the I(1) signature appears: AR(1) on the level gives `\(\hat\phi \approx 0.99\)`. Differencing the outcome restores stationarity (FD + ARDL(1,1) chosen for §2.8). --- # Fixed Effects — Does the Moderation Survive Within? <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> reg </td> <td style="text-align:right;"> 0.1172 </td> <td style="text-align:right;"> 0.0888 </td> <td style="text-align:right;"> 1.3209 </td> <td style="text-align:right;"> 0.1867 </td> </tr> <tr> <td style="text-align:left;"> edt </td> <td style="text-align:right;"> 0.1802 </td> <td style="text-align:right;"> 0.0175 </td> <td style="text-align:right;"> 10.2960 </td> <td style="text-align:right;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;"> reg:edt </td> <td style="text-align:right;"> -0.0286 </td> <td style="text-align:right;"> 0.0186 </td> <td style="text-align:right;"> -1.5329 </td> <td style="text-align:right;"> 0.1254 </td> </tr> </tbody> </table> -- - `oil` is absorbed (time-invariant). - The within `reg × edt` interaction tells us whether the moderation is a *within-country* phenomenon: *as a country's education rises over time, does the regime effect change?* - If the FE interaction is small while pooled OLS's was large → between-country variation was doing the talking. → **Same data, different stories under different identifying assumptions.** --- # Mundlak / REWB — The Diagnostic <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> Value </th> <th style="text-align:right;"> Std.Error </th> <th style="text-align:right;"> DF </th> <th style="text-align:right;"> t-value </th> <th style="text-align:right;"> p-value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> reg_within </td> <td style="text-align:right;"> 0.055 </td> <td style="text-align:right;"> 0.019 </td> <td style="text-align:right;"> 2784 </td> <td style="text-align:right;"> 2.842 </td> <td style="text-align:right;"> 0.005 </td> </tr> <tr> <td style="text-align:left;"> reg_between </td> <td style="text-align:right;"> 0.907 </td> <td style="text-align:right;"> 0.334 </td> <td style="text-align:right;"> 108 </td> <td style="text-align:right;"> 2.712 </td> <td style="text-align:right;"> 0.008 </td> </tr> <tr> <td style="text-align:left;"> edt_within </td> <td style="text-align:right;"> 0.051 </td> <td style="text-align:right;"> 0.006 </td> <td style="text-align:right;"> 2784 </td> <td style="text-align:right;"> 8.028 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> edt_between </td> <td style="text-align:right;"> 0.226 </td> <td style="text-align:right;"> 0.031 </td> <td style="text-align:right;"> 108 </td> <td style="text-align:right;"> 7.338 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> reg_x_edt_within </td> <td style="text-align:right;"> -0.012 </td> <td style="text-align:right;"> 0.004 </td> <td style="text-align:right;"> 2784 </td> <td style="text-align:right;"> -2.779 </td> <td style="text-align:right;"> 0.005 </td> </tr> <tr> <td style="text-align:left;"> reg_x_edt_between </td> <td style="text-align:right;"> -0.037 </td> <td style="text-align:right;"> 0.049 </td> <td style="text-align:right;"> 108 </td> <td style="text-align:right;"> -0.753 </td> <td style="text-align:right;"> 0.453 </td> </tr> <tr> <td style="text-align:left;"> oil </td> <td style="text-align:right;"> 0.652 </td> <td style="text-align:right;"> 0.175 </td> <td style="text-align:right;"> 108 </td> <td style="text-align:right;"> 3.717 </td> <td style="text-align:right;"> 0.000 </td> </tr> </tbody> </table> -- - `reg_within` / `reg_between` — within-country regime effect vs cross-country slope. - `reg_x_edt_within` / `reg_x_edt_between` — does the moderation by education operate *within* (rising education within a country changes the regime effect) or *between* (more-educated countries have a different regime-growth slope)? - `oil` retains its main effect (RE flexibility, FE shortcoming). **This is exactly the diagnostic Hausman gives, with covariate-by-covariate detail.** --- # Hausman Test $$ H = (\hat\beta_{FE} - \hat\beta_{RE})^\top [V_{FE} - V_{RE}]^{-1} (\hat\beta_{FE} - \hat\beta_{RE}) \;\sim\; \chi^2_k $$ - `\(H_0\)`: `\(\text{Cov}(\alpha_i, X_{it}) = 0\)` — RE consistent and efficient. - `\(H_1\)`: RE inconsistent; only FE consistent. - The variance of the difference collapses to `\(V_{FE} - V_{RE}\)` because RE is *efficient* under `\(H_0\)` (Hausman 1978). ``` ## [1] "chisq = 61.75, df = 3, p = 0.0000" ``` -- **Bell & Jones (2015) caveat**: do not use Hausman as a model-selection switch. Even when it rejects, REWB is often the better choice — it returns *both* within and between estimates so you can report the relevant comparison. --- # Panel Standard Errors | SE | Allows | When | |:---|:---|:---| | **Cluster-robust by unit** | Within-country correlation | Many clusters; residuals roughly independent across countries | | **PCSE (Beck & Katz 1995)** | Contemporaneous *cross-sectional* dependence | TSCS panels — common shocks hit many countries the same year | | **Driscoll–Kraay (1998)** | Cross-sectional **+** serial correlation | Moderate-to-large `\(T\)`; safest default | <table> <thead> <tr> <th style="text-align:left;"> coefficient </th> <th style="text-align:right;"> cluster </th> <th style="text-align:right;"> pcse </th> <th style="text-align:right;"> driscoll </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> reg </td> <td style="text-align:right;"> 0.0888 </td> <td style="text-align:right;"> 0.0821 </td> <td style="text-align:right;"> 0.0628 </td> </tr> <tr> <td style="text-align:left;"> edt </td> <td style="text-align:right;"> 0.0175 </td> <td style="text-align:right;"> 0.0163 </td> <td style="text-align:right;"> 0.0231 </td> </tr> <tr> <td style="text-align:left;"> reg:edt </td> <td style="text-align:right;"> 0.0186 </td> <td style="text-align:right;"> 0.0175 </td> <td style="text-align:right;"> 0.0081 </td> </tr> </tbody> </table> PCSE is the TSCS default; Driscoll–Kraay is the safer choice when residual autocorrelation persists after the lagged DV. --- class: inverse, center, middle # Part 4 — Counterfactual via IRF and LRE --- # The Forecast Question > *If a country democratized **permanently**, what would the trajectory of `\(\ln\)` GDP per worker look like over the next 20 years — and how does it depend on the workforce's education level?* -- We use the **FE-ARDL(1,1)** model: country fixed effects + lagged DV + contemporaneous and lagged `reg`, `edt`, and `reg × edt`. The output is the **impulse response function (IRF)** — cumulative effect at horizon `\(h\)` — and its asymptote, the **long-run equilibrium effect (LRE)**. This is the panel analog of Lab 4's IRF, with the moderation by education made explicit. --- # Closed-Form IRF For the dynamic FE-ARDL(1,1) model under permanent democratization at `\(t = 0\)`, with `edt` held at level `\(e\)`: $$ \mathrm{IRF}(h \mid e) \;=\; A(e) \cdot \frac{1 - \phi^{h+1}}{1 - \phi} \;+\; B(e) \cdot \frac{1 - \phi^{h}}{1 - \phi} $$ where - `\(A(e) = \beta_0 + \gamma_0\,e\)` is the **impact** effect (year of democratization). - `\(B(e) = \beta_1 + \gamma_1\,e\)` is the **delayed transmission** through `reg_lag` and `reg_lag × edt_lag`. -- **Long-run equilibrium effect** (asymptote as `\(h \to \infty\)`, `\(|\phi| < 1\)`): $$ \mathrm{LRE}(e) \;=\; \frac{A(e) + B(e)}{1 - \phi} \;=\; \frac{\beta_0 + \beta_1 + (\gamma_0 + \gamma_1)\,e}{1 - \phi} $$ Parametric simulation (King–Tomz–Wittenberg): draw `\(S = 1{,}000\)` coefficient vectors from MVN, evaluate IRF at each draw and horizon, take quantile bands. --- # IRF: Permanent Democratization at Two Education Levels <img src="Lab5_slides_files/figure-html/p4-irf-plot-1.svg" alt="" width="792" style="display: block; margin: auto;" /> --- # Long-Run Equilibrium Effect (LRE) <table> <caption>LRE: long-run effect of permanent democratization, FE-ARDL(1,1).</caption> <thead> <tr> <th style="text-align:left;"> scenario </th> <th style="text-align:right;"> pe </th> <th style="text-align:right;"> lo </th> <th style="text-align:right;"> hi </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Low education (edt = 2.21, P25) </td> <td style="text-align:right;"> -0.222 </td> <td style="text-align:right;"> -0.556 </td> <td style="text-align:right;"> 0.083 </td> </tr> <tr> <td style="text-align:left;"> High education (edt = 7.08, P75) </td> <td style="text-align:right;"> -0.052 </td> <td style="text-align:right;"> -0.370 </td> <td style="text-align:right;"> 0.269 </td> </tr> </tbody> </table> -- **How to read it.** A larger LRE for the high-`edt` scenario says democratization's long-run growth payoff rises with the workforce's education — a within-country moderation by human capital. Wide CIs reflect the Nickell-bias caveat: at `\(\hat\phi\)` close to 1, the LRE is sensitive to small changes in `\(\hat\phi\)`. --- class: inverse, center, middle # Wrap-Up --- # What We Learned Today 1. **Within vs between is the substantive question**, not a methodological footnote. Mundlak / REWB returns both — Bell & Jones (2015): make it the default. 2. **Test before you estimate**. The panel unit-root *constellation* (CD + Fisher-MW + IPS + Hadri + CIPS) gave us a defensible I(1)-like verdict where no single test was decisive at `\(T \le 30\)`. 3. **Path A (ARDL) for FE / TWFE; Path B (FD) for pooled OLS**. The level-AR(1) `\(\hat\phi \approx 0.99\)` in pooled is the I(1) signature. 4. **No LDV in RE / Mundlak**. corARMA(1) is the natural dynamic version — it absorbs persistence in the residuals without breaking the within-between split (REWB) or running into Wooldridge initial-conditions (RE). 5. **The IRF + LRE generalizes Lab 4's forecast to panels** — closed-form propagation, parametric simulation, moderator made visible at the horizon scale. -- **Next week**: dynamic-panel GMM (Arellano–Bond, Arellano–Bover) to address the Nickell bias. --- # References - Beck, N., & Katz, J. N. (1995). What to do (and not to do) with TSCS data. *APSR*. - Beck, N., & Katz, J. N. (2011). Modeling dynamics in TSCS political-economy data. *Annu. Rev. Polit. Sci.* - Bell, A., & Jones, K. (2015). Explaining fixed effects: Random effects modeling of TSCS and panel data. *PSRM*. - Driscoll, J. C., & Kraay, A. C. (1998). Consistent covariance matrix estimation with spatially dependent panel data. *RES*. - Hadri, K. (2000). Testing for stationarity in heterogeneous panel data. *EJ*. - Hlouskova, J., & Wagner, M. (2006). The performance of panel unit root and stationarity tests. *Econ. Rev.* - Im, K. S., Pesaran, M. H., & Shin, Y. (2003). Testing for unit roots in heterogeneous panels. *J. Econometrics*. - King, G., Tomz, M., & Wittenberg, J. (2000). Making the most of statistical analyses. *AJPS*. - Kropko, J., & Kubinec, R. (2020). Interpretation and identification of within-unit and cross-sectional variation. *PLOS ONE*. - Maddala, G. S., & Wu, S. (1999). A comparative study of unit root tests with panel data. *OBES*. - Mundlak, Y. (1978). On the pooling of time series and cross section data. *Econometrica*. - Nickell, S. (1981). Biases in dynamic models with fixed effects. *Econometrica*. - Pesaran, M. H. (2007). A simple panel unit root test in the presence of cross-section dependence. *J. Appl. Econ.* --- class: inverse, center, middle # Questions? `rllobet@uw.edu`