Disclaimer

These materials have been revised and updated by successive teaching assistants over recent years.

Revision history:

Packages and Functions Used in This Lab

Package Purpose Key functions
plm Panel models, panel unit-root tests, panel SEs plm(), pdata.frame(), pvar(), purtest(), vcovHC(), vcovBK(), vcovSCC(), phtest()
fixest Fast multi-way fixed effects, native cluster SEs feols(), fixef()
nlme Mixed models with structured error correlations lme(), corARMA()
lme4 Mixed models (no AR error structure) lmer()
parameters Within/between decomposition for Mundlak demean()
sandwich Robust and clustered covariance matrices vcovCL(), vcovHC()
lmtest Coefficient tests on adjusted SEs coeftest()
tseries Single-series unit-root tests adf.test(), pp.test()
MASS Multivariate normal draws for parametric simulation mvrnorm()
simcf King-Tomz-Wittenberg counterfactual simulators cfMake(), cfChange(), ldvsimev(), ldvsimpv()
knitr Include the compiled TikZ DAG image include_graphics()
texreg Cross-format regression tables htmlreg(), texreg(), screenreg()
tidyverse Data manipulation and plotting ggplot(), tibble(), pipes (|>)

Today’s roadmap

Part Topic
1 Panel data: theory, the Przeworski democracy panel, and a dynamics workflow ending in a panel unit-root constellation
2 A family of panel specifications (Pooled / FE / TWFE / RE / Mundlak-REWB) and their within-vs-between identification
3 Counterfactual prediction under dynamic fixed effects: one-country forecast and average marginal expectations

Lab 4 introduced counterfactual forecasting for a single time series. Lab 5 moves the same dynamic logic into panel data. The pedagogical thread, given the short-\(T\) ambiguity in §1.3.4, is interpretation before estimation: identify what variation each model uses, read whether the substantive conclusion is stable, and use fixed-effects counterfactuals only for quantities those models can actually support.


Part 1 — A panel of countries: theory, data, dynamics

1.1 A toy theory of politics in panel data

Before any estimator, write the theory. We study the link between regime type (reg, 1 = democracy) and economic growth (ln_gdpw) across countries. The simplified DAG treats oil exporter status (oil) as a time-invariant confounder and education (edt) as a slow-moving mediator on one pathway from democracy to growth.

Three covariate roles:

  1. Within-varying treatment: reg — flips on coups, pacts, or constitutional transitions; supplies the within-unit identifying variation.
  2. Mediator / slow-moving covariate: edt — democracy can expand education, and education can raise growth levels. Conditioning on edt therefore moves us from a total regime-growth association toward a more direct association.
  3. Time-invariant confounder: oil — binary indicator for major oil exporters; oil can shape both regime type and GDP levels. It is absorbed by FE/TWFE and estimable in Pooled / RE / Mundlak.
Simplified DAG. Oil confounds democracy and growth; education mediates part of democracy's association with growth.

Simplified DAG. Oil confounds democracy and growth; education mediates part of democracy’s association with growth.

What the panel structure buys us:

  • Pooled OLS mixes within and between variation. Biased if unobserved country-level confounders (geography, colonial institutions, deep history) are correlated with reg or edt. Pooling conflates within-country dynamics with between-country differences.
  • Fixed effects sweep out all time-invariant unit-level confounders, including unobserved ones. Time-invariant oil is absorbed. If we control for edt, we are conditioning on a mediator, so the reg coefficient should be read as closer to a direct association than a total association.
  • Random effects assume the country random intercept is uncorrelated with the regressors — suspect when colonial institutions, deep development history, or initial endowments plausibly correlate with both reg and edt.
  • Mundlak / REWB keeps RE’s flexibility (we can estimate the time-invariant oil) but tests RE’s assumption by splitting every time-varying regressor — including the interaction reg × edt — into within and between components. If the between components add nothing, RE is fine; if they do, the within components carry the FE-style identification.

Empirical foreshadow. We will fit \(\ln \text{gdpw} = \alpha + \beta_1\,\text{reg} + \beta_2\,\text{edt} + \beta_3\,(\text{reg}\!\times\!\text{edt}) + \beta_4\,\text{oil} + \dots\) in pooled OLS and then compare the interaction \(\beta_3\) to its FE counterpart in §2.7–§2.8. The DAG motivates edt as a mediator; the interaction adds a descriptive heterogeneity question: does the regime-growth association differ at different education levels, and does that moderation survive once we strip out cross-country variation?

1.2 The Przeworski democracy panel

# load data
democracy <- read_csv("data/democracy.csv") |>
  # lower case
  rename_with(tolower) |>
  # transform variables
  mutate(ln_gdpw   = log(gdpw),
         country_f = factor(country),     # numeric ID → factor for plm
         year_f    = factor(year))

n_distinct(democracy$country) # N = 135
## [1] 135
n_distinct(democracy$year)    # T = 40
## [1] 40

The dataset has two unit identifiers: country (numeric ID) and ctyname (the actual country name). The panel is unbalanced: there are 40 unique years in the dataset, but not every country is observed in every year.

panel_T <- democracy |>
  group_by(country, ctyname) |>
  summarise(T_i = sum(!is.na(ln_gdpw)), .groups = "drop")

range(panel_T$T_i)
## [1]  6 40
ggplot(panel_T, aes(T_i)) +
  geom_histogram(binwidth = 2, fill = "steelblue", alpha = 0.7,
                 boundary = 0) +
  labs(x = "Usable observations per country",
       y = "# countries") +
  theme_minimal(base_size = 11)
Distribution of usable time periods per country. The panel contains 40 unique years, but many countries have shorter histories.

Distribution of usable time periods per country. The panel contains 40 unique years, but many countries have shorter histories.

For the simplified specification, we will use four variables:

  • ln_gdpw is the outcome: log real GDP per worker.
  • reg is the treatment: 1 for democracy and 0 for dictatorship in this file’s coding.
  • edt is the slow-moving covariate and the moderator: cumulative average years of labor-force education. The interaction reg × edt lets the regime effect vary with the workforce’s education level.
  • oil is a time-invariant control: 1 if the country was a major oil exporter (per the Przeworski codebook), 0 otherwise.

A quick sanity check that oil, britcol, elf60 really are unit-invariant in the data, and reg/edt really do move within units:

pdata <- pdata.frame(democracy, index = c("country_f", "year_f"))
pvar(pdata)
## no time variation:       country ctyname region britcol cath elf60 moslem newc oil country_f 
## no individual variation: year edt elf60 year_f 
## all NA in time dimension for at least one individual:  civlib edt elf60 pollib 
## all NA in ind. dimension for at least one time period: civlib edt elf60 pollib

The pvar() summary confirms what the DAG drew: the time-invariants vary only between, and reg/edt vary both ways.

1.3 A systematic dynamics workflow

We need to settle the dynamics before fitting any FE/RE/Mundlak model — otherwise we are just running OLS on a non-stationary outcome and inviting spurious results.

1.3.1 Look at the series

ggplot(democracy, aes(year, ln_gdpw, group = country, colour = region)) +
  geom_line(alpha = 0.5) +
  scale_colour_brewer(palette = "Dark2") +
  labs(x = NULL, y = "log GDP per worker", colour = NULL) +
  theme_minimal(base_size = 11) +
  theme(legend.position = "bottom")
Per-country trajectories of log GDP per worker, coloured by region. Most series trend upward with country-specific intercepts and slopes — non-stationary in levels.

Per-country trajectories of log GDP per worker, coloured by region. Most series trend upward with country-specific intercepts and slopes — non-stationary in levels.

Most trajectories trend up; some are flat or volatile. This is the classic mix you will see in macro-panels and a strong prior toward unit roots in levels.

1.3.2 Detrend each panel and export unit-level diagnostics

The level series are visibly trended, so raw ACF/PACF plots mostly rediscover the trend. A cleaner diagnostic is to remove a country-specific linear trend first, then inspect the residual serial dependence.

The loop below does two things. First, it saves the de-trended series in democracy_detrended. Second, it exports a small diagnostic archive:

output/dynamics/detrended_series/
output/dynamics/acf/
output/dynamics/pacf/
# create subdirectories (if these are not already there)
dir.create("output/dynamics/detrended_series",
           recursive = TRUE, showWarnings = FALSE)
dir.create("output/dynamics/acf",
           recursive = TRUE, showWarnings = FALSE)
dir.create("output/dynamics/pacf",
           recursive = TRUE, showWarnings = FALSE)

# generate vector of unique countries
countries <- sort(unique(democracy$ctyname))
max_lag <- 8  # number of lags for cycles

# generate objects to store
democracy_detrended <- tibble()
acf_all <- tibble()
pacf_all <- tibble()

for (c0 in countries) {
  
  # 1. filter country candidate
  df_c <- democracy |>
    filter(ctyname == c0) |>
    arrange(year)
  
  # 2. Detrend
  trend_fit <- lm(ln_gdpw ~ year, data = df_c, na.action = na.exclude)
  df_c$ln_gdpw_trend <- fitted(trend_fit)
  df_c$ln_gdpw_detrended <- residuals(trend_fit)

  democracy_detrended <- bind_rows(democracy_detrended, df_c)
  
  # remove NAs in the trend (if any)
  df_c <- df_c |>
    filter(!is.na(ln_gdpw_detrended))
  
  # clean the strings, to name the files
  file_stub <- c0 |>
    str_to_lower() |>
    str_replace_all("[^a-z0-9]+", "_") |>
    str_replace_all("^_|_$", "")
  
  # 3. plot the series
  p_series <- 
    ggplot(df_c, aes(year, ln_gdpw_detrended)) +
    geom_line(colour = "grey30") +
    geom_hline(yintercept = 0, linetype = "dashed") +
    labs(title = c0,
         x = NULL,
         y = "De-trended log GDP per worker") +
    theme_minimal(base_size = 11)

  # export for inspection
  ggsave(file.path("output/dynamics/detrended_series",
                   paste0(file_stub, ".pdf")),
         plot = p_series, width = 7, height = 4)

  # 4. plot ACF
  p_acf <- ggAcf(df_c$ln_gdpw_detrended, lag.max = max_lag) +
    labs(title = paste0(c0, ": ACF after de-trending"),
         x = "Lag", y = "ACF") +
    theme_minimal(base_size = 11)

  # export for inspection
  ggsave(file.path("output/dynamics/acf",
                   paste0(file_stub, ".pdf")),
         plot = p_acf, width = 7, height = 4)
  
  # 5. plot PACF
  p_pacf <- ggPacf(df_c$ln_gdpw_detrended, lag.max = max_lag) +
    labs(title = paste0(c0, ": PACF after de-trending"),
         x = "Lag", y = "PACF") +
    theme_minimal(base_size = 11)

  # export for inspection
  ggsave(file.path("output/dynamics/pacf",
                   paste0(file_stub, ".pdf")),
         plot = p_pacf, width = 7, height = 4)

  # store all acf and pacf values
  acf_values <- acf(df_c$ln_gdpw_detrended,
                    lag.max = max_lag, plot = FALSE)$acf[-1]
  pacf_values <- pacf(df_c$ln_gdpw_detrended,
                      lag.max = max_lag, plot = FALSE)$acf

  acf_all <- bind_rows(
    acf_all,
    tibble(ctyname = c0,
           lag = seq_along(acf_values),
           value = as.numeric(acf_values))
  )

  pacf_all <- bind_rows(
    pacf_all,
    tibble(ctyname = c0,
           lag = seq_along(pacf_values),
           value = as.numeric(pacf_values))
  )
}

# You can look at particular subsets
democracy_detrended |>
  filter(ctyname %in% c("USA", "Argentina", "Mexico", "Egypt")) |>
  ggplot(aes(year, ln_gdpw_detrended)) +
  geom_line(colour = "grey30") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  facet_wrap(~ ctyname, scales = "free_y") +
  labs(x = NULL, y = "De-trended log GDP per worker") +
  theme_minimal(base_size = 11)
Illustrative de-trended log GDP per worker series. Each line is the residual from a country-specific linear trend.

Illustrative de-trended log GDP per worker series. Each line is the residual from a country-specific linear trend.

The exported files are not meant to be read one by one during lab. They are a reproducible diagnostic archive. In applied work, the pooled summary is useful, but individual units still need inspection because a few unusual panels can drive the average.

1.3.3 Pooled ACF/PACF and per-series unit-root tests

We can summarize the ACF/PACF diagnostics across countries by averaging the country-level correlations at each lag. This is descriptive rather than a formal estimator, but it gives a compact view of the typical short-run dynamics after de-trending.

To help read the figure, we add the usual approximate white-noise bounds:

\[ \pm \frac{1.96}{\sqrt{T}}. \]

The logic is that, if a series is white noise, the sample autocorrelation at any fixed lag is approximately normally distributed around zero with standard error \(1/\sqrt{T}\). Multiplying by 1.96 gives the familiar approximate 95% threshold. Here we use the average usable country-level time-series length after de-trending, so the bounds are a visual rule of thumb rather than a formal pooled significance test.

acf_summary <- acf_all |>
  group_by(lag) |>
  summarise(mean = mean(value, na.rm = TRUE),
            lo = quantile(value, 0.25, na.rm = TRUE),
            hi = quantile(value, 0.75, na.rm = TRUE),
            .groups = "drop")

pacf_summary <- pacf_all |>
  group_by(lag) |>
  summarise(mean = mean(value, na.rm = TRUE),
            lo = quantile(value, 0.25, na.rm = TRUE),
            hi = quantile(value, 0.75, na.rm = TRUE),
            .groups = "drop")

avg_T <- democracy_detrended |>
  filter(!is.na(ln_gdpw_detrended)) |>
  count(ctyname) |>
  summarise(avg_T = mean(n)) |>
  pull(avg_T)

wn_bound <- 1.96 / sqrt(avg_T)

p_acf_pool <- ggplot(acf_summary, aes(lag, mean)) +
  geom_hline(yintercept = 0, colour = "grey70") +
  geom_hline(yintercept = c(-wn_bound, wn_bound),
             linetype = "dashed", colour = "grey45") +
  geom_linerange(aes(ymin = lo, ymax = hi), colour = "grey55") +
  geom_point(colour = "steelblue", size = 2) +
  geom_line(colour = "steelblue") +
  scale_x_continuous(breaks = 1:max_lag) +
  labs(title = "Average ACF",
       subtitle = "Dashed lines: approximate white-noise bounds",
       x = "Lag", y = "Correlation") +
  theme_minimal(base_size = 11)

p_pacf_pool <- ggplot(pacf_summary, aes(lag, mean)) +
  geom_hline(yintercept = 0, colour = "grey70") +
  geom_hline(yintercept = c(-wn_bound, wn_bound),
             linetype = "dashed", colour = "grey45") +
  geom_linerange(aes(ymin = lo, ymax = hi), colour = "grey55") +
  geom_point(colour = "firebrick", size = 2) +
  geom_line(colour = "firebrick") +
  scale_x_continuous(breaks = 1:max_lag) +
  labs(title = "Average PACF",
       subtitle = "Dashed lines: approximate white-noise bounds",
       x = "Lag", y = "Partial correlation") +
  theme_minimal(base_size = 11)

p_acf_pool + p_pacf_pool
Average ACF and PACF after removing country-specific linear trends. Points show the mean across countries; vertical bars show the interquartile range.

Average ACF and PACF after removing country-specific linear trends. Points show the mean across countries; vertical bars show the interquartile range.

Finally, we loop over countries and run ADF / PP tests on the de-trended series. In the tseries implementations we use here, both tests have the same null:

Test Null hypothesis Default alternative Reading a large p-value
ADF (adf.test) unit root / I(1) stationary / I(0) fail to reject unit root
PP (pp.test) unit root / I(1) stationary / I(0) fail to reject unit root

The difference is not the null hypothesis. The difference is how the tests handle serial correlation and heteroskedasticity in the errors: ADF adds lagged differences, while PP uses a nonparametric correction to the Dickey-Fuller statistic. Both are power-sensitive. With short country series, a highly persistent but stationary process can easily look too close to a unit root, so failure to reject should be read as no strong evidence against a unit root, not proof that the unit root is true.

The setup below is still cleaner than testing visibly trended levels without first removing the deterministic trend.

# generate object to store test p-values
unit_root_tests <- tibble()

for (c0 in countries) {
  # 1. subset each country time series
  y <- democracy_detrended |>
    filter(ctyname == c0) |>
    arrange(year) |>
    pull(ln_gdpw_detrended) |>
    na.omit()

  # 2. test ADF and PP
  adf_p <- adf.test(y)$p.value
  pp_p  <- pp.test(y)$p.value

  # 3. save results
  unit_root_tests <- bind_rows(
    unit_root_tests,
    tibble(ctyname = c0,
           adf = adf_p,
           pp = pp_p)
  )
}

# Visualize in a histogram the distribution of -pvalues
unit_root_tests |>
  pivot_longer(c(adf, pp),
               names_to = "test",
               values_to = "p") |>
  ggplot(aes(p, fill = test)) +
  geom_histogram(bins = 25,
                 alpha = 0.65,
                 position = "identity") +
  geom_vline(xintercept = 0.05, linetype = "dashed") +
  facet_wrap(~ test, ncol = 2) +
  labs(title = "De-trended log GDP per worker",
       x = "p-value", y = "# countries") +
  theme_minimal(base_size = 10) +
  theme(legend.position = "none")
ADF and PP p-values after country-specific linear de-trending.

ADF and PP p-values after country-specific linear de-trending.

Most countries still have ADF / PP p-values above 0.05 after removing country-specific linear trends. Because both tests have a unit-root null, this means most country series do not provide strong evidence against unit-root-like persistence. Given the short time dimension, the result is also consistent with near-unit-root stationary dynamics.

1.3.4 Panel unit-root tests: read several tests together

The de-trended ACF/PACF plots in §1.3.2 and the per-series ADF/PP scan in §1.3.3 are descriptive: each country’s deterministic trend was removed by OLS before the test, so the residuals were clean by construction. They tell us about residual short-run dependence — not whether the level series is itself I(1) or trend-stationary.

The formal panel tests below answer that harder question: are the level series better described as a random walk (with drift), or as orbiting a deterministic trend? We feed purtest the level series of ln_gdpw and let it handle the trend internally (exo = "trend").

The two competing models. With exo = "trend", the test pits these two models against each other:

  • \(H_0\)unit root with drift: increments are stationary with mean \(\delta_i\), but the level wanders indefinitely. \[y_{it} = y_{i,t-1} + \delta_i + \varepsilon_{it} \quad\Longleftrightarrow\quad \Delta y_{it} = \delta_i + \varepsilon_{it}\]

  • \(H_1\)trend-stationary: the level orbits around a country-specific linear trend; deviations are mean-reverting. \[y_{it} = \mu_i + \delta_i\,t + u_{it}, \qquad u_{it}\text{ stationary I(0)}\]

In \(H_0\), \(\delta_i\) is the per-period drift (the average step the random walk takes each year). In \(H_1\), \(\mu_i\) is the country intercept and \(\delta_i\) is the slope of the deterministic linear trend. The two models can both produce upward-sloping trajectories — the difference is whether shocks fade away (\(H_1\)) or accumulate forever (\(H_0\)). This is the same TS-vs-I(1)-with-drift distinction we drew in Lab 4 §1.2: removing a deterministic trend by OLS does not adjudicate between the two; the unit-root test with trend does.

Why we read a battery of tests, not just one. The panel is unbalanced, macro shocks (oil crises, financial cycles, the Volcker disinflation) hit many countries in the same year, and individual unit-root tests have low power at \(T_i \le 30\). No single test is decisive — we run four tests with different strengths and read them together. Two recurring ideas to keep in mind:

  • Cross-sectional dependence (CSD). If countries’ residuals move together because of common shocks, the standard panel tests are biased. We need a pretest for CSD and at least one test that is robust to it.
  • Direction of the null. Most tests have a unit-root null (rejection \(\Rightarrow\) stationary). Hadri flips it (rejection \(\Rightarrow\) unit root). Comparing both directions is a useful sanity check.

The four tests, in plain language:

Step Test Null hypothesis Plain-language description
(a) Pesaran CD pretest Cross-sectional independence Looks at residuals across countries to see whether they move together. Rejection means common shocks contaminate the residuals — the tests in (b) become biased and we lean more on (d).
(b) Maddala-Wu/Fisher & IPS Unit root in every panel Run an ADF on each country, then combine. Fisher pools unit-level \(p\)-values via \(-2\sum\log p_i\); IPS averages unit-level \(t\)-statistics. Both assume countries are independent — work well when (a) does not reject.
(c) Hadri Stationarity in every panel Flips the null. KPSS-style: tests whether all panels are stationary against the alternative of any unit root. Needs a balanced rectangle. Hlouskova & Wagner (2006) document that Hadri over-rejects under CSD — its rejection deserves less weight when (a) flags dependence.
(d) Pesaran CIPS Unit root, allowing common shocks A unit-root test built to be robust to the CSD detected in (a). Augments each country’s ADF with cross-sectional means of \(y\) and \(\Delta y\) — those means absorb the common factor before the country-level test runs.

We keep countries with at least 20 usable observations for tests (a), (b), (d). For Hadri (c) we need a balanced rectangle — every retained country observed in every year of a fixed window — so we take the last 20 years of the panel and keep countries with full coverage there.

# ---- (i) Build the two analysis panels ---------------------------------------
# Unbalanced panel for steps (a), (b), (d): all countries with T_i >= 20.
db_levels <- democracy |>
  filter(!is.na(ln_gdpw)) |>
  group_by(country) |>
  filter(n() >= 20, sd(ln_gdpw) > 0) |>
  ungroup()

# Balanced rectangle for Hadri (step c): last 20 years of the panel,
# keeping only countries observed in every one of those years.
T_target     <- 20
window_start <- max(democracy$year) - T_target + 1
window_end   <- max(democracy$year)

db_levels_balanced <- democracy |>
  filter(year >= window_start, year <= window_end, !is.na(ln_gdpw)) |>
  group_by(country) |>
  filter(n() == T_target, sd(ln_gdpw) > 0) |>
  ungroup()

pdata_levels <- pdata.frame(db_levels, index = c("country", "year"))

# Quote eligibility so students see how much of the panel is being tested.
sprintf("Unbalanced panel for (a)/(b)/(d): %d countries, %d country-years.",
        n_distinct(db_levels$country), nrow(db_levels))
## [1] "Unbalanced panel for (a)/(b)/(d): 116 countries, 3897 country-years."
sprintf("Balanced subset for Hadri (c): %d countries x %d years (%d-%d).",
        n_distinct(db_levels_balanced$country),
        T_target, window_start, window_end)
## [1] "Balanced subset for Hadri (c): 99 countries x 20 years (1971-1990)."
# ---- (a) Pesaran CD: is there cross-sectional dependence? --------------------
# Tests CSD in residuals of a within model that absorbs country FE and a global
# linear time trend. Rejection says common shocks beyond the global trend
# contaminate the residuals — first-gen tests are then biased. (`pcdtest` with
# a within model needs at least one regressor; `year` plays that role.)
cd_levels <- pcdtest(ln_gdpw ~ year, data = pdata_levels,
                     model = "within", test = "cd")

# ---- (b) First-generation tests: H0 = unit root in all panels ----------------
# `exo = "trend"` includes constant + linear trend in each unit's
# Dickey-Fuller regression, so H0 = I(1)-with-drift, H1 = trend-stationary.
# Lag length picked by SIC up to pmax = 4.
mw_levels <- purtest(ln_gdpw ~ trend,
                     data = db_levels,
                     index = c("country", "year"),
                     test = "madwu",
                     lags = "SIC", pmax = 4,
                     exo = "trend")

ips_levels <- purtest(ln_gdpw ~ trend,
                      data = db_levels,
                      index = c("country", "year"),
                      test = "ips",
                      lags = "SIC", pmax = 4,
                      exo = "trend")

# ---- (c) Hadri: H0 = stationarity in all panels (KPSS-style) -----------------
# Hadri uses a long-run-variance correction (not ADF lags), so we leave the
# bandwidth at its default and just signal `exo = "trend"` for detrending.
hadri_levels <- purtest(ln_gdpw ~ trend,
                        data = db_levels_balanced,
                        index = c("country", "year"),
                        test = "hadri",
                        exo = "trend")

# ---- (d) Pesaran CIPS: second-generation, robust to CSD ----------------------
# Augments each unit's ADF regression with cross-sectional means of y and
# Delta y to absorb the common factor. `type = "trend"` includes a trend.
cips_levels <- cipstest(pdata_levels$ln_gdpw,
                        lags = 2,
                        type = "trend")

# ---- Tabulate the constellation. The `direction` column makes explicit ------
# what a rejection means: rejecting H0 in (b)/(d) favors I(0); rejecting H0
# in (c) — Hadri — favors I(1). High p-values mean the opposite.
panel_tests <- tibble(
  step = c("(a)", "(b)", "(b)", "(c)", "(d)"),
  test = c("Pesaran CD", "Maddala-Wu/Fisher", "IPS", "Hadri", "Pesaran CIPS"),
  null = c("Cross-sectional independence",
           "Unit root in all panels",
           "Unit root in all panels",
           "Stationarity in all panels",
           "Unit root with common-factor correction"),
  direction = c("CSD pretest",
                "Reject => favors I(0)",
                "Reject => favors I(0)",
                "Reject => favors I(1)",
                "Reject => favors I(0)"),
  p_value = c(cd_levels$p.value,
              mw_levels$statistic$p.value,
              ips_levels$statistic$p.value,
              hadri_levels$statistic$p.value,
              cips_levels$p.value)
) |>
  mutate(decision = if_else(p_value < 0.05, "Reject null", "Do not reject null"))

panel_tests |>
  mutate(p_value = scales::pvalue(p_value)) |>
  kable(caption = "Panel diagnostic constellation for log GDP per worker in levels.")
Panel diagnostic constellation for log GDP per worker in levels.
step test null direction p_value decision
(a) Pesaran CD Cross-sectional independence CSD pretest <0.001 Reject null
(b) Maddala-Wu/Fisher Unit root in all panels Reject => favors I(0) 0.040 Reject null
(b) IPS Unit root in all panels Reject => favors I(0) >0.999 Do not reject null
(c) Hadri Stationarity in all panels Reject => favors I(1) <0.001 Reject null
(d) Pesaran CIPS Unit root with common-factor correction Reject => favors I(0) 0.100 Do not reject null

Reading the four tests together. Step (a) tells us whether the standard panel tests in (b) are reliable: if it rejects, common shocks are contaminating the residuals and we should weight (d) — designed for this case — more heavily.

Within step (b), Maddala-Wu/Fisher and IPS combine the same per-country ADFs in different ways. A split between them is informative, not a contradiction: Fisher (Choi 2001) sums \(-2\log p_i\), so a few countries with very small \(p\)-values can drag the statistic across the threshold; IPS averages normalized \(t\)-statistics, which is more robust to outliers. So when Fisher rejects but IPS does not, the rejection comes from a small number of fast-mean-reverting countries (small open economies, post-shock recoveries) — not from panel-wide stationarity.

Hadri (c) flips the null toward stationarity, and under CSD it tends to over-reject — so its rejection here gets discounted.

The test designed for our regime — CIPS (d), robust to the CSD found in (a) — fails to reject the unit-root null. Combining everything: the honest diagnosis is strong persistence consistent with I(1)-type behavior in levels, with the standard caveat that no test fully discriminates near-unit-root stationarity from a true unit root at \(T \le 30\).

1.3.5 If we had chosen Path B: first differences

If we chose to treat the outcome as difference-stationary, the natural diagnostic is to repeat §1.3.3’s per-country ADF/PP scan, but on the first-differenced series. This is not the path we take below, but it makes the tradeoff explicit: differencing should remove most of the persistence, at the cost of changing the estimand from level effects to growth-rate effects.

# Per-country ADF and PP on the first-differenced series.
# Mirrors the §1.3.3 scan, but on Δ ln_gdpw instead of detrended ln_gdpw.
fd_unit_root_tests <- tibble()

for (c0 in countries) {
  # 1. compute first differences for this country
  y_diff <- democracy |>
    filter(ctyname == c0) |>
    arrange(year) |>
    mutate(d_ln_gdpw = ln_gdpw - lag(ln_gdpw)) |>
    pull(d_ln_gdpw) |>
    na.omit()

  if (length(y_diff) < 8) next   # skip very short series

  # 2. run ADF and PP on the differences
  adf_p <- adf.test(y_diff)$p.value
  pp_p  <- pp.test(y_diff)$p.value

  # 3. save results
  fd_unit_root_tests <- bind_rows(
    fd_unit_root_tests,
    tibble(ctyname = c0, adf = adf_p, pp = pp_p)
  )
}

fd_unit_root_tests |>
  pivot_longer(c(adf, pp), names_to = "test", values_to = "p") |>
  ggplot(aes(p, fill = test)) +
  geom_histogram(bins = 25, alpha = 0.65, position = "identity") +
  geom_vline(xintercept = 0.05, linetype = "dashed") +
  facet_wrap(~ test, ncol = 2) +
  labs(title = "First-differenced log GDP per worker",
       subtitle = "Per-country ADF and PP p-values; dashed line marks p = 0.05",
       x = "p-value", y = "# countries") +
  theme_minimal(base_size = 10) +
  theme(legend.position = "none")
ADF and PP p-values on first-differenced log GDP per worker, computed country by country.

ADF and PP p-values on first-differenced log GDP per worker, computed country by country.

Compared with §1.3.3, the bulk of the country-level \(p\)-values now sits below 0.05: differencing flips most countries from “cannot reject unit root” to “rejects unit root in favor of stationarity.” That is the picture we expected if ln_gdpw is I(1).

The pooled Maddala-Wu/Fisher confirms the picture at the panel level:

db_diff <- db_levels |>
  group_by(country) |>
  mutate(d_ln_gdpw = ln_gdpw - lag(ln_gdpw)) |>
  ungroup() |>
  drop_na(d_ln_gdpw)

mw_diff <- purtest(d_ln_gdpw ~ 1,
                   data = db_diff,
                   index = c("country", "year"),
                   test = "madwu",
                   lags = "SIC",
                   pmax = 4,
                   exo = "intercept")

mw_diff_unit_p <- imap_dfr(
  mw_diff$idres,
  ~ tibble(country = as.numeric(.y),
           p = as.numeric(.x$p.trho),
           lags = .x$lags,
           T = .x$T)
)

mw_diff_readout <- tibble(
  test = "Maddala-Wu / Fisher",
  statistic = as.numeric(mw_diff$statistic$statistic),
  df = as.numeric(mw_diff$statistic$parameter),
  p_value = as.numeric(mw_diff$statistic$p.value)
)

mw_diff_readout |>
  mutate(p_value = scales::pvalue(p_value)) |>
  kable(caption = "Path B diagnostic: Maddala-Wu/Fisher test on first-differenced log GDP per worker.")
Path B diagnostic: Maddala-Wu/Fisher test on first-differenced log GDP per worker.
test statistic df p_value
Maddala-Wu / Fisher 2548.104 232 <0.001
ggplot(mw_diff_unit_p, aes(p)) +
  geom_histogram(aes(y = after_stat(density)),
                 bins = 25,
                 fill = "steelblue",
                 alpha = 0.65,
                 colour = "white") +
  geom_density(colour = "firebrick", linewidth = 0.8) +
  geom_vline(xintercept = 0.05, linetype = "dashed") +
  labs(title = "Path B diagnostic: unit-level p-values after differencing",
       subtitle = "First-differenced log GDP per worker; dashed line marks p = 0.05",
       x = "Unit-level ADF p-value", y = "Density") +
  theme_minimal(base_size = 11)

The first-difference diagnostic is much cleaner. That is exactly why differencing is tempting. The cost is substantive: the outcome becomes growth in log GDP per worker, not the level of log GDP per worker.

1.3.6 Distribution of unit-level AR(1) coefficients on the differences

ar1_one <- function(y) {
  y <- na.omit(y)
  if (length(y) < 10) return(NA_real_)
  unname(coef(arima(y, order = c(1, 0, 0)))[1])
}
phi_diff <- democracy |>
  group_by(country) |>
  arrange(year) |>
  summarise(phi = ar1_one(ln_gdpw - lag(ln_gdpw)), .groups = "drop")

ggplot(phi_diff, aes(phi)) +
  geom_histogram(bins = 25, fill = "steelblue", alpha = 0.7) +
  geom_vline(xintercept = mean(phi_diff$phi, na.rm = TRUE),
             colour = "firebrick", linetype = "dashed") +
  labs(x = expression(hat(phi)[i]), y = "# countries",
       subtitle = "Mean across countries marked in red") +
  theme_minimal(base_size = 11)
Estimated AR(1) coefficients on first-differenced log GDP per worker, fit one country at a time. A descriptive summary of short-run persistence after differencing — read as the AR(1) part of an ARIMA(1,1,0) only if we accept the I(1) verdict from §1.3.4.

Estimated AR(1) coefficients on first-differenced log GDP per worker, fit one country at a time. A descriptive summary of short-run persistence after differencing — read as the AR(1) part of an ARIMA(1,1,0) only if we accept the I(1) verdict from §1.3.4.

1.3.7 The verdict — and a methodological choice

Two specifications carry the dynamics:

Path Outcome Dynamics absorbed by Trade-offs
A \(\ln\)-level Lagged DV in levels (ARDL(1)) Coefficients are interpretable as level effects; Beck & Katz (2011) recommend this as the TSCS default; simcf::ldvsimev propagates it natively. Nickell bias in dynamic FE for short \(T\) (Lab 6 will fix).
B \(\Delta\ln\) Lagged differences Outcome becomes a growth rate; Mundlak’s within/between split is awkward on differences; forecasts must be un-differenced.

We take Path A for the rest of the lab. This is a defensible modeling choice, not a triumphant test verdict — and that is exactly why Part 2 below estimates a family of specifications (pooled, FE, RE, Mundlak/REWB) and reads whether the substantive conclusion is robust across them, rather than picking one estimator and defending it. We revisit the Nickell-bias correction next week with Arellano–Bond and dynamic panel GMM.


Part 2 — Estimating panel models

§1.3.4 closed in honest ambiguity: at \(T \le 30\) with cross-sectional dependence, the data cannot tell us with confidence whether ln_gdpw is I(1) or near-unit-root stationary. The pedagogical response is not to pick one estimator and defend it, but to estimate a family of specifications that identify the regime effect under different assumptions, and read whether the conclusion is robust across them.

2.0 Specifications and what they identify

Each estimator below is defined by which variance it uses. Following Kropko & Kubinec (2020) and Bell & Jones (2015), every time-varying regressor decomposes into a within and a between component:

\[ x_{it} \;=\; \underbrace{(x_{it} - \bar x_i)}_{x^{W}_{it}\ (\text{within})} \;+\; \underbrace{\bar x_i}_{x^{B}_i\ (\text{between})} \]

Different estimators recover different combinations of \(x^W\) and \(x^B\):

Specification What \(\hat\beta\) identifies Identifying assumption
Pooled OLS mixture of within + between \(u_{it}\perp X_{it}\); common intercept \(\alpha\) for all units
Fixed effects (within) within only (\(\beta_W\)) \(\varepsilon_{it}\perp x_{it}\) given \(\alpha_i\); allows \(\text{Cov}(\alpha_i, X)\neq 0\)
Two-way fixed effects deviations from country and year means removes country levels and common year shocks, but can be hard to interpret outside a clear design
Random effects matrix-weighted average of \(\beta_W\) and \(\beta_B\) \(\alpha_i\perp X_i\) — country intercepts uncorrelated with regressors
Mundlak / REWB \(\beta_W\) and \(\beta_B\) separately RE on residuals; \(\bar x_i\) absorbs cross-level confounding

The right specification depends on the research question:

  • “Does democratization, for a given country, raise GDP?” → within (\(\beta_W\)); use FE or Mundlak’s within block.
  • “Are democracies, on average across countries, richer?” → between (\(\beta_B\)); use Mundlak’s between block.
  • “Both — and do they differ?” → Mundlak / REWB returns both, and \(\beta_W = \beta_B\) is itself testable.

Two pedagogical reminders we will use in §2.7:

  1. Bell & Jones (2015) on REWB. When researchers default to FE “to be safe,” they discard the between variation that often carries the substantive signal. REWB (random effects within-between, equivalent to Mundlak with random intercepts) returns both coefficients without losing the FE-style identification on the within block. The difference \(\beta_W - \beta_B\) is itself the diagnostic.
  2. Kropko & Kubinec (2020) on FE. One-way country FE has a clear interpretation: it uses within-country changes over time. Two-way FE is less transparent: after removing both country and year means, the coefficient describes deviations from both dimensions. That can be exactly right in a difference-in-differences design; otherwise it is easy to report a coefficient whose comparison is hard to state in words.

The five candidates all start from the same equation:

\[ y_{it} \;=\; \alpha + \beta_1\,\text{reg}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{reg}_{it}\!\cdot\!\text{edt}_{it}) + \beta_4\,\text{oil}_i + u_{it} \]

§1.3.7 chose Path A: ARDL(1) in levels — a lagged DV \(y_{i,t-1}\) to absorb the I(1)-like persistence. For pedagogical clarity, each subsection below fits the static version and the dynamic version(s) side-by-side, so you can see directly what dynamics buy you. The full menu by subsection:

Subsection Static AR(1) (lagged DV) ARDL(1,1) (+ lagged reg) corARMA(1) (AR(1) errors)
2.1 Pooled OLS
2.2 FE (within)
2.3 TWFE
2.4 RE
2.5 Mundlak / REWB

Label note. “AR(1)” in this lab means an AR(1) on the outcome — i.e., a lagged dependent variable in the regression. corARMA(1) is an AR(1) on the errors. The two are observationally close but interpret differently — the §2.4 RE block walks through both side by side.

Two methodological notes built into this menu:

  • Why no lagged DV in Mundlak/REWB. Adding \(y_{i,t-1}\) to the within-between decomposition either pollutes the decomposition (treating the lag as a single regressor imports a between component into an estimator whose whole point is to keep within and between separate) or causes mechanical collinearity (decomposing the lag makes \(\bar y_i\) the empirical analog of the random intercept \(u_i\)). corARMA is the defensible dynamic version for REWB — it absorbs persistence in the residuals without disturbing the within-between split.
  • Why corARMA shows up only for RE and Mundlak/REWB. It is a maximum-likelihood feature of nlme::lme. feols and plm cannot fit it; cluster-robust / Driscoll–Kraay SEs play the same role for those engines.
# Build the dynamic data once for all of Part 2 (and Part 3):
#   ln_gdpw_lag    — lagged outcome (for AR(1) and ARDL specs in levels)
#   reg_lag        — lagged treatment (for ARDL(1,1))
#   edt_lag        — lagged moderator (so the lagged interaction reg_lag × edt_lag
#                    is the t-1 version of the contemporaneous reg × edt)
#   d_ln_gdpw      — first difference of the outcome (for §2.1 FD specs)
#   d_ln_gdpw_lag  — lag of the differenced outcome (for FD-AR(1) and FD-ARDL(1,1))
democracy_dyn <- democracy |>
  group_by(country) |>
  arrange(year) |>
  mutate(ln_gdpw_lag   = lag(ln_gdpw),
         reg_lag       = lag(reg),
         edt_lag       = lag(edt),
         d_ln_gdpw     = ln_gdpw - ln_gdpw_lag,
         d_ln_gdpw_lag = lag(d_ln_gdpw)) |>
  ungroup()

2.1 Pooled OLS

We start with pooled OLS — every country-year exchangeable, one common intercept. Pooled OLS cannot recover the within or between effect cleanly, but it gives a useful baseline and a place to confront the nonstationarity flagged by §1.3.4 directly. The level AR(1) here will show \(\hat\phi\) close to 1, the unmistakable signature of an I(1) outcome; differencing the outcome restores stationarity.

Four specifications.

Col Spec Outcome Right-hand side
(1) Static \(y_{it}\) \(\beta X_{it}\)
(2) AR(1) on level \(y_{it}\) \(\phi\,y_{i,t-1} + \beta X_{it}\)
(3) FD + AR(1) \(\Delta y_{it}\) \(\phi\,\Delta y_{i,t-1} + \beta X_{it}\)
(4) FD + ARDL(1,1) \(\Delta y_{it}\) \(\phi\,\Delta y_{i,t-1} + \beta_0\,\text{reg}_{it} + \beta_1\,\text{reg}_{i,t-1} + \gamma_0\,\text{reg}_{it}\!\cdot\!\text{edt}_{it} + \gamma_1\,\text{reg}_{i,t-1}\!\cdot\!\text{edt}_{i,t-1} + \dots\)

Logic. A level AR(1) with \(\hat\phi \approx 1\) is the I(1) signature: the lagged DV absorbs essentially all variance in \(y_{it}\) and the implied long-run multiplier \(\hat\beta/(1-\hat\phi)\) is unstable. Differencing the outcome flips the model to a growth-rate equation — a stationary quantity — and the lagged \(\Delta y_{i,t-1}\) then carries growth-rate dynamics, which mean-revert quickly. The ARDL(1,1) version of FD additionally lets the regime effect (and its moderation by education) spread over a one-year window.

# Drop rows with missing variables once, so all four fits use the same N.
db_pool <- democracy_dyn |>
  drop_na(ln_gdpw, ln_gdpw_lag, d_ln_gdpw, d_ln_gdpw_lag,
          reg, reg_lag, edt, edt_lag, oil)

# (1) Static pooled OLS in levels: no dynamics.
fit_pool          <- lm(ln_gdpw ~ reg * edt + oil,
                        data = db_pool)

# (2) AR(1) on level: add the lagged outcome. Expect phi near 1 (I(1) signature).
fit_pool_dyn      <- lm(ln_gdpw ~ ln_gdpw_lag + reg * edt + oil,
                        data = db_pool)

# (3) FD + AR(1): outcome is Delta y; add lagged Delta y to absorb growth-rate dynamics.
fit_pool_fd_dyn   <- lm(d_ln_gdpw ~ d_ln_gdpw_lag + reg * edt + oil,
                        data = db_pool)

# (4) FD + ARDL(1,1): also lag the treatment and its interaction with edt.
fit_pool_fd_ardl11 <- lm(d_ln_gdpw ~ d_ln_gdpw_lag +
                                     reg + reg_lag +
                                     edt + edt_lag +
                                     reg:edt + reg_lag:edt_lag +
                                     oil,
                         data = db_pool)
show_reg(
  list(fit_pool, fit_pool_dyn, fit_pool_fd_dyn, fit_pool_fd_ardl11),
  custom.model.names = c("(1) Static", "(2) AR(1) level",
                         "(3) FD + AR(1)", "(4) FD + ARDL(1,1)"),
  custom.coef.map = list(
    "(Intercept)"     = "(Intercept)",
    "ln_gdpw_lag"     = "y_lag (φ, level)",
    "d_ln_gdpw_lag"   = "Δy_lag (φ, FD)",
    "reg"             = "reg",
    "reg_lag"         = "reg_lag",
    "edt"             = "edt",
    "edt_lag"         = "edt_lag",
    "reg:edt"         = "reg × edt",
    "reg_lag:edt_lag" = "reg_lag × edt_lag",
    "oil"             = "oil"
  ),
  digits = 3
)
Statistical models
 
  1. Static
  1. AR(1) level
  1. FD + AR(1)
  1. FD + ARDL(1,1)
(Intercept) 7.277*** 0.031* 0.016*** 0.016***
  (0.028) (0.014) (0.003) (0.003)
y_lag (φ, level)   0.998***    
    (0.002)    
Δy_lag (φ, FD)     0.151*** 0.151***
      (0.019) (0.019)
reg 0.633*** 0.005 0.003 0.012
  (0.059) (0.006) (0.006) (0.019)
reg_lag       -0.010
        (0.019)
edt 0.233*** 0.001 0.000 0.002
  (0.006) (0.001) (0.001) (0.007)
edt_lag       -0.002
        (0.007)
reg × edt -0.027** -0.001 -0.000 -0.001
  (0.009) (0.001) (0.001) (0.004)
reg_lag × edt_lag       0.001
        (0.004)
oil 0.752*** -0.000 -0.002 -0.002
  (0.037) (0.004) (0.004) (0.004)
R2 0.656 0.996 0.023 0.023
Adj. R2 0.656 0.996 0.021 0.020
Num. obs. 2734 2734 2734 2734
***p < 0.001; **p < 0.01; *p < 0.05

Reading.

  • (1) Static: the moderation reg × edt and the main effects on reg and edt are identified mostly off cross-country variation.
  • (2) AR(1) level: \(\hat\phi\) on ln_gdpw_lag is close to 1 — the lagged DV absorbs essentially all variance in \(y_{it}\), contemporaneous coefficients shrink toward zero, and \(\hat\beta/(1-\hat\phi)\) is unstable. This is the nonstationarity signature §1.3.4 warned about.
  • (3) FD + AR(1): outcome is now \(\Delta y\). \(\hat\phi\) on \(\Delta y_{i,t-1}\) is well below 1 — growth rates mean-revert. Coefficients are interpretable as effects on the growth rate.
  • (4) FD + ARDL(1,1): generalizes (3). The lagged reg and lagged reg × edt terms show whether democratization’s growth-rate effect — and its moderation by education — spreads over time.

The cleanest specification is column (4): FD + ARDL(1,1). We pick this as the best pooled model going forward.

2.1.1 Residual diagnostic for the chosen model

We check for residual autocorrelation in the FD + ARDL(1,1) pooled model with a panel Breusch-Godfrey test (plm::pbgtest), which respects the country structure of the residuals.

# Refit as a plm pooling model so pbgtest and the panel SE corrections work.
fit_pool_fd_ardl11_plm <- plm(
  d_ln_gdpw ~ d_ln_gdpw_lag +
              reg + reg_lag +
              edt + edt_lag +
              reg:edt + reg_lag:edt_lag +
              oil,
  data  = db_pool,
  index = c("country_f", "year_f"),
  model = "pooling"
)

pbgtest(fit_pool_fd_ardl11_plm, order = 2)
## 
##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models
## 
## data:  d_ln_gdpw ~ d_ln_gdpw_lag + reg + reg_lag + edt + edt_lag + reg:edt +  ...
## chisq = 7.4933, df = 2, p-value = 0.0236
## alternative hypothesis: serial correlation in idiosyncratic errors

If the test rejects, residuals still carry serial correlation after the ARDL(1,1) parameterization — common in macro-panels because the dynamics in (4) are a one-equation approximation, not the full DGP. The fix is to leave the point estimates alone and adjust the standard errors.

2.1.2 Standard errors — choice and justification

OLS point estimates are unbiased even with autocorrelated and cross-sectionally dependent errors; the SEs are not. Three options for the panel SE, all available via plm:

ses_pool_fd <- list(
  cluster  = vcovHC(fit_pool_fd_ardl11_plm, method = "arellano"),
  pcse     = vcovBK(fit_pool_fd_ardl11_plm),
  driscoll = vcovSCC(fit_pool_fd_ardl11_plm)
)

target_pool <- c("d_ln_gdpw_lag", "reg", "reg_lag",
                 "reg:edt", "reg_lag:edt_lag")

ses_pool_tab <- map(ses_pool_fd, function(V)
                       coeftest(fit_pool_fd_ardl11_plm, vcov. = V)[target_pool, 2])

bind_cols(coefficient = target_pool, as_tibble(ses_pool_tab)) |>
  kable(digits = 4,
        caption = "SE corrections on the FD + ARDL(1,1) pooled model.") |>
  kable_styling(full_width = FALSE)
SE corrections on the FD + ARDL(1,1) pooled model.
coefficient cluster pcse driscoll
d_ln_gdpw_lag 0.0356 0.0198 0.0411
reg 0.0210 0.0183 0.0236
reg_lag 0.0206 0.0177 0.0243
reg:edt 0.0040 0.0039 0.0043
reg_lag:edt_lag 0.0040 0.0039 0.0045

We adopt PCSE (Beck & Katz 1995) as the default for pooled OLS, with two reasons:

  1. Cluster-robust by country assumes independence across countries — too strong for a TSCS panel where macro shocks (oil crises, financial cycles, the Volcker disinflation) hit many countries in the same year. Cluster SEs would be artificially small.
  2. PCSE corrects for that contemporaneous cross-country dependence, which is the dominant pathology in TSCS data.

If the §2.1.1 BG test rejects strongly, we switch to Driscoll-Kraay instead — it layers serial correlation on top of cross-country dependence and is the most conservative choice. See §2.7 for the comprehensive SE menu and the same comparison applied to the FE specifications.

2.2 Fixed effects (within)

Equation (static). Each country gets its own intercept \(\alpha_i\), sweeping out time-invariant unit-level confounders:

\[y_{it} = \alpha_i + \beta_1\,\text{reg}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{reg}_{it}\!\cdot\!\text{edt}_{it}) + \varepsilon_{it}\]

Algebraically equivalent to the within transformation — demeaning every variable by its country mean:

\[\tilde y_{it} = y_{it} - \bar y_i,\qquad \tilde x_{it} = x_{it} - \bar x_i,\qquad \tilde y_{it} = \beta\,\tilde x_{it} + \tilde\varepsilon_{it}\]

Equation (dynamic — AR(1)). Country FE plus a lagged DV:

\[y_{it} = \alpha_i + \phi\,y_{i,t-1} + \beta_1\,\text{reg}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{reg}_{it}\!\cdot\!\text{edt}_{it}) + \varepsilon_{it}\]

Equation (dynamic — ARDL(1,1)). Add lagged reg, lagged edt, and the lagged interaction. With one lag of the treatment and the moderator, students can see whether the regime effect shows up contemporaneously, with a one-period delay, or both:

\[y_{it} = \alpha_i + \phi\,y_{i,t-1} + \beta_0\,\text{reg}_{it} + \beta_1\,\text{reg}_{i,t-1} + \beta_2\,\text{edt}_{it} + \beta_3\,\text{edt}_{i,t-1} + \gamma_0\,(\text{reg}_{it}\!\cdot\!\text{edt}_{it}) + \gamma_1\,(\text{reg}_{i,t-1}\!\cdot\!\text{edt}_{i,t-1}) + \varepsilon_{it}\]

Logic. Identifies \(\beta\) from within-country variation only — changes over time for the same country. Time-invariant oil is absorbed by \(\alpha_i\). The interaction reg × edt survives because both reg and edt are time-varying. Three quantities to read in the ARDL(1,1) column at education level \(\bar e\): the impact effect \(\hat\beta_0 + \hat\gamma_0\bar e\), the cumulative one-period effect \((\hat\beta_0 + \hat\beta_1) + (\hat\gamma_0 + \hat\gamma_1)\bar e\), and the long-run effect \(\big[(\hat\beta_0 + \hat\beta_1) + (\hat\gamma_0 + \hat\gamma_1)\bar e\big]/(1-\hat\phi)\). Nickell (1981) bias caveat: with country FE and short \(T\), \(\hat\phi\) is biased downward by \(\approx -(1+\phi)/(T-1)\) (Lab 6 will fix this with dynamic-panel GMM).

# Static FE: feols is our default — fast on large panels, native cluster SEs.
# `oil` is time-invariant in the panel and is silently absorbed by country FE.
fit_fe       <- feols(ln_gdpw ~ reg * edt + oil | country,
                      data = democracy, cluster = ~ country)

# Dynamic FE — AR(1): add the lagged outcome.
fit_fe_dyn   <- feols(ln_gdpw ~ ln_gdpw_lag + reg * edt + oil | country,
                      data = democracy_dyn, cluster = ~ country)

# Dynamic FE — ARDL(1,1): also add the lagged treatment, lagged moderator,
# and lagged interaction.
fit_fe_ardl11 <- feols(
  ln_gdpw ~ ln_gdpw_lag +
            reg + reg_lag +
            edt + edt_lag +
            reg:edt + reg_lag:edt_lag +
            oil |
            country,
  data = democracy_dyn, cluster = ~ country
)

# We will need the per-country intercepts from the FE-ARDL(1,1) in Part 3 (IRF).
fe_intercepts <- fixef(fit_fe_ardl11)$country
range(fe_intercepts)
## [1] 0.3793449 0.6484020
show_reg(
  list(fit_fe, fit_fe_dyn, fit_fe_ardl11),
  custom.model.names = c("Static", "AR(1)", "ARDL(1,1)"),
  custom.coef.map = list(
    "ln_gdpw_lag"     = "y_lag (φ)",
    "reg"             = "reg",
    "reg_lag"         = "reg_lag",
    "edt"             = "edt",
    "edt_lag"         = "edt_lag",
    "reg:edt"         = "reg × edt",
    "reg_lag:edt_lag" = "reg_lag × edt_lag"
  ),
  digits = 3
)
Statistical models
  Static AR(1) ARDL(1,1)
y_lag (φ)   0.944*** 0.943***
    (0.008) (0.008)
reg 0.117 -0.008 0.017
  (0.089) (0.012) (0.022)
reg_lag     -0.033
      (0.023)
edt 0.180*** -0.004 0.002
  (0.018) (0.003) (0.006)
edt_lag     -0.007
      (0.006)
reg × edt -0.029 0.001 -0.002
  (0.019) (0.002) (0.004)
reg_lag × edt_lag     0.004
      (0.004)
Num. obs. 2900 2847 2787
Num. groups: country 113 113 113
R2 (full model) 0.971 0.997 0.997
R2 (proj model) 0.429 0.938 0.933
Adj. R2 (full model) 0.970 0.997 0.997
Adj. R2 (proj model) 0.428 0.938 0.933
***p < 0.001; **p < 0.01; *p < 0.05

The same point estimates are also available via plm(model = "within") and via LSDV (lm(... + factor(country))); we use feols because it is faster and exposes cluster SEs natively. We re-fit with plm in §2.7 because vcovBK (PCSE) and vcovSCC (Driscoll–Kraay) require a plm object.

Reading. The static within fit shows whether the moderation reg × edt survives once between-country variation is stripped out. Adding the LDV concentrates variance on \(y_{i,t-1}\) and shrinks the contemporaneous coefficients into short-run effects. In the ARDL(1,1) fit, comparing the impact (\(\hat\beta_0\), \(\hat\gamma_0\)) to the one-period cumulative (\(\hat\beta_0 + \hat\beta_1\), \(\hat\gamma_0 + \hat\gamma_1\)) tells us whether democratization’s effect — and its moderation by education — concentrates at impact or spreads over the next year.

2.3 Two-way fixed effects

Equation (static). Country and year each get their own intercept:

\[ y_{it} = \alpha_i + \lambda_t + \beta_1\,\text{reg}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{reg}_{it}\!\cdot\!\text{edt}_{it}) + \varepsilon_{it} \]

Equation (dynamic — AR(1)). Add the lagged DV:

\[ y_{it} = \alpha_i + \lambda_t + \phi\,y_{i,t-1} + \beta_1\,\text{reg}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{reg}_{it}\!\cdot\!\text{edt}_{it}) + \varepsilon_{it} \]

Equation (dynamic — ARDL(1,1)). Also lag the treatment, the moderator, and the interaction:

\[ y_{it} = \alpha_i + \lambda_t + \phi\,y_{i,t-1} + \beta_0\,\text{reg}_{it} + \beta_1\,\text{reg}_{i,t-1} + \beta_2\,\text{edt}_{it} + \beta_3\,\text{edt}_{i,t-1} + \gamma_0\,(\text{reg}_{it}\!\cdot\!\text{edt}_{it}) + \gamma_1\,(\text{reg}_{i,t-1}\!\cdot\!\text{edt}_{i,t-1}) + \varepsilon_{it} \]

Logic. Country FE removes time-invariant country differences; year FE removes shocks common to all countries in the same year. Useful when common global events affect the outcome, but Kropko & Kubinec’s warning applies: the TWFE coefficient is not simply “the effect of \(x\) controlling for country and year.” It is the slope after removing both country and year means — use it only when that comparison matches the research question.

# Static TWFE: country + year intercepts. `oil` is time-invariant and absorbed.
fit_twfe        <- feols(ln_gdpw ~ reg * edt + oil |
                                   country + year,
                         data = democracy, cluster = ~ country)

# Dynamic TWFE — AR(1): add the lagged outcome.
fit_twfe_dyn    <- feols(ln_gdpw ~ ln_gdpw_lag + reg * edt + oil |
                                   country + year,
                         data = democracy_dyn, cluster = ~ country)

# Dynamic TWFE — ARDL(1,1): also add lagged reg, lagged edt, and lagged interaction.
fit_twfe_ardl11 <- feols(
  ln_gdpw ~ ln_gdpw_lag +
            reg + reg_lag +
            edt + edt_lag +
            reg:edt + reg_lag:edt_lag +
            oil |
            country + year,
  data = democracy_dyn, cluster = ~ country
)
show_reg(
  list(fit_twfe, fit_twfe_dyn, fit_twfe_ardl11),
  custom.model.names = c("Static", "AR(1)", "ARDL(1,1)"),
  custom.coef.map = list(
    "ln_gdpw_lag"     = "y_lag (φ)",
    "reg"             = "reg",
    "reg_lag"         = "reg_lag",
    "edt"             = "edt",
    "edt_lag"         = "edt_lag",
    "reg:edt"         = "reg × edt",
    "reg_lag:edt_lag" = "reg_lag × edt_lag"
  ),
  digits = 3
)
Statistical models
  Static AR(1) ARDL(1,1)
y_lag (φ)   0.927*** 0.926***
    (0.011) (0.011)
reg 0.096 0.001 0.016
  (0.073) (0.013) (0.022)
reg_lag     -0.020
      (0.022)
edt 0.030 -0.003 -0.002
  (0.022) (0.003) (0.006)
edt_lag     -0.002
      (0.006)
reg × edt -0.015 0.000 -0.001
  (0.016) (0.003) (0.004)
reg_lag × edt_lag     0.002
      (0.004)
Num. obs. 2900 2847 2787
Num. groups: country 113 113 113
Num. groups: year 28 28 27
R2 (full model) 0.980 0.997 0.997
R2 (proj model) 0.010 0.847 0.844
Adj. R2 (full model) 0.979 0.997 0.997
Adj. R2 (proj model) 0.009 0.847 0.844
***p < 0.001; **p < 0.01; *p < 0.05

Reading. Compared to country-only FE, TWFE absorbs additional variance through the year intercepts \(\lambda_t\) — these capture global shocks (oil crises, financial cycles) that affect every country at once. The dynamic columns show the same shrinkage of contemporaneous coefficients we saw in §2.2 once \(y_{i,t-1}\) enters. The ARDL(1,1) row gives a cleaner read on the timing: near-zero reg_lag and reg_lag × edt_lag say democratization’s effect (and its moderation) are contemporaneous; nonzero values say they spread over the next year.

2.4 Random effects

Equation (static). Country intercepts are random draws from a common population:

\[ y_{it} = \alpha + \beta_1\,\text{reg}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{reg}_{it}\!\cdot\!\text{edt}_{it}) + \beta_4\,\text{oil}_i + (u_i + \varepsilon_{it}),\qquad u_i \sim \mathcal{N}(0,\sigma_\alpha^2) \]

Logic. \(\hat\beta\) is a matrix-weighted average of within and between estimators, with weight

\[ \theta = 1 - \sqrt{\frac{\sigma_\varepsilon^2}{\sigma_\varepsilon^2 + T\,\sigma_\alpha^2}} \]

— so \(\theta=0\) collapses to pooled OLS and \(\theta=1\) collapses to FE. RE is partial pooling toward the population mean. Two key features: RE estimates the coefficient on oil (FE cannot, since oil is time-invariant), at the cost of needing \(u_i \perp X_i\) — which the §1.1 DAG says is suspect.

We fit two versions:

  • Static: \(y_{it} = \alpha + X_{it}\beta + (u_i + \varepsilon_{it})\). No dynamics in the conditional mean.
  • corARMA(1): \(y_{it} = \alpha + X_{it}\beta + u_{it}\) with \(u_{it} = \rho u_{i,t-1} + \varepsilon_{it}\). Dynamic coefficient on the residuals. \(\beta\) is already long-run by construction — the conditional mean isn’t lagged.

Why no LDV in RE here. A lagged DV in an RE model carries the Wooldridge (2005) initial-conditions issue: \(y_{i,0}\) is correlated with the random intercept \(u_i\), so the lagged DV is not strictly exogenous in the RE framework. The methodologically clean fix is Wooldridge’s correlated-random-effects setup or Arellano-Bond GMM (Lab 6). For RE here, corARMA(1) handles the persistence in the residuals without disturbing the random-intercept structure — the natural dynamic version, just like for Mundlak/REWB in §2.5.

# Static RE.
fit_re         <- lme(
  fixed     = ln_gdpw ~ reg * edt + oil,
  random    = ~ 1 | country,
  data      = democracy,
  na.action = na.omit,
  method    = "REML"
)

# RE — corARMA(1): AR(1) on the residuals, no lagged DV.
fit_re_corar   <- lme(
  fixed       = ln_gdpw ~ reg * edt + oil,
  random      = ~ 1 | country,
  correlation = corARMA(form = ~ year | country, p = 1, q = 0),
  data        = democracy,
  na.action   = na.omit,
  method      = "REML"
)
show_reg(
  list(fit_re, fit_re_corar),
  custom.model.names = c("Static", "corARMA(1)"),
  custom.coef.map = list(
    "(Intercept)" = "(Intercept)",
    "reg"         = "reg",
    "edt"         = "edt",
    "reg:edt"     = "reg × edt",
    "oil"         = "oil"
  ),
  digits = 3
)
Statistical models
  Static corARMA(1)
(Intercept) 7.703*** 8.238***
  (0.071) (0.095)
reg 0.119*** 0.059**
  (0.030) (0.019)
edt 0.183*** 0.058***
  (0.004) (0.006)
reg × edt -0.027*** -0.012**
  (0.006) (0.004)
oil 0.539** 0.257
  (0.201) (0.267)
AIC -1009.712 -7024.916
BIC -967.916 -6977.150
Log Likelihood 511.856 3520.458
Num. obs. 2900 2900
Num. groups: country 113 113
***p < 0.001; **p < 0.01; *p < 0.05

Reading. Compare the reg row across the two columns. In the static fit, \(\hat\beta_{\text{reg}}\) is the partially-pooled level effect with no dynamic correction; the residuals are assumed iid which is implausible here (§1.3.4 found I(1)-like persistence). In corARMA(1), \(\hat\beta_{\text{reg}}\) is the partially-pooled level effect with an AR(1) error structure — it is already long-run by construction. Adding corARMA(1) typically widens SEs (the residuals are now allowed to be persistent rather than iid), but should not move the within-between mixture much. The §3.5 IRF below gets the dynamic story from the FE-ARDL(1,1), where the lagged-DV form gives a clean closed-form impulse response.

2.5 Mundlak / REWB — within and between, separately

Equation (static). Decompose every time-varying regressor — including the interaction reg × edt — into within and between components, and put both in the model with separate coefficients:

\[ y_{it} = \alpha + \beta_W^{\text{reg}}\,\text{reg}^{W}_{it} + \beta_B^{\text{reg}}\,\overline{\text{reg}}_i + \beta_W^{\text{edt}}\,\text{edt}^{W}_{it} + \beta_B^{\text{edt}}\,\overline{\text{edt}}_i + \gamma_W\,(\text{reg}\!\cdot\!\text{edt})^{W}_{it} + \gamma_B\,\overline{(\text{reg}\!\cdot\!\text{edt})}_i + \beta_4\,\text{oil}_i + (u_i + \varepsilon_{it}) \]

where \(x^{W}_{it} = x_{it} - \bar x_i\) and we treat \(\text{reg}\cdot\text{edt}\) as a single variable to be demeaned, then split.

Equation (dynamic — corARMA(1)). Same equation; AR(1) on the residuals:

\[ \dots + u_{it}, \qquad u_{it} = \rho u_{i,t-1} + \varepsilon_{it} \]

Logic. This is Mundlak (1978), repackaged by Bell & Jones (2015) as REWB (random effects within-between). The within-mean coefficient \(\beta_W\) picks up the FE-style identification; the between-mean coefficient \(\beta_B\) picks up the cross-sectional contrast that pure RE was using implicitly. We see both, and \(\beta_W = \beta_B\) is testable (Hausman / Mundlak diagnostic in §2.6). Bell & Jones argue REWB should be the default — it returns more information than FE or RE alone.

Why no LDV here. Adding \(y_{i,t-1}\) to the within-between decomposition either (a) imports a between component into a model whose whole point is the within-between split, or (b) creates mechanical collinearity if you decompose the lag itself (\(\bar y_i\) is the empirical analog of \(u_i\)). corARMA absorbs persistence in the residuals without disturbing the decomposition, so it is the natural dynamic version for REWB.

# parameters::demean splits each time-varying regressor into _within and _between.
# Because `reg × edt` involves two time-varying regressors, we precompute the
# product `reg_x_edt` at the country-year level, then demean it as a single
# variable. `oil` is time-invariant — no within variation — so it enters as a
# single between-only term.
mund <- democracy |>
  drop_na(ln_gdpw, reg, edt, oil) |>
  mutate(reg_x_edt = reg * edt) |>
  parameters::demean(select = c("reg", "edt", "reg_x_edt"),
                     by = "country", verbose = FALSE)

# Static REWB: within + between coefficients estimated separately,
# RE intercept retained.
fit_mundlak       <- lme(
  fixed     = ln_gdpw ~ reg_within       + reg_between +
                       edt_within       + edt_between +
                       reg_x_edt_within + reg_x_edt_between +
                       oil,
  random    = ~ 1 | country,
  data      = mund,
  na.action = na.omit,
  method    = "REML"
)

# Dynamic REWB via corARMA(1). Persistence absorbed in the residuals,
# within/between decomposition preserved.
fit_mundlak_corar <- lme(
  fixed       = ln_gdpw ~ reg_within       + reg_between +
                         edt_within       + edt_between +
                         reg_x_edt_within + reg_x_edt_between +
                         oil,
  random      = ~ 1 | country,
  correlation = corARMA(form = ~ year | country, p = 1, q = 0),
  data        = mund,
  na.action   = na.omit,
  method      = "REML"
)
show_reg(
  list(fit_mundlak, fit_mundlak_corar),
  custom.model.names = c("Static", "+ corARMA(1)"),
  custom.coef.map = list(
    "(Intercept)"        = "(Intercept)",
    "reg_within"         = "reg (within)",
    "reg_between"        = "reg (between)",
    "edt_within"         = "edt (within)",
    "edt_between"        = "edt (between)",
    "reg_x_edt_within"   = "reg × edt (within)",
    "reg_x_edt_between"  = "reg × edt (between)",
    "oil"                = "oil"
  ),
  digits = 3
)
Statistical models
  Static
  • corARMA(1)
(Intercept) 7.216*** 7.141***
  (0.130) (0.134)
reg (within) 0.117*** 0.055**
  (0.030) (0.019)
reg (between) 0.927** 0.907**
  (0.324) (0.334)
edt (within) 0.180*** 0.051***
  (0.004) (0.006)
edt (between) 0.228*** 0.226***
  (0.030) (0.031)
reg × edt (within) -0.029*** -0.012**
  (0.006) (0.004)
reg × edt (between) -0.044 -0.037
  (0.047) (0.049)
oil 0.718*** 0.652***
  (0.170) (0.175)
AIC -1035.523 -7109.690
BIC -975.826 -7044.023
Log Likelihood 527.762 3565.845
Num. obs. 2900 2900
Num. groups: country 113 113
***p < 0.001; **p < 0.01; *p < 0.05

Reading. Six coefficients on the regime/education block, instead of three:

  • reg_within / reg_between — within-country effect of becoming democratic vs the cross-country slope.
  • edt_within / edt_between — within-country effect of rising education vs the cross-country slope.
  • reg_x_edt_within / reg_x_edt_between — does the moderation by education operate within countries (rising education changes the regime effect over time within the same country) or between (more-educated countries have a different regime-growth slope on average)?

Time-invariant oil retains its main effect (RE’s flexibility, FE’s shortcoming). For our research question — the within effect of regime change and its moderation by educationreg_within and reg_x_edt_within are the relevant estimates. Adding corARMA(1) typically widens SEs but should not move the within-between split materially.

2.6 Diagnostic tests — does within agree with between?

Two formal tests targeting the same hypothesis:

# Mundlak Wald: jointly tests reg_within = reg_between, and similarly for the
# moderation and edt. The contrast matrix L picks out (within - between) for
# each regressor; W = (Lb)' (LVL')^{-1} (Lb) is chi-square under H0.
b <- fixef(fit_mundlak)
V <- vcov(fit_mundlak)

contrasts_named <- list(
  c("reg_within"        = 1, "reg_between"        = -1),
  c("edt_within"        = 1, "edt_between"        = -1),
  c("reg_x_edt_within"  = 1, "reg_x_edt_between"  = -1)
)
L <- t(sapply(contrasts_named, function(cn) {
  row <- setNames(rep(0, length(b)), names(b))
  row[names(cn)] <- cn
  row
}))

delta <- L %*% b
W     <- t(delta) %*% solve(L %*% V %*% t(L)) %*% delta
sprintf("Mundlak Wald = %.2f, df = %d, p = %.3f",
        W, nrow(L), 1 - pchisq(W, df = nrow(L)))
## [1] "Mundlak Wald = 52.04, df = 3, p = 0.000"
# Classical Hausman: compares within and RE coefficients on time-varying
# regressors only. Coarser than the Mundlak Wald above, but the two should
# agree in direction. (`oil` is time-invariant; FE drops it, RE estimates it.)
fit_fe_plm <- plm(ln_gdpw ~ reg * edt + oil,
                  data = democracy, index = c("country_f", "year_f"),
                  model = "within")
fit_re_plm <- plm(ln_gdpw ~ reg * edt + oil,
                  data = democracy, index = c("country_f", "year_f"),
                  model = "random")
phtest(fit_fe_plm, fit_re_plm)
## 
##  Hausman Test
## 
## data:  ln_gdpw ~ reg * edt + oil
## chisq = 61.75, df = 3, p-value = 2.485e-13
## alternative hypothesis: one model is inconsistent

The Mundlak Wald is more granular — it tests each within-vs-between contrast individually. Hausman is the classical aggregate version. They should agree in direction.

Important caveat (Bell & Jones 2015). Do not use Hausman as a gatekeeper that “picks” FE vs RE. Even when it rejects, REWB is often the better choice because it returns both within and between estimates. Treat the test as a diagnostic — does my within story differ from my between story? — not a model-selection switch.

2.7 Standard errors when residuals are not iid

OLS / pooled / FE point estimates are unbiased under exogeneity even if errors are autocorrelated and cross-sectionally dependent. The standard errors are not. Lab 1 §3 covered cluster-robust SEs in detail; this section adds the two panel-specific corrections we need here.

When to use which panel SE.
SE Allows When
Cluster-robust by unit arbitrary within-country correlation many clusters (≥30-50), residuals roughly independent across countries
PCSE (Beck & Katz 1995) contemporaneous cross-sectional correlation many countries, TSCS data, residuals correlated across countries in the same year
Driscoll–Kraay (1998) cross-sectional + serial correlation, any pattern moderate-to-large T; the safest default when both pathologies are present

Cluster-robust SEs assume independence across clusters, which is fine for classrooms and survey strata but will under-state uncertainty in a TSCS panel where macroeconomic shocks hit many countries in the same year. PCSE corrects for that contemporaneous cross-country dependence; Driscoll–Kraay then layers serial correlation on top.

# vcovBK and vcovSCC require a plm object; use fit_fe_plm fitted in §2.6.
ses <- list(
  cluster  = vcovHC(fit_fe_plm, method = "arellano"),   # cluster by unit
  pcse     = vcovBK(fit_fe_plm),                         # Beck-Katz PCSE
  driscoll = vcovSCC(fit_fe_plm)                         # Driscoll-Kraay
)
target <- c("reg", "edt", "reg:edt")
ses_tab <- map(ses, function(V)
                coeftest(fit_fe_plm, vcov. = V)[target, 2])
bind_cols(coefficient = target, as_tibble(ses_tab)) |>
  kable(digits = 4,
        caption = "Standard errors on the within-FE estimate of (reg, edt, reg:edt) under three corrections.") |>
  kable_styling(full_width = FALSE)
Standard errors on the within-FE estimate of (reg, edt, reg:edt) under three corrections.
coefficient cluster pcse driscoll
reg 0.0888 0.0821 0.0628
edt 0.0175 0.0163 0.0231
reg:edt 0.0186 0.0175 0.0081

PCSE and Driscoll–Kraay are typically a notch larger than the cluster-robust SE here, exactly because they admit cross-country dependence that cluster-by-unit assumes away.

2.8 Robustness — does the conclusion travel across specifications?

We close Part 2 with two side-by-side comparison tables:

  • Table 1 — static specs. What the §1.1 within-vs-between framing looks like before any dynamic correction. The pure cross-country contrast (Pooled OLS) versus the pure within contrast (FE) versus the partial-pooling (RE) versus the explicit decomposition (Mundlak / REWB).
  • Table 2 — dynamic specs (best fit / most plausible). ARDL(1,1) for Pooled / FE / TWFE — these include both the lagged DV and the lagged treatment, so each contemporaneous coefficient is a clean short-run effect with a long-run reconstruction \((\hat\beta_0 + \hat\beta_1)/(1-\hat\phi)\). RE and Mundlak / REWB use corARMA(1) instead — LDV in RE faces the Wooldridge initial-conditions problem, and LDV in REWB breaks the within-between decomposition.
show_reg(
  list(fit_pool, fit_fe, fit_twfe, fit_re, fit_mundlak),
  custom.model.names = c("Pooled OLS", "FE (within)", "TWFE", "RE", "Mundlak / REWB"),
  custom.coef.map    = list(
    reg                  = "reg",
    edt                  = "edt",
    "reg:edt"            = "reg x edt",
    oil                  = "oil",
    reg_within           = "reg (within)",
    reg_between          = "reg (between)",
    edt_within           = "edt (within)",
    edt_between          = "edt (between)",
    reg_x_edt_within     = "reg x edt (within)",
    reg_x_edt_between    = "reg x edt (between)"
  ),
  digits = 3,
  caption = "Table 1: static specifications."
)
Table 1: static specifications.
  Pooled OLS FE (within) TWFE RE Mundlak / REWB
reg 0.633*** 0.117 0.096 0.119***  
  (0.059) (0.089) (0.073) (0.030)  
edt 0.233*** 0.180*** 0.030 0.183***  
  (0.006) (0.018) (0.022) (0.004)  
reg x edt -0.027** -0.029 -0.015 -0.027***  
  (0.009) (0.019) (0.016) (0.006)  
oil 0.752***     0.539** 0.718***
  (0.037)     (0.201) (0.170)
reg (within)         0.117***
          (0.030)
reg (between)         0.927**
          (0.324)
edt (within)         0.180***
          (0.004)
edt (between)         0.228***
          (0.030)
reg x edt (within)         -0.029***
          (0.006)
reg x edt (between)         -0.044
          (0.047)
R2 0.656        
Adj. R2 0.656        
Num. obs. 2734 2900 2900 2900 2900
Num. groups: country   113 113 113 113
R2 (full model)   0.971 0.980    
R2 (proj model)   0.429 0.010    
Adj. R2 (full model)   0.970 0.979    
Adj. R2 (proj model)   0.428 0.009    
Num. groups: year     28    
AIC       -1009.712 -1035.523
BIC       -967.916 -975.826
Log Likelihood       511.856 527.762
***p < 0.001; **p < 0.01; *p < 0.05
# Pooled OLS uses the FD + ARDL(1,1) model from §2.1 (the level ARDL(1,1) was
# nonstationary). FE/TWFE/RE use the level ARDL(1,1). Mundlak uses corARMA(1)
# since LDV breaks the within/between decomposition.
show_reg(
  list(fit_pool_fd_ardl11, fit_fe_ardl11, fit_twfe_ardl11, fit_re_corar,
       fit_mundlak_corar),
  custom.model.names = c("Pooled — FD ARDL(1,1)", "FE — ARDL(1,1)",
                         "TWFE — ARDL(1,1)", "RE — corARMA(1)",
                         "Mundlak — corARMA(1)"),
  custom.coef.map    = list(
    "ln_gdpw_lag"        = "y_lag (φ, level)",
    "d_ln_gdpw_lag"      = "Δy_lag (φ, FD)",
    reg                  = "reg",
    reg_lag              = "reg_lag",
    edt                  = "edt",
    edt_lag              = "edt_lag",
    "reg:edt"            = "reg x edt",
    "reg_lag:edt_lag"    = "reg_lag x edt_lag",
    oil                  = "oil",
    reg_within           = "reg (within)",
    reg_between          = "reg (between)",
    edt_within           = "edt (within)",
    edt_between          = "edt (between)",
    reg_x_edt_within     = "reg x edt (within)",
    reg_x_edt_between    = "reg x edt (between)"
  ),
  digits = 3,
  caption = "Table 2: dynamic specifications. Pooled OLS uses the FD + ARDL(1,1) picked in §2.1 (its outcome is Δy, hence the 'φ, FD' row). FE/TWFE use the level ARDL(1,1). RE and Mundlak use corARMA(1) — LDV in RE faces the Wooldridge initial-conditions issue; LDV in REWB breaks the within-between decomposition."
)
Table 2: dynamic specifications. Pooled OLS uses the FD + ARDL(1,1) picked in §2.1 (its outcome is Δy, hence the ‘φ, FD’ row). FE/TWFE use the level ARDL(1,1). RE and Mundlak use corARMA(1) — LDV in RE faces the Wooldridge initial-conditions issue; LDV in REWB breaks the within-between decomposition.
  Pooled — FD ARDL(1,1) FE — ARDL(1,1) TWFE — ARDL(1,1) RE — corARMA(1) Mundlak — corARMA(1)
y_lag (φ, level)   0.943*** 0.926***    
    (0.008) (0.011)    
Δy_lag (φ, FD) 0.151***        
  (0.019)        
reg 0.012 0.017 0.016 0.059**  
  (0.019) (0.022) (0.022) (0.019)  
reg_lag -0.010 -0.033 -0.020    
  (0.019) (0.023) (0.022)    
edt 0.002 0.002 -0.002 0.058***  
  (0.007) (0.006) (0.006) (0.006)  
edt_lag -0.002 -0.007 -0.002    
  (0.007) (0.006) (0.006)    
reg x edt -0.001 -0.002 -0.001 -0.012**  
  (0.004) (0.004) (0.004) (0.004)  
reg_lag x edt_lag 0.001 0.004 0.002    
  (0.004) (0.004) (0.004)    
oil -0.002     0.257 0.652***
  (0.004)     (0.267) (0.175)
reg (within)         0.055**
          (0.019)
reg (between)         0.907**
          (0.334)
edt (within)         0.051***
          (0.006)
edt (between)         0.226***
          (0.031)
reg x edt (within)         -0.012**
          (0.004)
reg x edt (between)         -0.037
          (0.049)
R2 0.023        
Adj. R2 0.020        
Num. obs. 2734 2787 2787 2900 2900
Num. groups: country   113 113 113 113
R2 (full model)   0.997 0.997    
R2 (proj model)   0.933 0.844    
Adj. R2 (full model)   0.997 0.997    
Adj. R2 (proj model)   0.933 0.844    
Num. groups: year     27    
AIC       -7024.916 -7109.690
BIC       -6977.150 -7044.023
Log Likelihood       3520.458 3565.845
***p < 0.001; **p < 0.01; *p < 0.05

Headline reading.

  • Static table. Compare the pooled-OLS reg × edt to its FE counterpart: if the moderation collapses under FE, it was driven by between-country variation (more-educated countries differ from less-educated countries in their average regime-growth slope). If it survives, the moderation is a within-country phenomenon (rising education within a country changes the regime effect). The Mundlak / REWB split (reg × edt (within) vs reg × edt (between)) makes which story is doing the work explicit.
  • Dynamic table. Once a lagged DV is included, every contemporaneous coefficient becomes a short-run effect; long-run reconstruction multiplies by \(1/(1-\hat\phi)\). The within-vs-between pattern from the static table should survive. The lagged-treatment columns (reg_lag, reg_lag × edt_lag) tell us whether democracy’s effect — and its moderation — is contemporaneous or spreads over a year.
  • For our research questionthe within effect of democratization, conditional on education — the within-block estimates are the relevant ones, dynamic or static. The §3.5 IRF below computes the long-run trajectory at two values of education to make the moderation visible.

This robustness pattern — “the within story is consistent across FE/TWFE/REWB; the between story is visible in pooled OLS and REWB’s between block” — is exactly why Kropko & Kubinec (2020) tell us to state the comparison our coefficient is making, rather than picking one estimator and pretending the others were wrong.


Part 3 — Counterfactual prediction with fixed effects

Lab 4 introduced parametric simulation for a single time series. Part 3 extends the same logic to panels, but we keep the estimand fixed-effects-centered:

  1. A one-country dynamic FE forecast asks what happens to a particular country’s expected path if it democratizes after the observed sample.
  2. A TWFE average marginal expectation asks what the average one-step expected difference is if observed country-years are set to democracy versus autocracy, holding country and year effects fixed.

These are different quantities. The first is a future forecast. The second is an average within-sample contrast. TWFE is much easier to defend for the second quantity because the year fixed effects are already observed; future year effects are not.

3.1 Reuse the dynamic fits from Part 2

Part 2 already fit the dynamic country-FE and TWFE models we need here. We just pull out the persistence coefficients for reference:

phi_fe   <- coef(fit_fe_dyn)["ln_gdpw_lag"]
phi_twfe <- coef(fit_twfe_dyn)["ln_gdpw_lag"]
sprintf("phi_fe = %.3f,  phi_twfe = %.3f", phi_fe, phi_twfe)
## [1] "phi_fe = 0.944,  phi_twfe = 0.927"

Nickell (1981) bias caveat. With small \(T\) and country FE, the within-FE estimate of \(\hat\phi\) is biased downward by approximately \(-(1+\phi)/(T-1)\). For our \(T_i \in [10, 30]\), the bias is not trivial. Lab 6 will introduce dynamic-panel GMM to address it. For today, the goal is interpretation and simulation, not a final causal estimate.

3.2 One-country forecast scenario

The first counterfactual question is:

If one historically authoritarian country democratized permanently after the observed sample, what would its expected log-GDP-per-worker path look like?

We choose an always-autocratic country with the longest observed history. That gives us a clean baseline and a well-defined last observed value for the dynamic recursion.

last_year <- max(democracy$year)
maxT      <- max(table(democracy$country))

ever_aut <- democracy |>
  group_by(country, ctyname) |>
  summarise(always_aut = all(reg == 0, na.rm = TRUE),
            n_obs       = n(),
            n_yreg      = sum(!is.na(ln_gdpw) & !is.na(reg)),
            last_obs    = max(year), .groups = "drop") |>
  filter(always_aut, n_obs == maxT, n_yreg == maxT, last_obs == last_year)

focus_country <- ever_aut$ctyname[1]
focus_id      <- ever_aut$country[1]

sprintf("Focus country: %s (id = %d)", focus_country, focus_id)
## [1] "Focus country: Egypt (id = 14)"
H              <- 20                                # forecast horizon (years)
focus_df       <- democracy |> filter(country == focus_id) |> arrange(year)
y_anchor       <- tail(focus_df$ln_gdpw, 1)         # last observed log GDP
edt_hold       <- tail(na.omit(focus_df$edt), 1)    # last non-NA education

sprintf("y_anchor = %.3f, edt_hold = %.3f", y_anchor, edt_hold)
## [1] "y_anchor = 8.838, edt_hold = 5.390"

3.3 Manual FE forecast with mvrnorm

The dynamic country-FE recursion is:

\[ y_t = \hat\alpha_i + \hat\phi y_{t-1} + \hat\beta_{\text{reg}}\cdot 1 + \hat\beta_{\text{edt}}\cdot \text{edt}_T + \hat\beta_{\text{int}}\cdot 1\cdot \text{edt}_T. \]

The counterfactual fixes reg = 1 for every forecast year, holds education at the last observed value, and initializes the recursion at the last observed log GDP.

set.seed(2025)
S        <- 1000
alpha_fe <- fixef(fit_fe_dyn)$country[as.character(focus_id)]
draws_fe <- mvrnorm(n = S, mu = coef(fit_fe_dyn), Sigma = vcov(fit_fe_dyn))

# Store one simulated forecast path in each row.
sim_fe_manual <- matrix(NA_real_, nrow = S, ncol = H)

for (s in seq_len(S)) {
  # Extract one draw of the dynamic FE coefficients.
  b <- draws_fe[s, ]

  # Start the recursion at the last observed outcome.
  y_lag <- y_anchor

  for (h in seq_len(H)) {
    # Expected value in forecast year h under permanent democratization.
    # The stochastic innovation has expectation zero, so this is an expected
    # counterfactual path rather than a predicted value path.
    y_now <- alpha_fe +
      b["ln_gdpw_lag"] * y_lag +
      b["reg"]         * 1 +
      b["edt"]         * edt_hold +
      b["reg:edt"]     * 1 * edt_hold

    # Save and update the lag for the next forecast year.
    sim_fe_manual[s, h] <- y_now
    y_lag <- y_now
  }
}

fe_manual_band <- tibble(
  year   = (last_year + 1):(last_year + H),
  pe     = apply(sim_fe_manual, 2, mean),
  lo     = apply(sim_fe_manual, 2, quantile, 0.025),
  hi     = apply(sim_fe_manual, 2, quantile, 0.975),
  method = "Manual FE"
)

We hold \(\hat\alpha_i\) fixed across simulations. That keeps the example focused on coefficient uncertainty and dynamic propagation. A more advanced version could also simulate uncertainty in the estimated country intercept.

3.4 The same FE forecast with simcf

ldvsimev() can reproduce the same expected path if we pass the country FE as the model constant. The lagged dependent variable coefficient is passed separately as phi, so it should not appear in the simulated coefficient matrix b.

# Build the hypothetical-X matrix from the non-lagged covariates.
formula_fe_static <- ln_gdpw ~ reg * edt
xhyp_fe <- cfMake(formula_fe_static, data = democracy_dyn, nscen = H)

# Set the same permanent-democracy scenario for each forecast year.
for (h in seq_len(H)) {
  xhyp_fe <- cfChange(xhyp_fe, "reg", x = 1,        scen = h)
  xhyp_fe <- cfChange(xhyp_fe, "edt", x = edt_hold, scen = h)
}

# Put the country intercept in the first column of b. This makes it the
# "constant" for this particular country-specific forecast.
coef_fe       <- coef(fit_fe_dyn)
coef_fe_x     <- coef_fe[names(coef_fe) != "ln_gdpw_lag"]
draws_fe_x    <- draws_fe[, names(coef_fe_x), drop = FALSE]
phi_fe_draws  <- draws_fe[, "ln_gdpw_lag", drop = FALSE]
simbetas_fe   <- cbind(alpha_fe = rep(alpha_fe, S), draws_fe_x)

sim_fe_obj <- ldvsimev(
  x        = xhyp_fe,
  b        = simbetas_fe,
  ci       = 0.95,
  constant = 1,
  phi      = phi_fe_draws,
  lagY     = y_anchor
)

fe_simcf_band <- tibble(
  year   = (last_year + 1):(last_year + H),
  pe     = sim_fe_obj$pe,
  lo     = sim_fe_obj$lower,
  hi     = sim_fe_obj$upper,
  method = "simcf FE"
)
hist_df <- focus_df |>
  transmute(year, observed = ln_gdpw)

forecast_bands <- bind_rows(fe_manual_band, fe_simcf_band)

ggplot() +
  geom_line(data = hist_df, aes(year, observed), colour = "grey20") +
  geom_ribbon(data = forecast_bands,
              aes(year, ymin = lo, ymax = hi, fill = method),
              alpha = 0.22) +
  geom_line(data = forecast_bands,
            aes(year, pe, colour = method), linewidth = 0.75) +
  geom_vline(xintercept = last_year, linetype = "dashed") +
  labs(x = NULL, y = "log GDP per worker",
       title = paste0("Counterfactual democratization, ", focus_country),
       subtitle = "Dashed line: end of observed sample. Scenario sets reg = 1 for every forecast year.",
       colour = NULL, fill = NULL) +
  theme_minimal(base_size = 11) +
  theme(legend.position = "bottom")
One-country counterfactual forecast under dynamic country fixed effects. The manual simulation and simcf reproduce the same expected-value logic.

One-country counterfactual forecast under dynamic country fixed effects. The manual simulation and simcf reproduce the same expected-value logic.

3.5 Impulse response function and long-run effect

The §3.3–3.4 forecast plotted the level of \(y\) under permanent democratization. The same dynamic, expressed as a difference vs a no-change counterfactual baseline, is an impulse response function (IRF) — the panel analog of Lab 4’s IRF. We use the FE-ARDL(1,1) model from §2.2 because its lagged-treatment terms reg_lag and reg_lag × edt_lag are the natural channel to display moderation by education over time, not just at impact.

Closed form. The ARDL(1,1) recursion under permanent democratization at \(t = 0\), with education held at value \(e\), gives

\[ \mathrm{IRF}(h \mid e) \;=\; A(e) \cdot \frac{1 - \phi^{h+1}}{1 - \phi} \;+\; B(e) \cdot \frac{1 - \phi^{h}}{1 - \phi}, \]

where \(A(e) = \beta_0 + \gamma_0\, e\) is the impact effect and \(B(e) = \beta_1 + \gamma_1\, e\) is the delayed transmission through reg_lag and reg_lag × edt_lag. (\(\beta_0\) is the coefficient on reg, \(\beta_1\) on reg_lag, \(\gamma_0\) on reg × edt, \(\gamma_1\) on reg_lag × edt_lag, \(\phi\) on ln_gdpw_lag. Holding edt constant means \(\text{edt}_t = \text{edt}_{t-1} = e\) throughout the counterfactual.)

Long-run equilibrium effect (LRE). As \(h \to \infty\), assuming \(|\phi| < 1\):

\[ \mathrm{LRE}(e) \;=\; \frac{A(e) + B(e)}{1 - \phi} \;=\; \frac{\beta_0 + \beta_1 + (\gamma_0 + \gamma_1)\,e}{1 - \phi}. \]

Implementation. Parametric simulation: draw \(S = 1{,}000\) coefficient vectors from MVN(coef, vcov), evaluate the closed-form IRF at each draw and each horizon, take the 95% band over draws. We compute IRF and LRE at two education levels — the 25th and 75th percentile of edt in the panel — so the moderation by education is visible.

# Closed-form IRF for one parameter draw, one edt value, horizons 0..H.
# IRF(h | e) = A(e) * (1 - phi^(h+1))/(1 - phi) + B(e) * (1 - phi^h)/(1 - phi)
#   with B's term zero at h = 0 (no lagged-treatment contribution at impact).
irf_path <- function(e, phi, beta0, beta1, gamma0, gamma1, H = 20) {
  A <- beta0 + gamma0 * e
  B <- beta1 + gamma1 * e
  h_seq         <- 0:H
  geom_h_plus_1 <- (1 - phi^(h_seq + 1)) / (1 - phi)
  geom_h        <- ifelse(h_seq == 0, 0, (1 - phi^h_seq) / (1 - phi))
  A * geom_h_plus_1 + B * geom_h
}

# Two education levels: 25th and 75th percentiles of observed edt.
edt_lo <- quantile(democracy$edt, 0.25, na.rm = TRUE)
edt_hi <- quantile(democracy$edt, 0.75, na.rm = TRUE)
sprintf("edt low (P25) = %.2f,  edt high (P75) = %.2f", edt_lo, edt_hi)
## [1] "edt low (P25) = 2.21,  edt high (P75) = 7.08"
# Parametric simulation from FE-ARDL(1,1) joint coefficient distribution.
set.seed(2025)
S       <- 1000
H_irf   <- 20
draws_ardl <- mvrnorm(n = S,
                      mu    = coef(fit_fe_ardl11),
                      Sigma = vcov(fit_fe_ardl11))

# Evaluate IRF for each draw, at edt_lo and edt_hi.
irf_lo <- matrix(NA_real_, S, H_irf + 1)
irf_hi <- matrix(NA_real_, S, H_irf + 1)

for (s in seq_len(S)) {
  d <- draws_ardl[s, ]
  irf_lo[s, ] <- irf_path(e      = edt_lo,
                          phi    = d["ln_gdpw_lag"],
                          beta0  = d["reg"],
                          beta1  = d["reg_lag"],
                          gamma0 = d["reg:edt"],
                          gamma1 = d["reg_lag:edt_lag"],
                          H      = H_irf)
  irf_hi[s, ] <- irf_path(e      = edt_hi,
                          phi    = d["ln_gdpw_lag"],
                          beta0  = d["reg"],
                          beta1  = d["reg_lag"],
                          gamma0 = d["reg:edt"],
                          gamma1 = d["reg_lag:edt_lag"],
                          H      = H_irf)
}

irf_band <- bind_rows(
  tibble(h = 0:H_irf,
         pe = colMeans(irf_lo),
         lo = apply(irf_lo, 2, quantile, 0.025),
         hi = apply(irf_lo, 2, quantile, 0.975),
         scenario = sprintf("Low education (edt = %.2f, P25)", edt_lo)),
  tibble(h = 0:H_irf,
         pe = colMeans(irf_hi),
         lo = apply(irf_hi, 2, quantile, 0.025),
         hi = apply(irf_hi, 2, quantile, 0.975),
         scenario = sprintf("High education (edt = %.2f, P75)", edt_hi))
)
ggplot(irf_band, aes(h, pe, colour = scenario, fill = scenario)) +
  geom_ribbon(aes(ymin = lo, ymax = hi), alpha = 0.22, colour = NA) +
  geom_line(linewidth = 0.75) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey45") +
  labs(x = "Horizon h (years after democratization)",
       y = "Cumulative effect on log GDP per worker",
       title = "Impulse response: permanent democratization, FE-ARDL(1,1)",
       colour = NULL, fill = NULL) +
  theme_minimal(base_size = 11) +
  theme(legend.position = "bottom")
Impulse response function: cumulative effect of permanent democratization on log GDP per worker, from the FE-ARDL(1,1) model. 95% parametric simulation bands. The asymptote of each curve is the long-run equilibrium effect (LRE).

Impulse response function: cumulative effect of permanent democratization on log GDP per worker, from the FE-ARDL(1,1) model. 95% parametric simulation bands. The asymptote of each curve is the long-run equilibrium effect (LRE).

# Long-run equilibrium effect (LRE) for each edt value, with simulation band.
# LRE(e) = (beta_0 + beta_1 + (gamma_0 + gamma_1) * e) / (1 - phi).
lre_lo_draws <- (draws_ardl[, "reg"]            + draws_ardl[, "reg_lag"] +
                 (draws_ardl[, "reg:edt"]       + draws_ardl[, "reg_lag:edt_lag"]) * edt_lo) /
                (1 - draws_ardl[, "ln_gdpw_lag"])

lre_hi_draws <- (draws_ardl[, "reg"]            + draws_ardl[, "reg_lag"] +
                 (draws_ardl[, "reg:edt"]       + draws_ardl[, "reg_lag:edt_lag"]) * edt_hi) /
                (1 - draws_ardl[, "ln_gdpw_lag"])

lre_summary <- tibble(
  scenario = c(sprintf("Low education (edt = %.2f, P25)",  edt_lo),
               sprintf("High education (edt = %.2f, P75)", edt_hi)),
  pe = c(mean(lre_lo_draws),                mean(lre_hi_draws)),
  lo = c(quantile(lre_lo_draws, 0.025),     quantile(lre_hi_draws, 0.025)),
  hi = c(quantile(lre_lo_draws, 0.975),     quantile(lre_hi_draws, 0.975))
)

lre_summary |>
  kable(digits = 3,
        caption = "Long-run equilibrium effect of permanent democratization, FE-ARDL(1,1). 95% parametric simulation interval.") |>
  kable_styling(full_width = FALSE)
Long-run equilibrium effect of permanent democratization, FE-ARDL(1,1). 95% parametric simulation interval.
scenario pe lo hi
Low education (edt = 2.21, P25) -0.222 -0.556 0.083
High education (edt = 7.08, P75) -0.052 -0.370 0.269

Reading.

  • The IRF at \(h = 0\) is the impact effect \(A(e) = \beta_0 + \gamma_0 e\) — the contemporaneous response in the year of democratization, evaluated at education level \(e\).
  • The IRF at \(h = 1\) adds the delayed transmission \(B(e) = \beta_1 + \gamma_1 e\) through reg_lag and reg_lag × edt_lag, plus one round of \(\phi\)-driven persistence on the impact effect.
  • For \(h \geq 2\), the IRF accumulates further \(\phi\)-driven persistence and asymptotes to the LRE. The closer \(\hat\phi\) is to 1, the slower the convergence; with \(\hat\phi\) near 1 (Nickell-biased downward in this lab), the LRE point estimate may understate the true long-run.
  • The two curves separate by education level. If the high-edt curve sits above the low-edt curve, more-educated workforces benefit more from democratization in the long run — a substantively meaningful within-country moderation by human capital. If they overlap, the moderation visible in the static reg × edt coefficient is a between-country phenomenon that disappears once we model dynamics within country.
  • This is the panel analog of Lab 4’s IRF — same parametric-simulation logic, same closed-form propagation, with the moderation by education made explicit at the horizon scale that policymakers care about.

References