Disclaimer
Packages and Functions Used in This Lab
Today’s roadmap
Part 1 — A panel of countries: theory, data, dynamics
Part 2 — Estimating panel models
Part 3 — Counterfactual prediction with fixed effects
References

Disclaimer

These materials have been revised and updated by successive teaching assistants over recent years.

Revision history:

Ramses Llobet (Spring 2026); rllobet@uw.edu

Packages and Functions Used in This Lab

Package	Purpose	Key functions
`plm`	Panel models, panel unit-root tests, panel SEs	`plm()`, `pdata.frame()`, `pvar()`, `purtest()`, `vcovHC()`, `vcovBK()`, `vcovSCC()`, `phtest()`
`fixest`	Fast multi-way fixed effects, native cluster SEs	`feols()`, `fixef()`
`nlme`	Mixed models with structured error correlations	`lme()`, `corARMA()`
`lme4`	Mixed models (no AR error structure)	`lmer()`
`parameters`	Within/between decomposition for Mundlak	`demean()`
`sandwich`	Robust and clustered covariance matrices	`vcovCL()`, `vcovHC()`
`lmtest`	Coefficient tests on adjusted SEs	`coeftest()`
`tseries`	Single-series unit-root tests	`adf.test()`, `pp.test()`
`MASS`	Multivariate normal draws for parametric simulation	`mvrnorm()`
`simcf`	King-Tomz-Wittenberg counterfactual simulators	`cfMake()`, `cfChange()`, `ldvsimev()`, `ldvsimpv()`
`knitr`	Include the compiled TikZ DAG image	`include_graphics()`
`texreg`	HTML regression tables	`htmlreg()`
`tidyverse`	Data manipulation and plotting	`ggplot()`, `tibble()`, pipes (`\|>`)

Today’s roadmap

Part	Topic
1	Panel data: theory, the Przeworski democracy panel, and a dynamics workflow ending in a panel unit-root constellation
2	A family of panel specifications (Pooled / FE / TWFE / RE / Mundlak-REWB) and their within-vs-between identification
3	Counterfactual prediction under dynamic fixed effects: one-country forecast and average marginal expectations

Lab 4 introduced counterfactual forecasting for a single time series. Lab 5 moves the same dynamic logic into panel data. The pedagogical thread, given the short-$T$ ambiguity in §1.3.4, is interpretation before estimation: identify what variation each model uses, read whether the substantive conclusion is stable, and use fixed-effects counterfactuals only for quantities those models can actually support.

Part 1 — A panel of countries: theory, data, dynamics

1.1 A toy theory of politics in panel data

Before any estimator, write the theory. We study the link between democracy (dem, 1 = democracy) and economic growth (ln_gdpw) across countries. The simplified DAG treats oil exporter status (oil) as a time-invariant confounder and education (edt) as a slow-moving mediator on one pathway from democracy to growth.

Three covariate roles:

Within-varying treatment: dem — flips on coups, pacts, or constitutional transitions; supplies the within-unit identifying variation.
Mediator / slow-moving covariate: edt — democracy can expand education, and education can raise growth levels. Conditioning on edt therefore moves us from a total regime-growth association toward a more direct association.
Time-invariant confounder: oil — binary indicator for major oil exporters; oil can shape both regime type and GDP levels. It is absorbed by FE/TWFE and estimable in Pooled / RE / Mundlak.

Simplified DAG. Oil confounds democracy and growth; education mediates part of democracy’s association with growth.

What the panel structure buys us:

Pooled OLS mixes within and between variation. Biased if unobserved country-level confounders (geography, colonial institutions, deep history) are correlated with dem or edt. Pooling conflates within-country dynamics with between-country differences.
Fixed effects sweep out all time-invariant unit-level confounders, including unobserved ones. Time-invariant oil is absorbed. If we control for edt, we are conditioning on a mediator, so the dem coefficient should be read as closer to a direct association than a total association.
Random effects assume the country random intercept is uncorrelated with the regressors — suspect when colonial institutions, deep development history, or initial endowments plausibly correlate with both dem and edt.
Mundlak / REWB keeps RE’s flexibility (we can estimate the time-invariant oil) but tests RE’s assumption by splitting every time-varying regressor — including the interaction dem × edt — into within and between components. If the between components add nothing, RE is fine; if they do, the within components carry the FE-style identification.

Empirical foreshadow. We will fit $\ln \text{gdpw} = \alpha + \beta_1\,\text{dem} + \beta_2\,\text{edt} + \beta_3\,(\text{dem}\!\times\!\text{edt}) + \beta_4\,\text{oil} + \dots$ in pooled OLS and then compare the interaction $\beta_3$ to its FE counterpart in §2.7–§2.8. The DAG motivates edt as a mediator; the interaction adds a descriptive heterogeneity question: does the democracy-growth association differ at different education levels, and does that moderation survive once we strip out cross-country variation?

1.2 The Przeworski democracy panel

The data come from Przeworski, Alvarez, Cheibub, and Limongi’s Democracy and Development: Political Institutions and Well-Being in the World, 1950-1990 (Cambridge University Press, 2000). The book studies how political regimes relate to development, regime survival, and material well-being across countries over time. The local codebook used in earlier versions of this lab describes annual country-level measures of political regime type, log GDP, education, civil and political liberties, resource dependence, religion, colonial history, ethnic fractionalization, and regime-transition history.

The file democracy.csv is an unbalanced panel of $N = 135$ countries observed over $T = 40$ years (1951-1990), 4,126 country-year rows. The codebook below lists every variable, its role in the analysis, and its coverage. Coverage matters here: two of the most tempting “controls,” civlib and pollib, are missing in roughly 42% of rows, and edt is missing in about 30%. Because regression uses listwise deletion, the variables you put on the right-hand side silently determine your sample size.¹

Variable	Type / range	Coverage	Role	Definition
`country`	ID, 1-135	100%	unit id	Numeric country identifier.
`ctyname`	string	100%	unit id	Country name (134 distinct labels).
`region`	factor, 6 levels	100%	grouping	World region.
`year`	1951-1990	100%	time id	Calendar year; $T = 40$.
`gdpw`	480-37,903	100%	outcome	Real GDP per worker, 1985 international prices. Used as `ln_gdpw = log(gdpw)`.
`reg`	0/1	100%	treatment	Political regime. In this file `1` = democracy, `0` = dictatorship — recoded relative to the original codebook (where `1` = dictatorship). The lab relabels it to `dem` to avoid confusion.
`edt`	0.03-12.8	70.3%	within covariate / moderator	Cumulative years of education of the average labor-force member. The main time-varying continuous predictor.
`civlib`	1 (worst)-7 (best)	58.1%	bad control	Civil liberties. Near-definitional to democracy; high missingness.
`pollib`	1 (worst)-7 (best)	58.1%	bad control	Political liberties. Near-definitional to democracy; high missingness.
`elf60`	0-0.93	86.9%	between control	Ethnolinguistic fractionalization index, 1960 (time-invariant).
`cath`	0-99 (%)	100%	between control	Percent Catholic in the population.
`moslem`	0-99.9 (%)	100%	between control	Percent Muslim in the population.
`britcol`	0/1	100%	between control	Coded `1` for every year of countries that were British colonies any time after 1918.
`newc`	0/1	100%	between control	Coded `1` for countries that became independent after 1945.
`oil`	0/1	100%	between control	Coded `1` if fuel exports exceeded 50% of total exports on average in 1984-86.
`stra`	0-5	100%	regime history	Cumulative count of past transitions to authoritarianism. Downstream of regime dynamics — descriptive, not a clean control.

A practical reading of the table: gdpw is the outcome and reg/dem is the treatment. The only covariates that move within a country over time are dem, edt, civlib, pollib, and stra — and of those, civlib/pollib are bad controls and stra is a mechanical regime tally. So the honest within-country predictor set is thin: essentially dem and edt. Everything else (oil, britcol, newc, elf60, cath, moslem) is fixed within a country and contributes only between-country variation, which fixed effects discard but Pooled OLS, RE, and Mundlak can use. This tension — predictors that fit GDP levels well are exactly the ones FE throws away — is a recurring theme of the lab.

One source of confusion is the regime coding. The original codebook notes that REG was coded 1 for dictatorships and 0 for democracies. The democracy.csv file used in this lab has already been recoded in the opposite direction: reg = 1 is democracy and reg = 0 is dictatorship. To make the teaching code clearer, we create a new variable called dem and use the labels explicitly:

dem = 1: democracy.
dem = 0: dictatorship.

# load data
democracy <- read_csv("data/democracy.csv") |>
  # lower case
  rename_with(tolower) |>
  # transform variables
  mutate(dem       = reg,               # local file: 1 = democracy, 0 = dictatorship
         dictatorship = 1 - dem,
         regime_type = factor(dem,
                              levels = c(0, 1),
                              labels = c("Dictatorship", "Democracy")),
         ln_gdpw   = log(gdpw),
         # centre education so the dem main effect reads at mean schooling (~4.85 yrs)
         edt_c     = edt - mean(edt, na.rm = TRUE),
         # common global-shock period dummies (see the macro_shocks windows in 1.3.2).
         # These are time-varying, so they survive country FE but are absorbed by TWFE.
         oil_shock1 = as.integer(year %in% 1973:1975),  # first oil shock + recession
         oil_shock2 = as.integer(year %in% 1979:1982),  # second oil shock + Volcker
         region    = factor(region),     # 6 world regions; between-country control
         country_f = factor(country),     # numeric ID → factor for plm
         year_f    = factor(year))

n_distinct(democracy$country) # N = 135

## [1] 135

n_distinct(democracy$year)    # T = 40

## [1] 40

democracy |>
  count(regime_type)

## # A tibble: 2 × 2
##   regime_type      n
##   <fct>        <int>
## 1 Dictatorship  2481
## 2 Democracy     1645

The dataset has two unit identifiers: country (numeric ID) and ctyname (the actual country name). The panel is unbalanced: there are 40 unique years in the dataset, but not every country is observed in every year.

panel_T <- democracy |>
  group_by(country, ctyname) |>
  summarise(T_i = sum(!is.na(ln_gdpw)), .groups = "drop")

range(panel_T$T_i)

## [1]  6 40

ggplot(panel_T, aes(T_i)) +
  geom_histogram(binwidth = 2, fill = "steelblue", alpha = 0.7,
                 boundary = 0) +
  labs(x = "Usable observations per country",
       y = "# countries") +
  theme_minimal(base_size = 11)

Distribution of usable time periods per country. The panel contains 40 unique years, but many countries have shorter histories.

For the simplified specification, we will use four variables:

ln_gdpw is the outcome: log GDP.
dem is the treatment: 1 for democracy and 0 for dictatorship. This is the clearer version of the local reg variable.
edt is the slow-moving covariate and the moderator: cumulative average years of labor-force education. The interaction dem × edt lets the democracy association vary with the workforce’s education level.
oil is a time-invariant control: 1 if the country was a major oil exporter (per the Przeworski codebook), 0 otherwise.

The Przeworski et al. codebook gives many other candidate variables. For this lab, we separate them by their role in the causal story:

Reasonable background controls: oil, britcol, newc, elf60, cath, and moslem are mostly time-invariant country characteristics. They can proxy resource endowments, colonial history, state formation, social structure, and religious composition. FE/TWFE absorb them; pooled, RE, and Mundlak can estimate them.
Substantively important, but not a neutral control: edt is education. It may affect development, but democracy may also affect education. Including edt means we are no longer estimating the total democracy-growth association; we are estimating the association conditional on education. We keep it because it is useful for teaching slow-moving covariates and moderation.
Bad controls for this question: pollib and civlib are political and civil liberties. They are close to the definition and mechanisms of democracy, so controlling for them would partially control for what democracy is or what democracy does.
Regime-history variable: stra records past transitions to authoritarianism. It is useful descriptively, but it is also downstream of earlier regime dynamics. We do not include it in the main model.

So the main lab model is deliberately parsimonious: democracy, education, their interaction, dynamics, and oil where the estimator can use time-invariant covariates. This is not the only possible model; it is a teaching model that avoids obvious bad controls.

A quick sanity check that oil, britcol, elf60 really are unit-invariant in the data, and dem/edt really do move within units:

pdata <- pdata.frame(democracy, index = c("country_f", "year_f"))
pvar(pdata)

## no time variation:       country ctyname region britcol cath elf60 moslem newc oil country_f 
## no individual variation: year edt elf60 edt_c oil_shock1 oil_shock2 year_f 
## all NA in time dimension for at least one individual:  civlib edt elf60 pollib edt_c 
## all NA in ind. dimension for at least one time period: civlib edt elf60 pollib edt_c

The pvar() summary confirms what the DAG drew: the time-invariants vary only between, and dem/edt vary both ways.

1.3 A systematic dynamics workflow

We need to settle the dynamics before fitting any FE/RE/Mundlak model — otherwise we are just running OLS on a non-stationary outcome and inviting spurious results.

1.3.1 Look at the series

ggplot(democracy, aes(year, ln_gdpw, group = country, colour = region)) +
  geom_line(alpha = 0.5) +
  scale_colour_brewer(palette = "Dark2") +
  labs(x = NULL, y = "log GDP", colour = NULL) +
  theme_minimal(base_size = 11) +
  theme(legend.position = "bottom")

Per-country trajectories of log GDP, coloured by region. Most series trend upward with country-specific intercepts and slopes — non-stationary in levels.

Most trajectories trend up; some are flat or volatile. This is the classic mix you will see in macro-panels and a strong prior toward unit roots in levels.

1.3.2 Detrend each panel and export unit-level diagnostics

The level series are visibly trended, so raw ACF/PACF plots mostly rediscover the trend. A cleaner diagnostic is to remove a country-specific linear trend first, then inspect the residual serial dependence.

The loop below does two things. First, it saves the de-trended series in democracy_detrended. Second, it exports a small diagnostic archive:

output/dynamics/detrended_series/
output/dynamics/acf/
output/dynamics/pacf/

When we inspect the de-trended series, we should also remember that countries are not independent little worlds. During 1960-1990, several global shocks hit many countries at roughly the same time:

1973-1975: first oil shock and global recession.
1979-1982: second oil shock, inflation, and the Volcker monetary tightening.
1982-1984: Latin American debt crisis and broader developing-country debt stress.
1986: oil-price collapse.
1990: Gulf War oil shock.

These events do not need to affect every country in the same direction. The point is that common timing can induce cross-sectional dependence: residual spikes across many panels in the same years. We shade those windows in the de-trended plots as a visual check.

# create subdirectories (if these are not already there)
dir.create("output/dynamics/detrended_series",
           recursive = TRUE, showWarnings = FALSE)
dir.create("output/dynamics/acf",
           recursive = TRUE, showWarnings = FALSE)
dir.create("output/dynamics/pacf",
           recursive = TRUE, showWarnings = FALSE)

# generate vector of unique countries
countries <- sort(unique(democracy$ctyname))
max_lag <- 8  # number of lags for cycles

# Broad global shock windows visible at annual frequency.
macro_shocks <- tribble(
  ~shock, ~start, ~end,
  "First oil shock / global recession", 1973, 1975,
  "Second oil shock / Volcker tightening", 1979, 1982,
  "Debt crisis", 1982, 1984,
  "Oil price collapse", 1986, 1986,
  "Gulf War oil shock", 1990, 1990
) |>
  mutate(xmin = start - 0.5,
         xmax = end + 0.5)

# Before looping over every country, walk through one example. Mexico is useful
# here: it is a Latin American country affected by the 1982 debt crisis and it
# was also exposed to oil-price shocks.
df_mexico <- democracy |>
  filter(ctyname == "Mexico") |>
  arrange(year)

trend_mexico <- lm(ln_gdpw ~ year, data = df_mexico, na.action = na.exclude)
df_mexico$ln_gdpw_detrended <- residuals(trend_mexico)

ggplot(df_mexico, aes(year, ln_gdpw_detrended)) +
  # geom_rect() adds the shaded areas. The rectangle data come from
  # macro_shocks, not from df_mexico, so inherit.aes = FALSE.
  geom_rect(data = macro_shocks,
            aes(xmin = xmin, xmax = xmax, ymin = -Inf, ymax = Inf),
            inherit.aes = FALSE,
            fill = "grey70", alpha = 0.18) +
  geom_line(colour = "grey30") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(title = "Mexico",
       subtitle = "De-trended log GDP with global shock windows shaded",
       x = NULL,
       y = "De-trended log GDP") +
  theme_minimal(base_size = 11)

Illustrative de-trended log GDP series. Each line is the residual from a country-specific linear trend.

# generate objects to store
democracy_detrended <- tibble()
acf_all <- tibble()
pacf_all <- tibble()

for (c0 in countries) {
  
  # 1. filter country candidate
  df_c <- democracy |>
    filter(ctyname == c0) |>
    arrange(year)
  
  # 2. Detrend
  trend_fit <- lm(ln_gdpw ~ year, data = df_c, na.action = na.exclude)
  df_c$ln_gdpw_trend <- fitted(trend_fit)
  df_c$ln_gdpw_detrended <- residuals(trend_fit)

  democracy_detrended <- bind_rows(democracy_detrended, df_c)
  
  # remove NAs in the trend (if any)
  df_c <- df_c |>
    filter(!is.na(ln_gdpw_detrended))
  
  # clean the strings, to name the files
  file_stub <- c0 |>
    str_to_lower() |>
    str_replace_all("[^a-z0-9]+", "_") |>
    str_replace_all("^_|_$", "")
  
  # 3. plot the series
  p_series <- 
    ggplot(df_c, aes(year, ln_gdpw_detrended)) +
    geom_rect(data = macro_shocks,
              aes(xmin = xmin, xmax = xmax, ymin = -Inf, ymax = Inf),
              inherit.aes = FALSE,
              fill = "grey70", alpha = 0.18) +
    geom_line(colour = "grey30") +
    geom_hline(yintercept = 0, linetype = "dashed") +
    labs(title = c0,
         x = NULL,
         y = "De-trended log GDP") +
    theme_minimal(base_size = 11)

  # export for inspection
  ggsave(file.path("output/dynamics/detrended_series",
                   paste0(file_stub, ".pdf")),
         plot = p_series, width = 7, height = 4)

  # 4. plot ACF
  p_acf <- ggAcf(df_c$ln_gdpw_detrended, lag.max = max_lag) +
    labs(title = paste0(c0, ": ACF after de-trending"),
         x = "Lag", y = "ACF") +
    theme_minimal(base_size = 11)

  # export for inspection
  ggsave(file.path("output/dynamics/acf",
                   paste0(file_stub, ".pdf")),
         plot = p_acf, width = 7, height = 4)
  
  # 5. plot PACF
  p_pacf <- ggPacf(df_c$ln_gdpw_detrended, lag.max = max_lag) +
    labs(title = paste0(c0, ": PACF after de-trending"),
         x = "Lag", y = "PACF") +
    theme_minimal(base_size = 11)

  # export for inspection
  ggsave(file.path("output/dynamics/pacf",
                   paste0(file_stub, ".pdf")),
         plot = p_pacf, width = 7, height = 4)

  # store all acf and pacf values
  acf_values <- acf(df_c$ln_gdpw_detrended,
                    lag.max = max_lag, plot = FALSE)$acf[-1]
  pacf_values <- pacf(df_c$ln_gdpw_detrended,
                      lag.max = max_lag, plot = FALSE)$acf

  acf_all <- bind_rows(
    acf_all,
    tibble(ctyname = c0,
           lag = seq_along(acf_values),
           value = as.numeric(acf_values))
  )

  pacf_all <- bind_rows(
    pacf_all,
    tibble(ctyname = c0,
           lag = seq_along(pacf_values),
           value = as.numeric(pacf_values))
  )
}

# You can look at particular subsets
democracy_detrended |>
  filter(ctyname %in% c("USA", "Argentina", "Mexico", "Egypt")) |>
  ggplot(aes(year, ln_gdpw_detrended)) +
  geom_rect(data = macro_shocks,
            aes(xmin = xmin, xmax = xmax, ymin = -Inf, ymax = Inf),
            inherit.aes = FALSE,
            fill = "grey70", alpha = 0.18) +
  geom_line(colour = "grey30") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  facet_wrap(~ ctyname, scales = "free_y") +
  labs(x = NULL, y = "De-trended log GDP") +
  theme_minimal(base_size = 11)

Illustrative de-trended log GDP series. Each line is the residual from a country-specific linear trend.

The exported files are not meant to be read one by one during lab. They are a reproducible diagnostic archive. In applied work, the pooled summary is useful, but individual units still need inspection because a few unusual panels can drive the average.

1.3.3 Pooled ACF/PACF and per-series unit-root tests

We can summarize the ACF/PACF diagnostics across countries by averaging the country-level correlations at each lag. This is descriptive rather than a formal estimator, but it gives a compact view of the typical short-run dynamics after de-trending.

To help read the figure, we add the usual approximate white-noise bounds:

\[ \pm \frac{1.96}{\sqrt{T}}. \]

The logic is that, if a series is white noise, the sample autocorrelation at any fixed lag is approximately normally distributed around zero with standard error $1/\sqrt{T}$. Multiplying by 1.96 gives the familiar approximate 95% threshold. Here we use the average usable country-level time-series length after de-trending, so the bounds are a visual rule of thumb rather than a formal pooled significance test.

acf_summary <- acf_all |>
  group_by(lag) |>
  summarise(mean = mean(value, na.rm = TRUE),
            lo = quantile(value, 0.25, na.rm = TRUE),
            hi = quantile(value, 0.75, na.rm = TRUE),
            .groups = "drop")

pacf_summary <- pacf_all |>
  group_by(lag) |>
  summarise(mean = mean(value, na.rm = TRUE),
            lo = quantile(value, 0.25, na.rm = TRUE),
            hi = quantile(value, 0.75, na.rm = TRUE),
            .groups = "drop")

avg_T <- democracy_detrended |>
  filter(!is.na(ln_gdpw_detrended)) |>
  count(ctyname) |>
  summarise(avg_T = mean(n)) |>
  pull(avg_T)

wn_bound <- 1.96 / sqrt(avg_T)

p_acf_pool <- ggplot(acf_summary, aes(lag, mean)) +
  geom_hline(yintercept = 0, colour = "grey70") +
  geom_hline(yintercept = c(-wn_bound, wn_bound),
             linetype = "dashed", colour = "grey45") +
  geom_linerange(aes(ymin = lo, ymax = hi), colour = "grey55") +
  geom_point(colour = "steelblue", size = 2) +
  geom_line(colour = "steelblue") +
  scale_x_continuous(breaks = 1:max_lag) +
  labs(title = "Average ACF",
       subtitle = "Dashed lines: approximate white-noise bounds",
       x = "Lag", y = "Correlation") +
  theme_minimal(base_size = 11)

p_pacf_pool <- ggplot(pacf_summary, aes(lag, mean)) +
  geom_hline(yintercept = 0, colour = "grey70") +
  geom_hline(yintercept = c(-wn_bound, wn_bound),
             linetype = "dashed", colour = "grey45") +
  geom_linerange(aes(ymin = lo, ymax = hi), colour = "grey55") +
  geom_point(colour = "firebrick", size = 2) +
  geom_line(colour = "firebrick") +
  scale_x_continuous(breaks = 1:max_lag) +
  labs(title = "Average PACF",
       subtitle = "Dashed lines: approximate white-noise bounds",
       x = "Lag", y = "Partial correlation") +
  theme_minimal(base_size = 11)

p_acf_pool + p_pacf_pool

Average ACF and PACF after removing country-specific linear trends. Points show the mean across countries; vertical bars show the interquartile range.

Finally, we loop over countries and run ADF / PP tests on three versions of each country series. In the tseries implementations we use here, both tests have the same null:

Test	Null hypothesis	Default alternative	Reading a large p-value
ADF (`adf.test`)	unit root / I(1)	stationary / I(0)	fail to reject unit root
PP (`pp.test`)	unit root / I(1)	stationary / I(0)	fail to reject unit root

The difference is not the null hypothesis. The difference is how the tests handle serial correlation and heteroskedasticity in the errors: ADF adds lagged differences, while PP uses a nonparametric correction to the Dickey-Fuller statistic. Both are power-sensitive. With short country series, a highly persistent but stationary process can easily look too close to a unit root, so failure to reject should be read as no strong evidence against a unit root, not proof that the unit root is true.

The setup below keeps three modeling stories separate:

Transformation	Object tested	Story
Detrended residuals	$y_{it} - \widehat\alpha_i - \widehat\delta_i t$	deterministic trend removed, persistence still visible
Trend + AR(1) innovations	residuals from `Arima(y, order = c(1,0,0), xreg = time_id)`	deterministic trend plus stationary AR(1) errors
First differences	$\Delta y_{it}$	difference-stationary / I(1) with average growth

# Before looping over all countries, walk through one country so the objects
# are visible. We again use Mexico.
df_y_mexico <- democracy |>
  filter(ctyname == "Mexico") |>
  arrange(year) |>
  select(country, ctyname, year, ln_gdpw) |>
  drop_na(ln_gdpw)

# Treatment 1: de-trended residuals.
df_mexico_detrended <- democracy_detrended |>
  filter(ctyname == "Mexico") |>
  arrange(year) |>
  select(country, ctyname, year, value = ln_gdpw_detrended) |>
  drop_na(value)

adf_mexico_detrended <- adf.test(df_mexico_detrended$value)
pp_mexico_detrended  <- pp.test(df_mexico_detrended$value)

# Treatment 2: residuals from a linear trend plus AR(1) errors.
time_id_mexico <- seq_len(nrow(df_y_mexico))
fit_mexico_ar1_trend <- Arima(df_y_mexico$ln_gdpw,
                              order = c(1, 0, 0),
                              xreg = time_id_mexico)

df_mexico_ar1 <- df_y_mexico |>
  mutate(value = as.numeric(residuals(fit_mexico_ar1_trend))) |>
  select(country, ctyname, year, value) |>
  drop_na(value)

adf_mexico_ar1 <- adf.test(df_mexico_ar1$value)
pp_mexico_ar1  <- pp.test(df_mexico_ar1$value)

# Treatment 3: first differences of the original log GDP series.
df_mexico_diff <- df_y_mexico |>
  mutate(value = ln_gdpw - lag(ln_gdpw)) |>
  select(country, ctyname, year, value) |>
  drop_na(value)

adf_mexico_diff <- adf.test(df_mexico_diff$value)
pp_mexico_diff  <- pp.test(df_mexico_diff$value)

# Put the Mexico p-values into a small table before repeating the steps for
# every country.
tibble(
  treatment = c("Detrended residuals",
                "Trend + AR(1) innovations",
                "First differences"),
  adf_p = c(adf_mexico_detrended$p.value,
            adf_mexico_ar1$p.value,
            adf_mexico_diff$p.value),
  pp_p = c(pp_mexico_detrended$p.value,
           pp_mexico_ar1$p.value,
           pp_mexico_diff$p.value)
) |>
  kable(digits = 3,
        caption = "Mexico example: ADF and PP p-values after each transformation.") |>
  kable_styling(full_width = FALSE)

Mexico example: ADF and PP p-values after each transformation.
treatment	adf_p	pp_p
Detrended residuals	0.952	0.978
Trend + AR(1) innovations	0.205	0.020
First differences	0.052	0.015

# Store the transformed country-level series. We reuse this object below for
# the panel unit-root comparison.
unit_root_series <- tibble()

# Store country-level ADF / PP p-values.
unit_root_tests <- tibble()

for (c0 in countries) {
  # 1. Pull the original country time series.
  df_y <- democracy |>
    filter(ctyname == c0) |>
    arrange(year) |>
    select(country, ctyname, year, ln_gdpw) |>
    drop_na(ln_gdpw)

  if (nrow(df_y) < 8 || sd(df_y$ln_gdpw) == 0) next

  # 2. Treatment 1: country-specific detrended residuals.
  df_detrended <- democracy_detrended |>
    filter(ctyname == c0) |>
    arrange(year) |>
    select(country, ctyname, year, value = ln_gdpw_detrended) |>
    drop_na(value)

  if (nrow(df_detrended) >= 8 && sd(df_detrended$value) > 0) {
    adf_obj <- try(adf.test(df_detrended$value), silent = TRUE)
    pp_obj  <- try(pp.test(df_detrended$value), silent = TRUE)

    unit_root_tests <- bind_rows(
      unit_root_tests,
      tibble(ctyname = c0,
             treatment = "Detrended residuals",
             adf = if (inherits(adf_obj, "try-error")) NA_real_ else adf_obj$p.value,
             pp  = if (inherits(pp_obj, "try-error")) NA_real_ else pp_obj$p.value)
    )

    unit_root_series <- bind_rows(
      unit_root_series,
      df_detrended |> mutate(treatment = "Detrended residuals")
    )
  }

  # 3. Treatment 2: innovations after fitting trend + AR(1) errors.
  # This estimates y_it = alpha_i + delta_i t + u_it, u_it = phi_i u_i,t-1 + e_it.
  time_id <- seq_len(nrow(df_y))
  fit_ar1_trend <- try(
    Arima(df_y$ln_gdpw, order = c(1, 0, 0), xreg = time_id),
    silent = TRUE
  )

  if (!inherits(fit_ar1_trend, "try-error")) {
    df_ar1 <- df_y |>
      mutate(value = as.numeric(residuals(fit_ar1_trend))) |>
      select(country, ctyname, year, value) |>
      drop_na(value)

    if (nrow(df_ar1) >= 8 && sd(df_ar1$value) > 0) {
      adf_obj <- try(adf.test(df_ar1$value), silent = TRUE)
      pp_obj  <- try(pp.test(df_ar1$value), silent = TRUE)

      unit_root_tests <- bind_rows(
        unit_root_tests,
        tibble(ctyname = c0,
               treatment = "Trend + AR(1) innovations",
               adf = if (inherits(adf_obj, "try-error")) NA_real_ else adf_obj$p.value,
               pp  = if (inherits(pp_obj, "try-error")) NA_real_ else pp_obj$p.value)
      )

      unit_root_series <- bind_rows(
        unit_root_series,
        df_ar1 |> mutate(treatment = "Trend + AR(1) innovations")
      )
    }
  }

  # 4. Treatment 3: first differences of the original, non-detrended outcome.
  df_diff <- df_y |>
    mutate(value = ln_gdpw - lag(ln_gdpw)) |>
    select(country, ctyname, year, value) |>
    drop_na(value)

  if (nrow(df_diff) >= 8 && sd(df_diff$value) > 0) {
    adf_obj <- try(adf.test(df_diff$value), silent = TRUE)
    pp_obj  <- try(pp.test(df_diff$value), silent = TRUE)

    unit_root_tests <- bind_rows(
      unit_root_tests,
      tibble(ctyname = c0,
             treatment = "First differences",
             adf = if (inherits(adf_obj, "try-error")) NA_real_ else adf_obj$p.value,
             pp  = if (inherits(pp_obj, "try-error")) NA_real_ else pp_obj$p.value)
    )

    unit_root_series <- bind_rows(
      unit_root_series,
      df_diff |> mutate(treatment = "First differences")
    )
  }
}

unit_root_tests |>
  pivot_longer(c(adf, pp),
               names_to = "test",
               values_to = "p") |>
  mutate(treatment = factor(treatment,
                            levels = c("Detrended residuals",
                                       "Trend + AR(1) innovations",
                                       "First differences")),
         test = toupper(test)) |>
  ggplot(aes(p, fill = test)) +
  geom_histogram(bins = 25,
                 alpha = 0.65,
                 position = "identity") +
  geom_vline(xintercept = 0.05, linetype = "dashed") +
  facet_grid(treatment ~ test) +
  labs(title = "Country-level unit-root diagnostics",
       subtitle = "ADF and PP null: unit root. Dashed line marks p = 0.05.",
       x = "p-value", y = "# countries") +
  theme_minimal(base_size = 10) +
  theme(legend.position = "none")

ADF and PP p-values under three country-level transformations: detrending, trend + AR(1), and first differencing.

The main thing to compare is how the distribution of p-values changes across the three rows. If detrending alone leaves many countries with large p-values, then removing a linear trend is probably not enough. If the trend + AR(1) innovations move much more clearly below 0.05, then an AR(1) error process is doing useful work. If the first-differenced series looks similarly clean, then the I(1) / growth-rate specification is also plausible. These histograms are not final proof, but they tell us which transformations make the persistence less severe.

You may also notice that ADF and PP do not look exactly the same, even though both tests have a unit-root null. The difference is how they handle serial correlation. ADF handles it by adding lagged differences of the series being tested:

\[ \Delta z_t = \alpha + \rho z_{t-1} + \sum_{\ell = 1}^{p}\gamma_\ell \Delta z_{t-\ell} + e_t. \]

Those extra lagged differences cost degrees of freedom, which matters here because each country has a short time series. PP tests the same null, but it corrects the statistic using a nonparametric long-run-variance adjustment instead of estimating those extra lag terms directly. This is why the PP histograms often look cleaner after the trend + AR(1) transformation and after first differencing.

This matters most for the trend + AR(1) innovations. At that point, we have already removed a country-specific linear trend and one layer of AR(1) persistence. Running ADF afterward is a conservative check: it asks whether the remaining innovations reject a unit root even after ADF uses some of the sample to correct for serial correlation. PP is less costly in this short-$T$ setting, so its p-values often concentrate more sharply below 0.05.

So I would not read the figure as “ADF says no, PP says yes.” I would read it as follows: detrending alone is the weakest transformation, while both trend + AR(1) filtering and first differencing produce much cleaner diagnostics. The remaining choice is substantive: do we want to keep the outcome in levels and model persistence explicitly, or do we want to difference the outcome and model growth rates?

1.3.4 Panel unit-root tests: compare the same three treatments

The country-level histograms in §1.3.3 are useful, but they still treat each country as a separate short time series. The panel tests below pool information across countries. To keep the comparison consistent, we run the same tests on the same three objects:

Detrended residuals — asks whether removing a country-specific deterministic trend is enough.
Trend + AR(1) innovations — asks whether a deterministic trend plus stationary AR(1) errors is enough.
First differences — asks whether the I(1) / growth-rate specification is the cleanest.

For the first two objects, the trend has already been removed or modeled. For the third object, differencing has removed the stochastic trend. Therefore the panel tests below use exo = "intercept" rather than exo = "trend": the transformed series should not still need a deterministic linear trend.

Why more than one test? The panel is unbalanced, macro shocks (oil crises, financial cycles, the Volcker disinflation) hit many countries in the same year, and individual unit-root tests have low power at $T_i \le 30$. No single test is decisive, so we run several tests and read them together. Two recurring ideas to keep in mind:

Cross-sectional dependence (CSD). If countries’ residuals move together because of common shocks, the standard panel tests are biased. We need a pretest for CSD and at least one test that is robust to it.
Direction of the null. Most tests have a unit-root null (rejection $\Rightarrow$ stationary). Hadri flips it (rejection $\Rightarrow$ unit root). Comparing both directions is a useful sanity check.

The four tests, in plain language:

Step	Test	Null hypothesis	Plain-language description
(a)	Pesaran CD pretest	Cross-sectional independence	Looks at residuals across countries to see whether they move together. Rejection means common shocks contaminate the residuals — the tests in (b) become biased and we lean more on (d).
(b)	Maddala-Wu/Fisher & IPS	Unit root in every panel	Run an ADF on each country, then combine. Fisher pools unit-level $p$-values via $-2\sum\log p_i$; IPS averages unit-level $t$-statistics. Both assume countries are independent — work well when (a) does not reject.
(c)	Hadri	Stationarity in every panel	Flips the null. KPSS-style: tests whether all panels are stationary against the alternative of any unit root. Needs a balanced rectangle. Hlouskova & Wagner (2006) document that Hadri over-rejects under CSD — its rejection deserves less weight when (a) flags dependence.
(d)	Pesaran CIPS	Unit root, allowing common shocks	A unit-root test built to be robust to the CSD detected in (a). Augments each country’s ADF with cross-sectional means of $y$ and $\Delta y$ — those means absorb the common factor before the country-level test runs.

We keep countries with at least 20 usable observations for tests (a), (b), (d). For Hadri (c) we need a balanced rectangle — every retained country observed in every year of a fixed window — so we take the last 20 years available in each transformed object and keep countries with full coverage there.

# unit_root_series was created in the previous chunk. It has one row per
# country-year-treatment, where value is the transformed log GDP series.

# ---- 1. Keep the three transformed outcomes in separate objects -------------
# The country-level tests need enough observations inside each country.
# I keep countries with at least 20 usable observations and non-zero variation.

db_detrended <- unit_root_series |>
  filter(treatment == "Detrended residuals", !is.na(value)) |>
  group_by(country) |>
  filter(n() >= 20, sd(value) > 0) |>
  ungroup()

db_ar1 <- unit_root_series |>
  filter(treatment == "Trend + AR(1) innovations", !is.na(value)) |>
  group_by(country) |>
  filter(n() >= 20, sd(value) > 0) |>
  ungroup()

db_diff <- unit_root_series |>
  filter(treatment == "First differences", !is.na(value)) |>
  group_by(country) |>
  filter(n() >= 20, sd(value) > 0) |>
  ungroup()

# pdata.frame() tells plm which variable is the country id and which variable
# is time. This does not estimate a model. It just stores the panel structure.

pdata_detrended <- pdata.frame(db_detrended, index = c("country", "year"))
pdata_ar1       <- pdata.frame(db_ar1,       index = c("country", "year"))
pdata_diff      <- pdata.frame(db_diff,      index = c("country", "year"))

# ---- 2. Build balanced rectangles for the Hadri test ------------------------
# Hadri needs a balanced panel: the same countries observed in the same years.
# To make the three columns comparable, we use the same last 20-year window.

T_target <- 20
window_end <- max(unit_root_series$year, na.rm = TRUE)
window_start <- window_end - T_target + 1

db_detrended_balanced <- db_detrended |>
  filter(year >= window_start, year <= window_end) |>
  group_by(country) |>
  filter(n() == T_target, sd(value) > 0) |>
  ungroup()

db_ar1_balanced <- db_ar1 |>
  filter(year >= window_start, year <= window_end) |>
  group_by(country) |>
  filter(n() == T_target, sd(value) > 0) |>
  ungroup()

db_diff_balanced <- db_diff |>
  filter(year >= window_start, year <= window_end) |>
  group_by(country) |>
  filter(n() == T_target, sd(value) > 0) |>
  ungroup()

panel_eligibility <- tibble(
  treatment = c("Detrended residuals",
                "Trend + AR(1) innovations",
                "First differences"),
  unbalanced_countries = c(n_distinct(db_detrended$country),
                           n_distinct(db_ar1$country),
                           n_distinct(db_diff$country)),
  unbalanced_country_years = c(nrow(db_detrended),
                               nrow(db_ar1),
                               nrow(db_diff)),
  balanced_countries = c(n_distinct(db_detrended_balanced$country),
                         n_distinct(db_ar1_balanced$country),
                         n_distinct(db_diff_balanced$country)),
  balanced_years = T_target,
  balanced_window = paste0(window_start, "-", window_end)
)

panel_eligibility |>
  kable(caption = "Panel eligibility by transformed outcome.") |>
  kable_styling(full_width = FALSE)

Panel eligibility by transformed outcome.
treatment	unbalanced_countries	unbalanced_country_years	balanced_countries	balanced_years	balanced_window
Detrended residuals	116	3897	99	20	1971-1990
Trend + AR(1) innovations	116	3897	99	20	1971-1990
First differences	111	3687	95	20	1971-1990

# ---- 3. Pesaran CD pretest ---------------------------------------------------
# H0: cross-sectional independence. If we reject, countries still move together
# after the transformation, probably because of common macro shocks.

cd_detrended <- pcdtest(value ~ 1,
                        data = pdata_detrended,
                        model = "pooling",
                        test = "cd")

cd_ar1 <- pcdtest(value ~ 1,
                  data = pdata_ar1,
                  model = "pooling",
                  test = "cd")

cd_diff <- pcdtest(value ~ 1,
                   data = pdata_diff,
                   model = "pooling",
                   test = "cd")

# ---- 4. Maddala-Wu/Fisher test ---------------------------------------------
# H0: every country has a unit root. Rejection favors stationarity.
# purtest() runs country-level unit-root tests and combines the evidence.

mw_detrended <- purtest(value ~ 1,
                        data = db_detrended,
                        index = c("country", "year"),
                        test = "madwu",
                        lags = "SIC", pmax = 4,
                        exo = "intercept")

mw_ar1 <- purtest(value ~ 1,
                  data = db_ar1,
                  index = c("country", "year"),
                  test = "madwu",
                  lags = "SIC", pmax = 4,
                  exo = "intercept")

mw_diff <- purtest(value ~ 1,
                   data = db_diff,
                   index = c("country", "year"),
                   test = "madwu",
                   lags = "SIC", pmax = 4,
                   exo = "intercept")

# ---- 5. IPS test -------------------------------------------------------------
# H0: every country has a unit root. Rejection favors stationarity.
# IPS averages country-level ADF statistics instead of combining p-values.

ips_detrended <- purtest(value ~ 1,
                         data = db_detrended,
                         index = c("country", "year"),
                         test = "ips",
                         lags = "SIC", pmax = 4,
                         exo = "intercept")

ips_ar1 <- purtest(value ~ 1,
                   data = db_ar1,
                   index = c("country", "year"),
                   test = "ips",
                   lags = "SIC", pmax = 4,
                   exo = "intercept")

ips_diff <- purtest(value ~ 1,
                    data = db_diff,
                    index = c("country", "year"),
                    test = "ips",
                    lags = "SIC", pmax = 4,
                    exo = "intercept")

# ---- 6. Hadri test -----------------------------------------------------------
# H0: every country is stationary. Rejection favors a unit root.
# This test requires the balanced panels we created above.

hadri_detrended <- purtest(value ~ 1,
                           data = db_detrended_balanced,
                           index = c("country", "year"),
                           test = "hadri",
                           exo = "intercept")

hadri_ar1 <- purtest(value ~ 1,
                     data = db_ar1_balanced,
                     index = c("country", "year"),
                     test = "hadri",
                     exo = "intercept")

hadri_diff <- purtest(value ~ 1,
                      data = db_diff_balanced,
                      index = c("country", "year"),
                      test = "hadri",
                      exo = "intercept")

# ---- 7. Pesaran CIPS test ----------------------------------------------------
# H0: unit root. Rejection favors stationarity.
# CIPS is useful here because it allows a common factor across countries.

cips_detrended <- cipstest(pdata_detrended$value, lags = 2, type = "drift")
cips_ar1       <- cipstest(pdata_ar1$value,       lags = 2, type = "drift")
cips_diff      <- cipstest(pdata_diff$value,      lags = 2, type = "drift")

# ---- 8. Put the p-values into one comparison table --------------------------

panel_tests_wide <- tibble(
  step = c("(a)", "(b)", "(b)", "(c)", "(d)"),
  test = c("Pesaran CD", "Maddala-Wu/Fisher", "IPS", "Hadri", "Pesaran CIPS"),
  null = c("Cross-sectional independence",
           "Unit root in all panels",
           "Unit root in all panels",
           "Stationarity in all panels",
           "Unit root with common-factor correction"),
  direction = c("CSD pretest",
                "Reject => favors I(0)",
                "Reject => favors I(0)",
                "Reject => favors I(1)",
                "Reject => favors I(0)"),
  `Detrended residuals` = c(cd_detrended$p.value,
                            mw_detrended$statistic$p.value,
                            ips_detrended$statistic$p.value,
                            hadri_detrended$statistic$p.value,
                            cips_detrended$p.value),
  `Trend + AR(1) innovations` = c(cd_ar1$p.value,
                                  mw_ar1$statistic$p.value,
                                  ips_ar1$statistic$p.value,
                                  hadri_ar1$statistic$p.value,
                                  cips_ar1$p.value),
  `First differences` = c(cd_diff$p.value,
                          mw_diff$statistic$p.value,
                          ips_diff$statistic$p.value,
                          hadri_diff$statistic$p.value,
                          cips_diff$p.value)
) |>
  mutate(across(c(`Detrended residuals`,
                  `Trend + AR(1) innovations`,
                  `First differences`),
                ~ scales::pvalue(.x, accuracy = 0.001)))

panel_tests_wide |>
  kable(caption = "Panel unit-root diagnostics by transformation.") |>
  kable_styling(full_width = FALSE)

Panel unit-root diagnostics by transformation.
test	null	direction	Detrended residuals	Trend + AR(1) innovations	First differences
Pesaran CD	Cross-sectional independence	CSD pretest	<0.001	<0.001	<0.001
Maddala-Wu/Fisher	Unit root in all panels	Reject => favors I(0)	<0.001	<0.001	<0.001
IPS	Unit root in all panels	Reject => favors I(0)	<0.001	<0.001	<0.001
Hadri	Stationarity in all panels	Reject => favors I(1)	<0.001	<0.001	<0.001
Pesaran CIPS	Unit root with common-factor correction	Reject => favors I(0)	0.010	0.010	0.010

Reading the panel table. Step (a) tells us whether cross-sectional dependence is still present after each transformation. If CD rejects, countries are still moving together in ways that look like common shocks. In that case, the first-generation tests in (b) are less reliable and CIPS in (d) deserves more weight.

Within step (b), Maddala-Wu/Fisher and IPS combine country-level evidence differently. Fisher is sensitive to a subset of very small country-level p-values; IPS averages unit-level test statistics. A split between them means some countries look stationary, not necessarily the whole panel.

Hadri (c) flips the null toward stationarity, so its rejection points toward nonstationarity. Under cross-sectional dependence it often over-rejects, so read it as a warning rather than a final verdict.

The comparison across columns is useful, but the first row matters. In this application, Pesaran CD rejects for all three transformed objects. That means the panel still contains shared movement across countries, even after detrending, AR(1) filtering, or first differencing. This is plausible for macro data: oil shocks, debt crises, and global interest-rate changes can move many economies in the same years.

Given that cross-sectional dependence remains, we should not rely only on Maddala-Wu/Fisher or IPS. Both are first-generation panel unit-root tests, so their rejections may partly reflect the common shocks. Hadri also rejects stationarity in every column, but Hadri is known to over-reject when cross-sectional dependence is present. For this reason, the most useful row here is CIPS, because it is designed to handle a common factor across countries. Its p-values are reported at the lower bound of the printed approximation, so read them as strong rejection rather than as exact probabilities.

The practical conclusion is not “one test picked one model.” The table says that raw trended levels are not appropriate, and that both levels with explicit persistence and first differences are defensible ways to proceed. The choice now depends on interpretation: do we want to keep level effects and model persistence directly, or do we want to change the outcome to growth rates?

Two routes remain:

Path A: levels with explicit persistence, such as ARDL(1) / lagged-DV models or AR(1)-error models. This keeps level interpretations.
Path B: first differences, which gives cleaner stationarity diagnostics but changes the outcome to log GDP growth.

The rest of the lab takes Path A for interpretation, while keeping Path B in view as the alternative specification.

1.3.5 Do growth rates still have AR dynamics?

First differencing changes the outcome from a level to a growth rate:

\[ \Delta y_{it} = y_{it} - y_{i,t-1}. \]

This often removes the strongest nonstationary trend. But differencing does not guarantee that the transformed series is independent over time. A country’s growth this year may still be related to its growth last year. Before moving into panel models, we therefore ask a simple diagnostic question:

After differencing log GDP, is there still short-run AR(1) persistence within countries?

For each country, we fit:

\[ \Delta y_{it} = \alpha_i + \phi_i \Delta y_{i,t-1} + e_{it}. \]

This figure is useful because it prepares two modeling choices in Part 2. If the $\hat\phi_i$’s are meaningfully positive, then a growth-rate model may still need an explicit lagged dependent variable. Alternatively, we might leave the conditional mean static and handle remaining serial correlation through the variance-covariance matrix, for example with clustered or panel-corrected standard errors. The point is that first differencing solves the trend problem, but it does not automatically solve every dynamic problem.

# One-country example first: estimate AR(1) persistence in Mexico's growth
# series, where growth is the first difference of log GDP.
df_mexico_growth <- democracy |>
  filter(ctyname == "Mexico") |>
  arrange(year) |>
  mutate(d_ln_gdpw = ln_gdpw - lag(ln_gdpw)) |>
  drop_na(d_ln_gdpw)

fit_mexico_growth_ar1 <- arima(df_mexico_growth$d_ln_gdpw,
                               order = c(1, 0, 0))

coef(fit_mexico_growth_ar1)["ar1"]

##       ar1 
## 0.3550756

# Now repeat the same AR(1) fit country by country.
phi_diff <- tibble()

for (c0 in countries) {
  df_c <- democracy |>
    filter(ctyname == c0) |>
    arrange(year) |>
    mutate(d_ln_gdpw = ln_gdpw - lag(ln_gdpw)) |>
    drop_na(d_ln_gdpw)

  if (nrow(df_c) < 10) next

  fit_c <- arima(df_c$d_ln_gdpw, order = c(1, 0, 0))

  phi_diff <- bind_rows(
    phi_diff,
    tibble(ctyname = c0,
           phi = as.numeric(coef(fit_c)["ar1"]))
  )
}

ggplot(phi_diff, aes(phi)) +
  geom_histogram(bins = 25, fill = "steelblue", alpha = 0.7) +
  geom_vline(xintercept = mean(phi_diff$phi, na.rm = TRUE),
             colour = "firebrick", linetype = "dashed") +
  labs(x = expression(hat(phi)[i]), y = "# countries",
       subtitle = "Mean across countries marked in red") +
  theme_minimal(base_size = 11)

Estimated AR(1) coefficients on first-differenced log GDP, fit one country at a time. A descriptive summary of short-run persistence after differencing — read as the AR(1) part of an ARIMA(1,1,0) only if we accept the I(1) verdict from §1.3.4.

1.3.6 The verdict — and the modeling menu

The diagnostics do not justify raw trended levels. They do justify two cleaner ways to proceed:

Path	Outcome	What the diagnostics support	What the coefficients mean
A	$\ln y_{it}$	Keep the level outcome, but model trend/persistence explicitly.	Effects on the level of log GDP.
B	$\Delta \ln y_{it}$	Difference the outcome first, then check whether growth still has short-run dynamics.	Effects on log GDP growth.

Within either path, we still have to decide where to put the remaining AR(1) structure:

Dynamic choice	Example	Interpretation
Lagged dependent variable (LDV)	$y_{it} = \alpha_i + \phi y_{i,t-1} + X_{it}\beta + \varepsilon_{it}$	Persistence is part of the conditional mean. Coefficients are short-run effects; long-run effects require dividing by $1-\phi$.
AR(1) errors	$y_{it} = \alpha_i + X_{it}\beta + u_{it}$, with $u_{it} = \rho u_{i,t-1} + \varepsilon_{it}$	Persistence is in the residual process. The mean model stays static; the error model handles serial correlation.
No explicit AR term, corrected SEs	$y_{it} = \alpha_i + X_{it}\beta + \varepsilon_{it}$, with clustered / PCSE / Driscoll-Kraay SEs	Point estimates come from a simpler mean model; the variance-covariance correction protects inference from serial and cross-sectional dependence.

These are different assumptions, not just different software options. LDV models are useful for forecasting and impulse responses because the lagged outcome tells us how shocks propagate. AR-error models are useful when we believe the conditional mean is static but the residuals are persistent. Robust variance-covariance corrections are useful when the mean model is the focus and we mainly want standard errors that do not pretend the residuals are iid.

Part 2 uses this menu deliberately. We will compare level and growth-rate specifications, and we will distinguish models that put AR(1) dynamics in the outcome from models that put AR(1) dynamics in the errors. We revisit the Nickell-bias correction next week with Arellano-Bond and dynamic panel GMM.

Part 2 — Estimating panel models

§1.3.4 closed in honest ambiguity: at $T \le 30$ with cross-sectional dependence, the data cannot tell us with confidence whether ln_gdpw is I(1) or near-unit-root stationary. The pedagogical response is not to pick one estimator and defend it, but to estimate a family of specifications that identify the regime effect under different assumptions, and read whether the conclusion is robust across them.

2.0 Specifications and what they identify

Each estimator below is defined by which variance it uses. Following Kropko & Kubinec (2020) and Bell & Jones (2015), every time-varying regressor decomposes into a within and a between component:

\[ x_{it} \;=\; \underbrace{(x_{it} - \bar x_i)}_{x^{W}_{it}\ (\text{within})} \;+\; \underbrace{\bar x_i}_{x^{B}_i\ (\text{between})} \]

The right specification depends on the research question:

“Does democratization, for a given country, raise log GDP?” → within ($\beta_W$); use FE or Mundlak’s within block.
“Are democracies, on average across countries, higher in log GDP?” → between ($\beta_B$); use Mundlak’s between block.
“Both — and do they differ?” → Mundlak / REWB returns both, and $\beta_W = \beta_B$ is itself testable.

2.0.1 Seeing country-level outcome differences

Panel models start from a simple fact: countries have very different average outcome levels. The plot below shows the full country-year distribution of log GDP for each country, ordered by each country’s mean. The red dot marks the country mean. The box color marks whether that country’s growth volatility is low, middle, or high relative to the other countries in the sample.

# Keep one country-year row per observation with the variables used in the plot.
db_country_levels <- democracy |>
  select(country, ctyname, year, ln_gdpw) |>
  drop_na(ln_gdpw)

# Compute each country's mean log GDP. The mean orders the countries and draws
# the red dots.
country_level_means <- db_country_levels |>
  group_by(ctyname) |>
  summarise(mean_ln_gdpw = mean(ln_gdpw),
            .groups = "drop") |>
  arrange(mean_ln_gdpw)

# Compute each country's growth volatility: the standard deviation of the first
# difference of log GDP. This removes the level trend and summarizes how noisy
# the growth process is within each country.
country_growth_volatility <- db_country_levels |>
  group_by(ctyname) |>
  arrange(year) |>
  mutate(d_ln_gdpw = ln_gdpw - lag(ln_gdpw)) |>
  summarise(sd_d_ln_gdpw = sd(d_ln_gdpw, na.rm = TRUE),
            .groups = "drop")

# Use tertiles to label countries as low, middle, or high growth volatility.
# These labels are relative to this sample, not universal categories.
sd_cut_low <- quantile(country_growth_volatility$sd_d_ln_gdpw,
                       1/3, na.rm = TRUE)
sd_cut_high <- quantile(country_growth_volatility$sd_d_ln_gdpw,
                        2/3, na.rm = TRUE)

country_growth_volatility <- country_growth_volatility |>
  mutate(growth_volatility = case_when(
    sd_d_ln_gdpw <= sd_cut_low ~ "Low growth volatility",
    sd_d_ln_gdpw <= sd_cut_high ~ "Middle growth volatility",
    TRUE ~ "High growth volatility"
  ))

# Compute the overall sample mean. This is the dashed vertical reference line.
overall_mean_ln_gdpw <- mean(db_country_levels$ln_gdpw)

# Reorder country labels so that countries appear from low to high average log GDP.
db_country_levels <- db_country_levels |>
  left_join(country_growth_volatility |> select(ctyname, growth_volatility),
            by = "ctyname") |>
  mutate(ctyname_ordered = factor(ctyname,
                                  levels = country_level_means$ctyname))

# Apply the same country ordering to the country-mean data.
country_level_means <- country_level_means |>
  mutate(ctyname_ordered = factor(ctyname,
                                  levels = country_level_means$ctyname))

# Boxplots show the within-country distribution over time. Fill color shows
# whether the country's first-differenced log GDP is low, middle, or high in
# volatility. Red dots show country means. The dashed line shows the overall
# sample mean.
ggplot(db_country_levels,
       aes(x = ln_gdpw, y = ctyname_ordered, fill = growth_volatility)) +
  geom_vline(xintercept = overall_mean_ln_gdpw,
             linetype = "dashed",
             colour = "black") +
  geom_boxplot(colour = "grey45",
               outlier.size = 0.35,
               linewidth = 0.25) +
  geom_point(data = country_level_means,
             aes(x = mean_ln_gdpw, y = ctyname_ordered),
             inherit.aes = FALSE,
             colour = "firebrick",
             size = 0.9) +
  scale_fill_manual(values = c("Low growth volatility" = "grey88",
                               "Middle growth volatility" = "skyblue3",
                               "High growth volatility" = "goldenrod2")) +
  labs(x = "log GDP",
       y = NULL,
       fill = NULL,
       subtitle = "Countries are ordered by their panel mean; highest averages appear at the top.") +
  theme_minimal(base_size = 9) +
  theme(axis.text.y = element_text(size = 5),
        legend.position = "bottom",
        panel.grid.major.y = element_blank())

Country-level distributions of log GDP, ordered by each country’s mean. Red dots mark country means; the dashed black line marks the overall sample mean. Box color marks the SD of first-differenced log GDP.

This figure is a visual reminder of the main bias problem in pooled panel models. If countries with high average education, democracy, or oil status also have persistently high outcome levels for historical reasons, pooled OLS can mistake cross-country level differences for within-country change.

2.0.2 Seeing pooled, between, and within variation

Before fitting models, it helps to see the three comparisons with a variable that has both within- and between-country variation. We use edt, cumulative years of labor-force education, because it varies a lot across countries and also moves over time within many countries.

The pooled scatter ignores the panel structure. Every country-year is just one point:

# Keep only complete country-years for the two variables in the plot.
db_edt <- democracy |>
  select(country, ctyname, year, ln_gdpw, edt) |>
  drop_na(ln_gdpw, edt)

# The pooled correlation ignores the country structure.
pooled_cor <- cor(db_edt$edt, db_edt$ln_gdpw)

# Each point is a country-year. The regression line is the pooled bivariate fit.
ggplot(db_edt, aes(edt, ln_gdpw)) +
  geom_point(alpha = 0.25, size = 1) +
  geom_smooth(method = "lm", se = FALSE, colour = "firebrick") +
  labs(x = "Education, country-year level",
       y = "log GDP",
       subtitle = paste0("Pooled correlation = ",
                         round(pooled_cor, 2))) +
  theme_minimal(base_size = 11)

Pooled association between education and log GDP. Each point is a country-year.

The between scatter collapses each country to its panel mean. Now the question is different: do countries with higher average education also have higher average log GDP?

# Collapse the panel to one row per country.
between_edt <- db_edt |>
  group_by(country, ctyname) |>
  summarise(edt_mean = mean(edt),
            ln_gdpw_mean = mean(ln_gdpw),
            .groups = "drop")

# This correlation uses the country means, not the country-year observations.
between_cor <- cor(between_edt$edt_mean,
                   between_edt$ln_gdpw_mean)

# Each point is a country mean.
ggplot(between_edt, aes(edt_mean, ln_gdpw_mean)) +
  geom_point(alpha = 0.7, size = 1.8) +
  geom_smooth(method = "lm", se = FALSE, colour = "firebrick") +
  labs(x = "Country mean education",
       y = "Country mean log GDP",
       subtitle = paste0("Between correlation = ",
                         round(between_cor, 2))) +
  theme_minimal(base_size = 11)

Between-country association: each point is one country’s panel mean.

The within scatter subtracts each country’s own mean from both variables. A positive within relationship means that, for the same country, years with above-usual education also tend to be years with above-usual log GDP. The facets below show this country by country for a small subset.

# Demean education and log GDP within each country.
within_edt <- db_edt |>
  group_by(country, ctyname) |>
  mutate(edt_within = edt - mean(edt),
         ln_gdpw_within = ln_gdpw - mean(ln_gdpw)) |>
  ungroup()

# This is the overall within-country correlation after demeaning.
within_cor <- cor(within_edt$edt_within,
                  within_edt$ln_gdpw_within)

# Pick a small set of countries for a readable facet plot.
focus_countries <- c("Argentina", "Brazil", "Chile", "India",
                     "Korea, South", "Mexico", "Spain", "Turkey")

# The axes are within-country deviations, not raw levels.
within_edt |>
  filter(ctyname %in% focus_countries) |>
  ggplot(aes(edt_within, ln_gdpw_within)) +
  geom_hline(yintercept = 0, colour = "grey75") +
  geom_vline(xintercept = 0, colour = "grey75") +
  geom_point(alpha = 0.7, size = 1.3) +
  geom_smooth(method = "lm", se = FALSE, colour = "firebrick") +
  facet_wrap(~ ctyname, scales = "free") +
  labs(x = "Education minus country mean",
       y = "log GDP minus country mean",
       subtitle = paste0("Overall within correlation = ",
                         round(within_cor, 2))) +
  theme_minimal(base_size = 10)

Within-country association for selected countries. Both axes subtract each country’s own mean.

The same data can therefore tell three different stories:

# Put the three correlations in one table so students can compare them directly.
tibble(
  comparison = c("Pooled country-years",
                 "Between country means",
                 "Within-country deviations"),
  x_variable = c("edt",
                 "mean(edt) by country",
                 "edt - mean(edt) by country"),
  y_variable = c("ln_gdpw",
                 "mean(ln_gdpw) by country",
                 "ln_gdpw - mean(ln_gdpw) by country"),
  correlation = c(pooled_cor, between_cor, within_cor)
) |>
  kable(digits = 3,
        caption = "Pooled, between, and within correlations between education and log GDP.") |>
  kable_styling(full_width = FALSE)

Pooled, between, and within correlations between education and log GDP.
comparison	x_variable	y_variable	correlation
Pooled country-years	edt	ln_gdpw	0.757
Between country means	mean(edt) by country	mean(ln_gdpw) by country	0.767
Within-country deviations	edt - mean(edt) by country	ln_gdpw - mean(ln_gdpw) by country	0.651

This is why panel models are not just different standard errors. They answer different questions. Pooled OLS mixes all three patterns. FE uses the within deviations. Mundlak / REWB estimates the within and between pieces separately.

2.0.3 The same decomposition for every candidate predictor

The edt picture above is not special. Every covariate splits into a between part (cross-country differences in its mean) and a within part (movement around each country’s own mean), and the two can correlate with ln_gdpw very differently. Before choosing a specification, we compute all three correlations — overall (pooled), between (country means), within (country-demeaned) — for every candidate predictor, plus the within-variance share: the fraction of each variable’s variance that lives within countries. A variable with a within share near zero is effectively time-invariant and contributes nothing to a fixed-effects estimate.

# For each predictor we want three correlations with log GDP: overall (all
# country-years), between (country means), and within (deviations from each
# country's own mean). Instead of repeating the same code for all 12 variables,
# we stack the predictors into long format and group by variable.

# Step 1: name the predictors, and build the interaction we also want to see.
wb_vars <- c("dem", "edt", "dem_x_edt", "stra", "civlib", "pollib",
             "oil", "britcol", "newc", "elf60", "cath", "moslem")
wb_df <- democracy |>
  mutate(dem_x_edt = dem * edt)

# Step 2: stack every predictor into two columns (`var`, `value`), keeping the
# outcome and the country id beside each row. Now one row = one country-year-var.
wb_long <- wb_df |>
  select(country, ln_gdpw, all_of(wb_vars)) |>
  pivot_longer(cols = all_of(wb_vars),
               names_to = "var", values_to = "value")

# Step 3: OVERALL correlation. Group by variable, correlate value with ln_gdpw
# using every country-year.
overall <- wb_long |>
  group_by(var) |>
  summarise(overall = cor(value, ln_gdpw, use = "pairwise.complete.obs"),
            .groups = "drop")

# Step 4: BETWEEN correlation. First collapse to one mean per country, then
# correlate those country means with each other.
between <- wb_long |>
  group_by(var, country) |>
  summarise(value_mean   = mean(value,   na.rm = TRUE),
            ln_gdpw_mean = mean(ln_gdpw, na.rm = TRUE),
            .groups = "drop") |>
  group_by(var) |>
  summarise(between = cor(value_mean, ln_gdpw_mean, use = "pairwise.complete.obs"),
            .groups = "drop")

# Step 5: WITHIN correlation. Subtract each country's mean from both the
# predictor and the outcome, then correlate the deviations.
within <- wb_long |>
  group_by(var, country) |>
  mutate(value_dev   = value   - mean(value,   na.rm = TRUE),
         ln_gdpw_dev = ln_gdpw - mean(ln_gdpw, na.rm = TRUE)) |>
  ungroup() |>
  group_by(var) |>
  summarise(within = cor(value_dev, ln_gdpw_dev, use = "pairwise.complete.obs"),
            .groups = "drop")

# Step 6: WITHIN-VARIANCE SHARE. Reuse the within deviation, then divide its
# variance by the total variance. A share near 0 means the variable barely
# moves within a country (effectively time-invariant).
within_share <- wb_long |>
  group_by(var, country) |>
  mutate(value_dev = value - mean(value, na.rm = TRUE)) |>
  ungroup() |>
  group_by(var) |>
  summarise(within_share = var(value_dev, na.rm = TRUE) / var(value, na.rm = TRUE),
            .groups = "drop")

# Step 7: join the four pieces, sort by the strength of the overall correlation.
wb_tab <- overall |>
  left_join(between,      by = "var") |>
  left_join(within,       by = "var") |>
  left_join(within_share, by = "var") |>
  arrange(desc(abs(overall)))

wb_tab |>
  kable(digits = 3,
        col.names = c("Variable", "Overall", "Between", "Within",
                      "Within var. share"),
        caption = "Correlation with log GDP, decomposed into overall / between / within, with each variable's within-variance share. NA within = no within variation (time-invariant).") |>
  kable_styling(full_width = FALSE)

Correlation with log GDP, decomposed into overall / between / within, with each variable’s within-variance share. NA within = no within variation (time-invariant).
Variable	Overall	Between	Within	Within var. share
edt	0.757	0.772	0.626	0.084
dem_x_edt	0.668	0.691	0.176	0.106
pollib	0.653	0.690	0.085	0.156
civlib	-0.647	-0.679	-0.047	0.137
dem	0.560	0.632	0.053	0.247
newc	-0.530	-0.506	NA	0.000
elf60	-0.473	-0.500	NA	0.000
moslem	-0.257	-0.252	NA	0.000
cath	0.251	0.255	NA	0.000
britcol	-0.153	-0.079	NA	0.000
stra	0.113	0.101	0.289	0.190
oil	0.091	0.087	NA	0.000

A dumbbell plot makes the gap between the between and within correlation visible at a glance:

wb_tab |>
  filter(var != "dem_x_edt") |>
  mutate(var = fct_reorder(var, between)) |>
  ggplot(aes(y = var)) +
  geom_segment(aes(x = between, xend = within, y = var, yend = var),
               colour = "grey70") +
  geom_point(aes(x = between, colour = "Between"), size = 2.6) +
  geom_point(aes(x = within, colour = "Within"), size = 2.6) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey50") +
  scale_colour_manual(values = c("Between" = "firebrick", "Within" = "steelblue")) +
  labs(x = "Correlation with log GDP", y = NULL, colour = NULL) +
  theme_minimal(base_size = 11) +
  theme(legend.position = "bottom")

Between vs. within correlation with log GDP for each candidate predictor. A long segment means the variable tells a very different story across countries than it does within a country. Time-invariant variables have no within point.

Three lessons set up the rest of Part 2:

edt is the only predictor that survives the within transformation strongly (within $r \approx 0.63$). Education tracks GDP both across countries and within a country over time, so it is the genuine fixed-effects workhorse.
dem collapses from a strong between correlation ($\approx 0.63$) to almost nothing within ($\approx 0.05$). The cross-country democracy-GDP association is overwhelmingly between countries — rich countries tend to be democracies — not a within-country relationship. This is exactly the gap fixed effects will expose.
The time-invariant controls (oil, britcol, newc, elf60, cath, moslem) have no within variation at all (within share = 0, within correlation undefined). They can only ever contribute between-country information, so FE/TWFE drop them and only Pooled, RE, and Mundlak can estimate their coefficients. Note also that civlib/pollib correlate strongly between but near-zero within, on top of being 42% missing and near-definitional to democracy — three reasons they stay out of the main model.

Two pedagogical reminders we will use in §2.7:

Bell & Jones (2015) on REWB. When researchers default to FE “to be safe,” they discard the between variation that often carries the substantive signal. REWB (random effects within-between, equivalent to Mundlak with random intercepts) returns both coefficients without losing the FE-style identification on the within block. The difference $\beta_W - \beta_B$ is itself the diagnostic.
Kropko & Kubinec (2020) on FE. One-way country FE has a clear interpretation: it uses within-country changes over time. Two-way FE is less transparent: after removing both country and year means, the coefficient describes deviations from both dimensions. That can be exactly right in a difference-in-differences design; otherwise it is easy to report a coefficient whose comparison is hard to state in words.

The five candidates all start from the same equation:

\[ y_{it} \;=\; \alpha + \beta_1\,\text{dem}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{dem}_{it}\!\cdot\!\text{edt}_{it}) + \beta_4\,\text{oil}_i + u_{it} \]

§1.3 left two defensible routes open, and Part 2 shows both for each estimator. This is the organizing spine of the section: either keep the outcome in levels and correct the persistence (a lagged DV, or AR(1) errors), or difference the outcome and let cluster-robust SEs handle the milder residual autocorrelation. Each subsection fits a static version and its dynamic variant(s) side-by-side, so you can see what each device adds:

Subsection	Levels route	Differenced route
2.1 Pooled OLS / GLS	levels + trend; GLS AR(1) errors	growth; ECM (lagged level); growth + lagged Δy
2.2 FE (within)	static within; AR(1) lagged DV	growth (Δy, country FE); FD estimator
2.3 TWFE	static; AR(1) lagged DV	growth (Δy, two-way FE); FD + year effects
2.4 RE	static; corARMA(1) AR(1) errors	— (differencing breaks the RE structure)
2.5 Mundlak / REWB	static; corARMA(1)	—

Label note. “AR(1)” in this lab means an AR(1) on the outcome — i.e., a lagged dependent variable in the regression. corARMA(1) is an AR(1) on the errors. The two are observationally close but interpret differently — the §2.4 RE block walks through both side by side. The differenced route never needs an explicit AR term: differencing removes most of the persistence, and what remains is absorbed by cluster-robust SEs.

Two methodological notes built into this menu:

Why no lagged DV in Mundlak/REWB. Adding $y_{i,t-1}$ to the within-between decomposition either pollutes the decomposition (treating the lag as a single regressor imports a between component into an estimator whose whole point is to keep within and between separate) or causes mechanical collinearity (decomposing the lag makes $\bar y_i$ the empirical analog of the random intercept $u_i$). corARMA is the defensible dynamic version for REWB — it absorbs persistence in the residuals without disturbing the within-between split.
Why corARMA shows up only for RE and Mundlak/REWB. It is a maximum-likelihood feature of nlme::lme. feols and plm cannot fit it; cluster-robust / Driscoll–Kraay SEs play the same role for those engines.

The table below is best read as a bias map, not as a list of automatically causal estimators. Every estimator helps with some problems and leaves others behind.

Specification	Comparison emphasized	Bias it helps with	What can still go wrong
Pooled OLS	All country-years together; mixes within and between variation	Useful baseline; no correction for country-level heterogeneity	Biased if stable country traits affect both the regressors and the outcome; conflates cross-country differences with within-country change; SEs can be wrong under serial or cross-sectional dependence
Fixed effects (within)	Within-country deviations from each country’s own mean	Removes all time-invariant country-level confounders, observed or unobserved	Cannot estimate time-invariant covariates; still biased by time-varying confounders, reverse causality, measurement error, and Nickell bias in dynamic FE
Two-way fixed effects	Within-country deviations after also removing common year shocks	Removes time-invariant country differences and shocks common to all countries in a year	Coefficients can be hard to interpret outside a clear design; time-varying confounding remains; year effects can absorb substantively important common variation
Random effects	Partial-pooling / quasi-demeaned mixture of within and between variation	More efficient than FE if the country intercept is unrelated to regressors; estimates time-invariant covariates	Biased if unobserved country traits are correlated with the regressors; mixes within and between stories unless modeled explicitly
Mundlak / REWB	Within and between components estimated separately	Makes the within/between split explicit; relaxes the strongest RE assumption by adding country means of time-varying regressors	Still needs no omitted time-varying confounders for a causal reading; dynamic dependence and cross-sectional dependence still need to be modeled or corrected

# Build the dynamic data once for all of Part 2 (and Part 3):
#   time_id        — numeric time trend for pooled trend models
#   dem_x_edt      — contemporaneous interaction, stored so we can difference it
#   ln_gdpw_lag    — lagged outcome (for AR(1) and ARDL specs in levels)
#   dem_lag        — lagged treatment (for the pooled growth / distributed-lag models)
#   edt_lag        — lagged moderator (so the lagged interaction dem_lag × edt_lag
#                    is the t-1 version of the contemporaneous dem × edt)
#   d_ln_gdpw      — first difference of the outcome (log-GDP growth)
#   d_ln_gdpw_lag  — lag of the differenced outcome (dynamic growth model)
#   d_dem, d_edt, d_dem_x_edt — first differences of the covariates
democracy_dyn <- democracy |>
  group_by(country) |>
  arrange(country, year) |>
  mutate(time_id            = year - min(year),
         dem_x_edt          = dem * edt,
         ln_gdpw_lag        = lag(ln_gdpw),
         dem_lag            = lag(dem),
         edt_lag            = lag(edt),
         dem_x_edt_lag      = lag(dem_x_edt),
         d_ln_gdpw          = ln_gdpw - ln_gdpw_lag,
         d_ln_gdpw_lag      = lag(d_ln_gdpw),
         d_dem              = dem - dem_lag,
         d_dem_lag          = lag(d_dem),
         d_edt              = edt - edt_lag,
         d_edt_lag          = lag(d_edt),
         d_dem_x_edt        = dem_x_edt - dem_x_edt_lag,
         d_dem_x_edt_lag    = lag(d_dem_x_edt)) |>
  ungroup()

2.1 Pooled OLS and GLS

We start with pooled models — every country-year exchangeable, one common intercept. These models are not our final answer, because they ignore country-specific baselines. They are useful because they let us compare four ways to handle trends and persistence before we add fixed or random effects:

include a linear trend in the pooled levels model;
put AR(1) persistence in the errors with GLS;
difference the outcome and model log-GDP growth;
add the lagged level of the outcome to the growth model — the error-correction / convergence form.

2.1.1 Creating lags in panel data

Before fitting dynamic models, we need lags. The key point is that lags must be taken within country, not down the whole dataset.

# Pick a small example so students can see what a panel lag does.
lag_demo <- democracy |>
  filter(ctyname %in% c("Mexico", "Argentina")) |>
  arrange(ctyname, year) |>
  select(country, ctyname, year, ln_gdpw, dem, edt)

# Method 1: tidyverse. group_by(country) tells R to restart the lag for each
# country, so Mexico's first row does not use Argentina's last row as its lag.
lag_demo_tidy <- lag_demo |>
  group_by(country) |>
  arrange(country, year) |>
  mutate(ln_gdpw_lag_tidy = lag(ln_gdpw),
         dem_lag_tidy = lag(dem)) |>
  ungroup()

# Method 2: plm. pdata.frame() stores the panel index, and plm::lag() respects
# that index when creating lags.
lag_demo_plm <- pdata.frame(lag_demo,
                            index = c("country", "year"))
lag_demo_plm$ln_gdpw_lag_plm <- plm::lag(lag_demo_plm$ln_gdpw, 1)
lag_demo_plm$dem_lag_plm <- plm::lag(lag_demo_plm$dem, 1)

# Put the two versions side by side. They should match.
tibble(
  ctyname = lag_demo_tidy$ctyname,
  year = lag_demo_tidy$year,
  ln_gdpw = lag_demo_tidy$ln_gdpw,
  lag_tidy = lag_demo_tidy$ln_gdpw_lag_tidy,
  lag_plm = as.numeric(lag_demo_plm$ln_gdpw_lag_plm),
  dem = lag_demo_tidy$dem,
  dem_lag_tidy = lag_demo_tidy$dem_lag_tidy,
  dem_lag_plm = as.numeric(lag_demo_plm$dem_lag_plm)
) |>
  group_by(ctyname) |>
  slice_head(n = 6) |>
  ungroup() |>
  kable(digits = 3,
        caption = "Two ways to create panel lags: tidyverse group_by() and plm::lag().") |>
  kable_styling(full_width = FALSE)

Two ways to create panel lags: tidyverse group_by() and plm::lag().
ctyname	year	ln_gdpw	lag_tidy	lag_plm	dem	dem_lag_tidy	dem_lag_plm
Argentina	1951	9.219	NA	NA	1	NA	NA
Argentina	1952	9.138	9.219	9.219	1	1	1
Argentina	1953	9.148	9.138	9.138	1	1	1
Argentina	1954	9.188	9.148	9.148	1	1	1
Argentina	1955	9.261	9.188	9.188	0	1	1
Argentina	1956	9.251	9.261	9.261	0	0	0
Mexico	1951	8.898	NA	NA	0	NA	NA
Mexico	1952	8.904	8.898	8.898	0	0	0
Mexico	1953	8.869	8.904	8.904	0	0	0
Mexico	1954	8.945	8.869	8.869	0	0	0
Mexico	1955	9.001	8.945	8.945	0	0	0
Mexico	1956	9.038	9.001	9.001	0	0	0

2.1.2 Five pooled specifications

We fit five pooled models that share one coherent control set — dem, lagged democracy dem_lag, centred education edt_c, the dem × edt_c interaction, the structural between-country controls newc, elf60, region, and oil. The first two keep the level outcome with a linear trend (the levels route, where persistence is corrected through AR(1) errors); the last three move to log-GDP growth (the differenced-outcome route) and differ only in how they treat dynamics. The common-shock period dummies (oil_shock1, oil_shock2) enter every column. In the growth columns they read cleanly as deviations in the growth rate (e.g. the 1979-82 slowdown); in the levels columns — alongside the linear trend — they capture level deviations during the crisis windows.

Col	Spec	Outcome	Dynamic idea
(1)	Pooled levels + trend	$y_{it}$	Descriptive baseline with a linear time trend.
(2)	GLS AR(1) errors + trend	$y_{it}$	Same mean as (1); persistence enters the residual process.
(3)	Growth	$\Delta y_{it}$	Growth on covariate levels (+ lagged democracy `dem_lag`).
(4)	Growth / ECM	$\Delta y_{it}$	(3) + lagged level $y_{i,t-1}$ — error-correction / convergence.
(5)	Growth + lagged $\Delta y$	$\Delta y_{it}$	(3) + lagged growth $\Delta y_{i,t-1}$ — momentum / AR(1) in growth.

Columns (3)-(5) are the differenced-outcome route: growth is close to stationary (§1.3.5), so we do not need an explicit level-persistence correction, and cluster-robust SEs (§2.1.4) handle the mild residual autocorrelation. We add lagged democracy (dem_lag) to all three so a delayed regime effect — institutional reforms biting a year later — can show up separately from the contemporaneous one. Columns (4) and (5) then ask two different follow-up questions about dynamics.

Column (4): the error-correction model (ECM), in plain words. Adding the lagged level of the outcome to a growth regression gives the single-equation ECM:

\[ \Delta y_{it} \;=\; \phi\, y_{i,t-1} + X_{it}\beta + \varepsilon_{it}. \]

Read it as a self-correction mechanism. Think of $X_{it}\beta$ as the level of GDP a country’s fundamentals (education, regime, structure) can sustain — its long-run equilibrium. The term $\phi\,y_{i,t-1}$ asks where the country sat last year relative to that target. If $\phi < 0$, a country sitting above its fundamentals-implied level grows more slowly next period (it falls back toward equilibrium), and one sitting below grows faster (it catches up). So $\phi$ is the speed of adjustment — the fraction of the gap closed each year — and the long-run effect of a covariate is $-\beta_X/\phi$. A $\phi$ near zero means no self-correction: the level wanders like a random walk and never returns (the pure unit-root case).

When to use an ECM. It is the natural model when (i) the outcome and predictors are each non-stationary but believed to move together in the long run (cointegration), and (ii) you want to separate the short-run effect ($\beta$, the immediate response in growth) from the long-run effect ($-\beta_X/\phi$, the eventual level shift). It is the workhorse for convergence questions (“do poorer countries catch up?”) and for any setting where a shock has both an impact effect and a gradual equilibrium adjustment. The catch: a stable equilibrium requires $\phi < 0$ (and $|1+\phi|<1$). We chose the lagged level over the lagged difference precisely because it speaks to the I(1)-vs-stationary question from §1.3.4 — and as we will see, $\hat\phi \approx 0$ here (no pooled error-correction), which is itself the finding that justifies modelling growth.

Column (5): the lagged-difference (growth-momentum) model. Replacing the lagged level with the lagged growth rate gives

\[ \Delta y_{it} \;=\; \psi\, \Delta y_{i,t-1} + X_{it}\beta + \varepsilon_{it}. \]

Here $\psi$ is the autocorrelation of the growth rate itself: does a fast-growing year tend to be followed by another? A positive $\psi$ is momentum — booms and recessions persist beyond a single year. Unlike the ECM, this model says nothing about the level or about convergence; it is simply the AR(1) representation of the growth process. Use it when growth rates are serially correlated and you want to model that persistence directly (or absorb it), while staying agnostic about any long-run level equilibrium. In this panel $\hat\psi \approx 0.12$ (significant) while $\hat\phi \approx 0$: growth has mild momentum, but levels show no error-correction — so the differenced outcome is the cleaner description, and cluster SEs would handle the $\psi$ persistence even if we left it unmodelled.

# Levels models: level outcome + trend + the coherent control set. One common
# complete-case dataset for columns (1)-(2).
db_pool_levels <- democracy_dyn |>
  drop_na(ln_gdpw, time_id, dem, edt_c, newc, elf60, oil)

# Growth models first-difference the outcome only; covariates remain in levels,
# so `dem` means democratic status, not a regime transition. The same sample
# serves the static growth model (3) and its ECM extension (4): d_ln_gdpw is
# missing in each country's first year — exactly where ln_gdpw_lag is too.
db_pool_growth <- democracy_dyn |>
  drop_na(d_ln_gdpw, d_ln_gdpw_lag, ln_gdpw_lag, dem, dem_lag, edt_c, newc, elf60, oil)

# (1) Pooled levels + trend. A common time trend keeps the intercept from
# having to absorb the long-run upward drift in log GDP.
fit_pool <- lm(ln_gdpw ~ time_id + dem * edt_c + newc + elf60 + region + oil +
                 oil_shock1 + oil_shock2,
               data = db_pool_levels)

# (2) GLS with AR(1) errors + trend: same mean equation as (1), but residuals
# follow an AR(1) within country. corAR1(form = ~ year | country) orders the
# serial correlation by year inside each country.
fit_pool_gls_ar1 <- gls(ln_gdpw ~ time_id + dem * edt_c + newc + elf60 + region + oil +
                          oil_shock1 + oil_shock2,
                        data = db_pool_levels,
                        correlation = corAR1(form = ~ year | country),
                        method = "ML")

# (3) Growth model: outcome is log-GDP growth, covariates in levels, plus a
# lagged democracy term so a delayed regime effect can show up separately from
# the contemporaneous one. The common-shock dummies are interpretable here —
# oil_shock2 (1979-82) should pick up the Volcker-era growth slowdown.
fit_pool_growth <- lm(d_ln_gdpw ~ dem * edt_c + dem_lag + newc + elf60 + region + oil +
                        oil_shock1 + oil_shock2,
                      data = db_pool_growth)

# (4) Growth / ECM: add the lagged LEVEL of the outcome — the error-correction
# term. A negative coefficient is conditional convergence; near zero is the
# near-unit-root case from 1.3.4.
fit_pool_ecm <- lm(d_ln_gdpw ~ ln_gdpw_lag + dem * edt_c + dem_lag + newc + elf60 + region + oil +
                     oil_shock1 + oil_shock2,
                   data = db_pool_growth)

# (5) Growth + lagged difference: add the lagged GROWTH rate — momentum / AR(1)
# in the growth process. Says nothing about the level; cluster SEs (2.1.4) would
# also absorb this serial correlation without modelling it explicitly.
fit_pool_lagdiff <- lm(d_ln_gdpw ~ d_ln_gdpw_lag + dem * edt_c + dem_lag + newc + elf60 + region + oil +
                         oil_shock1 + oil_shock2,
                       data = db_pool_growth)

texreg::htmlreg(
  list(fit_pool, fit_pool_gls_ar1, fit_pool_growth, fit_pool_ecm, fit_pool_lagdiff),
  custom.model.names = c("(1) Levels",
                         "(2) GLS AR(1)",
                         "(3) Growth",
                         "(4) ECM",
                         "(5) Momentum"),
  custom.coef.map = list(
    "(Intercept)"     = "(Intercept)",
    "ln_gdpw_lag"     = "GDP lag (φ, ECM)",
    "d_ln_gdpw_lag"   = "Growth lag (ψ)",
    "time_id"         = "Trend",
    "dem"             = "Democracy",
    "dem_lag"         = "Democracy lag",
    "edt_c"           = "Education",
    "dem:edt_c"       = "Democ.×Educ.",
    "newc"            = "New state",
    "elf60"           = "Ethnic frac.",
    "regionAsia"      = "Region: Asia",
    "regionEurope"    = "Region: Europe",
    "regionNorth and Central America" = "Region: N&C America",
    "regionOceania"   = "Region: Oceania",
    "regionSouth America" = "Region: S. America",
    "oil"             = "Oil exporter",
    "oil_shock1"      = "1st oil shock",
    "oil_shock2"      = "2nd oil shock"
  ),
  digits = 3,
  doctype = FALSE
)

Statistical models
	Levels	GLS AR(1)	Growth	ECM	Momentum
(Intercept)	8.362^***	7.626^***	0.026^***	0.052^**	0.023^***
	(0.052)	(0.240)	(0.005)	(0.020)	(0.005)
GDP lag (φ, ECM)				-0.003
				(0.002)
Growth lag (ψ)					0.119^***
					(0.020)
Trend	0.005^***	0.020^***
	(0.001)	(0.001)
Democracy	0.126^***	0.004	0.006	0.007	0.006
	(0.029)	(0.008)	(0.008)	(0.008)	(0.008)
Democracy lag			-0.008	-0.008	-0.008
			(0.008)	(0.008)	(0.008)
Education	0.205^***	0.013	-0.001	-0.001	-0.001
	(0.008)	(0.007)	(0.001)	(0.001)	(0.001)
Democ.×Educ.	-0.032^***	-0.003	0.000	0.000	0.000
	(0.009)	(0.004)	(0.001)	(0.001)	(0.001)
New state	-0.098^**	-0.064	-0.006	-0.007	-0.006
	(0.031)	(0.182)	(0.004)	(0.004)	(0.004)
Ethnic frac.	-0.345^***	-0.401	-0.013^**	-0.014^**	-0.011^*
	(0.041)	(0.260)	(0.005)	(0.005)	(0.005)
Region: Asia	0.171^***	0.653^***	0.023^***	0.024^***	0.021^***
	(0.035)	(0.191)	(0.004)	(0.004)	(0.004)
Region: Europe	0.688^***	1.631^***	0.013^*	0.015^*	0.012
	(0.050)	(0.251)	(0.006)	(0.006)	(0.006)
Region: N&C America	0.418^***	1.007^***	-0.005	-0.004	-0.004
	(0.044)	(0.240)	(0.005)	(0.005)	(0.005)
Region: Oceania	0.516^***	1.651^***	0.000	0.002	0.001
	(0.076)	(0.384)	(0.009)	(0.009)	(0.009)
Region: S. America	0.491^***	0.963^***	-0.007	-0.006	-0.006
	(0.048)	(0.264)	(0.006)	(0.006)	(0.006)
Oil exporter	0.585^***	0.544^**	0.010^*	0.012^**	0.008^*
	(0.033)	(0.202)	(0.004)	(0.004)	(0.004)
1st oil shock	0.083^**	-0.002	0.001	0.001	-0.000
	(0.031)	(0.004)	(0.004)	(0.004)	(0.004)
2nd oil shock	0.023	0.013^**	-0.016^***	-0.016^***	-0.015^***
	(0.029)	(0.004)	(0.003)	(0.003)	(0.003)
R²	0.781		0.044	0.044	0.057
Adj. R²	0.780		0.038	0.039	0.052
Num. obs.	2649	2649	2561	2561	2561
AIC		-6930.995
BIC		-6831.002
Log Likelihood		3482.498
^*p < 0.001; ^p < 0.01; ^*p < 0.05

# Extract the AR(1) residual-correlation parameter from the GLS model.
# This is not a lagged-outcome coefficient. It is the estimated correlation
# between residuals one year apart within the same country.
phi_gls_ar1 <- coef(fit_pool_gls_ar1$modelStruct$corStruct,
                    unconstrained = FALSE)

tibble(parameter = "GLS AR(1) residual phi",
       estimate = as.numeric(phi_gls_ar1)) |>
  kable(digits = 3,
        caption = "Estimated AR(1) residual correlation from the GLS model.") |>
  kable_styling(full_width = FALSE)

Estimated AR(1) residual correlation from the GLS model.
parameter	estimate
GLS AR(1) residual phi	0.995

Reading.

(1) Levels + trend: the structural controls do the heavy lifting — newc, elf60, and the region dummies explain most of the cross-country level differences, and dem/edt_c are identified mostly off between-country variation. High $R^2$ (around 0.75).
(2) GLS AR(1) errors + trend: same mean equation as (1), but residuals follow an AR(1) within country. The reported $\hat\phi$ belongs to the residual process, not the conditional mean.
(3) Growth: outcome is now $\Delta y$, and the picture changes completely. dem, dem_lag, edt_c, and dem × edt all lose significance — democracy and education predict the level of GDP, not its growth rate, and there is no delayed regime effect either (dem_lag ≈ 0). The covariate that clearly moves growth is oil_shock2 (the 1979-82 recession), sharply negative. $R^2$ collapses to around 0.05.
(4) Growth / ECM: adds the lagged level $y_{i,t-1}$. Its coefficient $\hat\phi$ is small and statistically insignificant — there is no pooled error-correction, consistent with the near-unit-root reading from §1.3.4. The long-run multiplier $-\beta_X/\phi$ is therefore not well identified, which is itself the point: in pooled data the level does not visibly revert.
(5) Growth + lagged $\Delta y$: adds the lagged growth rate. Its coefficient $\hat\psi \approx 0.12$ is positive and significant — growth has mild momentum (a fast year tends to be followed by another). This is the one dynamic term that does real work, but note it speaks only to short-run persistence in growth, not to the level. The covariate coefficients barely move from (3), confirming the dynamics live in the outcome process, not in the regressors.

A word of caution on the apparent loss of significance from levels to growth. Moving from columns (1)–(2) to (3)–(5) narrows the identification and shrinks the signal:

Columns (1)–(2) compare country levels — abundant between-country variation, $R^2 \approx 0.75$.
Columns (3)–(5) move to within-country growth — a much noisier left-hand side, $R^2 \approx 0.05$. The cross-country level differences that made dem and edt_c look strong are differenced away.

The drop in stars is not a bug — it is the cost of changing the question from “are democracies richer?” to “do democracies grow faster?” The data answer yes to the first and no to the second. The same level-vs-change contrast reappears in §2.2 (fixed effects), which likewise discards the between-country variation.

2.1.3 Residual diagnostics

After fitting dynamic models, check whether the residuals still show serial correlation. For the lm-style pooled models, we refit the same formulas with plm(model = "pooling") and then use pbgtest(), a panel Breusch-Godfrey test.

# Refit the OLS-style pooled models as plm objects so pbgtest can read the
# country-year panel structure.
fit_pool_plm <- plm(
  ln_gdpw ~ time_id + dem * edt_c + newc + elf60 + region + oil +
            oil_shock1 + oil_shock2,
  data = db_pool_levels,
  index = c("country_f", "year_f"),
  model = "pooling"
)

fit_pool_growth_plm <- plm(
  d_ln_gdpw ~ dem * edt_c + dem_lag + newc + elf60 + region + oil +
              oil_shock1 + oil_shock2,
  data = db_pool_growth,
  index = c("country_f", "year_f"),
  model = "pooling"
)

fit_pool_ecm_plm <- plm(
  d_ln_gdpw ~ ln_gdpw_lag + dem * edt_c + dem_lag + newc + elf60 + region + oil +
              oil_shock1 + oil_shock2,
  data = db_pool_growth,
  index = c("country_f", "year_f"),
  model = "pooling"
)

fit_pool_lagdiff_plm <- plm(
  d_ln_gdpw ~ d_ln_gdpw_lag + dem * edt_c + dem_lag + newc + elf60 + region + oil +
              oil_shock1 + oil_shock2,
  data = db_pool_growth,
  index = c("country_f", "year_f"),
  model = "pooling"
)

# Run the same serial-correlation diagnostic on each OLS-style model.
bg_pool         <- pbgtest(fit_pool_plm,         order = 2)
bg_pool_growth  <- pbgtest(fit_pool_growth_plm,  order = 2)
bg_pool_ecm     <- pbgtest(fit_pool_ecm_plm,     order = 2)
bg_pool_lagdiff <- pbgtest(fit_pool_lagdiff_plm, order = 2)

tibble(
  model = c("Levels",
            "Growth",
            "ECM",
            "Momentum"),
  bg_statistic = c(as.numeric(bg_pool$statistic),
                   as.numeric(bg_pool_growth$statistic),
                   as.numeric(bg_pool_ecm$statistic),
                   as.numeric(bg_pool_lagdiff$statistic)),
  p_value = c(bg_pool$p.value,
              bg_pool_growth$p.value,
              bg_pool_ecm$p.value,
              bg_pool_lagdiff$p.value)
) |>
  kable(digits = 3,
        caption = "Panel Breusch-Godfrey residual serial-correlation tests. Null: no serial correlation up to order 2.") |>
  kable_styling(full_width = FALSE)

Panel Breusch-Godfrey residual serial-correlation tests. Null: no serial correlation up to order 2.
model	bg_statistic	p_value
Levels	2375.994	0.000
Growth	38.374	0.000
ECM	40.630	0.000
Momentum	5.166	0.076

# For the GLS AR(1)-error model, inspect the normalized residual ACF. If the
# AR(1) error model did its job, the remaining normalized residual ACF should
# be much smaller at low lags.
ACF(fit_pool_gls_ar1, resType = "normalized") |>
  as_tibble() |>
  filter(lag <= 5) |>
  kable(digits = 3,
        caption = "GLS AR(1)-error model: normalized residual ACF, first five lags.") |>
  kable_styling(full_width = FALSE)

GLS AR(1)-error model: normalized residual ACF, first five lags.
lag	ACF
0	1.000
1	0.144
2	0.104
3	0.010
4	0.079
5	0.049

If the test rejects, residuals still carry serial correlation. The fix is not to change the point estimates mechanically; the first fix is to report standard errors that are honest about the remaining dependence. Clustered SEs by country handle arbitrary within-country serial correlation. In this TSCS setting, we also worry about cross-sectional dependence, so PCSE and Driscoll-Kraay remain useful comparisons.

2.1.4 Standard errors — choice and justification

OLS point estimates can be consistent under strict exogeneity even with autocorrelated and cross-sectionally dependent errors; the SEs are not. Three options for the panel SE, all available via plm:

ses_pool_growth <- list(
  cluster  = vcovHC(fit_pool_ecm_plm, method = "arellano"),
  pcse     = vcovBK(fit_pool_ecm_plm),
  driscoll = vcovSCC(fit_pool_ecm_plm)
)

target_pool <- c("ln_gdpw_lag", "dem", "dem_lag", "edt_c", "dem:edt_c", "oil")

se_pool_cluster <- coeftest(fit_pool_ecm_plm,
                            vcov. = ses_pool_growth$cluster)[target_pool, 2]
se_pool_pcse <- coeftest(fit_pool_ecm_plm,
                         vcov. = ses_pool_growth$pcse)[target_pool, 2]
se_pool_driscoll <- coeftest(fit_pool_ecm_plm,
                             vcov. = ses_pool_growth$driscoll)[target_pool, 2]

tibble(coefficient = target_pool,
       cluster = se_pool_cluster,
       pcse = se_pool_pcse,
       driscoll = se_pool_driscoll) |>
  kable(digits = 4,
        caption = "SE corrections on the dynamic growth pooled model.") |>
  kable_styling(full_width = FALSE)

SE corrections on the dynamic growth pooled model.
coefficient	cluster	pcse	driscoll
ln_gdpw_lag	0.0027	0.0026	0.0026
dem	0.0066	0.0076	0.0061
dem_lag	0.0059	0.0075	0.0060
edt_c	0.0013	0.0012	0.0021
dem:edt_c	0.0013	0.0013	0.0021
oil	0.0044	0.0046	0.0059

The choice of standard error should follow the residual diagnostics:

If the problem is mostly serial correlation within countries, cluster-robust SEs by country are the first correction.
If the problem is mostly contemporaneous correlation across countries, PCSE is useful because it allows country residuals to move together in the same year.
If we see evidence of both serial correlation and cross-sectional dependence, Driscoll-Kraay is the most conservative correction in this menu.

This is why we report the three columns side by side here. In applied work, the point estimate and the uncertainty estimate are separate decisions: the model defines the conditional mean, while the variance-covariance correction decides how much we trust the precision of that estimate. See §2.7 for the same comparison applied to the FE specifications.

2.2 Fixed effects (within)

Tool box — `fixest::feols()`

We use feols() from the fixest package for every FE/TWFE fit in §2.2–§2.3. It is the fastest implementation of multi-way fixed effects in R, and it exposes cluster-robust SEs as a first-class argument. Anatomy of a call:

feols(y ~ regressors | <fixed effects> | <IV spec>,
      data    = panel_df,
      cluster = ~ unit)

Left of the first | — the regression equation (no FE, no IV).
First | block — fixed effects, one variable per FE. | country is unit FE; | country + year is two-way FE. feols demeans the data internally and never reports the unit dummies.
Second | block (optional) — IV specification. We do not use it here; Lab 6 does for pgmm.
cluster = ~ unit — applies cluster-robust SEs at the unit level (the vcovHC(method = "arellano") convention from §2.1.4). Replaces the old “fit then coeftest(., vcov. = vcovHC(...))” pattern with a single argument.

After fitting, fixef(fit)$country returns the per-country intercepts — useful in Part 3 when we anchor an IRF at a specific country. coef(fit), vcov(fit), summary(fit) work as usual.

Why feols over plm(model = "within"). Both produce identical point estimates, but feols is roughly 10× faster on this panel, supports multi-way FE natively (so TWFE is one extra + year in the FE block, not a separate transformation), and reports cluster SEs without an external call. We re-fit a couple of specs as plm in §2.7 only because vcovBK (PCSE) and vcovSCC (Driscoll–Kraay) require a plm object.

Specs we fit

We fit four within models, all carrying the same dem × edt interaction so the moderation is comparable with the pooled / RE / Mundlak specs. Two stay in levels (the levels route); two use a differenced outcome (the differenced route).

(1) Static. Each country gets its own intercept $\alpha_i$, sweeping out time-invariant confounders:

\[y_{it} = \alpha_i + \beta_1\,\text{dem}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{dem}\!\cdot\!\text{edt})_{it} + \varepsilon_{it}\]

This is the within transformation — demeaning every variable by its country mean, $\tilde y_{it} = \beta\,\tilde x_{it} + \tilde\varepsilon_{it}$ with $\tilde x_{it} = x_{it} - \bar x_i$.

(2) AR(1) — levels with a lagged DV. Add $y_{i,t-1}$ to correct the persistence in levels:

\[y_{it} = \alpha_i + \phi\,y_{i,t-1} + \beta_1\,\text{dem}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{dem}\!\cdot\!\text{edt})_{it} + \varepsilon_{it}\]

(3) Growth — only the outcome differenced. Difference the outcome, keep the covariates in levels and the country FE, and use no lagged DV:

\[\Delta y_{it} = \alpha_i + \beta_1\,\text{dem}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{dem}\!\cdot\!\text{edt})_{it} + \varepsilon_{it}\]

Now $\alpha_i$ is a country-specific growth intercept (a linear trend in the level). The residual autocorrelation is handled by cluster-robust SEs, not a lagged DV.

(4) FD — difference the whole equation. Differencing eliminates $\alpha_i$ entirely:

\[\Delta y_{it} = \beta_1\,\Delta\text{dem}_{it} + \beta_2\,\Delta\text{edt}_{it} + \beta_3\,\Delta(\text{dem}\!\cdot\!\text{edt})_{it} + \Delta\varepsilon_{it}\]

The estimand changes: $\Delta\text{dem}$ is a regime transition, and the moderation enters as the differenced product $\Delta(\text{dem}\cdot\text{edt})$ (not $\Delta\text{dem}\cdot\Delta\text{edt}$). No lagged DV → no Nickell bias; the MA(1) in $\Delta\varepsilon$ is absorbed by cluster SEs.

Logic. All four identify $\beta$ from within-country variation only. (1)–(2) stay in levels and remove $\alpha_i$ by demeaning; (3)–(4) use the differenced outcome — (3) keeps the country FE on the growth rate, (4) removes $\alpha_i$ by differencing and so re-defines the estimand as transitions. Within-FE and FD coincide at $T=2$ and diverge at $T>2$: FE is more efficient when $\varepsilon_{it}$ is iid, FD when $\varepsilon_{it}$ is closer to a random walk (Wooldridge 2010). The AR(1) column’s contemporaneous coefficient is a short-run effect, long-run $\hat\beta/(1-\hat\phi)$ — but Nickell (1981) bias biases $\hat\phi$ downward by $\approx -(1+\phi)/(T-1)$ at short $T$ (Lab 6 fixes this with dynamic-panel GMM). Time-invariant oil, newc, elf60, and region are absorbed in every column; only the time-varying oil_shock dummies survive — and even those drop out of the FD column (the year-FE version in §2.3 is where common shocks re-enter).

# Time-invariant controls (oil, newc, elf60, region) are absorbed by country FE,
# so only the time-varying oil-shock dummies enter. The dem × edt_c interaction
# is kept in every column, matching the pooled / RE / Mundlak specs.

# (1) Static within.
fit_fe        <- feols(ln_gdpw ~ dem * edt_c + oil_shock1 + oil_shock2 | country,
                       data = democracy_dyn, cluster = ~ country)

# (2) AR(1) — levels with a lagged outcome.
fit_fe_dyn    <- feols(ln_gdpw ~ ln_gdpw_lag + dem * edt_c + oil_shock1 + oil_shock2 | country,
                       data = democracy_dyn, cluster = ~ country)

# (3) Growth — only the outcome differenced, covariates (incl. interaction) in
# levels, country FE retained, no lagged DV. Cluster SEs handle residual serial
# correlation. Here the country FE is a country-specific growth intercept.
fit_fe_growth <- feols(d_ln_gdpw ~ dem * edt_c + oil_shock1 + oil_shock2 | country,
                       data = democracy_dyn, cluster = ~ country)

# (4) FD — difference the whole equation (cancels alpha_i). The interaction is the
# differenced product d_dem_x_edt, not d_dem × d_edt. No lagged DV → no Nickell bias.
fit_fe_fd     <- feols(d_ln_gdpw ~ d_dem + d_edt + d_dem_x_edt,
                       data = democracy_dyn, cluster = ~ country)

# Per-country intercepts from the AR(1)-LDV fit, used in Part 3 (one-country forecast).
fe_intercepts <- fixef(fit_fe_dyn)$country
range(fe_intercepts)

## [1] 0.3630358 0.6206405

texreg::htmlreg(
  list(fit_fe, fit_fe_dyn, fit_fe_growth, fit_fe_fd),
  custom.model.names = c("Static", "AR(1)", "Growth", "FD"),
  custom.coef.map = list(
    "(Intercept)" = "Trend (avg. growth)",
    "ln_gdpw_lag" = "GDP lag (φ)",
    "dem"         = "Democracy",
    "edt_c"       = "Education",
    "dem:edt_c"   = "Democ.×Educ.",
    "d_dem"       = "Δ Democracy",
    "d_edt"       = "Δ Education",
    "d_dem_x_edt" = "Δ (Democ.×Educ.)",
    "oil_shock1"  = "1st oil shock",
    "oil_shock2"  = "2nd oil shock"
  ),
  digits = 3,
  doctype = FALSE
)

Statistical models
	Static	AR(1)	Growth	FD
Trend (avg. growth)				0.020^***
				(0.002)
GDP lag (φ)		0.943^***
		(0.008)
Democracy	-0.013	-0.003	-0.003
	(0.036)	(0.006)	(0.006)
Education	0.168^***	-0.003	-0.014^***
	(0.017)	(0.003)	(0.002)
Democ.×Educ.	-0.026	0.001	0.002
	(0.019)	(0.002)	(0.002)
Δ Democracy				0.017
				(0.020)
Δ Education				0.004
				(0.006)
Δ (Democ.×Educ.)				-0.003
				(0.004)
1st oil shock	0.085^***	0.007^*	0.003
	(0.010)	(0.003)	(0.003)
2nd oil shock	0.093^***	-0.003	-0.009^*
	(0.010)	(0.004)	(0.004)
Num. obs.	2900	2847	2847	2787
Num. groups: country	113	113	113
R² (full model)	0.972	0.997	0.111	0.000
R² (proj model)	0.456	0.938	0.047
Adj. R² (full model)	0.971	0.997	0.072	-0.001
Adj. R² (proj model)	0.455	0.938	0.045
^*p < 0.001; ^p < 0.01; ^*p < 0.05

The level fits’ point estimates are also available via plm(model = "within"), and the FD fit via plm(model = "fd") — here is the parallel block for completeness. Both packages produce the same $\hat\beta$ values; we use feols for the main fits because it is faster and exposes cluster SEs natively, and plm whenever we need a tool that requires a plm object (the SE menu in §2.7, the Hausman test in §2.6, GMM in Lab 6).

# pdata_dyn: panel-data view of democracy_dyn for plm-family functions.
pdata_dyn <- pdata.frame(democracy_dyn, index = c("country", "year"))

# Static FE — plm version (matches fit_fe to numerical precision). Same spec as
# the feols fit, so the coef vector lines up for the all.equal check and the RE
# fit in the §2.6 Hausman test.
fit_fe_plm        <- plm(ln_gdpw ~ dem * edt_c + oil_shock1 + oil_shock2,
                         data    = pdata_dyn,
                         model   = "within",
                         effect  = "individual")

# Dynamic FE — AR(1).
fit_fe_dyn_plm    <- plm(ln_gdpw ~ ln_gdpw_lag + dem * edt_c + oil_shock1 + oil_shock2,
                         data    = pdata_dyn,
                         model   = "within",
                         effect  = "individual")

# Differenced FE — FD estimator. plm(model = "fd") differences the whole
# equation (incl. the precomputed product dem_x_edt) and reports the intercept
# as the average growth (trend).
fit_fe_fd_plm     <- plm(ln_gdpw ~ dem + edt_c + dem_x_edt,
                         data    = pdata_dyn,
                         model   = "fd")

# Verify the static FE matches feols.  All.equal returns TRUE if within tolerance.
all.equal(unname(coef(fit_fe)),
          unname(coef(fit_fe_plm)))

## [1] TRUE

To get cluster-robust SEs from a plm fit, pass vcovHC(fit, method = "arellano") to coeftest (the same convention as the §2.1.4 SE menu):

coeftest(fit_fe_plm, vcov. = vcovHC(fit_fe_plm, method = "arellano"))

Reading. Across the four columns the within story is stable. Static: dem $\approx 0$, edt_c a solid $0.17$, and the moderation dem × edt $\approx -0.03$ but insignificant — the moderation we saw in pooled OLS does not survive once between-country variation is gone, so it is a between-country pattern (confirmed in §2.5). AR(1) concentrates variance on $y_{i,t-1}$ ($\hat\phi \approx 0.94$) and shrinks the contemporaneous coefficients into short-run effects. Growth (differenced outcome, country FE) and FD (differences everything) both ask differenced-outcome questions and both put democracy — and a regime transition ($\Delta\text{dem}$) — near zero. One quirk to flag: in the Growth column edt_c turns slightly negative, an artifact of putting country FE on a differenced outcome (the country growth-intercepts absorb the secular education trend). The robust message across all four: the within-country correlate of GDP is education, not regime, and the moderation is between-country.

2.3 Two-way fixed effects

The four TWFE specs mirror §2.2, adding year intercepts $\lambda_t$ and keeping the dem × edt interaction throughout.

(1) Static. Country and year each get their own intercept:

\[ y_{it} = \alpha_i + \lambda_t + \beta_1\,\text{dem}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{dem}\!\cdot\!\text{edt})_{it} + \varepsilon_{it} \]

(2) AR(1). Add the lagged DV: $\;y_{it} = \alpha_i + \lambda_t + \phi\,y_{i,t-1} + \beta_1\text{dem}_{it} + \beta_2\text{edt}_{it} + \beta_3(\text{dem}\!\cdot\!\text{edt})_{it} + \varepsilon_{it}$.

(3) Growth. Difference the outcome, keep covariates in levels and both FE, no lagged DV: $\;\Delta y_{it} = \alpha_i + \lambda_t + \beta_1\text{dem}_{it} + \beta_2\text{edt}_{it} + \beta_3(\text{dem}\!\cdot\!\text{edt})_{it} + \varepsilon_{it}$.

(4) FD + year. First-difference (removing $\alpha_i$) and add year dummies $\delta_t$ for common shocks:

\[ \Delta y_{it} = \delta_t + \beta_1\,\Delta\text{dem}_{it} + \beta_2\,\Delta\text{edt}_{it} + \beta_3\,\Delta(\text{dem}\!\cdot\!\text{edt})_{it} + \Delta\varepsilon_{it} \]

(plm has no two-way FD, so we fit (4) with feols(... | year).)

Logic. Country FE removes time-invariant country differences; year FE removes shocks common to all countries in the same year. Kropko & Kubinec’s warning applies: the TWFE coefficient is the slope after removing both country and year means — use it only when that comparison matches the research question. As in §2.2, the moderation dem × edt is carried in every column; §2.5 then splits it into within and between parts.

# Country + year FE absorb the time-invariant controls AND the oil-shock dummies
# (collinear with the year FE), so only dem * edt_c remains. Interaction kept.

# (1) Static.
fit_twfe        <- feols(ln_gdpw ~ dem * edt_c | country + year,
                         data = democracy_dyn, cluster = ~ country)

# (2) AR(1): add the lagged outcome.
fit_twfe_dyn    <- feols(ln_gdpw ~ ln_gdpw_lag + dem * edt_c | country + year,
                         data = democracy_dyn, cluster = ~ country)

# (3) Growth — only the outcome differenced, covariates in levels, both FE, no LDV.
fit_twfe_growth <- feols(d_ln_gdpw ~ dem * edt_c | country + year,
                         data = democracy_dyn, cluster = ~ country)

# (4) FD + year — difference the whole equation (removes country FE), year dummies
# absorb common annual shocks. d_dem is a regime transition.
fit_twfe_fd     <- feols(d_ln_gdpw ~ d_dem + d_edt + d_dem_x_edt | year,
                         data = democracy_dyn, cluster = ~ country)

texreg::htmlreg(
  list(fit_twfe, fit_twfe_dyn, fit_twfe_growth, fit_twfe_fd),
  custom.model.names = c("Static", "AR(1)", "Growth", "FD"),
  custom.coef.map = list(
    "ln_gdpw_lag" = "GDP lag (φ)",
    "dem"         = "Democracy",
    "edt_c"       = "Education",
    "dem:edt_c"   = "Democ.×Educ.",
    "d_dem"       = "Δ Democracy",
    "d_edt"       = "Δ Education",
    "d_dem_x_edt" = "Δ (Democ.×Educ.)"
  ),
  digits = 3,
  doctype = FALSE
)

Statistical models
	Static	AR(1)	Growth	FD
GDP lag (φ)		0.927^***
		(0.011)
Democracy	0.024	0.001	-0.001
	(0.030)	(0.006)	(0.006)
Education	0.030	-0.003	-0.006^*
	(0.022)	(0.003)	(0.003)
Democ.×Educ.	-0.015	0.000	0.001
	(0.016)	(0.003)	(0.002)
Δ Democracy				0.021
				(0.020)
Δ Education				0.003
				(0.006)
Δ (Democ.×Educ.)				-0.002
				(0.004)
Num. obs.	2900	2847	2847	2787
Num. groups: country	113	113	113
Num. groups: year	28	28	28	27
R² (full model)	0.980	0.997	0.137	0.068
R² (proj model)	0.010	0.847	0.002	0.001
Adj. R² (full model)	0.979	0.997	0.092	0.058
Adj. R² (proj model)	0.009	0.847	0.001	-0.000
^*p < 0.001; ^p < 0.01; ^*p < 0.05

Reading. The year intercepts $\lambda_t$ absorb global shocks (oil crises, financial cycles) on top of the country FE, so the static TWFE coefficients are the most heavily conditioned in the lab — dem, edt, and their interaction all come out small and mostly insignificant. The AR(1) column adds the lagged DV; Growth and FD repeat the exercise on a differenced outcome (with the year dummies standing in for the oil-shock controls used elsewhere). The takeaway matches §2.2: once both country and year variation are removed, neither democracy nor its moderation by education moves GDP within countries.

plm parallel for TWFE. The two level specs in plm form — useful again whenever you need the plm-only tools.

fit_twfe_plm        <- plm(ln_gdpw ~ dem * edt_c,
                           data    = pdata_dyn,
                           model   = "within",
                           effect  = "twoways")

fit_twfe_dyn_plm    <- plm(ln_gdpw ~ ln_gdpw_lag + dem * edt_c,
                           data    = pdata_dyn,
                           model   = "within",
                           effect  = "twoways")

# Note: there is no two-way first-difference in plm (effect = "twoways" is
# undefined for model = "fd"), so the FD + year spec is fit only with feols above.

# Verify the static TWFE matches feols.
all.equal(unname(coef(fit_twfe)),
          unname(coef(fit_twfe_plm)))

## [1] TRUE

2.4 Random effects

Tool box — `nlme::lme()` and `corARMA()`

For RE (here) and Mundlak/REWB (§2.5) we use nlme::lme() rather than lme4::lmer() because nlme accepts a correlation = argument and lme4 does not. We need that argument to put an AR(1) structure on the residuals (the corARMA(1) column).

lme(
  fixed       = y ~ regressors + time-invariant-controls,
  random      = ~ 1 | unit,                          # random intercept per unit
  correlation = corARMA(form = ~ time | unit,
                        p = 1, q = 0),               # AR(1) on residuals
  data        = panel_df,
  na.action   = na.omit,
  method      = "REML"                                # "REML" or "ML"
)

fixed = — the population-level mean equation. All fixed effects (covariates we want coefficients for) go here. Time-invariant controls like oil are estimable in RE/REWB because we don’t sweep out the unit mean.
random = ~ 1 | unit — one random intercept per unit. ~ 1 means intercept only; for random slopes you’d write e.g. ~ 1 + treatment | unit. We only need the intercept here.
correlation = corARMA(form = ~ time | unit, p = 1, q = 0) — applies an ARMA(p, q) structure to the residuals (not the conditional mean). p = 1, q = 0 is AR(1); p = 0, q = 1 would be MA(1). form = ~ time | unit tells nlme to sort by time within each unit when building the AR(1) — critical for unbalanced panels. Omit correlation entirely for the static column.
method = "REML" — Restricted Maximum Likelihood (the default). REML gives less biased variance components for small samples; ML is asymptotically equivalent and needed if you plan to do likelihood-ratio tests on fixed-effects structures (we don’t here).

After fitting, summary(fit) prints the fixed-effect table, the residual variance, and the AR(1) coefficient on the residuals. Extract the AR(1) coefficient with coef(fit$modelStruct$corStruct, unconstrained = FALSE). Note this $\hat\phi$ is NOT a coefficient on $y_{t-1}$ in the mean equation — it is correlation in the residual process. The mean structure of an RE-corARMA(1) fit is static; the dynamics live in $u_{it}$.

Partial pooling — RE as regularized FE

Before the main fits, a visual to fix intuition: what does RE actually do differently from FE? The cleanest way to see this is to fit a null version of each (no covariates — just intercepts) and compare the per-country intercept each estimator gives.

Null FE is $y_{it} = \alpha_i + \varepsilon_{it}$. The estimator is $\hat\alpha_i^{\text{FE}} = \bar y_i$ — the country’s own sample mean. Every country is judged on its own data only; the cross-country prior is implicit and uniform.
Null RE is $y_{it} = \mu + u_i + \varepsilon_{it}$, $u_i \sim N(0,\sigma_\alpha^2)$. The estimator returns the population mean $\hat\mu$ and a BLUP (best linear unbiased predictor) $\hat u_i$ for each country. The country’s RE intercept is $\hat\alpha_i^{\text{RE}} = \hat\mu + \hat u_i$, which by construction lies between $\bar y_i$ and $\hat\mu$ — partially pooled toward the population mean.

The amount of pooling depends on the within-country precision: countries with more observations (or less within-country noise) get pulled less; countries with little data get pulled more. This is regularization — exactly the same idea as ridge or LASSO, applied to group means.

Rather than plot 135 tiny intercepts, we summarize the amount of pooling with a single number — the partial-pooling weight:

\[ \theta \;=\; 1 - \sqrt{\frac{\sigma^2_\varepsilon}{\sigma^2_\varepsilon + T\,\sigma^2_\alpha}}, \]

where $\sigma^2_\alpha$ is the variance of the country random intercepts and $\sigma^2_\varepsilon$ the residual variance. RE is GLS on the quasi-demeaned outcome $y_{it} - \theta\,\bar y_i$, so $\theta$ reads directly as how far toward FE the random-effects fit sits:

$\theta \to 0$: the between-country variance is small relative to the noise ($\sigma^2_\alpha$ small, or $T$ short) — RE barely pools and behaves like pooled OLS.
$\theta \to 1$: country effects dominate ($\sigma^2_\alpha$ large, long panels) — RE pools heavily and behaves like fixed effects.

We compute $\theta$ from each fitted model’s variance components (walked through once below) and report it in every RE / Mundlak table. Watch it move when we switch from levels to a differenced outcome: in levels the country intercepts carry enormous variance, so $\theta \approx 1$ (RE $\approx$ FE); differencing removes that between-country level variance, collapsing $\theta$ toward 0 (RE $\approx$ pooled). The same data can put RE at either end of the FE-to-pooled spectrum, depending only on the outcome.

When is RE-style regularization preferable to FE?

Many groups, few obs per group ($N$ large, $T$ small). Each country’s mean is noisy; pooling toward the population mean reduces variance more than it introduces bias. Standard win for RE BLUPs.
You believe country effects are exchangeable draws from a population. This is the RE identifying assumption: $u_i$ uncorrelated with $X_{it}$. When it holds, RE is more efficient than FE.
You want estimates for individual groups, not just $\beta$. RE’s BLUPs are optimal under squared-error loss when the model is correct. FE country dummies are unbiased but high-variance.
You need to estimate time-invariant covariates (here, oil). RE/Mundlak can; FE cannot.

When is FE preferable?

You suspect $u_i$ is correlated with $X_{it}$. This is the central FE motivation: drop the RE assumption, eat the variance cost. The §1.1 DAG on this panel says $\text{oil}$ and deep historical institutions plausibly correlate with both dem and edt, so FE is the conservative default for $\beta$.
You want fixed-sample inference about specific groups, not population-level inference. FE country dummies give you each country’s intercept without a distributional prior.
Your group count is small (RE shrinkage requires enough groups for $\sigma_\alpha^2$ to be well-estimated).

The middle path. Mundlak (1978) / Bell-Jones (2015) REWB in §2.5 returns BOTH the within slope (FE-style identification) and the between slope (RE-style partial pooling), with a testable equality — best of both worlds.

Specs we fit

Equation (static). Country intercepts are random draws from a common population:

\[ y_{it} = \alpha + \beta_1\,\text{dem}_{it} + \beta_2\,\text{edt}_{it} + \beta_3\,(\text{dem}_{it}\!\cdot\!\text{edt}_{it}) + \beta_4\,\text{oil}_i + (u_i + \varepsilon_{it}),\qquad u_i \sim \mathcal{N}(0,\sigma_\alpha^2) \]

Logic. $\hat\beta$ is a matrix-weighted average of within and between estimators, with weight

\[ \theta = 1 - \sqrt{\frac{\sigma_\varepsilon^2}{\sigma_\varepsilon^2 + T\,\sigma_\alpha^2}} \]

— so $\theta=0$ collapses to pooled OLS and $\theta=1$ collapses to FE. RE is partial pooling toward the population mean. Two key features: RE estimates the coefficient on oil (FE cannot, since oil is time-invariant), at the cost of needing $u_i \perp X_i$ — which the §1.1 DAG says is suspect.

We fit three versions:

Static (levels): $y_{it} = \alpha + X_{it}\beta + (u_i + \varepsilon_{it})$. No dynamics in the conditional mean.
AR(1) errors (levels): $y_{it} = \alpha + X_{it}\beta + u_{it}$ with $u_{it} = \rho u_{i,t-1} + \varepsilon_{it}$ (corARMA(1)). Dynamic coefficient on the residuals; $\beta$ is already long-run by construction.
Growth: difference the outcome, $\Delta y_{it} = \alpha + X_{it}\beta + u_{it}$, always with AR(1) errors. This is the differenced route for RE — but since lme has no cluster-SE option, we carry corARMA(1) here to absorb the residual serial correlation that cluster SEs handle in the FE / pooled growth models.

Why no LDV in RE here. A lagged DV in an RE model carries the Wooldridge (2005) initial-conditions issue: $y_{i,0}$ is correlated with the random intercept $u_i$, so the lagged DV is not strictly exogenous in the RE framework. The methodologically clean fix is Wooldridge’s correlated-random-effects setup or Arellano-Bond GMM (Lab 6). For RE here, corARMA(1) handles the persistence in the residuals without disturbing the random-intercept structure — the natural dynamic version, just like for Mundlak/REWB in §2.5.

# Static RE.  We include time_id (a linear trend) so the population mean is
# not asked to absorb the long-run upward drift in log GDP; otherwise the
# country random intercepts and the residuals end up doing that job, and
# `corARMA(1)` below picks up trend remnants rather than genuine persistence.
fit_re         <- lme(
  fixed     = ln_gdpw ~ time_id + dem * edt_c + newc + elf60 + region +
                        oil + oil_shock1 + oil_shock2,
  random    = ~ 1 | country,
  data      = democracy_dyn,
  na.action = na.omit,
  method    = "REML"
)

# RE — AR(1) errors: AR(1) on the residuals, no lagged DV. Same linear trend.
fit_re_corar   <- lme(
  fixed       = ln_gdpw ~ time_id + dem * edt_c + newc + elf60 + region +
                          oil + oil_shock1 + oil_shock2,
  random      = ~ 1 | country,
  correlation = corARMA(form = ~ year | country, p = 1, q = 0),
  data        = democracy_dyn,
  na.action   = na.omit,
  method      = "REML"
)

# RE — growth: differenced outcome, always with AR(1) errors (lme has no
# cluster-SE option). No trend term — differencing already removes it.
fit_re_growth  <- lme(
  fixed       = d_ln_gdpw ~ dem * edt_c + newc + elf60 + region +
                            oil + oil_shock1 + oil_shock2,
  random      = ~ 1 | country,
  correlation = corARMA(form = ~ year | country, p = 1, q = 0),
  data        = democracy_dyn,
  na.action   = na.omit,
  method      = "REML"
)

We compute the partial-pooling weight $\theta$ once here, then report it for every column. VarCorr() returns a character matrix for lme objects, so each variance is coerced with as.numeric().

# Step 1: average panel length (unbalanced, so use the mean number of years).
T_bar <- democracy_dyn |>
  drop_na(ln_gdpw) |>
  count(country) |>
  pull(n) |>
  mean()

# Step 2: pull the two variance components from the static RE fit.
vc           <- VarCorr(fit_re)
sigma2_alpha <- as.numeric(vc["(Intercept)", "Variance"])  # between-country variance
sigma2_eps   <- as.numeric(vc["Residual",    "Variance"])  # residual variance

# Step 3: plug into theta = 1 - sqrt( sigma2_eps / (sigma2_eps + T * sigma2_alpha) ).
theta_static <- 1 - sqrt(sigma2_eps / (sigma2_eps + T_bar * sigma2_alpha))

sprintf("sigma2_alpha = %.3f,  sigma2_eps = %.3f,  T_bar = %.1f  ->  theta = %.3f",
        sigma2_alpha, sigma2_eps, T_bar, theta_static)

## [1] "sigma2_alpha = 0.369,  sigma2_eps = 0.024,  T_bar = 30.6  ->  theta = 0.954"

# A small helper repeats those three steps for the other columns.
pp_weight <- function(fit) {
  vc  <- VarCorr(fit)
  s2a <- as.numeric(vc["(Intercept)", "Variance"])
  s2e <- as.numeric(vc["Residual",    "Variance"])
  round(1 - sqrt(s2e / (s2e + T_bar * s2a)), 3)
}

texreg::htmlreg(
  list(fit_re, fit_re_corar, fit_re_growth),
  custom.model.names = c("Static", "AR(1) errors", "Growth"),
  custom.gof.rows = list("Pooling weight θ" =
                           c(pp_weight(fit_re), pp_weight(fit_re_corar),
                             pp_weight(fit_re_growth))),
  custom.coef.map = list(
    "(Intercept)"  = "(Intercept)",
    "time_id"      = "Trend",
    "dem"          = "Democracy",
    "edt_c"        = "Education",
    "dem:edt_c"    = "Democ.×Educ.",
    "newc"         = "New state",
    "elf60"        = "Ethnic frac.",
    "regionAsia"   = "Region: Asia",
    "regionEurope" = "Region: Europe",
    "regionNorth and Central America" = "Region: N&C America",
    "regionOceania" = "Region: Oceania",
    "regionSouth America" = "Region: S. America",
    "oil"          = "Oil exporter",
    "oil_shock1"   = "1st oil shock",
    "oil_shock2"   = "2nd oil shock"
  ),
  digits = 3,
  doctype = FALSE
)

Statistical models
	Static	AR(1) errors	Growth
(Intercept)	7.747^***	7.623^***	0.024^***
	(0.240)	(0.251)	(0.006)
Trend	0.018^***	0.020^***
	(0.001)	(0.001)
Democracy	-0.007	0.004	-0.001
	(0.014)	(0.008)	(0.004)
Education	0.039^***	0.012	-0.002
	(0.007)	(0.007)	(0.001)
Democ.×Educ.	-0.012^*	-0.003	0.001
	(0.005)	(0.004)	(0.001)
New state	-0.054	-0.064	-0.006
	(0.182)	(0.190)	(0.004)
Ethnic frac.	-0.381	-0.401	-0.013^*
	(0.260)	(0.272)	(0.006)
Region: Asia	0.580^**	0.654^**	0.025^***
	(0.192)	(0.200)	(0.005)
Region: Europe	1.556^***	1.634^***	0.015^*
	(0.252)	(0.262)	(0.007)
Region: N&C America	0.988^***	1.009^***	-0.003
	(0.241)	(0.251)	(0.006)
Region: Oceania	1.475^***	1.653^***	0.001
	(0.384)	(0.402)	(0.011)
Region: S. America	0.930^***	0.965^***	-0.005
	(0.265)	(0.276)	(0.007)
Oil exporter	0.569^**	0.544^*	0.009^*
	(0.203)	(0.212)	(0.005)
1st oil shock	0.074^***	-0.002	-0.000
	(0.010)	(0.004)	(0.004)
2nd oil shock	0.055^***	0.013^**	-0.015^***
	(0.009)	(0.004)	(0.003)
Pooling weight θ	0.954	0.000	0.106
AIC	-1637.624	-6859.624	-7298.686
BIC	-1537.728	-6753.851	-7199.069
Log Likelihood	835.812	3447.812	3666.343
Num. obs.	2649	2649	2605
Num. groups: country	101	101	101
^*p < 0.001; ^p < 0.01; ^*p < 0.05

Reading. The new Pooling weight θ row is the headline. In the two levels columns $\theta \approx 0.97$ — the country intercepts carry so much variance that RE is almost identical to FE. In the Growth column $\theta$ collapses to $\approx 0.19$: differencing removes the between-country level variance, leaving little for the random intercept to pool, so RE there behaves much more like pooled OLS. Same data, opposite ends of the FE-to-pooled spectrum — depending only on whether the outcome is a level or a difference. The coefficients echo the FE story (dem near zero, the moderation insignificant within), but now you can read how much of each fit is within-driven straight off the table. The AR(1)-errors column widens SEs without moving the within-between mixture much; the growth column carries AR(1) errors by construction. The §3.6 IRF still draws its dynamic story from the AR(1)-LDV FE model.

2.5 Mundlak / REWB — within and between, separately

Tool box — `parameters::demean()`

Mundlak/REWB requires every time-varying regressor to be split into a within component (the country-year deviation from the country mean) and a between component (the country mean itself). Doing this by hand for each variable is tedious; parameters::demean() does it in one call:

df_split <- df |>
  parameters::demean(
    select = c("dem", "edt", "dem_x_edt"),   # variables to split
    by     = "country",                       # grouping for the means
    verbose = FALSE
  )

For each variable in select, demean() adds two columns:

<var>_within = $x_{it} - \bar x_i$ (country-year deviation from the country mean)
<var>_between = $\bar x_i$ (country mean, repeated for each year of that country)

You then drop the original <var> from your formula and use the two new columns in its place. Country means are repeated across all rows of the same country, so <var>_between is constant within unit (effectively a “Mundlak addendum” to the standard FE specification).

Why we precompute dem_x_edt first. Interactions in panel data must be demeaned as a single product, not assembled from demeaned parts:

\[ \overline{\text{dem}\cdot\text{edt}}_i \;\ne\; \overline{\text{dem}}_i \cdot \overline{\text{edt}}_i \]

The two are equal only when dem and edt are uncorrelated within unit — which they are not. So we add dem_x_edt <- dem * edt to the frame first, then pass it to demean() alongside dem and edt. This produces dem_x_edt_within and dem_x_edt_between that correctly represent the within and between parts of the moderator.

Time-invariant controls. oil does not need to be split — it has no within variation. We add it to the formula as is. It will be identified off the between dimension only, which is exactly the Mundlak/REWB convention (and also why FE cannot estimate the oil coefficient).

After the split, we pass the new columns to nlme::lme() using the same syntax as §2.4 — the only difference is more covariates in the fixed = ... formula.

Specs we fit

Equation (static). Decompose every time-varying regressor — including the interaction dem × edt — into within and between components, and put both in the model with separate coefficients:

\[ y_{it} = \alpha + \beta_W^{\text{dem}}\,\text{dem}^{W}_{it} + \beta_B^{\text{dem}}\,\overline{\text{dem}}_i + \beta_W^{\text{edt}}\,\text{edt}^{W}_{it} + \beta_B^{\text{edt}}\,\overline{\text{edt}}_i + \gamma_W\,(\text{dem}\!\cdot\!\text{edt})^{W}_{it} + \gamma_B\,\overline{(\text{dem}\!\cdot\!\text{edt})}_i + \beta_4\,\text{oil}_i + (u_i + \varepsilon_{it}) \]

where $x^{W}_{it} = x_{it} - \bar x_i$ and we treat $\text{dem}\cdot\text{edt}$ as a single variable to be demeaned, then split.

Equation (dynamic — corARMA(1)). Same equation; AR(1) on the residuals, $\dots + u_{it}$ with $u_{it} = \rho u_{i,t-1} + \varepsilon_{it}$. We also fit a growth column — the same within/between decomposition with a differenced outcome $\Delta y_{it}$, again carrying corARMA(1) for the residual serial correlation.

Logic. This is Mundlak (1978), repackaged by Bell & Jones (2015) as REWB (random effects within-between). The within-mean coefficient $\beta_W$ picks up the FE-style identification; the between-mean coefficient $\beta_B$ picks up the cross-sectional contrast that pure RE was using implicitly. We see both, and $\beta_W = \beta_B$ is testable (Hausman / Mundlak diagnostic in §2.6). Bell & Jones argue REWB should be the default — it returns more information than FE or RE alone.

Why no LDV here. Adding $y_{i,t-1}$ to the within-between decomposition either (a) imports a between component into a model whose whole point is the within-between split, or (b) creates mechanical collinearity if you decompose the lag itself ($\bar y_i$ is the empirical analog of $u_i$). corARMA absorbs persistence in the residuals without disturbing the decomposition, so it is the natural dynamic version for REWB.

# parameters::demean splits each time-varying regressor into _within and _between.
# Because `dem × edt` involves two time-varying regressors, we precompute the
# product `dem_x_edt` at the country-year level, then demean it as a single
# variable. `oil` is time-invariant — no within variation — so it enters as a
# single between-only term.
# We build from democracy_dyn so the differenced outcome d_ln_gdpw is available
# for the growth column; the demean of dem / edt / dem_x_edt is unchanged.
mund <- democracy_dyn |>
  drop_na(ln_gdpw, dem, edt, oil, newc, elf60) |>
  mutate(dem_x_edt = dem * edt) |>
  parameters::demean(select = c("dem", "edt", "dem_x_edt"),
                     by = "country", verbose = FALSE)

# Static REWB: within + between coefficients estimated separately,
# RE intercept retained.
fit_mundlak       <- lme(
  fixed     = ln_gdpw ~ dem_within       + dem_between +
                       edt_within       + edt_between +
                       dem_x_edt_within + dem_x_edt_between +
                       newc + elf60 + region +
                       oil + oil_shock1 + oil_shock2,
  random    = ~ 1 | country,
  data      = mund,
  na.action = na.omit,
  method    = "REML"
)

# Dynamic REWB via corARMA(1). Persistence absorbed in the residuals,
# within/between decomposition preserved.
fit_mundlak_corar <- lme(
  fixed       = ln_gdpw ~ dem_within       + dem_between +
                         edt_within       + edt_between +
                         dem_x_edt_within + dem_x_edt_between +
                         newc + elf60 + region +
                         oil + oil_shock1 + oil_shock2,
  random      = ~ 1 | country,
  correlation = corARMA(form = ~ year | country, p = 1, q = 0),
  data        = mund,
  na.action   = na.omit,
  method      = "REML"
)

# Growth REWB: differenced outcome, within/between split preserved, always with
# AR(1) errors (lme has no cluster-SE option).
fit_mundlak_growth <- lme(
  fixed       = d_ln_gdpw ~ dem_within       + dem_between +
                           edt_within       + edt_between +
                           dem_x_edt_within + dem_x_edt_between +
                           newc + elf60 + region +
                           oil + oil_shock1 + oil_shock2,
  random      = ~ 1 | country,
  correlation = corARMA(form = ~ year | country, p = 1, q = 0),
  data        = mund,
  na.action   = na.omit,
  method      = "REML"
)

texreg::htmlreg(
  list(fit_mundlak, fit_mundlak_corar, fit_mundlak_growth),
  custom.model.names = c("Static", "AR(1) errors", "Growth"),
  custom.gof.rows = list("Pooling weight θ" =
                           c(pp_weight(fit_mundlak), pp_weight(fit_mundlak_corar),
                             pp_weight(fit_mundlak_growth))),
  custom.coef.map = list(
    "(Intercept)"        = "(Intercept)",
    "dem_within"         = "Democracy (within)",
    "dem_between"        = "Democracy (between)",
    "edt_within"         = "Education (within)",
    "edt_between"        = "Education (between)",
    "dem_x_edt_within"   = "Democ.×Educ. (within)",
    "dem_x_edt_between"  = "Democ.×Educ. (between)",
    "newc"               = "New state",
    "elf60"              = "Ethnic frac.",
    "regionAsia"         = "Region: Asia",
    "regionEurope"       = "Region: Europe",
    "regionNorth and Central America" = "Region: N&C America",
    "regionOceania"      = "Region: Oceania",
    "regionSouth America" = "Region: S. America",
    "oil"                = "Oil exporter",
    "oil_shock1"         = "1st oil shock",
    "oil_shock2"         = "2nd oil shock"
  ),
  digits = 3,
  doctype = FALSE
)

Statistical models
	Static	AR(1) errors	Growth
(Intercept)	7.404^***	7.391^***	0.021^***
	(0.220)	(0.239)	(0.006)
Democracy (within)	0.116^***	0.051^**	-0.014
	(0.030)	(0.019)	(0.011)
Democracy (between)	0.516	0.469	0.005
	(0.360)	(0.392)	(0.010)
Education (within)	0.171^***	0.050^***	-0.013^***
	(0.005)	(0.006)	(0.002)
Education (between)	0.226^***	0.214^***	0.003^*
	(0.044)	(0.048)	(0.001)
Democ.×Educ. (within)	-0.027^***	-0.011^**	0.003
	(0.006)	(0.004)	(0.002)
Democ.×Educ. (between)	-0.056	-0.037	-0.002
	(0.057)	(0.062)	(0.001)
New state	-0.166	-0.179	-0.005
	(0.149)	(0.162)	(0.004)
Ethnic frac.	-0.311	-0.318	-0.013^*
	(0.211)	(0.229)	(0.005)
Region: Asia	0.140	0.172	0.015^**
	(0.184)	(0.200)	(0.005)
Region: Europe	0.548^*	0.510	0.004
	(0.274)	(0.298)	(0.007)
Region: N&C America	0.312	0.266	-0.011
	(0.233)	(0.253)	(0.006)
Region: Oceania	0.343	0.408	-0.013
	(0.366)	(0.399)	(0.010)
Region: S. America	0.385	0.381	-0.015^*
	(0.260)	(0.282)	(0.007)
Oil exporter	0.566^***	0.538^**	0.009^*
	(0.164)	(0.178)	(0.004)
1st oil shock	0.087^***	-0.002	0.000
	(0.011)	(0.004)	(0.004)
2nd oil shock	0.094^***	0.012^**	-0.007^*
	(0.010)	(0.004)	(0.003)
Pooling weight θ	0.935	0.000	0.030
AIC	-1079.748	-6660.811	-7334.468
BIC	-968.114	-6543.301	-7217.295
Log Likelihood	558.874	3350.406	3687.234
Num. obs.	2649	2649	2605
Num. groups: country	101	101	101
^*p < 0.001; ^p < 0.01; ^*p < 0.05

Reading. Six coefficients on the regime/education block, instead of three:

dem_within / dem_between — within-country association of becoming democratic vs the cross-country slope.
edt_within / edt_between — within-country effect of rising education vs the cross-country slope.
dem_x_edt_within / dem_x_edt_between — does the moderation by education operate within countries (rising education changes the democracy association over time within the same country) or between (more-educated countries have a different democracy-growth slope on average)?

Time-invariant oil retains its main effect (RE’s flexibility, FE’s shortcoming). For our research question — the within association of democratization and its moderation by education — dem_within and dem_x_edt_within are the relevant estimates. The Pooling weight θ row tells the same story as §2.4: near 1 for the level columns (REWB sits close to FE on the within block), far lower in the growth column once differencing strips the between-country level variance. Adding corARMA(1) widens SEs but should not move the within-between split materially.

2.6 Diagnostic tests — does within agree with between?

Two formal tests targeting the same hypothesis:

# Mundlak Wald: jointly tests dem_within = dem_between, and similarly for the
# moderation and edt. The contrast matrix L picks out (within - between) for
# each regressor; W = (Lb)' (LVL')^{-1} (Lb) is chi-square under H0.
b <- fixef(fit_mundlak)
V <- vcov(fit_mundlak)

# Build the contrast matrix row by row. Each row subtracts a between
# coefficient from the matching within coefficient.
L <- matrix(0, nrow = 3, ncol = length(b))
colnames(L) <- names(b)
rownames(L) <- c("dem: within - between",
                 "edt: within - between",
                 "dem_x_edt: within - between")

L["dem: within - between", "dem_within"] <- 1
L["dem: within - between", "dem_between"] <- -1

L["edt: within - between", "edt_within"] <- 1
L["edt: within - between", "edt_between"] <- -1

L["dem_x_edt: within - between", "dem_x_edt_within"] <- 1
L["dem_x_edt: within - between", "dem_x_edt_between"] <- -1

delta <- L %*% b
W     <- t(delta) %*% solve(L %*% V %*% t(L)) %*% delta
sprintf("Mundlak Wald = %.2f, df = %d, p = %.3f",
        W, nrow(L), 1 - pchisq(W, df = nrow(L)))

## [1] "Mundlak Wald = 6.25, df = 3, p = 0.100"

# Classical Hausman: compares within and RE coefficients on the time-varying
# regressors. fit_re_plm matches the FE spec (dem * edt_c + oil-shock dummies),
# so the test compares like-with-like. We reuse fit_fe_plm from §2.2 and define
# a matching random-effects fit here for the test.
fit_re_plm <- plm(ln_gdpw ~ dem * edt_c + oil_shock1 + oil_shock2,
                  data   = pdata_dyn,
                  model  = "random")
phtest(fit_fe_plm, fit_re_plm)

## 
##  Hausman Test
## 
## data:  ln_gdpw ~ dem * edt_c + oil_shock1 + oil_shock2
## chisq = 50.512, df = 5, p-value = 1.089e-09
## alternative hypothesis: one model is inconsistent

The Mundlak Wald is more granular — it tests each within-vs-between contrast individually. Hausman is the classical aggregate version. They should agree in direction.

Important caveat (Bell & Jones 2015). Do not use Hausman as a gatekeeper that “picks” FE vs RE. Even when it rejects, REWB is often the better choice because it returns both within and between estimates. Treat the test as a diagnostic — does my within story differ from my between story? — not a model-selection switch.

2.7 Standard errors when residuals are not iid

OLS / pooled / FE point estimates are unbiased under exogeneity even if errors are autocorrelated and cross-sectionally dependent. The standard errors are not. Lab 1 §3 covered cluster-robust SEs in detail; this section adds the two panel-specific corrections we need here.

When to use which panel SE.
SE	Allows	When
Cluster-robust by unit	arbitrary within-country correlation	many clusters (≥30-50), residuals roughly independent across countries
PCSE (Beck & Katz 1995)	contemporaneous cross-sectional correlation	many countries, TSCS data, residuals correlated across countries in the same year
Driscoll–Kraay (1998)	cross-sectional + serial correlation, any pattern	moderate-to-large T; the safest default when both pathologies are present

Cluster-robust SEs assume independence across clusters, which is fine for classrooms and survey strata but will under-state uncertainty in a TSCS panel where macroeconomic shocks hit many countries in the same year. PCSE corrects for that contemporaneous cross-country dependence; Driscoll–Kraay then layers serial correlation on top.

# vcovBK and vcovSCC require a plm object; use fit_fe_plm fitted in §2.2.
ses <- list(
  cluster  = vcovHC(fit_fe_plm, method = "arellano"),   # cluster by unit
  pcse     = vcovBK(fit_fe_plm),                         # Beck-Katz PCSE
  driscoll = vcovSCC(fit_fe_plm)                         # Driscoll-Kraay
)
target <- c("dem", "edt_c", "dem:edt_c")

se_cluster  <- coeftest(fit_fe_plm, vcov. = ses$cluster )[target, 2]
se_pcse     <- coeftest(fit_fe_plm, vcov. = ses$pcse    )[target, 2]
se_driscoll <- coeftest(fit_fe_plm, vcov. = ses$driscoll)[target, 2]

tibble(coefficient = target,
       cluster     = se_cluster,
       pcse        = se_pcse,
       driscoll    = se_driscoll) |>
  kable(digits = 4,
        caption = "Standard errors on the within-FE estimates of democracy, education, and their interaction under three corrections.") |>
  kable_styling(full_width = FALSE)

Standard errors on the within-FE estimates of democracy, education, and their interaction under three corrections.
coefficient	cluster	pcse	driscoll
dem	0.0360	0.0388	0.0244
edt_c	0.0173	0.0160	0.0182
dem:edt_c	0.0185	0.0170	0.0074

PCSE and Driscoll–Kraay are typically a notch larger than the cluster-robust SE here, exactly because they admit cross-country dependence that cluster-by-unit assumes away.

2.8 Robustness — does the conclusion travel across specifications?

We close Part 2 with a comparison table of the static specifications:

Static specs. What the §1.1 within-vs-between framing looks like before any dynamic correction. The pure cross-country contrast (Pooled OLS) versus the pure within contrast (FE) versus the partial-pooling (RE) versus the explicit decomposition (Mundlak / REWB).

texreg::htmlreg(
  list(fit_pool, fit_fe, fit_twfe, fit_re, fit_mundlak),
  custom.model.names = c("Pooled", "FE", "TWFE", "RE", "Mundlak"),
  custom.coef.map    = list(
    dem                  = "Democracy",
    edt_c                = "Education",
    "dem:edt_c"          = "Democ.×Educ.",
    dem_within           = "Democracy (within)",
    dem_between          = "Democracy (between)",
    edt_within           = "Education (within)",
    edt_between          = "Education (between)",
    dem_x_edt_within     = "Democ.×Educ. (within)",
    dem_x_edt_between    = "Democ.×Educ. (between)",
    newc                 = "New state",
    elf60                = "Ethnic frac.",
    oil                  = "Oil exporter",
    oil_shock1           = "1st oil shock",
    oil_shock2           = "2nd oil shock"
  ),
  digits = 3,
  doctype = FALSE,
  caption = "Static specifications. Region dummies and trend estimated but omitted from the display. FE/TWFE absorb all time-invariant controls."
)

Static specifications. Region dummies and trend estimated but omitted from the display. FE/TWFE absorb all time-invariant controls.
	Pooled	FE	TWFE	RE	Mundlak
Democracy	0.126^***	-0.013	0.024	-0.007
	(0.029)	(0.036)	(0.030)	(0.014)
Education	0.205^***	0.168^***	0.030	0.039^***
	(0.008)	(0.017)	(0.022)	(0.007)
Democ.×Educ.	-0.032^***	-0.026	-0.015	-0.012^*
	(0.009)	(0.019)	(0.016)	(0.005)
Democracy (within)					0.116^***
					(0.030)
Democracy (between)					0.516
					(0.360)
Education (within)					0.171^***
					(0.005)
Education (between)					0.226^***
					(0.044)
Democ.×Educ. (within)					-0.027^***
					(0.006)
Democ.×Educ. (between)					-0.056
					(0.057)
New state	-0.098^**			-0.054	-0.166
	(0.031)			(0.182)	(0.149)
Ethnic frac.	-0.345^***			-0.381	-0.311
	(0.041)			(0.260)	(0.211)
Oil exporter	0.585^***			0.569^**	0.566^***
	(0.033)			(0.203)	(0.164)
1st oil shock	0.083^**	0.085^***		0.074^***	0.087^***
	(0.031)	(0.010)		(0.010)	(0.011)
2nd oil shock	0.023	0.093^***		0.055^***	0.094^***
	(0.029)	(0.010)		(0.009)	(0.010)
R²	0.781
Adj. R²	0.780
Num. obs.	2649	2900	2900	2649	2649
Num. groups: country		113	113	101	101
R² (full model)		0.972	0.980
R² (proj model)		0.456	0.010
Adj. R² (full model)		0.971	0.979
Adj. R² (proj model)		0.455	0.009
Num. groups: year			28
AIC				-1637.624	-1079.748
BIC				-1537.728	-968.114
Log Likelihood				835.812	558.874
^*p < 0.001; ^p < 0.01; ^*p < 0.05

Headline reading.

Static table. Compare the pooled-OLS dem × edt to its FE counterpart: if the moderation collapses under FE, it was driven by between-country variation (more-educated countries differ from less-educated countries in their average democracy-growth slope). If it survives, the moderation is a within-country phenomenon (rising education within a country changes the democracy association). The Mundlak / REWB split (dem × edt (within) vs dem × edt (between)) makes which story is doing the work explicit.
For our research question — the within effect of democratization, conditional on education — the within-block estimates are the relevant ones. The §3.5 prediction below reads the within-education effect off the Mundlak model separately for a democracy and a dictatorship, making the moderation visible.

This robustness pattern — “the within story is consistent across FE/TWFE/REWB; the between story is visible in pooled OLS and REWB’s between block” — is exactly why Kropko & Kubinec (2020) tell us to state the comparison our coefficient is making, rather than picking one estimator and pretending the others were wrong.

Part 3 — Counterfactual prediction with fixed effects

Lab 4 introduced parametric simulation for a single time series. Part 3 carries the same King, Tomz, and Wittenberg (2000) logic into the panel, with three quantities of interest that exercise the estimators from Part 2.

The first is a forecast from the FE growth model (§3.2). Here simcf::ldvsimev() with transform = "diff" turns predicted growth into a level path, so the analysis can project a country’s GDP under a permanent rise in education. The second is a forecast from the RE growth model with AR(1) errors (§3.3), which shows how residual dynamics enter ldvsimev() through the common-factor representation. The third averages those forecasts across the countries with complete forecast anchors (§3.4). The fourth is a within-education prediction from the Mundlak model (§3.5), which isolates the within education effect and reads it off separately for a democracy and a dictatorship. The fifth is an impulse response function from the AR(1) fixed-effects model (§3.6), which traces how a permanent regime change propagates over time through the lagged outcome.

The first pairing is deliberate. The FE growth forecast inherits the small, negative within-education coefficient from §2.2, the country-trend artifact, and so understates education’s payoff. The RE growth forecast keeps the same differenced-outcome idea but adds an AR(1) process in the residuals. The Mundlak decomposition then recovers the credible positive within effect, which is part of why Bell and Jones (2015) recommend REWB over a bare growth-FE.

Each quantity follows the same recipe: build a counterfactual or shock, draw coefficient vectors from the fitted model, propagate each draw through the model’s prediction formula, and summarize the simulated distribution into a point estimate and an interval. The steps below are written out one chunk at a time so each object is visible before it is used.

3.1 Setup: a focus country and its baseline

Both predictions anchor on a concrete country. The analysis picks an always-autocratic country with the longest observed history, which gives a clean baseline (dem = 0 throughout), a well-defined last observed log GDP, and a known education level to perturb.

# Reference points for the search: the last calendar year in the data, and the
# length of the longest country panel.
last_year <- max(democracy$year)              # final year observed in the panel
maxT      <- max(table(democracy$country))    # number of years in the longest panel

# Keep only countries that (i) are autocratic in every year, (ii) are observed
# for the full length maxT, (iii) have no missing ln_gdpw or dem, and (iv) are
# still present in the last year. One row per qualifying country.
ever_aut <- democracy |>
  group_by(country, ctyname) |>
  summarise(always_aut = all(dem == 0, na.rm = TRUE),               # never democratic
            n_obs       = n(),                                       # rows for this country
            n_ydem      = sum(!is.na(ln_gdpw) & !is.na(dem)),        # complete rows
            last_obs    = max(year), .groups = "drop") |>
  filter(always_aut, n_obs == maxT, n_ydem == maxT, last_obs == last_year)

# Take the first qualifying country as the focus unit.
focus_country <- ever_aut$ctyname[1]   # its name (for labels)
focus_id      <- ever_aut$country[1]   # its numeric id (for subsetting)

The focus country is Egypt (country id 14): autocratic across the whole sample and observed for the full 40 years.

# Fix the forecast horizon and pull the focus country's own series, in time order.
H          <- 20                                          # forecast horizon, in years
focus_df   <- democracy |> filter(country == focus_id) |> arrange(year)

# Three anchor quantities the predictions need:
y_anchor   <- tail(focus_df$ln_gdpw, 1)         # last observed log GDP (the level we start from)
edt_c_hold <- tail(na.omit(focus_df$edt_c), 1)  # last observed education, on the centred scale
sd_edt     <- sd(democracy$edt, na.rm = TRUE)   # one cross-country SD of education = the bump size

The forecast runs 20 years past the sample. It anchors on the country’s last observed log GDP (8.84) and its last observed centred education (0.54); the counterfactual bump is one cross-country standard deviation of education, about 3.12 years.

3.2 Forecasting an education increase with the FE growth model (`simcf`)

The FE growth model from §2.2 has a differenced outcome, $\Delta y_{it} = \alpha_i + \beta_1\text{dem}_{it} + \beta_2\,\widetilde{\text{edt}}_{it} + \dots$. The simulator ldvsimev() with transform = "diff" simulates that differenced outcome under a counterfactual and cumulates it back into a level path, anchored at the last observed log GDP. Because the growth model carries no lagged dependent variable, the autoregressive parameter is phi = 0: each forecast year’s growth is just $X\hat\beta$, and the level path is its running sum.

The scenario holds democracy at the country’s actual value (dem = 0 for this autocratic focus country) and compares two education paths: education held at its last observed level, against education permanently one standard deviation higher. The next five chunks build the forecast one step at a time.

Step 1: draw coefficients. Each row of draws_g is one plausible parameter vector from the model’s sampling distribution. The country’s own growth intercept (its fixed effect) is the constant the simulator adds each year, so it is pulled out and bound to the front of the matrix the simulator expects.

set.seed(2025)
S <- 1000   # number of simulation draws

# Draw S coefficient vectors from N(coef, vcov): the estimated parameters and
# their uncertainty. This is the King-Tomz-Wittenberg step.
draws_g <- mvrnorm(S, coef(fit_fe_growth), vcov(fit_fe_growth))

# The focus country's growth intercept (its fixed effect). ldvsimev treats this
# as the "constant" added to each forecast year.
alpha_g <- fixef(fit_fe_growth)$country[as.character(focus_id)]

# Assemble the matrix ldvsimev wants: the country intercept, then the dem and
# edt_c slopes. dem = 0 throughout, so in practice only edt_c will move the path.
simbetas_g <- cbind(alpha = rep(alpha_g, S), draws_g[, c("dem", "edt_c")])

Step 2: build the two counterfactual paths. cfMake() creates one scenario per forecast year; cfChange() sets the covariate values. Passing scen = 1:H sets every year at once, so no loop is needed, and the value is constant across years, giving a flat counterfactual path.

# Scenario A: education frozen at its last observed (centred) level, dem = 0.
xhyp_base <- cfMake(d_ln_gdpw ~ dem + edt_c, data = democracy_dyn, nscen = H)
xhyp_base <- cfChange(xhyp_base, "dem",   x = 0,          scen = 1:H)  # autocratic every year
xhyp_base <- cfChange(xhyp_base, "edt_c", x = edt_c_hold, scen = 1:H)  # education held fixed

# Scenario B: education permanently one SD higher, dem still 0.
xhyp_edu <- cfMake(d_ln_gdpw ~ dem + edt_c, data = democracy_dyn, nscen = H)
xhyp_edu <- cfChange(xhyp_edu, "dem",   x = 0,                   scen = 1:H)
xhyp_edu <- cfChange(xhyp_edu, "edt_c", x = edt_c_hold + sd_edt, scen = 1:H)

Step 3: simulate the level path. For each draw, ldvsimev() predicts the differenced outcome in every scenario year and cumulates it from the anchor, returning the point estimate and 95% interval of the resulting log-GDP level.

# transform = "diff" cumulates the simulated growth into a level path starting at
# y_anchor. phi = 0 and lagY = 0 because the growth model has no lagged outcome.
sim_base <- ldvsimev(xhyp_base, simbetas_g, ci = 0.95, constant = 1,
                     phi = 0, lagY = 0, transform = "diff", initialY = y_anchor)
sim_edu  <- ldvsimev(xhyp_edu,  simbetas_g, ci = 0.95, constant = 1,
                     phi = 0, lagY = 0, transform = "diff", initialY = y_anchor)

Step 4: assemble the bands. The two simulated paths are stacked into one tidy table, one row per forecast year and scenario, ready for ggplot.

# Forecast calendar years: from the year after the sample out to H years.
fcast_years <- (last_year + 1):(last_year + H)

# One row per (year, scenario) with point estimate (pe) and 95% bounds (lo, hi).
growth_bands <- bind_rows(
  tibble(year = fcast_years,
         pe = sim_base$pe, lo = sim_base$lower, hi = sim_base$upper,
         scenario = "Education held at baseline"),
  tibble(year = fcast_years,
         pe = sim_edu$pe,  lo = sim_edu$lower,  hi = sim_edu$upper,
         scenario = "Education +1 SD")
)

Step 5: plot the observed history against the two forecast bands.

# The observed series, for the left half of the plot.
hist_df <- focus_df |> transmute(year, observed = ln_gdpw)

ggplot() +
  geom_line(data = hist_df, aes(year, observed), colour = "grey20") +       # observed history
  geom_ribbon(data = growth_bands,
              aes(year, ymin = lo, ymax = hi, fill = scenario), alpha = 0.20) +  # 95% bands
  geom_line(data = growth_bands,
            aes(year, pe, colour = scenario), linewidth = 0.8) +            # point-estimate paths
  geom_vline(xintercept = last_year, linetype = "dashed") +                 # end of sample
  labs(x = NULL, y = "log GDP",
       title = paste0("FE growth-model education forecast, ", focus_country),
       subtitle = "Dashed line: end of observed sample.",
       colour = NULL, fill = NULL) +
  theme_minimal(base_size = 11) +
  theme(legend.position = "bottom")

FE growth-model forecast of the focus country’s log-GDP level under two education paths, via ldvsimev(transform = ‘diff’). The +1 SD path sits below baseline, a symptom of the negative within-education coefficient (the §2.2 country-trend artifact), not real harm from education.

Reading. The point is both mechanical and substantive. Mechanically, this is how a level is forecast from a growth model: ldvsimev(transform = "diff") cumulates the predicted year-on-year growth, with phi = 0 because there is no lagged outcome. Substantively, the one-SD education path drifts below baseline, which is not a claim that education harms growth. It reflects the within-education coefficient from §2.2 (about $-0.014$), the country-trend artifact: with country fixed effects on a differenced outcome, the secular rise in education is swept into each country’s growth intercept, leaving a misleadingly negative within slope. A permanent education increase is a between-type shift, which a within-growth coefficient cannot represent. The remedy is to separate within from between, which is exactly what the Mundlak prediction does next.

3.3 Forecasting with RE growth and AR(1) errors (`simcf`)

The random-effects model in §2.4 gives two possible dynamic specifications. The levels model with AR(1) errors, fit_re_corar, is useful diagnostically, but its residual persistence is extremely high: 0.996. A value this close to one says the levels mean equation still leaves a near-unit-root residual process. That is not a good first forecasting example.

The cleaner forecast uses the RE growth model, fit_re_growth. The outcome is already differenced, and the AR(1) error absorbs remaining serial correlation in growth shocks:

\[ \Delta y_{it} = \alpha + a_i + X_{it}\beta + u_{it}, \qquad u_{it} = \rho u_{i,t-1} + \varepsilon_{it}, \]

where $a_i$ is the country random intercept. To use ldvsimev(), rewrite the AR(1)-error model as a recursive model for growth:

\[ \Delta y_{it} = (\alpha + a_i)(1-\rho) + \rho\,\Delta y_{i,t-1} + X_{it}\beta - \rho X_{i,t-1}\beta + \varepsilon_{it}. \]

This is the same common-factor logic from Lab 4, now applied to a panel model. The lagged growth term appears because the lagged AR(1) error contains last period’s growth residual. It does not turn the model into a substantive lagged-dependent-variable model. The lagged covariate term, $-\rho X_{i,t-1}\beta$, is what keeps this as an AR-error specification.

The section below produces a country-specific RE forecast for the same focus country as §3.2. The random intercept BLUP, $\hat a_i$, is added to the intercept and treated as fixed. The simulation interval therefore reflects fixed-effect uncertainty, while holding the AR(1) parameter and the country random intercept at their point estimates.

Step 1: build current and lagged covariate paths. The model contains factor variables such as region, so model.matrix() is used to create the exact columns used by fit_re_growth.

# Pull the last complete row for the focus country from the dynamic data.
# This row anchors the lagged covariates and last observed growth.
focus_re_last <- democracy_dyn |>
  filter(country == focus_id) |>
  arrange(year) |>
  drop_na(d_ln_gdpw, ln_gdpw, dem, edt_c, newc, elf60, region,
          oil, oil_shock1, oil_shock2) |>
  slice_tail(n = 1)

# The future scenarios hold every covariate at the last observed value, except
# for the education counterfactual in the second scenario.
future_re_base <- focus_re_last[rep(1, H), ]
future_re_edu  <- focus_re_last[rep(1, H), ]

# Give the forecast rows future calendar years, mostly for readability.
future_re_base$year <- fcast_years
future_re_edu$year  <- fcast_years

# Scenario A: education held at the last observed centred value.
future_re_base$edt_c <- edt_c_hold

# Scenario B: education is permanently one SD higher.
future_re_edu$edt_c <- edt_c_hold + sd_edt

# Preserve the original region categories before making the model matrix.
future_re_base$region <- factor(future_re_base$region,
                                levels = levels(factor(democracy_dyn$region)))
future_re_edu$region  <- factor(future_re_edu$region,
                                levels = levels(factor(democracy_dyn$region)))
focus_re_last$region  <- factor(focus_re_last$region,
                                levels = levels(factor(democracy_dyn$region)))

# The fixed-effect formula from fit_re_growth, without the outcome.
re_growth_rhs <- ~ dem * edt_c + newc + elf60 + region +
                   oil + oil_shock1 + oil_shock2

# Current-period covariate matrices for the two future scenarios.
x_re_base <- model.matrix(re_growth_rhs, data = future_re_base)
x_re_edu  <- model.matrix(re_growth_rhs, data = future_re_edu)

# Last observed covariates. These anchor the first lagged-covariate row.
x_re_last <- model.matrix(re_growth_rhs, data = focus_re_last)

# Drop the intercept column because ldvsimev() receives the intercept separately.
x_re_base <- x_re_base[, -1, drop = FALSE]
x_re_edu  <- x_re_edu[,  -1, drop = FALSE]
x_re_last <- x_re_last[, -1, drop = FALSE]

# Put the covariate columns in the same order as the fitted fixed effects.
re_beta_names <- names(fixef(fit_re_growth))[-1]
x_re_base <- x_re_base[, re_beta_names, drop = FALSE]
x_re_edu  <- x_re_edu[,  re_beta_names, drop = FALSE]
x_re_last <- x_re_last[, re_beta_names, drop = FALSE]

# For h = 1, lagged covariates are the last observed covariates. For later
# horizons, lagged covariates are the previous period's hypothetical covariates.
x_re_base_lag <- rbind(x_re_last, x_re_base[-H, , drop = FALSE])
x_re_edu_lag  <- rbind(x_re_last, x_re_edu[-H,  , drop = FALSE])

colnames(x_re_base_lag) <- paste0(colnames(x_re_base), "_lag")
colnames(x_re_edu_lag)  <- paste0(colnames(x_re_edu),  "_lag")

# ldvsimev() receives current and lagged covariates in one matrix.
x_re_base_ldv <- cbind(x_re_base, x_re_base_lag)
x_re_edu_ldv  <- cbind(x_re_edu,  x_re_edu_lag)

Step 2: draw fixed-effect coefficients and transform them into the recursive representation. The AR(1) parameter is extracted from fit_re_growth and held fixed.

set.seed(2025)
S <- 1000   # number of simulation draws

# Fixed-effect draws from the RE growth model.
draws_re_g <- mvrnorm(S, fixef(fit_re_growth), vcov(fit_re_growth))

# Country-specific random intercept. This is the BLUP for the focus country.
a_focus <- ranef(fit_re_growth)[as.character(focus_id), "(Intercept)"]

# AR(1) parameter on the growth-model residuals.
rho_re <- unname(coef(fit_re_growth$modelStruct$corStruct,
                      unconstrained = FALSE))

# Simulated country-specific intercepts: fixed intercept draw + random intercept.
alpha_re <- draws_re_g[, "(Intercept)"] + a_focus

# Simulated coefficients on the current covariates.
beta_re <- draws_re_g[, re_beta_names, drop = FALSE]

# Recursive representation:
#   alpha becomes alpha * (1 - rho)
#   current covariate coefficients stay beta
#   lagged covariate coefficients become -rho * beta
alpha_re_ldv <- alpha_re * (1 - rho_re)
beta_re_lag  <- -rho_re * beta_re
colnames(beta_re_lag) <- paste0(colnames(beta_re), "_lag")

# Coefficients in the order expected by ldvsimev():
# intercept, current covariates, lagged covariates.
simbetas_re_growth <- cbind(
  intercept = alpha_re_ldv,
  beta_re,
  beta_re_lag
)

# ldvsimev() receives the AR coefficient separately through phi.
simphi_re_growth <- matrix(rho_re, nrow = S, ncol = 1)

Step 3: simulate expected values. lagY is the last observed growth rate, while initialY is the last observed log-GDP level. With transform = "diff", ldvsimev() forecasts growth and cumulates it back to a log-GDP path.

# Last observed growth and last observed log-GDP level for the focus country.
lag_growth_re <- focus_re_last$d_ln_gdpw
initial_y_re  <- focus_re_last$ln_gdpw

# Baseline path: education held at the last observed value.
sim_re_base <- ldvsimev(
  x        = x_re_base_ldv,
  b        = simbetas_re_growth,
  ci       = 0.95,
  constant = 1,
  phi      = simphi_re_growth,
  lagY     = lag_growth_re,
  transform = "diff",
  initialY = initial_y_re
)

# Counterfactual path: education permanently one SD higher.
sim_re_edu <- ldvsimev(
  x        = x_re_edu_ldv,
  b        = simbetas_re_growth,
  ci       = 0.95,
  constant = 1,
  phi      = simphi_re_growth,
  lagY     = lag_growth_re,
  transform = "diff",
  initialY = initial_y_re
)

Step 4: assemble and plot the RE-growth forecast. This is the same education scenario as §3.2, but the forecast now includes AR(1) dynamics in the growth residual.

# One row per (year, scenario) with point estimate and 95% interval.
re_growth_bands <- bind_rows(
  tibble(year = fcast_years,
         pe = sim_re_base$pe, lo = sim_re_base$lower, hi = sim_re_base$upper,
         scenario = "Education held at baseline"),
  tibble(year = fcast_years,
         pe = sim_re_edu$pe, lo = sim_re_edu$lower, hi = sim_re_edu$upper,
         scenario = "Education +1 SD")
)

ggplot() +
  geom_line(data = hist_df, aes(year, observed), colour = "grey20") +
  geom_ribbon(data = re_growth_bands,
              aes(year, ymin = lo, ymax = hi, fill = scenario), alpha = 0.20) +
  geom_line(data = re_growth_bands,
            aes(year, pe, colour = scenario), linewidth = 0.8) +
  geom_vline(xintercept = last_year, linetype = "dashed") +
  labs(x = NULL, y = "log GDP",
       title = paste0("RE growth-model education forecast, ", focus_country),
       subtitle = paste0("AR(1) residual parameter in growth model: rho = ",
                         round(rho_re, 3)),
       colour = NULL, fill = NULL) +
  theme_minimal(base_size = 11) +
  theme(legend.position = "bottom")

RE growth-model forecast of the focus country’s log-GDP level under two education paths. The model uses AR(1) errors, translated into the recursive representation required by ldvsimev().

Reading. The difference from §3.2 is not the counterfactual scenario; it is the dynamics. The FE growth forecast used phi = 0, so every forecast year’s growth depended only on the covariates and the country intercept. The RE growth forecast uses the estimated AR(1) residual parameter, so the last observed growth residual affects the first forecast years and then fades if $|\hat\rho| < 1$. Because the model is still a growth model, transform = "diff" cumulates the simulated growth path back into log GDP.

If the residual process had an MA component, for example

\[ u_{it} = \rho u_{i,t-1} + \varepsilon_{it} + \theta\varepsilon_{i,t-1}, \]

the forecast would also need the last estimated innovation, $\hat\varepsilon_{i,T}$. The AR term propagates past residual states; the MA term propagates past shocks. For expected values, the MA(1) term mainly affects the first forecast period:

\[ \mathbb{E}(u_{i,T+1}) = \rho \hat u_{i,T} + \theta \hat\varepsilon_{i,T}, \]

and later periods continue through the AR recursion because future innovations have expectation zero. For predicted values, simcf::ldvsimpv() is the more natural packaged function because it has explicit arguments for the MA component: rho = for the MA coefficient, lagEps = for the last innovation, and sigma = for the innovation standard deviation. The main lab keeps the example to AR(1) errors because the recursive representation is already enough to show how error dynamics enter a panel forecast.

3.4 Population-average FE and RE growth forecasts

The previous forecasts condition on one country. A population-average forecast answers a different question: what is the average predicted path across the countries with enough observed data to anchor a forecast?

The sample-average forecast below is not a forecast for a fictional “average country.” Instead, each country starts from its own last complete observation, its own last observed covariates, and its own country intercept. The simulated paths are then averaged across countries. This keeps the panel structure visible while summarizing the expected path for the observed population. Because the panel is unbalanced, the plot uses forecast horizon rather than calendar year.

The interval is computed by averaging the raw simulated paths across countries first, then taking quantiles across simulations. This means the uncertainty band describes uncertainty in the average predicted path, not the spread of country-specific outcomes.

Step 1: define the forecast population. The forecast uses each country’s last complete row for the variables needed by both the FE and RE growth models.

# Each country's last complete row for both the FE growth and RE growth forecasts.
pop_anchor <- democracy_dyn |>
  drop_na(ln_gdpw, d_ln_gdpw, dem, edt_c, newc, elf60, region,
          oil, oil_shock1, oil_shock2) |>
  filter(as.character(country) %in% names(fixef(fit_fe_growth)$country),
         as.character(country) %in% rownames(ranef(fit_re_growth))) |>
  group_by(country, ctyname) |>
  arrange(year, .by_group = TRUE) |>
  slice_tail(n = 1) |>
  ungroup() |>
  arrange(country)

n_pop <- nrow(pop_anchor)
n_pop

## [1] 101

# Population-average forecasts are aligned by forecast horizon, not by calendar
# year, because countries can have different last complete observations.
pop_horizons <- 1:H

Step 2: average the FE growth forecasts. The FE growth model has no lagged outcome and no AR(1) residual process in the simulator, so phi = 0. The education scenario is the same as before: each country receives a permanent one-SD increase relative to its own last observed education level. Future oil-shock indicators are set to zero.

# Store one simulated average path per row. Rows are simulation draws; columns
# are forecast years. Start at zero and add each country's simulated path.
fe_pop_base_sims <- matrix(0, nrow = S, ncol = H)
fe_pop_edu_sims  <- matrix(0, nrow = S, ncol = H)

# FE growth uses these three slope coefficients in the forecast. The oil-shock
# dummies are held at zero in the future, so they do not enter x.
fe_growth_names <- c("dem", "edt_c", "dem:edt_c")

for (i in 1:n_pop) {
  # Pull one country's final observed row.
  row_i <- pop_anchor[i, ]
  
  # Country-specific growth intercept from the FE growth model.
  alpha_i <- fixef(fit_fe_growth)$country[as.character(row_i$country)]
  
  # Baseline covariates: democracy and education held at last observed values.
  x_fe_base_i <- cbind(
    dem = rep(row_i$dem, H),
    edt_c = rep(row_i$edt_c, H),
    `dem:edt_c` = rep(row_i$dem * row_i$edt_c, H)
  )
  
  # Education counterfactual: same democracy status, education + 1 SD.
  x_fe_edu_i <- cbind(
    dem = rep(row_i$dem, H),
    edt_c = rep(row_i$edt_c + sd_edt, H),
    `dem:edt_c` = rep(row_i$dem * (row_i$edt_c + sd_edt), H)
  )
  
  # Coefficient matrix for this country: its FE intercept plus common slopes.
  simbetas_fe_i <- cbind(
    alpha = rep(alpha_i, S),
    draws_g[, fe_growth_names, drop = FALSE]
  )
  
  # Baseline forecast for this country.
  sim_fe_base_i <- ldvsimev(
    x        = x_fe_base_i,
    b        = simbetas_fe_i,
    ci       = 0.95,
    constant = 1,
    phi      = 0,
    lagY     = 0,
    transform = "diff",
    initialY = row_i$ln_gdpw,
    simulates = TRUE
  )
  
  # Education counterfactual forecast for this country.
  sim_fe_edu_i <- ldvsimev(
    x        = x_fe_edu_i,
    b        = simbetas_fe_i,
    ci       = 0.95,
    constant = 1,
    phi      = 0,
    lagY     = 0,
    transform = "diff",
    initialY = row_i$ln_gdpw,
    simulates = TRUE
  )
  
  # Add this country's simulated paths to the running total.
  fe_pop_base_sims <- fe_pop_base_sims + sim_fe_base_i$simulates
  fe_pop_edu_sims  <- fe_pop_edu_sims  + sim_fe_edu_i$simulates
}

# Divide by the number of countries to get the sample-average simulated path.
fe_pop_base_sims <- fe_pop_base_sims / n_pop
fe_pop_edu_sims  <- fe_pop_edu_sims  / n_pop

# Summarize the average paths by forecast year.
fe_pop_bands <- bind_rows(
  tibble(horizon = pop_horizons,
         pe = apply(fe_pop_base_sims, 2, mean),
         lo = apply(fe_pop_base_sims, 2, quantile, probs = 0.025),
         hi = apply(fe_pop_base_sims, 2, quantile, probs = 0.975),
         scenario = "Education held at baseline",
         model = "FE growth"),
  tibble(horizon = pop_horizons,
         pe = apply(fe_pop_edu_sims, 2, mean),
         lo = apply(fe_pop_edu_sims, 2, quantile, probs = 0.025),
         hi = apply(fe_pop_edu_sims, 2, quantile, probs = 0.975),
         scenario = "Education +1 SD",
         model = "FE growth")
)

Step 3: average the RE growth forecasts. The RE version uses the same countries and the same education scenario. The difference is that each country gets its random-intercept BLUP, and the AR(1) residual process is translated into the recursive representation from §3.3.

# Store simulated sample-average RE paths.
re_pop_base_sims <- matrix(0, nrow = S, ncol = H)
re_pop_edu_sims  <- matrix(0, nrow = S, ncol = H)

for (i in 1:n_pop) {
  # Pull one country's final observed row.
  row_i <- pop_anchor[i, ]
  
  # Make H-row future data frames for baseline and education counterfactual.
  future_base_i <- row_i[rep(1, H), ]
  future_edu_i  <- row_i[rep(1, H), ]
  
  # Future years are used only to keep the rows ordered within country.
  future_base_i$year <- row_i$year + pop_horizons
  future_edu_i$year  <- row_i$year + pop_horizons
  
  # Baseline: last observed education. Counterfactual: education + 1 SD.
  future_base_i$edt_c <- row_i$edt_c
  future_edu_i$edt_c  <- row_i$edt_c + sd_edt
  
  # Preserve the region factor levels used by model.matrix().
  future_base_i$region <- factor(future_base_i$region,
                                 levels = levels(factor(democracy_dyn$region)))
  future_edu_i$region  <- factor(future_edu_i$region,
                                 levels = levels(factor(democracy_dyn$region)))
  row_i$region         <- factor(row_i$region,
                                 levels = levels(factor(democracy_dyn$region)))
  
  # Current covariate matrices for this country's two scenarios.
  x_base_i <- model.matrix(re_growth_rhs, data = future_base_i)
  x_edu_i  <- model.matrix(re_growth_rhs, data = future_edu_i)
  x_last_i <- model.matrix(re_growth_rhs, data = row_i)
  
  # Drop intercept and match the fitted coefficient order.
  x_base_i <- x_base_i[, -1, drop = FALSE]
  x_edu_i  <- x_edu_i[,  -1, drop = FALSE]
  x_last_i <- x_last_i[, -1, drop = FALSE]
  
  x_base_i <- x_base_i[, re_beta_names, drop = FALSE]
  x_edu_i  <- x_edu_i[,  re_beta_names, drop = FALSE]
  x_last_i <- x_last_i[, re_beta_names, drop = FALSE]
  
  # Lagged covariates: last observed row for h = 1, then previous hypothetical row.
  x_base_lag_i <- rbind(x_last_i, x_base_i[-H, , drop = FALSE])
  x_edu_lag_i  <- rbind(x_last_i, x_edu_i[-H,  , drop = FALSE])
  
  colnames(x_base_lag_i) <- paste0(colnames(x_base_i), "_lag")
  colnames(x_edu_lag_i)  <- paste0(colnames(x_edu_i),  "_lag")
  
  x_base_ldv_i <- cbind(x_base_i, x_base_lag_i)
  x_edu_ldv_i  <- cbind(x_edu_i,  x_edu_lag_i)
  
  # Country-specific RE intercept: fixed intercept draw + BLUP random intercept.
  a_i <- ranef(fit_re_growth)[as.character(row_i$country), "(Intercept)"]
  alpha_i <- draws_re_g[, "(Intercept)"] + a_i
  
  # Transform the coefficients into the recursive AR-error representation.
  alpha_i_ldv <- alpha_i * (1 - rho_re)
  beta_i_lag  <- -rho_re * beta_re
  colnames(beta_i_lag) <- paste0(colnames(beta_re), "_lag")
  
  simbetas_re_i <- cbind(
    intercept = alpha_i_ldv,
    beta_re,
    beta_i_lag
  )
  
  # Baseline forecast for this country.
  sim_re_base_i <- ldvsimev(
    x        = x_base_ldv_i,
    b        = simbetas_re_i,
    ci       = 0.95,
    constant = 1,
    phi      = simphi_re_growth,
    lagY     = row_i$d_ln_gdpw,
    transform = "diff",
    initialY = row_i$ln_gdpw,
    simulates = TRUE
  )
  
  # Education counterfactual forecast for this country.
  sim_re_edu_i <- ldvsimev(
    x        = x_edu_ldv_i,
    b        = simbetas_re_i,
    ci       = 0.95,
    constant = 1,
    phi      = simphi_re_growth,
    lagY     = row_i$d_ln_gdpw,
    transform = "diff",
    initialY = row_i$ln_gdpw,
    simulates = TRUE
  )
  
  # Add this country's simulated paths to the running total.
  re_pop_base_sims <- re_pop_base_sims + sim_re_base_i$simulates
  re_pop_edu_sims  <- re_pop_edu_sims  + sim_re_edu_i$simulates
}

# Divide by the number of countries to get sample-average simulated paths.
re_pop_base_sims <- re_pop_base_sims / n_pop
re_pop_edu_sims  <- re_pop_edu_sims  / n_pop

# Summarize the average paths by forecast year.
re_pop_bands <- bind_rows(
  tibble(horizon = pop_horizons,
         pe = apply(re_pop_base_sims, 2, mean),
         lo = apply(re_pop_base_sims, 2, quantile, probs = 0.025),
         hi = apply(re_pop_base_sims, 2, quantile, probs = 0.975),
         scenario = "Education held at baseline",
         model = "RE growth + AR(1) errors"),
  tibble(horizon = pop_horizons,
         pe = apply(re_pop_edu_sims, 2, mean),
         lo = apply(re_pop_edu_sims, 2, quantile, probs = 0.025),
         hi = apply(re_pop_edu_sims, 2, quantile, probs = 0.975),
         scenario = "Education +1 SD",
         model = "RE growth + AR(1) errors")
)

Step 4: compare the two population-average forecasts. The plot shows the average predicted log-GDP path across 101 countries, each anchored at its own last complete observation.

pop_forecast_bands <- bind_rows(fe_pop_bands, re_pop_bands)

ggplot(pop_forecast_bands,
       aes(horizon, pe, colour = scenario, fill = scenario)) +
  geom_ribbon(aes(ymin = lo, ymax = hi), alpha = 0.18, colour = NA) +
  geom_line(linewidth = 0.85) +
  facet_wrap(~ model, ncol = 1) +
  labs(x = "Years after each country's last complete observation",
       y = "Average predicted log GDP",
       title = "Population-average education forecasts",
       subtitle = paste0("Average across ", n_pop,
                         " countries with complete forecast anchors"),
       colour = NULL, fill = NULL) +
  theme_minimal(base_size = 11) +
  theme(legend.position = "bottom")

Population-average forecasts from the FE growth model and the RE growth model with AR(1) errors. Each country starts from its own last complete log GDP and covariates; paths are averaged across countries before intervals are computed.

Reading. The one-country forecasts answer, “what is the predicted path for this country?” The population-average forecasts answer, “what is the average predicted path across the observed panel?” The FE version averages country-specific fixed-effect forecasts. The RE version averages country-specific random-intercept forecasts and also carries the AR(1) residual dynamics. A different RE target would be a new-country forecast, where the random intercept is set to zero. That target is useful for predicting a generic country drawn from the model population, but it is not the same as averaging over the countries actually observed in the data.

3.5 Within-education prediction from the Mundlak model, by regime

The growth-FE forecast above was contaminated by the country-trend artifact. The Mundlak / REWB model fixes this by splitting education into a within part (the deviation from the country mean) and a between part (the country mean), each with its own coefficient. Because the dem × edt interaction is split the same way, the within-education effect can be read separately for a democracy and a dictatorship.

For a within-country rise in education of $e$ years above the country mean, holding the regime at dem = d, the predicted within contribution to log GDP is

\[ \Delta \widehat{\ln\text{gdpw}} \;=\; e \times \big(\hat\beta^{W}_{\text{edt}} + d\,\hat\beta^{W}_{\text{dem}\times\text{edt}}\big), \]

so the within-education slope is $\hat\beta^{W}_{\text{edt}}$ under dictatorship ($d = 0$) and $\hat\beta^{W}_{\text{edt}} + \hat\beta^{W}_{\text{dem}\times\text{edt}}$ under democracy ($d = 1$). This quantity is linear in the coefficients, so it needs no simcf machinery: the analysis draws from the Mundlak fixed effects with mvrnorm and evaluates the slope directly. The steps mirror §3.2 through §3.4.

Step 1: draw coefficients from the static Mundlak fixed-effect estimates and their covariance.

set.seed(2025)
S <- 1000   # number of simulation draws

# Draw S coefficient vectors from N(coef, vcov) of the Mundlak fixed effects.
draws_m <- mvrnorm(S, fixef(fit_mundlak), vcov(fit_mundlak))

Step 2: form the within-education slope implied by each draw, separately by regime. Dictatorship uses the within education coefficient alone; democracy adds the within interaction.

slope_dict <- draws_m[, "edt_within"]                                 # d = 0: slope is edt_within
slope_dem  <- draws_m[, "edt_within"] + draws_m[, "dem_x_edt_within"] # d = 1: add the within interaction

Step 3: evaluate the contribution over a grid of education increases. The within contribution is the education increase times the slope. Because the increase is a fixed constant, the interval is just that increase times the 2.5 and 97.5 percentiles of the simulated slope.

edu_grid <- seq(0, 3, by = 0.25)   # within-country education increases to evaluate, in years

# One row per (education increase, regime). pe uses the mean slope; lo / hi use the
# 2.5 / 97.5 percentiles of the simulated slope, each scaled by the education increase.
within_pred <- bind_rows(
  tibble(edu = edu_grid, regime = "Dictatorship (dem = 0)",
         pe = edu_grid * mean(slope_dict),
         lo = edu_grid * quantile(slope_dict, 0.025),
         hi = edu_grid * quantile(slope_dict, 0.975)),
  tibble(edu = edu_grid, regime = "Democracy (dem = 1)",
         pe = edu_grid * mean(slope_dem),
         lo = edu_grid * quantile(slope_dem, 0.025),
         hi = edu_grid * quantile(slope_dem, 0.975))
)

Step 4: plot the predicted within contribution and its band, by regime.

ggplot(within_pred, aes(edu, pe, colour = regime, fill = regime)) +
  geom_ribbon(aes(ymin = lo, ymax = hi), alpha = 0.18, colour = NA) +   # 95% band
  geom_line(linewidth = 0.9) +                                          # point estimate
  scale_colour_manual(values = c("Dictatorship (dem = 0)" = "#d95f02",
                                 "Democracy (dem = 1)"    = "#1b9e77")) +
  scale_fill_manual(values   = c("Dictatorship (dem = 0)" = "#d95f02",
                                 "Democracy (dem = 1)"    = "#1b9e77")) +
  labs(x = "Within-country rise in education (years above the country mean)",
       y = "Predicted change in log GDP",
       colour = NULL, fill = NULL,
       title = "Within-education effect by regime (Mundlak)") +
  theme_minimal(base_size = 11) +
  theme(legend.position = "bottom")

Within-country education effect on log GDP from the Mundlak model, by regime. Lines are expected within contributions; ribbons are 95% parametric-simulation intervals.

Reading. The within effect of education is now clearly positive under both regimes: a within-country rise of one year of average schooling is associated with roughly 15 to 18 percent higher log GDP. The slope is slightly flatter under democracy (the small negative within interaction dem_x_edt_within), but the headline is the contrast with §3.2 through §3.4. The same within-education idea gives a negative artifact in the differenced growth models and a credible positive effect in the Mundlak levels decomposition. The difference is entirely which variation each model uses, the lesson of §2.5 turned into a prediction. Lab 6 revisits dynamic panels with Arellano-Bond GMM, where the lagged-DV bias flagged in §2.2 is corrected directly.

3.6 Impulse response from the FE AR(1) model

The two predictions above came from models without a lagged outcome. The AR(1) fixed-effects model from §2.2 (column 2) is different: it carries $y_{i,t-1}$ on the right-hand side, so a one-time change in a covariate does not deliver its full effect at once. Part of the effect is passed forward through the lagged term, period after period. The impulse response function (IRF) traces that propagation, separating the immediate short-run effect from the accumulated long-run effect.

Write the AR(1) model for a single focus covariate $x$ as

\[ y_{it} = \alpha_i + \phi\,y_{i,t-1} + \beta\,x_{it} + \varepsilon_{it}. \]

Consider a permanent one-unit increase in $x$ from horizon $0$ onward, for example a country that democratizes and stays democratic. Measured against the unchanged baseline, the cumulative effect on $y$ after $h$ periods is

\[ \text{IRF}(h) \;=\; \beta\,\big(1 + \phi + \phi^2 + \dots + \phi^h\big) \;=\; \beta\,\frac{1 - \phi^{\,h+1}}{1 - \phi}. \]

At $h = 0$ the response is just $\beta$, the short-run effect. As $h \to \infty$ it converges to the long-run multiplier $\beta / (1 - \phi)$. A purely transitory shock, with $x$ raised for one period only, would instead give $\text{IRF}(h) = \beta\,\phi^{h}$, decaying geometrically back to zero; the permanent case is used below because a regime change is persistent.

The focus covariate is democracy. Because the fitted model includes the dem × edt_c interaction, the marginal democracy effect depends on education, $\beta_{\text{dem}} + \beta_{\text{dem}\times\text{edt}}\,\text{edt}_c$. Evaluating at the sample-mean education (edt_c = 0, since education is centred) makes the short-run effect simply the dem coefficient, so $\beta = \hat\beta_{\text{dem}}$ here. The estimated persistence is $\hat\phi \approx$ 0.943, so the long-run multiplier amplifies the short-run effect about 17.6-fold. The computation follows the same draw-and-summarize recipe as §3.2 through §3.5.

Step 1: draw coefficients from the AR(1) fit, then pull out the persistence and the short-run democracy effect.

set.seed(2025)
S <- 1000   # number of simulation draws

# Draw S coefficient vectors from N(coef, vcov) of the AR(1) FE fit. feols carries
# the clustered vcov, so the draws already reflect cluster-robust uncertainty.
draws_ar <- mvrnorm(S, coef(fit_fe_dyn), vcov(fit_fe_dyn))

# The two parameters the IRF formula needs:
phi_draws  <- draws_ar[, "ln_gdpw_lag"]   # persistence phi
beta_draws <- draws_ar[, "dem"]           # short-run democracy effect at mean education = IRF(0)

Step 2: apply the closed-form IRF to every draw and horizon at once. The matrix phi_pow holds $\phi^{h+1}$ for each draw (row) and horizon (column), so the cumulative-response formula becomes a single vectorized expression rather than a recursion.

H_irf    <- 20            # longest horizon to trace
horizons <- 0:H_irf       # h = 0, 1, ..., 20

# phi^(h+1) for every draw (row) and horizon (column): column j holds phi^j, which
# is phi^(h+1) for horizon h = j - 1. outer() avoids a loop over horizons.
phi_pow <- outer(phi_draws, horizons + 1, "^")   # S × (H_irf + 1)

# Cumulative response to a PERMANENT unit democratization: beta * (1 - phi^(h+1)) / (1 - phi).
# beta_draws and (1 - phi_draws) are length S and recycle down each column (one value per draw).
irf_mat <- beta_draws * (1 - phi_pow) / (1 - phi_draws)

Step 3: summarize each horizon into a point estimate and a 95% interval across the draws, and simulate the long-run multiplier the same way for a reference line.

# Each column of irf_mat is one horizon; summarize down the S draws.
irf_bands <- tibble(
  h  = horizons,
  pe = apply(irf_mat, 2, mean),                     # mean response at each horizon
  lo = apply(irf_mat, 2, quantile, probs = 0.025),  # 2.5 percentile
  hi = apply(irf_mat, 2, quantile, probs = 0.975)   # 97.5 percentile
)

# Long-run multiplier beta / (1 - phi), one value per draw, summarized to a point estimate.
lr_draws <- beta_draws / (1 - phi_draws)
lr_pe    <- mean(lr_draws)

Step 4: plot the impulse response with its band and the long-run multiplier.

ggplot(irf_bands, aes(h, pe)) +
  geom_ribbon(aes(ymin = lo, ymax = hi), alpha = 0.18, fill = "#1b9e77") +   # 95% band
  geom_line(linewidth = 0.9, colour = "#1b9e77") +                           # response path
  geom_hline(yintercept = lr_pe, linetype = "dashed", colour = "grey40") +   # long-run multiplier
  geom_hline(yintercept = 0, colour = "grey80") +                            # baseline (no effect)
  labs(x = "Years since permanent democratization (h)",
       y = "Cumulative change in log GDP",
       title = "Impulse response to a permanent democratization (FE AR(1))",
       subtitle = "Dashed line: long-run multiplier beta / (1 - phi).") +
  theme_minimal(base_size = 11)

Impulse response of log GDP to a permanent democratization, from the AR(1) fixed-effects model, evaluated at mean education. The line is the cumulative response, the ribbon is the 95% parametric-simulation interval, and the dashed line is the simulated long-run multiplier beta / (1 - phi).

Reading. The path begins at the short-run democracy effect, about -0.003, and accumulates toward the long-run multiplier, about -0.069. The approach is slow because $\hat\phi \approx 0.94$ is close to one: a permanent regime change keeps feeding through the lagged outcome for many years, so most of the long-run effect arrives well after the initial shock. Two cautions from earlier sections carry over. First, the band is wide and straddles zero throughout, consistent with the §2.2 finding that democracy’s within-country association with GDP is weak; the IRF makes the shape of any effect explicit but cannot manufacture precision the data do not contain. Second, $\hat\phi$ is biased downward by Nickell (1981) at this panel length, so the true persistence, and with it the long-run multiplier, is probably larger than shown. The same IRF computed on the bias-corrected GMM estimates in Lab 6 is the natural next step. For those who prefer the packaged route, simcf::ldvsimev() produces the same trajectory by simulation rather than closed form: it is the §3.2 call with phi set to the estimated persistence instead of 0, and a step in dem instead of edt_c.

References

Beck, N., & Katz, J. N. (1995). What to do (and not to do) with time-series cross-section data. American Political Science Review, 89(3), 634-647.
Beck, N., & Katz, J. N. (2011). Modeling dynamics in time-series-cross-section political economy data. Annual Review of Political Science, 14, 331-352.
Bell, A., & Jones, K. (2015). Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data. Political Science Research and Methods, 3(1), 133-153.
Bell, A., Fairbrother, M., & Jones, K. (2019). Fixed and random effects models: making an informed choice. Quality & Quantity, 53, 1051-1074.
Choi, I. (2001). Unit root tests for panel data. Journal of International Money and Finance, 20(2), 249-272.
Driscoll, J. C., & Kraay, A. C. (1998). Consistent covariance matrix estimation with spatially dependent panel data. Review of Economics and Statistics, 80(4), 549-560.
Hadri, K. (2000). Testing for stationarity in heterogeneous panel data. Econometrics Journal, 3(2), 148-161.
Hlouskova, J., & Wagner, M. (2006). The performance of panel unit root and stationarity tests: Results from a large scale simulation study. Econometric Reviews, 25(1), 85-116.
Im, K. S., Pesaran, M. H., & Shin, Y. (2003). Testing for unit roots in heterogeneous panels. Journal of Econometrics, 115(1), 53-74.
King, G., Tomz, M., & Wittenberg, J. (2000). Making the most of statistical analyses: Improving interpretation and presentation. American Journal of Political Science, 44(2), 347-361.
Kropko, J., & Kubinec, R. (2020). Interpretation and identification of within-unit and cross-sectional variation in panel data models. PLOS ONE, 15(4), e0231349.
Maddala, G. S., & Wu, S. (1999). A comparative study of unit root tests with panel data and a new simple test. Oxford Bulletin of Economics and Statistics, 61(S1), 631-652.
Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica, 46(1), 69-85.
Nickell, S. (1981). Biases in dynamic models with fixed effects. Econometrica, 49(6), 1417-1426.
Pesaran, M. H. (2007). A simple panel unit root test in the presence of cross-section dependence. Journal of Applied Econometrics, 22(2), 265-312.
Przeworski, A., Alvarez, M. E., Cheibub, J. A., & Limongi, F. (2000). Democracy and Development: Political Institutions and Well-Being in the World, 1950-1990. Cambridge University Press.

See the Przeworski et al. codebook at http://www.u.arizona.edu/~mishler/PrzeworskiCodebook.PDF.↩︎

Transformation	Object tested	Story
Detrended residuals	\(y_{it} - \widehat\alpha_i - \widehat\delta_i t\)	deterministic trend removed, persistence still visible
Trend + AR(1) innovations	residuals from `Arima(y, order = c(1,0,0), xreg = time_id)`	deterministic trend plus stationary AR(1) errors
First differences	\(\Delta y_{it}\)	difference-stationary / I(1) with average growth

Col	Spec	Outcome	Dynamic idea
(1)	Pooled levels + trend	\(y_{it}\)	Descriptive baseline with a linear time trend.
(2)	GLS AR(1) errors + trend	\(y_{it}\)	Same mean as (1); persistence enters the residual process.
(3)	Growth	\(\Delta y_{it}\)	Growth on covariate levels (+ lagged democracy `dem_lag`).
(4)	Growth / ECM	\(\Delta y_{it}\)	(3) + lagged level \(y_{i,t-1}\) — error-correction / convergence.
(5)	Growth + lagged \(\Delta y\)	\(\Delta y_{it}\)	(3) + lagged growth \(\Delta y_{i,t-1}\) — momentum / AR(1) in growth.

CSSS/POLS 512 - Time Series and Panel Data Methods

Lab 5: Panel Data — Dynamics, Fixed/Random Effects, Mundlak, Forecasting

Ramses Llobet

Spring 2026

Disclaimer

Packages and Functions Used in This Lab

Today’s roadmap

Part 1 — A panel of countries: theory, data, dynamics

1.1 A toy theory of politics in panel data

1.2 The Przeworski democracy panel

1.3 A systematic dynamics workflow

1.3.1 Look at the series

1.3.2 Detrend each panel and export unit-level diagnostics

1.3.3 Pooled ACF/PACF and per-series unit-root tests

1.3.4 Panel unit-root tests: compare the same three treatments

1.3.5 Do growth rates still have AR dynamics?

1.3.6 The verdict — and the modeling menu

Part 2 — Estimating panel models

2.0 Specifications and what they identify

2.0.1 Seeing country-level outcome differences

2.0.2 Seeing pooled, between, and within variation

2.0.3 The same decomposition for every candidate predictor

2.1 Pooled OLS and GLS

2.1.1 Creating lags in panel data

2.1.2 Five pooled specifications

2.1.3 Residual diagnostics

2.1.4 Standard errors — choice and justification

2.2 Fixed effects (within)

Tool box — fixest::feols()

Specs we fit

2.3 Two-way fixed effects

2.4 Random effects

Tool box — nlme::lme() and corARMA()

Partial pooling — RE as regularized FE

Specs we fit

2.5 Mundlak / REWB — within and between, separately

Tool box — parameters::demean()

Specs we fit

2.6 Diagnostic tests — does within agree with between?

2.7 Standard errors when residuals are not iid

2.8 Robustness — does the conclusion travel across specifications?

Part 3 — Counterfactual prediction with fixed effects

3.1 Setup: a focus country and its baseline

3.2 Forecasting an education increase with the FE growth model (simcf)

3.3 Forecasting with RE growth and AR(1) errors (simcf)

3.4 Population-average FE and RE growth forecasts

3.5 Within-education prediction from the Mundlak model, by regime

3.6 Impulse response from the FE AR(1) model

References

Tool box — `fixest::feols()`

Tool box — `nlme::lme()` and `corARMA()`

Tool box — `parameters::demean()`

3.2 Forecasting an education increase with the FE growth model (`simcf`)

3.3 Forecasting with RE growth and AR(1) errors (`simcf`)