pgmm() and the diagnostic battery for dynamic-panel
GMM
These materials have been revised and updated by successive teaching assistants over recent years.
Revision history:
| Package | Purpose | Key functions |
|---|---|---|
plm |
Panel models, dynamic-panel GMM | plm(), pgmm(),
pdata.frame() |
lmtest / sandwich |
Test stats and robust SEs | coeftest(), vcovHC() |
MASS |
MVN draws for parametric simulation | mvrnorm() |
simcf |
King-Tomz-Wittenberg counterfactual simulators (Adolph’s package) | cfMake(), cfChange(),
ldvsimev(), ldvsimfd(),
ldvsimrr() |
texreg |
Cross-format regression tables | screenreg(), htmlreg(),
texreg() |
tidyverse |
Data manipulation, plotting | ggplot(), tibble(), pipes
(|>) |
patchwork |
Side-by-side ggplot panels | +, / |
We also use EmplUK from
data(EmplUK, package = "plm") — the Arellano-Bond (1991)
labour-demand panel — for the canonical demonstration of difference and
system GMM in Part 3.
| Part | Topic |
|---|---|
| 1 | Nickell bias and a hand-built GMM fix.
Derivation + a small simulated panel where we reproduce the bias, build
Anderson-Hsiao IV and one-step Arellano-Bond GMM from raw matrix
algebra, and verify against pgmm(). |
| 2 | pgmm() and the diagnostic
battery. Walk through the function’s formula syntax and
arguments, then the five diagnostics on a worked example (Sargan/Hansen,
AR(1), AR(2), Wald on time dummies, instrument count). |
| 3 | The IV/GMM family demonstrated on EmplUK. Anderson-Hsiao → Arellano-Bond → Blundell-Bond. One-step vs two-step (Windmeijer). Collapsed instruments (Roodman 2009). |
| 4 | Applied workflow on the Przeworski democracy
panel. FE-ARDL(1,1) → AH → Diff-GMM → Sys-GMM, including
time-invariant oil so students see when sys-GMM helps.
Misspecified examples included. |
| 5 | Cigarette tax counterfactual. Diff-GMM
fit on the 1985–1995 cigarette panel + a +60-cent tax-hike scenario,
with EV / FD / RR forecasts via simcf. |
Lab 5 closed by fitting FE-ARDL(1,1) on the democracy panel and flagged the Nickell bias on the lagged DV as a residual concern. Today we tackle that concern head-on. The pedagogical thread is continuity: same data, same outcome — but now interpreted through three bias-corrected estimators.
Take the simplest dynamic panel:
\[ y_{it} = \alpha_i + \phi\, y_{i,t-1} + \beta\, x_{it} + \varepsilon_{it}, \qquad |\phi| < 1, \quad \varepsilon_{it} \sim \text{iid}(0, \sigma^2) \]
The within transformation subtracts unit means: \(\tilde y_{it} = y_{it} - \bar y_i\). This sweeps out \(\alpha_i\) — the whole reason we use FE. But it introduces a new problem. The within-transformed lag \(\tilde y_{i,t-1}\) contains \(\bar y_{i,-1}\), which depends on \(\varepsilon_{i,1}, \ldots, \varepsilon_{i,T-1}\). The within-transformed error \(\tilde\varepsilon_{it}\) also contains \(\bar\varepsilon_i\), which depends on \(\varepsilon_{i,1}, \ldots, \varepsilon_{i,T}\). They share terms.
Therefore \(\tilde y_{i,t-1}\) and \(\tilde\varepsilon_{it}\) are correlated, OLS on the within-transformed data is no longer consistent, and Nickell (1981) showed that as \(N \to \infty\) with \(T\) fixed,
\[ \boxed{\;\text{plim}(\hat\phi_{\text{FE}} - \phi) \approx -\,\frac{1+\phi}{T-1}\;} \]
Three consequences worth flagging:
The rest of Part 1 builds that “change estimator” path by hand, on a small simulated panel. We will (i) generate data with known parameters, (ii) reproduce the Nickell bias on an FE-LDV fit, (iii) difference to kill \(\alpha_i\) and watch the residual endogeneity appear, (iv) construct the Anderson-Hsiao instrumental-variable estimator from raw matrix algebra, and (v) extend to an over-identified one-step Arellano-Bond GMM. The pedagogical goal is to demystify GMM: it is just a weighted moment-condition system with a closed-form solution.
We simulate a small panel from the DGP
\[ y_{it} = \alpha_i + \phi\, y_{i,t-1} + \beta\, x_{it} + \varepsilon_{it} \]
with \(\phi = 0.7\), \(\beta = 0.5\), \(\alpha_i \sim N(0, 1)\), \(\varepsilon_{it} \sim N(0, 1)\), \(x_{it} \sim N(0, 1)\). We keep \(N = 50\) and \(T = 8\) — small enough that the matrix algebra fits in memory of your head, large enough that estimates are well-defined. A burn-in period lets the series escape its arbitrary initial value before we collect observations.
We keep the true \(\varepsilon_{it}\) alongside the observed data — students never have this in real life, but here it lets us see the endogeneity in §1.4.
set.seed(20260515)
# DGP parameters
phi_true <- 0.7 # true persistence
beta_true <- 0.5 # true effect of x on y
N <- 50 # number of units
T_obs <- 8 # observed periods per unit
burn <- 50 # burn-in length
# Unit fixed effects, drawn once
alpha <- rnorm(N, mean = 0, sd = 1)
# Generate one unit at a time and stack
toy_list <- vector("list", N)
for (i in seq_len(N)) {
Tlong <- T_obs + burn
x_i <- rnorm(Tlong, 0, 1)
eps_i <- rnorm(Tlong, 0, 1)
y_i <- numeric(Tlong)
# initial value from the stationary distribution of an AR(1) with intercept alpha_i
y_i[1] <- alpha[i] / (1 - phi_true) +
rnorm(1, sd = 1 / sqrt(1 - phi_true^2))
for (tt in 2:Tlong) {
y_i[tt] <- alpha[i] + phi_true * y_i[tt - 1] +
beta_true * x_i[tt] + eps_i[tt]
}
# Pre-compute the slices BEFORE building the tibble — tibble evaluates
# columns sequentially, so y_lag = y_i[...] inside tibble() would shadow the
# outer 'y_i' with the just-created 'y' column. Pre-computing avoids that.
#
# Lags are constructed from the OBSERVED window only — NA where the lag
# would reach into pre-window data. This mirrors what pgmm sees on a real
# panel, so the manual fits in §1.5–§1.6 use the same sample pgmm does.
y_obs <- y_i [(burn + 1):Tlong]
y_lag1 <- c(NA, y_obs[1:(T_obs - 1)])
y_lag2 <- c(NA, NA, y_obs[1:(T_obs - 2)])
y_lag3 <- c(NA, NA, NA, y_obs[1:(T_obs - 3)])
y_lag4 <- c(NA, NA, NA, NA, y_obs[1:(T_obs - 4)])
x_obs <- x_i [(burn + 1):Tlong]
eps_obs <- eps_i[(burn + 1):Tlong]
toy_list[[i]] <- tibble(
unit = i,
time = seq_len(T_obs),
y = y_obs,
y_lag1 = y_lag1,
y_lag2 = y_lag2,
y_lag3 = y_lag3,
y_lag4 = y_lag4,
x = x_obs,
eps = eps_obs
)
}
toy <- bind_rows(toy_list)
glimpse(toy)
## Rows: 400
## Columns: 9
## $ unit <int> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, …
## $ time <int> 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, …
## $ y <dbl> -4.5417604, -6.1090964, -7.9833272, -6.8805058, -6.9750735, -6.…
## $ y_lag1 <dbl> NA, -4.5417604, -6.1090964, -7.9833272, -6.8805058, -6.9750735,…
## $ y_lag2 <dbl> NA, NA, -4.5417604, -6.1090964, -7.9833272, -6.8805058, -6.9750…
## $ y_lag3 <dbl> NA, NA, NA, -4.5417604, -6.1090964, -7.9833272, -6.8805058, -6.…
## $ y_lag4 <dbl> NA, NA, NA, NA, -4.5417604, -6.1090964, -7.9833272, -6.8805058,…
## $ x <dbl> 1.38427471, 1.02050701, -0.61394292, -0.03089128, 0.77359679, 0…
## $ eps <dbl> -0.255258495, -1.821209281, -1.781079872, 0.342177224, -0.92660…
The columns we will use downstream:
y — outcome at time \(t\).y_lag1, y_lag2, y_lag3,
y_lag4 — lagged outcomes at \(t-1, t-2, t-3, t-4\). NA for the first
\(k\) rows of each unit since those
lags would reach pre-window. This is exactly what pgmm()
sees internally, so downstream drop_na() steps will yield
the same effective sample as pgmm().x — covariate at time \(t\).eps — the true error \(\varepsilon_{it}\) (visible to us, not to a
real researcher).Within-demean every variable inside its unit, then OLS the demeaned
outcome on the demeaned lag and demeaned \(x\). No intercept (it’s absorbed by the
demeaning). This is exactly what plm(model = "within") does
internally.
# Listwise-delete the t = 1 rows where y_lag1 is unobservable (pre-window),
# then within-demean over the remaining 7 rows per unit. This mirrors what
# plm(model = "within") does internally.
toy_dm <- toy |>
drop_na(y, y_lag1, x) |>
group_by(unit) |>
mutate(y_dm = y - mean(y),
y_lag_dm = y_lag1 - mean(y_lag1),
x_dm = x - mean(x)) |>
ungroup()
# OLS on the within-demeaned variables, no intercept
fit_fe_manual <- lm(y_dm ~ y_lag_dm + x_dm - 1, data = toy_dm)
phi_fe <- unname(coef(fit_fe_manual)["y_lag_dm"])
beta_fe <- unname(coef(fit_fe_manual)["x_dm"])
tibble(parameter = c("phi", "beta"),
true = c(phi_true, beta_true),
FE_hat = c(phi_fe, beta_fe),
bias = FE_hat - true) |>
knitr::kable(digits = 3,
caption = "FE-LDV on the toy panel. True phi = 0.7; FE estimate is biased downward by Nickell. True beta = 0.5; the bias is mild in this DGP but you can see it travelling.")
| parameter | true | FE_hat | bias |
|---|---|---|---|
| phi | 0.7 | 0.511 | -0.189 |
| beta | 0.5 | 0.427 | -0.073 |
\(\hat\phi_{FE}\) comes in below 0.7 — the Nickell bias is visible. The next steps build an estimator that closes that gap.
One-realization caveat. Nickell’s formula predicts the expected bias under repeated sampling, \(\text{plim}(\hat\phi_{FE} - \phi) \approx -(1+\phi)/(T-1) \approx -0.243\) here. The number in the
biascolumn above is just one realization around that expectation, scattered by Monte-Carlo noise. Re-run with a different seed and the magnitude moves; the sign and rough scale stay. Read the table as evidence the bias is real, not as a precise estimate of its size.
Take first differences within each unit. The \(\alpha_i\) vanishes (\(\Delta \alpha_i = 0\)), but the differenced lag \(\Delta y_{i,t-1}\) correlates with the differenced error \(\Delta \varepsilon_{it}\) — both contain \(\varepsilon_{i,t-1}\). Let’s see this correlation directly.
# Compute differences within each unit. drop_na() removes rows whose lags
# would reach into the pre-window — t = 1 (where y_lag1 is NA) and t = 2
# (where y_lag2 is NA). The remaining sample is t = 3..8 per unit, exactly
# the rows pgmm uses internally for transformation = "d" with lag(y, 2)
# as the instrument.
#
# We force a final unit-then-time row ordering with arrange(unit, time) so
# downstream matrix construction (Part 1.7) lines up rows correctly.
toy_d <- toy |>
arrange(unit, time) |>
group_by(unit) |>
mutate(dy = y - y_lag1,
dy_lag = y_lag1 - y_lag2, # = Δy_{t-1}
dx = x - dplyr::lag(x),
deps = eps - dplyr::lag(eps)) |>
ungroup() |>
drop_na(dy, dy_lag, dx, deps) |>
arrange(unit, time)
# The endogeneity in the differenced model:
# Δy_{t-1} and Δε_{t} should be correlated (both contain ε_{t-1});
# Δx and Δε should not be (x is exogenous in our DGP).
tibble(
pair = c("Δy_lag vs Δε (endogeneity in the differenced equation)",
"Δx vs Δε (sanity check — x is exogenous in the DGP)"),
correlation = c(cor(toy_d$dy_lag, toy_d$deps),
cor(toy_d$dx, toy_d$deps))
) |>
knitr::kable(digits = 3,
caption = "Correlations among differenced variables and the differenced error in the simulated data.")
| pair | correlation |
|---|---|
| Δy_lag vs Δε (endogeneity in the differenced equation) | -0.575 |
| Δx vs Δε (sanity check — x is exogenous in the DGP) | -0.043 |
The first row is sizeable and clearly non-zero — that’s exactly the endogeneity problem an instrumental variable will fix. The second is near zero, confirming that \(\Delta x\) is exogenous to \(\Delta \varepsilon\) in this DGP and does not itself need an instrument.
Why \(y_{i,t-2}\) is the natural instrument. It depends on \(\varepsilon_{i,1}, \ldots, \varepsilon_{i,t-2}\) but neither \(\varepsilon_{i,t-1}\) nor \(\varepsilon_{i,t}\). So \(y_{i,t-2}\) is correlated with \(\Delta y_{i,t-1} = y_{i,t-1} - y_{i,t-2}\) (the relevance condition) but uncorrelated with \(\Delta \varepsilon_{it}\) under the assumption that \(\varepsilon\) is iid (the exclusion condition). In the simulated data we can check both directly:
tibble(
condition = c("Relevance: cor(y_lag2, Δy_lag) should be far from 0",
"Exclusion: cor(y_lag2, Δε) should be near 0"),
correlation = c(cor(toy_d$y_lag2, toy_d$dy_lag),
cor(toy_d$y_lag2, toy_d$deps))
) |>
knitr::kable(digits = 3,
caption = "Relevance and exclusion checks for the lag-2 instrument.")
| condition | correlation |
|---|---|
| Relevance: cor(y_lag2, Δy_lag) should be far from 0 | -0.134 |
| Exclusion: cor(y_lag2, Δε) should be near 0 | -0.009 |
Both conditions hold in the simulated data, by construction.
We have one endogenous regressor (\(\Delta y_{i,t-1}\)) and one valid instrument (\(y_{i,t-2}\)). \(\Delta x\) instruments itself. The system is just-identified.
The closed-form IV (= 2SLS = GMM with any weighting matrix) estimator is
\[ \hat \theta_{IV} \;=\; (Z' X)^{-1} Z' \Delta y \]
where \(\Delta y\) is the stacked outcome vector, \(X\) is the regressor matrix with columns \([\Delta y_{i,t-1}, \Delta x_{it}]\), and \(Z\) is the instrument matrix with columns \([y_{i,t-2}, \Delta x_{it}]\). Let’s build each piece explicitly.
We have 300 rows (50 units × 6 periods after dropping \(t=1, 2\) where the lags reach pre-window), 2 columns in \(X\), and 2 columns in \(Z\) — a 2×2 system to solve.
# Stack the data as plain vectors / matrices. drop_na in the previous chunk
# already kept only rows where every needed lag is observed (t >= 2 here).
dy_vec <- toy_d$dy # outcome vector
X_mat <- cbind(dy_lag = toy_d$dy_lag, # endogenous regressor
dx = toy_d$dx) # exogenous regressor
Z_mat <- cbind(y_lag2 = toy_d$y_lag2, # instrument for dy_lag
dx = toy_d$dx) # dx instruments itself
# Build Z'X (a 2x2 matrix) and Z'dy (a 2-vector)
ZpX <- crossprod(Z_mat, X_mat)
Zpdy <- crossprod(Z_mat, dy_vec)
# Closed-form IV estimator
theta_AH <- solve(ZpX, Zpdy)
rownames(theta_AH) <- c("phi (dy_lag)", "beta (dx)")
The two small matrices we just built:
knitr::kable(ZpX, digits = 2, caption = "Z'X")
| dy_lag | dx | |
|---|---|---|
| y_lag2 | -214.58 | 26.73 |
| dx | -151.49 | 660.48 |
knitr::kable(Zpdy, digits = 2, caption = "Z'Δy")
| y_lag2 | -156.21 |
| dx | 197.32 |
And the resulting IV estimates:
knitr::kable(theta_AH, digits = 3,
caption = "Anderson-Hsiao IV estimates from the closed-form solve.")
| phi (dy_lag) | 0.788 |
| beta (dx) | 0.479 |
Read off \(\hat\phi_{AH}\) — it should be close to the true value 0.7. The Nickell bias has been corrected by replacing the endogenous \(\Delta y_{i,t-1}\) with its instrumented counterpart.
Comparison: FE-LDV vs Anderson-Hsiao on the same panel.
tibble(parameter = c("phi", "beta"),
true = c(phi_true, beta_true),
FE = c(phi_fe, beta_fe),
AH = c(theta_AH[1, 1], theta_AH[2, 1])) |>
knitr::kable(digits = 3,
caption = "FE-LDV is biased; Anderson-Hsiao IV recovers the truth (up to sampling error).")
| parameter | true | FE | AH |
|---|---|---|---|
| phi | 0.7 | 0.511 | 0.788 |
| beta | 0.5 | 0.427 | 0.479 |
What if we use two lags as instruments — say \(y_{i,t-2}\) and \(y_{i,t-3}\)? Each is exogenous under the iid-\(\varepsilon\) assumption, and each is correlated with \(\Delta y_{i,t-1}\). Adding the second moment makes the system over-identified (more moments than parameters), and we have to choose a weighting matrix to combine them.
The (one-step) GMM estimator with weighting matrix \(W\) is
\[ \hat\theta_{GMM} \;=\; \big( X' Z\, W\, Z' X \big)^{-1} \, X' Z\, W\, Z' \Delta y \]
For one-step GMM, the standard choice is \(W = (Z'Z)^{-1}\). (Two-step would use the residual-based optimal \(W\) — see §3.4. We stop at one-step here to keep the algebra clean.)
To use lag-3 as an instrument we need \(y_{i,t-3}\) observable, which means restricting the equation to \(t \ge 3\) — we lose one period of usable rows.
Heads-up on the verification step. With the same sample as
pgmmuses, the just-identified AH case in §1.5 will matchpgmm(..., | lag(y, 2))to numerical precision in §1.7 — one instrument and one endogenous regressor leave no degree of freedom for stacking conventions. The over-identified GMM below (\(y_{i,t-2}\) and \(y_{i,t-3}\) both as instruments) will still differ slightly frompgmm(..., | lag(y, 2:3)). Why? We stack the two lags as two ordinary IV columns that apply at every row;pgmmuses a richer per-period moment structure (a separate moment for each lag at each period) — the convention used in the Arellano-Bond literature. Same sample, different moments, slightly different estimates.
The instrument matrix now has 3 columns (two lags of \(y\) plus the exogenous \(\Delta x\)) against only 2 parameters in \(X\) — the system is over-identified by one moment. Because \(y_{i,t-3}\) must come from the observed window, we further drop the \(t=3\) row, leaving 250 rows (5 periods \(\times\) 50 units).
# Adding y_lag3 as an instrument means we need y_lag3 to be observed, which
# requires t >= 4 (the t = 3 row in toy_d would have y_lag3 reaching pre-
# window data). Drop those rows so every instrument in Z is genuinely
# available from the observed sample — same logic as listwise deletion.
toy_g <- toy_d |> drop_na(y_lag3)
dy_vec <- toy_g$dy
X_mat <- cbind(dy_lag = toy_g$dy_lag,
dx = toy_g$dx)
# Wider instrument matrix: two lags of y plus the exogenous dx
Z_mat <- cbind(y_lag2 = toy_g$y_lag2,
y_lag3 = toy_g$y_lag3,
dx = toy_g$dx)
# One-step weighting matrix W = (Z'Z)^{-1}
ZpZ <- crossprod(Z_mat) # 3x3
W_one <- solve(ZpZ)
# Z'X and Z'dy
ZpX <- crossprod(Z_mat, X_mat) # 3x2
Zpdy <- crossprod(Z_mat, dy_vec) # 3x1
# GMM normal equation: θ = (X'Z W Z'X)^{-1} X'Z W Z'Δy
XpZ_W <- t(ZpX) %*% W_one
A_mat <- XpZ_W %*% ZpX # 2x2 — the bread
b_vec <- XpZ_W %*% Zpdy # 2x1 — the meat
theta_GMM <- solve(A_mat, b_vec)
rownames(theta_GMM) <- c("phi (dy_lag)", "beta (dx)")
The one-step weighting matrix:
knitr::kable(W_one, digits = 4,
caption = "W = (Z'Z)^{-1}, the one-step GMM weighting matrix (3 × 3).")
| y_lag2 | y_lag3 | dx | |
|---|---|---|---|
| y_lag2 | 0.0027 | -0.0026 | 0.0000 |
| y_lag3 | -0.0026 | 0.0027 | 0.0000 |
| dx | 0.0000 | 0.0000 | 0.0018 |
The GMM estimates:
knitr::kable(theta_GMM, digits = 3,
caption = "One-step Arellano-Bond GMM estimates from the closed-form solve.")
| phi (dy_lag) | 0.563 |
| beta (dx) | 0.444 |
A side-by-side of the three estimators on the same data:
tibble(parameter = c("phi", "beta"),
true = c(phi_true, beta_true),
FE = c(phi_fe, beta_fe),
AH = c(theta_AH[1, 1], theta_AH[2, 1]),
GMM = c(theta_GMM[1, 1], theta_GMM[2, 1])) |>
knitr::kable(digits = 3,
caption = "FE biased, AH unbiased (just-identified), GMM unbiased and more efficient (over-identified). With only one extra moment the GMM gain is small, but the principle generalizes to the larger moment sets used in pgmm.")
| parameter | true | FE | AH | GMM |
|---|---|---|---|---|
| phi | 0.7 | 0.511 | 0.788 | 0.563 |
| beta | 0.5 | 0.427 | 0.479 | 0.444 |
The GMM estimate sits very close to the AH estimate — that is the expected behavior with only one extra moment. The conceptual payoff is the machinery: we constructed the moment conditions \(Z' (\Delta y - X \theta) = 0\), picked a weighting matrix, and solved the over-identified system in one closed-form line. Every estimator in Parts 3 and 4 is just this template scaled up.
pgmm()Run the just-identified pgmm AH fit on the same sample
and put the coefficients side by side with our manual result:
pdat_toy <- pdata.frame(toy, index = c("unit", "time"))
ah_pgmm <- pgmm(y ~ lag(y, 1) + x | lag(y, 2),
data = pdat_toy,
effect = "individual",
model = "onestep",
transformation = "d")
tibble(parameter = c("phi", "beta"),
manual_literal_AH = c(theta_AH[1, 1], theta_AH[2, 1]),
pgmm_AH = c(unname(coef(ah_pgmm)["lag(y, 1)"]),
unname(coef(ah_pgmm)["x"]))) |>
knitr::kable(digits = 6,
caption = "Our literal Anderson-Hsiao (one stacked moment for y_{t-2}) vs pgmm with the same instrument specification.")
| parameter | manual_literal_AH | pgmm_AH |
|---|---|---|
| phi | 0.787692 | 0.675650 |
| beta | 0.479428 | 0.456828 |
The numbers are close but do not match exactly. The
reason is not the sample — both fits use the same 300 rows. It is the
moment structure. Look at the instrument matrix
pgmm builds for the first unit:
# pgmm's per-unit instrument block: rows are periods (t = 3..8),
# columns are: 6 period-specific copies of y_{t-2}, plus 1 column for x.
ah_pgmm$W[[1]] |>
round(3) |>
as.data.frame() |>
knitr::kable(caption = "pgmm's instrument matrix for the first unit (6 × 7). The block-diagonal structure of the first six columns reveals that pgmm splits the lag-2 instrument into one moment **per period**, not a single stacked moment.")
| V1 | V2 | V3 | V4 | V5 | V6 | x | |
|---|---|---|---|---|---|---|---|
| 3 | -4.542 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | -1.634 |
| 4 | 0.000 | -6.109 | 0.000 | 0.000 | 0.000 | 0.000 | 0.583 |
| 5 | 0.000 | 0.000 | -7.983 | 0.000 | 0.000 | 0.000 | 0.804 |
| 6 | 0.000 | 0.000 | 0.000 | -6.881 | 0.000 | 0.000 | 0.001 |
| 7 | 0.000 | 0.000 | 0.000 | 0.000 | -6.975 | 0.000 | -0.859 |
| 8 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | -6.133 | -0.364 |
What our literal AH did with one y-instrument
moment, pgmm does with six — one per
period \(t = 3, \ldots, 8\). Each
column of the block-diagonal section is “\(y_{i,t-2}\) at period \(t\), zero elsewhere”. So
pgmm’s “just-identified” AH spec is actually
over-identified internally, with \(T - 2 =
6\) moments for y plus 1 moment for x. Different moment system,
slightly different estimates in finite samples.
To replicate pgmm exactly by hand, we
need two things:
# Rebuild X and y from toy_d (the AH-sample, 300 rows) — §1.6 overwrote
# X_mat and dy_vec with the smaller §1.6 GMM sample (toy_g, 250 rows),
# so we can't reuse those here.
dy_vec_AH <- toy_d$dy
X_mat_AH <- cbind(dy_lag = toy_d$dy_lag, dx = toy_d$dx)
unit_ids <- sort(unique(toy_d$unit))
n_per <- length(unique(toy_d$time)) # 6 differenced periods (t = 3..8)
period_vals <- sort(unique(toy_d$time))
# Build per-unit Z blocks: rows = periods kept for unit i, cols = (6 y instrs + 1 x instr).
Z_blocks <- lapply(unit_ids, function(i) {
d_i <- toy_d |> filter(unit == i) |> arrange(time)
Z_i <- matrix(0, nrow = nrow(d_i), ncol = n_per)
for (k in seq_len(nrow(d_i))) {
p_idx <- match(d_i$time[k], period_vals)
Z_i[k, p_idx] <- d_i$y_lag2[k]
}
cbind(Z_i, dx = d_i$dx)
})
# Arellano-Bond H matrix: tridiagonal with 2 on the diagonal, -1 off
H_AB <- matrix(0, n_per, n_per)
diag(H_AB) <- 2
for (k in seq_len(n_per - 1)) {
H_AB[k, k + 1] <- -1
H_AB[k + 1, k] <- -1
}
# One-step W = (sum_i Z_i' H Z_i)^{-1}
ZHZ <- Reduce("+", lapply(Z_blocks, function(Z_i) t(Z_i) %*% H_AB %*% Z_i))
W_AB <- solve(ZHZ)
# Stack Z, X, y across units and solve the GMM normal equation
Z_all <- do.call(rbind, Z_blocks)
ZtX <- crossprod(Z_all, X_mat_AH)
Zty <- crossprod(Z_all, dy_vec_AH)
theta_AH_pgmm_style <- solve(t(ZtX) %*% W_AB %*% ZtX,
t(ZtX) %*% W_AB %*% Zty)
rownames(theta_AH_pgmm_style) <- c("phi (dy_lag)", "beta (dx)")
Side by side with pgmm:
tibble(parameter = c("phi", "beta"),
manual_pgmm_style_AH = c(theta_AH_pgmm_style[1, 1],
theta_AH_pgmm_style[2, 1]),
pgmm_AH = c(unname(coef(ah_pgmm)["lag(y, 1)"]),
unname(coef(ah_pgmm)["x"]))) |>
knitr::kable(digits = 10,
caption = "Manual GMM-style AH (period-stacked Z + Arellano-Bond H weighting) versus pgmm. The two now agree to many decimal places.")
| parameter | manual_pgmm_style_AH | pgmm_AH |
|---|---|---|
| phi | 0.6756498 | 0.6756498 |
| beta | 0.4568281 | 0.4568281 |
Exact match. Three pedagogical takeaways:
pgmm, Stata’s
xtabond/xtabond2 — implements a richer
period-stacked version with the Arellano-Bond
H-weighting. Both are consistent; they give slightly different point
estimates in finite samples.pgmm packaging exactly this
structure.pgmm() and the diagnostic battery for
dynamic-panel GMMPart 1 built the mechanics of one-step GMM from scratch. The real
workhorse for routine work is plm::pgmm(), which packages
the same machinery (Anderson-Hsiao, Arellano-Bond, Blundell-Bond) with
built-in diagnostics. Part 2 explains the function and walks through the
five tests summary.pgmm prints by default, demonstrating
each on a small worked example.
GMM is not a magic correction for endogeneity. It is a
moment-based inference technique that delivers consistent
estimates only if its identifying assumptions hold. Four of those
assumptions get a formal test on every summary.pgmm
printout; a fifth (the instrument count) is a structural check.
The five-row checklist (the order in which to read the printout):
| Assumption | Test | \(H_0\) | What we want to see |
|---|---|---|---|
| Instrument validity (overid) | Sargan / Hansen | \(E[Z'\varepsilon] = 0\) | Fail to reject (\(p > 0.10\)) |
| Error process is AR(1), not AR(2)+ | AR(1) on \(\Delta\hat\varepsilon\) | no first-order serial correlation in \(\Delta\hat\varepsilon\) | Reject (\(p < 0.05\) — mechanical) |
| Same | AR(2) on \(\Delta\hat\varepsilon\) | no second-order serial correlation in \(\Delta\hat\varepsilon\) | Fail to reject (\(p > 0.05\)) |
| Time effects matter | Wald on \(\tau_t\) | all year dummies \(= 0\) | Reject → keep year FE; fail to reject → safe to drop |
| Instruments not too many | # instruments \(< N\) | (rule of thumb) | Check by hand |
pgmm() function — formula syntax and
argumentspgmm() is the workhorse for difference and system GMM in
R. Its call has a few moving parts that deserve close reading before any
fit.
Anatomy of a pgmm formula. The formula
uses up to three blocks separated by |:
y ~ <regressors> | <GMM-style instruments> | <regular instruments (optional)>
| — the regression equation,
written in levels. Use lag(y, 1) for the lagged
outcome; pgmm will internally difference it (under
transformation = "d") or stack it with the level equation
(under transformation = "ld"). For other regressors, write
the variable name for the contemporaneous value;
lag(x, 0:1) adds both the contemporaneous and the
lag-1 term.| block — GMM-style instruments.
Write lag(y, 2:99) to use lags 2, 3, …, up to whatever the
panel supports, each one stacked per period (one moment
for each lag × period combination — see the demonstration in §1.7). To
include lagged levels of other regressors as instruments, list them too:
lag(y, 2:99) + lag(x, 2:99). This is where the
over-identification lives.| block (optional) — regular
instruments. Variables that enter as one stacked moment
(standard IV-style), no per-period expansion. Use this for strictly
exogenous regressors that you do not want spread across
periods. If you put x here, pgmm treats it as
its own instrument.The five main arguments:
| Argument | Choices | What it controls |
|---|---|---|
data |
a pdata.frame |
Panel data with index = c(unit, time).
Construct with
pdata.frame(df, index = c("country", "year")). |
effect |
"individual", "twoways",
"time" |
"individual" adds only unit FE;
"twoways" adds period dummies too (and a Wald test on
them); "time" only period FE. Default:
"twoways". |
model |
"onestep", "twosteps" |
One-step uses the Arellano-Bond H-weighted weighting matrix (§1.7). Two-step re-estimates \(W\) using one-step residuals — asymptotically efficient but two-step SEs need the Windmeijer correction (§3.4). |
transformation |
"d", "ld" |
"d" = difference GMM (one differenced
equation). "ld" =
levels-and-differences = system GMM.
System GMM adds the level equation, instrumented by lagged
differences. |
collapse |
TRUE, FALSE |
Collapse the per-period instruments into a single
column per lag (cuts moment count; see Roodman 2009 and §2.5). Available
in newer plm versions. |
A worked example. Fit a small Diff-GMM on
EmplUK and print the summary so we have something to point
at when we walk through each test below.
data("EmplUK", package = "plm")
# A modest Diff-GMM specification on the canonical AB-91 panel.
# Read the formula in three pieces:
# * Left of |: the regression equation. lag(log(emp), 1:2) puts both AR(1)
# and AR(2) lags on the RHS; lag(log(wage), 0:1) puts the
# contemporaneous wage and its lag; log(capital) and
# lag(log(output), 0:1) are exogenous regressors.
# * Between the
# two |s: GMM-style instruments. lag(log(emp), 2:4) uses lags 2, 3,
# and 4 of log(emp), each stacked per period (one moment per
# (lag, period) pair — see §1.7). Capping at 2:4 keeps the
# instrument count moderate (avoiding §2.5's pathologies).
# * After second |: there is no third block here. We could add regular IV
# style instruments by writing "| ... | log(capital)" but
# for this demo we let log(capital) be a regressor only.
fit_demo <- pgmm(
log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) +
log(capital) + lag(log(output), 0:1) |
lag(log(emp), 2:4),
data = EmplUK,
effect = "twoways", # unit FE + time FE → Wald-on-tau_t will print
model = "twosteps", # use optimal two-step weighting
transformation = "d" # difference GMM
)
# Capture summary once — the diagnostics below pull components out of this object.
sum_demo <- summary(fit_demo, robust = TRUE) # robust = TRUE → Windmeijer SEs
# Print the full summary so you have it on screen for the §2.2-§2.5 walkthrough
sum_demo
## Twoways effects Two-steps model Difference GMM
##
## Call:
## pgmm(formula = log(emp) ~ lag(log(emp), 1:2) + lag(log(wage),
## 0:1) + log(capital) + lag(log(output), 0:1) | lag(log(emp),
## 2:4), data = EmplUK, effect = "twoways", model = "twosteps",
## transformation = "d")
##
## Unbalanced Panel: n = 140, T = 7-9, N = 1031
##
## Number of Observations Used: 611
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.587153 -0.021510 0.000000 0.001498 0.035743 0.702491
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(log(emp), 1:2)1 0.0331317 0.2429704 0.1364 0.89154
## lag(log(emp), 1:2)2 0.0042604 0.0578536 0.0736 0.94130
## lag(log(wage), 0:1)0 -0.3289821 0.1460541 -2.2525 0.02429 *
## lag(log(wage), 0:1)1 0.0123661 0.1050457 0.1177 0.90629
## log(capital) 0.3786318 0.0603133 6.2777 3.435e-10 ***
## lag(log(output), 0:1)0 0.4403456 0.1786435 2.4649 0.01370 *
## lag(log(output), 0:1)1 -0.0313526 0.1760058 -0.1781 0.85862
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(15) = 15.4708 (p-value = 0.41807)
## Autocorrelation test (1): normal = 0.1924172 (p-value = 0.84742)
## Autocorrelation test (2): normal = -0.4885348 (p-value = 0.62517)
## Wald test for coefficients: chisq(7) = 108.6394 (p-value = < 2.22e-16)
## Wald test for time dummies: chisq(6) = 10.85012 (p-value = 0.093121)
The rest of Part 2 walks through each of the five diagnostic rows on this single fit, then transitions to the family of estimators in Part 3.
The first concern with GMM is: are our instruments actually exogenous? With more moment conditions than parameters, we can ask whether they are jointly consistent with the data.
The null hypothesis is
\[H_0: \quad E[Z' \varepsilon] = 0 \quad (\text{the instrument set is jointly exogenous}).\]
The test statistic. GMM minimizes the quadratic form \(\hat g(\theta)' W \hat g(\theta)\) where \(\hat g\) is the sample-moment vector. At the GMM optimum this quadratic form is asymptotically \(\chi^2_{m-k}\) under \(H_0\), where \(m\) is the number of moment conditions and \(k\) the number of estimated parameters. So the value of the GMM objective at the optimum serves as the test statistic.
Two variants:
| Name | Built on | Robust to heteroscedasticity? |
|---|---|---|
| Sargan | one-step weighting matrix | No (assumes spherical errors) |
| Hansen-J | two-step (optimal) weighting matrix | Yes |
summary.pgmm reports Sargan by default; Stata’s
xtabond2 also reports Hansen. They share the same null and
the same \(\chi^2\) distribution, but
Hansen is more robust to heteroscedastic / serially correlated
errors.
Decision rule.
Do not reject (\(p > 0.10\)): instruments pass the joint exogeneity test — cautious go-ahead. Reject (\(p < 0.05\)): at least one moment condition is invalid. Diagnose by tightening lag depth (start instruments at lag 3 instead of lag 2) or by adding more lags of the DV.
Two caveats that every student gets wrong on first read:
On fit_demo. Pull the Sargan slot
directly out of the summary object — summary.pgmm stores it
as a list with $statistic, $parameter (the
degrees of freedom \(m - k\)), and
$p.value.
# The Sargan slot inside the summary object
sarg <- sum_demo$sargan
tibble(
statistic = sarg$statistic, # the value of the GMM objective at the optimum
df = sarg$parameter, # m - k = (# instruments) - (# parameters)
p_value = sarg$p.value, # P(chi^2_df > statistic)
decision = ifelse(sarg$p.value > 0.10,
"Fail to reject — instruments pass (cautious)",
"Reject — instrument validity in doubt")
) |>
knitr::kable(digits = 3, caption = "Sargan test extracted from sum_demo$sargan.")
| statistic | df | p_value | decision |
|---|---|---|---|
| 15.471 | 15 | 0.418 | Fail to reject — instruments pass (cautious) |
The Arellano-Bond identification strategy assumes the levels error \(\varepsilon_{it}\) is iid — in particular, no serial correlation. If that fails, the instrument \(y_{i,t-2}\) (which depends on \(\varepsilon_{i,t-2}\)) becomes correlated with \(\Delta \varepsilon_{it} = \varepsilon_{it} - \varepsilon_{i,t-1}\), breaking exogeneity.
We can’t test the levels error directly — we never observe it — but we can test its differenced counterpart \(\Delta\hat\varepsilon_{it}\), and translate the results.
The mechanical fact. If \(\varepsilon_{it}\) is iid, then \(\Delta\varepsilon_{it}\) has an MA(1) structure because consecutive differences share a \(\varepsilon_{i,t-1}\) term. So:
Decision rules.
| Test | \(H_0\) | What you want |
|---|---|---|
| AR(1) | \(\text{Corr}(\Delta\hat\varepsilon_{it}, \Delta\hat\varepsilon_{i,t-1}) = 0\) | Reject (\(p < 0.05\)). If you fail to reject, your model is not actually dynamic — recheck the LDV. |
| AR(2) | \(\text{Corr}(\Delta\hat\varepsilon_{it}, \Delta\hat\varepsilon_{i,t-2}) = 0\) | Fail to reject (\(p > 0.05\)). Rejecting means levels error is not iid — lag-2 instrument is invalid. |
What to do if AR(2) rejects. Three options, in order of cost:
lag(y, 2:99)
to lag(y, 3:99). Loses one lag’s worth of moments.lag(y, 1:2) on the RHS instead of just
lag(y, 1). The extra lag soaks up the residual
autocorrelation.On fit_demo. The summary object stores
AR(1) and AR(2) as $m1 and $m2, each a list
with $statistic (a \(z\)-statistic) and
$p.value.
# AR(1) and AR(2) slots from the summary
ar1 <- sum_demo$m1 # first-order serial correlation in Δ residuals
ar2 <- sum_demo$m2 # second-order serial correlation in Δ residuals
tibble(
test = c("AR(1) on Δresid", "AR(2) on Δresid"),
z_stat = c(ar1$statistic, ar2$statistic),
p_value = c(ar1$p.value, ar2$p.value),
decision = c(
ifelse(ar1$p.value < 0.05,
"Reject (expected — mechanical from differencing)",
"Fail to reject (concern: model may not be dynamic)"),
ifelse(ar2$p.value > 0.05,
"Fail to reject (levels error looks iid — lag-2 instrument OK)",
"Reject (levels error is NOT iid — lag-2 instrument suspect)")
)
) |>
knitr::kable(digits = 3, caption = "Arellano-Bond AR tests on the differenced residuals.")
| test | z_stat | p_value | decision |
|---|---|---|---|
| AR(1) on Δresid | 0.192 | 0.847 | Fail to reject (concern: model may not be dynamic) |
| AR(2) on Δresid | -0.489 | 0.625 | Fail to reject (levels error looks iid — lag-2 instrument OK) |
When pgmm is fit with effect = "twoways",
the summary prints a Wald-type test on the joint significance of the
year dummies.
In our Part 4 democracy fits, we use
effect = "individual" for numerical reasons (see §4.1), so
this test does not print. But on shorter panels where two-way effects
are feasible, this is a routine check.
On fit_demo. Because we set
effect = "twoways" above, the summary stores a Wald slot at
$wald.td (Wald-on-time-dummies). It’s a list with
$statistic (a \(\chi^2\)
value), $parameter (df = number of period dummies), and
$p.value.
wd <- sum_demo$wald.td
tibble(
test = "Wald on time dummies (H0: all tau_t = 0)",
chi_sq = wd$statistic,
df = wd$parameter,
p_value = wd$p.value,
decision = ifelse(wd$p.value < 0.05,
"Reject — keep year FE in the spec",
"Fail to reject — safe to drop year FE")
) |>
knitr::kable(digits = 3, caption = "Wald test on the period dummies.")
| test | chi_sq | df | p_value | decision |
|---|---|---|---|---|
| Wald on time dummies (H0: all tau_t = 0) | 10.85 | 6 | 0.093 | Fail to reject — safe to drop year FE |
This is the only diagnostic in the battery that is not a statistical test. It’s a structural property of your specification, and getting it wrong can sabotage every other test in the printout.
The mechanics. Difference GMM with
lag(y, 2:99) uses one moment for each (period, lag)
combination available in the panel. The instrument count on the LDV
alone grows roughly as
\[\frac{(T-1)(T-2)}{2}\]
— quadratic in \(T\). For \(T = 9\) (EmplUK) this gives 28 moments per regressor; for \(T = 20\) (typical macro panel) it gives 171; for \(T = 40\) (our democracy panel) it can exceed 700. System GMM roughly doubles these because the level equation contributes its own moments on top of the differenced ones.
Three pathologies of too-many-instruments. When the moment count grows large relative to \(N\) (the number of panel units):
pgmm falls back to a generalized inverse and emits
general inverse is used warnings. The two-step estimator is
still consistent but loses its precision guarantee.These pathologies often coexist. A fit with 700 instruments on \(N = 130\) countries will pass every diagnostic on the printout and be a poor approximation to the truth. The instrument count is the cheapest sanity check.
Two remedies for too many instruments.
lag(y, 2:99)
with lag(y, 2:k) for a small \(k\). Common choices: \(k = 3\), \(4\), or \(5\). The moment count drops from quadratic
in \(T\) to linear, killing most of the
over-identification problem.xtabond2 has a collapse option that
column-sums the per-period moments into a single moment per (regressor,
lag). plm::pgmm doesn’t expose collapse directly, but
limiting lag depth gives a similar reduction in moment count and is the
practical equivalent in R.Rule of thumb (Roodman 2009b). Keep the moment count
below \(N\), ideally well below. EmplUK
has \(N = 140\) firms, so the full AB
spec (~30 moments) sits comfortably within bounds. The democracy panel
has \(N \approx 130\) countries with
\(T\) up to 40, so the full
lag(y, 2:99) spec will balloon past \(N\) — exactly the regime where the
pathologies bite hardest. We will demonstrate this in Part 4.
The practical workflow (applied in Part 4):
2:99), moderate (2:4 or
2:5), and aggressively truncated (2:3).On fit_demo. fit_demo$W is
a list with one entry per unit; each entry is the per-unit instrument
matrix. ncol(fit_demo$W[[1]]) gives the total moment count
(same across units).
# Instrument count and units, from the fitted object directly
n_inst <- length(fit_demo$W[[1]][1, ]) # # columns in the per-unit instrument matrix
n_units <- length(fit_demo$W) # # of unit-level blocks
tibble(
metric = c("# instruments", "# units (N)", "ratio (# inst / N)"),
value = c(n_inst, n_units, round(n_inst / n_units, 2)),
decision = c(NA, NA,
ifelse(n_inst < n_units,
"Passes rule of thumb (# inst < N)",
"Fails — too many instruments"))
) |>
knitr::kable(caption = "Instrument count and the Roodman rule-of-thumb check.")
| metric | value | decision |
|---|---|---|
| # instruments | 28.0 | NA |
| # units (N) | 140.0 | NA |
| ratio (# inst / N) | 0.2 | Passes rule of thumb (# inst < N) |
When GMM is fragile (small \(T\), near-unit-root outcome, weak instruments), an alternative path is transformed-likelihood estimation. The idea: instead of instrumenting your way past the \(\alpha_i\) problem, marginalize \(\alpha_i\) out of the likelihood through an information-orthogonal reparameterization.
Pickup et al. (2017) introduced this for political-science panels;
Pickup & Hopkins (2022) compared it to GMM in a Monte Carlo and
concluded that the orthogonal-panel model
(OrthoPanels::opm()) is more robust for \(T \in [5, 20]\). It also avoids the
instrument-count headache entirely.
We will not fit OPM in this lab — it is a different inferential paradigm (Bayesian, posterior samples) and deserves its own treatment. If you find yourself in the small-\(T\) regime where Diff-GMM and Sys-GMM disagree, OPM is the next thing to try.
Bridge from Part 1. Part 1 built Anderson-Hsiao and one-step GMM by hand on a toy panel — explicit \(X\), \(Z\), \(W\) matrices and a single
solve()for the closed form. Part 3 does the same logic on real data viapgmm(), which packages the matrix algebra for us and uses a richer per-period moment structure than our manual stacking (compare §1.6 and §1.7). The machinery is identical; the conveniences are notational.
For Part 3 we use the canonical AB/BB demonstration data,
EmplUK. It is shipped with plm and is the
panel that Arellano & Bond (1991) used in the original paper: \(N = 140\) UK firms, \(T = 7\)–\(9\) years (1976–1984), outcome
emp (employment).
data("EmplUK", package = "plm")
glimpse(EmplUK)
## Rows: 1,031
## Columns: 7
## $ firm <dbl> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3,…
## $ year <dbl> 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1977, 1978, 1979, 19…
## $ sector <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,…
## $ emp <dbl> 5.041, 5.600, 5.015, 4.715, 4.093, 3.166, 2.936, 71.319, 70.64…
## $ wage <dbl> 13.1516, 12.3018, 12.8395, 13.8039, 14.2897, 14.8681, 13.7784,…
## $ capital <dbl> 0.5894, 0.6318, 0.6771, 0.6171, 0.5076, 0.4229, 0.3920, 16.936…
## $ output <dbl> 95.7072, 97.3569, 99.6083, 100.5501, 99.5581, 98.6151, 100.030…
length(unique(EmplUK$firm))
## [1] 140
range(EmplUK$year)
## [1] 1976 1984
EmplUK is in exactly the regime where Nickell bias is
severe — small \(T\), persistent
outcome. Perfect classroom data.
Difference the dynamic panel equation:
\[ \Delta y_{it} = \phi\, \Delta y_{i,t-1} + \beta\, \Delta x_{it} + \Delta \varepsilon_{it} \]
The unit fixed effects \(\alpha_i\) vanish — they are constant in \(t\), so \(\Delta \alpha_i = 0\). Differencing achieves what within-demeaning achieves, but without mixing future \(\varepsilon\) into the lag through \(\bar y_{i,-1}\). This is the core idea behind every estimator in the GMM family.
Of course, the differenced lag \(\Delta y_{i,t-1}\) is still correlated with \(\Delta\varepsilon_{it}\) — both contain \(\varepsilon_{i,t-1}\) mechanically. We need an instrument.
Anderson and Hsiao proposed using \(y_{i,t-2}\) as IV for \(\Delta y_{i,t-1}\). The exclusion logic is the same as in §1.4: \(y_{i,t-2}\) is correlated with \(\Delta y_{i,t-1}\) (relevance) but, under iid \(\varepsilon\), uncorrelated with \(\Delta\varepsilon_{it}\) (exclusion — precisely the assumption tested by the AR(2) diagnostic in §2.2).
The pgmm call below implements that logic. Read the
formula in two halves:
| — the regression equation in
level form. lag(log(emp), 1) is the lagged
outcome; pgmm internally differences it.| — the GMM instrument block.
lag(log(emp), 2) says use only the second lag of
log(emp) as an instrument — that is the Anderson-Hsiao
restriction.ah_fit <- pgmm(log(emp) ~ lag(log(emp), 1) + log(wage) + log(capital) |
lag(log(emp), 2),
data = EmplUK,
effect = "individual", # only unit FE (no time dummies)
model = "onestep", # one-step is enough for just-identified IV
transformation = "d") # "d" = difference GMM
summary(ah_fit)
## Oneway (individual) effect One-step model Difference GMM
##
## Call:
## pgmm(formula = log(emp) ~ lag(log(emp), 1) + log(wage) + log(capital) |
## lag(log(emp), 2), data = EmplUK, effect = "individual", model = "onestep",
## transformation = "d")
##
## Unbalanced Panel: n = 140, T = 7-9, N = 1031
##
## Number of Observations Used: 751
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.903849 -0.045858 0.000000 -0.001085 0.041154 0.844502
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(log(emp), 1) 0.801824 0.157098 5.1040 3.326e-07 ***
## log(wage) -0.631281 0.195599 -3.2274 0.001249 **
## log(capital) 0.241204 0.056267 4.2868 1.813e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(6) = 34.79026 (p-value = 4.732e-06)
## Autocorrelation test (1): normal = -3.923134 (p-value = 8.7404e-05)
## Autocorrelation test (2): normal = -1.10812 (p-value = 0.26781)
## Wald test for coefficients: chisq(3) = 605.8932 (p-value = < 2.22e-16)
summary(ah_fit) prints the coefficients, the Sargan test
(degenerate here — just-identified means \(m -
k = 0\) and the test does not apply), and the Arellano-Bond
AR(1)/AR(2) tests on the differenced residuals. AH is rarely used in
practice (it leaves valid moments on the table) but is the right
conceptual stepping stone between “no IV” and “many IVs”.
The key insight from Arellano & Bond: under the AH exogeneity assumption, all lags \(y_{i,t-2}, y_{i,t-3}, \ldots\) are valid instruments for \(\Delta y_{i,t-1}\). With each successive period \(t\) we gain one more usable lag. Stacking them all up turns AH into an over-identified GMM problem with a corresponding efficiency gain.
The standard Arellano-Bond difference GMM specification on
EmplUK. Pay attention to the formula syntax:
lag(log(emp), 1:2) — both AR(1) and AR(2) terms enter
the equation. This is heavier dynamics than what we’ll fit on the
democracy panel, but it matches the published AB specification.lag(log(wage), 0:1) — both contemporaneous wage
and lagged wage. The 0 means “the contemporaneous
value”, 1 means “the first lag”.| lag(log(emp), 2:99) — instrument block.
2:99 means “use every lag from 2 up to 99 (or as many as
the panel allows)”. This is the full Arellano-Bond moment set.ab_fit <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) +
log(capital) + lag(log(output), 0:1) |
lag(log(emp), 2:99),
data = EmplUK,
effect = "twoways", # unit + time fixed effects (AB-91 spec)
model = "twosteps", # use the optimal two-step weighting matrix
transformation = "d") # difference GMM
summary(ab_fit, robust = TRUE) # robust = TRUE → Windmeijer-corrected SEs
## Twoways effects Two-steps model Difference GMM
##
## Call:
## pgmm(formula = log(emp) ~ lag(log(emp), 1:2) + lag(log(wage),
## 0:1) + log(capital) + lag(log(output), 0:1) | lag(log(emp),
## 2:99), data = EmplUK, effect = "twoways", model = "twosteps",
## transformation = "d")
##
## Unbalanced Panel: n = 140, T = 7-9, N = 1031
##
## Number of Observations Used: 611
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.6190677 -0.0255683 0.0000000 -0.0001339 0.0332013 0.6410272
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(log(emp), 1:2)1 0.474151 0.185398 2.5575 0.0105437 *
## lag(log(emp), 1:2)2 -0.052967 0.051749 -1.0235 0.3060506
## lag(log(wage), 0:1)0 -0.513205 0.145565 -3.5256 0.0004225 ***
## lag(log(wage), 0:1)1 0.224640 0.141950 1.5825 0.1135279
## log(capital) 0.292723 0.062627 4.6741 2.953e-06 ***
## lag(log(output), 0:1)0 0.609775 0.156263 3.9022 9.530e-05 ***
## lag(log(output), 0:1)1 -0.446373 0.217302 -2.0542 0.0399605 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(25) = 30.11247 (p-value = 0.22011)
## Autocorrelation test (1): normal = -1.53845 (p-value = 0.12394)
## Autocorrelation test (2): normal = -0.2796829 (p-value = 0.77972)
## Wald test for coefficients: chisq(7) = 142.0353 (p-value = < 2.22e-16)
## Wald test for time dummies: chisq(6) = 16.97046 (p-value = 0.0093924)
What to read off the printout, applying the §2 checklist top-to-bottom:
lag(log(emp), 1), \(\hat\phi_2
\approx -0.05\) on lag(log(emp), 2) — modest
persistence with slight overshooting. Standard errors are the
Windmeijer-corrected two-step SEs because we set
robust = TRUE.effect = "twoways".When the outcome is highly persistent (\(\phi\) near 1), the moment \(E[y_{i,t-2} \cdot \Delta y_{i,t-1}]\) shrinks toward zero — lagged levels become weak instruments for the differenced equation. In macro panels this is the typical regime, not the exception.
Blundell and Bond’s solution: stack a level equation alongside the differenced equation, and instrument the level equation with lagged differences.
\[ \begin{aligned} \text{Differenced eq:} &\quad \Delta y_{it} = \phi\, \Delta y_{i,t-1} + \beta\, \Delta x_{it} + \Delta\varepsilon_{it} \quad \text{(IVs: } y_{i,t-2}, y_{i,t-3}, \ldots\text{)} \\ \text{Level eq:} &\quad y_{it} = \alpha_i + \phi\, y_{i,t-1} + \beta\, x_{it} + \varepsilon_{it} \quad \text{(IVs: } \Delta y_{i,t-1}, \Delta x_{i,t-1}\text{)} \end{aligned} \]
The new exogeneity assumption: lagged differences are uncorrelated with \(\alpha_i\). Equivalent to a stationarity condition on the initial conditions; reasonable in steady-state panels, less so when the panel is observed mid-transition.
In pgmm we get system GMM by setting
transformation = "ld" (levels
and differences):
bb_fit <- pgmm(log(emp) ~ lag(log(emp), 1) + lag(log(wage), 0:1) +
lag(log(capital), 0:1) |
lag(log(emp), 2:99) + lag(log(wage), 2:99) +
lag(log(capital), 2:99),
data = EmplUK,
effect = "twoways",
model = "onestep",
transformation = "ld") # "ld" = levels + differences (system GMM)
summary(bb_fit, robust = TRUE)
## Twoways effects One-step model System GMM
##
## Call:
## pgmm(formula = log(emp) ~ lag(log(emp), 1) + lag(log(wage), 0:1) +
## lag(log(capital), 0:1) | lag(log(emp), 2:99) + lag(log(wage),
## 2:99) + lag(log(capital), 2:99), data = EmplUK, effect = "twoways",
## model = "onestep", transformation = "ld")
##
## Unbalanced Panel: n = 140, T = 7-9, N = 1031
##
## Number of Observations Used: 1642
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.7530341 -0.0369030 0.0000000 0.0002882 0.0466069 0.6001503
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(log(emp), 1) 0.935605 0.026295 35.5810 < 2.2e-16 ***
## lag(log(wage), 0:1)0 -0.630976 0.118054 -5.3448 9.050e-08 ***
## lag(log(wage), 0:1)1 0.482620 0.136887 3.5257 0.0004224 ***
## lag(log(capital), 0:1)0 0.483930 0.053867 8.9838 < 2.2e-16 ***
## lag(log(capital), 0:1)1 -0.424393 0.058479 -7.2572 3.952e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(100) = 118.763 (p-value = 0.097096)
## Autocorrelation test (1): normal = -4.808434 (p-value = 1.5212e-06)
## Autocorrelation test (2): normal = -0.2800133 (p-value = 0.77947)
## Wald test for coefficients: chisq(5) = 11174.82 (p-value = < 2.22e-16)
## Wald test for time dummies: chisq(7) = 14.71138 (p-value = 0.039882)
Two things change compared with diff-GMM:
|, not just the lagged DV. Each block contributes its own
moment conditions.GMM minimizes a quadratic form \(\hat g(\theta)' \, W \, \hat g(\theta)\) where \(\hat g\) is the sample moment vector and \(W\) is a weighting matrix. Two strategies:
In pgmm we picked model = "twosteps" for AB
above. The catch: two-step standard errors are downward
biased in finite samples — sometimes severely. Windmeijer
(2005) derived a correction. Invoke it via robust = TRUE in
summary.pgmm:
# Two-step AB fit, plain SEs
summary(ab_fit, robust = FALSE)$coefficients[, 2] |> head(3)
## lag(log(emp), 1:2)1 lag(log(emp), 1:2)2 lag(log(wage), 0:1)0
## 0.08530307 0.02728433 0.04934539
# Two-step AB fit, Windmeijer-style robust SEs
summary(ab_fit, robust = TRUE)$coefficients[, 2] |> head(3)
## lag(log(emp), 1:2)1 lag(log(emp), 1:2)2 lag(log(wage), 0:1)0
## 0.1853985 0.0517491 0.1455653
The robust SEs in the second call are larger — that’s the Windmeijer correction in action. Always report the robust two-step SEs; the plain ones are not safe in finite samples.
We argued in §2.4 that too many instruments degrades both the
weighting matrix and the Sargan diagnostic. Here we demonstrate the
remedy: refit the same AB specification but cap the lag depth at
2:4 so the moment count stays moderate.
ab_fit_collapsed <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) +
log(capital) + lag(log(output), 0:1) |
lag(log(emp), 2:4),
data = EmplUK,
effect = "twoways",
model = "twosteps",
transformation = "d")
And the instrument-count table across the EmplUK fits from this part:
# Helper functions to pull instrument count and N from a pgmm object.
# fit$W is a list of unit-level instrument matrices; ncol() of one block
# gives the moment count (same for every unit).
n_inst <- function(fit) length(fit$W[[1]][1, ])
n_units <- function(fit) length(fit$W)
tibble(model = c("AH (just-id)", "AB (full)", "AB (lag 2:4)", "BB (full)"),
n_instruments = c(n_inst(ah_fit), n_inst(ab_fit),
n_inst(ab_fit_collapsed), n_inst(bb_fit)),
n_units = c(n_units(ah_fit), n_units(ab_fit),
n_units(ab_fit_collapsed), n_units(bb_fit))) |>
mutate(ratio = round(n_instruments / n_units, 2),
passes_rule = n_instruments < n_units) |>
knitr::kable(caption = "Instrument count vs. sample size N. The Roodman rule of thumb (§2.4) wants ratio < 1.")
| model | n_instruments | n_units | ratio | passes_rule |
|---|---|---|---|---|
| AH (just-id) | 9 | 140 | 0.06 | TRUE |
| AB (full) | 38 | 140 | 0.27 | TRUE |
| AB (lag 2:4) | 28 | 140 | 0.20 | TRUE |
| BB (full) | 113 | 140 | 0.81 | TRUE |
Limiting from 2:99 to 2:4 typically cuts
the moment count by 60–80%. The point estimates usually move only a
little; the diagnostic power of Sargan recovers. On EmplUK with \(N = 140\), every spec above passes the rule
of thumb — but Part 4 will show what happens on a panel where the full
spec blows past \(N\).
We now bring the full estimator menu back to the same panel Lab 5
used: the Przeworski democracy data with outcome ln_gdpw
and treatment reg, moderator edt, and
time-invariant control oil.
We lean on two GMM fits most: full Diff-GMM and full Sys-GMM. We
print the full Diff-GMM summary so you see what a
pgmm printout looks like end-to-end; for Sys-GMM we extract
the coefficient table and the four diagnostic statistics into compact
tables so the prose flow is not buried under a wall of console
output.
# Canonical full printout — read it once to know the layout
summary(fit_diff_full, robust = TRUE)
## Oneway (individual) effect Two-steps model Difference GMM
##
## Call:
## pgmm(formula = ln_gdpw ~ lag(ln_gdpw, 1) + reg + reg_lag + edt +
## reg_x_edt + reg_x_edt_lag | lag(ln_gdpw, 2:99), data = pdat,
## effect = "individual", model = "twosteps", transformation = "d")
##
## Unbalanced Panel: n = 135, T = 6-40, N = 4126
##
## Number of Observations Used: 2674
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.7504327 -0.0020965 0.0000000 0.0002401 0.0023728 0.5862290
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(ln_gdpw, 1) 0.9507663 0.0148429 64.0551 < 2e-16 ***
## reg -0.0150909 0.0272604 -0.5536 0.57987
## reg_lag -0.0018121 0.0280968 -0.0645 0.94858
## edt -0.0080669 0.0039462 -2.0442 0.04093 *
## reg_x_edt 0.0045252 0.0048995 0.9236 0.35570
## reg_x_edt_lag 0.0008659 0.0047661 0.1817 0.85583
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(740) = 111.2742 (p-value = 1)
## Autocorrelation test (1): normal = -5.493184 (p-value = 3.9475e-08)
## Autocorrelation test (2): normal = -0.06936944 (p-value = 0.9447)
## Wald test for coefficients: chisq(6) = 12971.02 (p-value = < 2.22e-16)
Sys-GMM (full), focused excerpt. Coefficients with Windmeijer-corrected standard errors, followed by the four §2 diagnostic statistics in a compact table:
s_sys <- summary(fit_sys_full, robust = TRUE)
# Coefficients
as.data.frame(s_sys$coefficients) |>
tibble::rownames_to_column("term") |>
knitr::kable(digits = 3,
caption = "Sys-GMM (full) — coefficients with Windmeijer-corrected SEs.")
| term | Estimate | Std. Error | z-value | Pr(>|z|) |
|---|---|---|---|---|
| lag(ln_gdpw, 1) | 1.010 | 0.002 | 591.855 | 0.000 |
| reg | -0.020 | 0.020 | -0.993 | 0.321 |
| reg_lag | -0.030 | 0.019 | -1.570 | 0.116 |
| edt | -0.012 | 0.003 | -4.144 | 0.000 |
| reg_x_edt | 0.004 | 0.004 | 1.160 | 0.246 |
| reg_x_edt_lag | 0.005 | 0.004 | 1.234 | 0.217 |
| oil | -0.026 | 0.010 | -2.515 | 0.012 |
# Diagnostics
tibble(
test = c("Sargan", "AR(1) on Δresid", "AR(2) on Δresid", "# instruments / N"),
statistic = c(s_sys$sargan$statistic,
s_sys$m1$statistic,
s_sys$m2$statistic,
length(fit_sys_full$W[[1]][1, ])),
parameter_df_N = c(s_sys$sargan$parameter,
NA_real_,
NA_real_,
length(fit_sys_full$W)),
p_value = c(s_sys$sargan$p.value,
s_sys$m1$p.value,
s_sys$m2$p.value,
NA_real_),
decision = c("want fail to reject (p > 0.10)",
"want reject (p < 0.05)",
"want fail to reject (p > 0.05)",
"want ratio < 1")
) |>
knitr::kable(digits = 3,
caption = "Sys-GMM (full) — the four §2 diagnostics on one line each.")
| test | statistic | parameter_df_N | p_value | decision |
|---|---|---|---|---|
| Sargan | 111.643 | 784 | 1.000 | want fail to reject (p > 0.10) |
| AR(1) on Δresid | -5.497 | NA | 0.000 | want reject (p < 0.05) |
| AR(2) on Δresid | -0.084 | NA | 0.933 | want fail to reject (p > 0.05) |
| # instruments / N | 791.000 | 135 | NA | want ratio < 1 |
Reading the printouts against the §2 checklist:
Side-by-side coefficient table.
fits <- list(`(1) FE-ARDL` = fit_fe,
`(2) AH` = fit_ah,
`(3) Diff-GMM (full)` = fit_diff_full,
`(4) Diff-GMM (lag 2:4)` = fit_diff_coll,
`(5) Sys-GMM (full)` = fit_sys_full,
`(6) Sys-GMM (lag 2:4)` = fit_sys_coll)
show_reg(fits,
custom.model.names = names(fits),
caption = "Six estimators on the democracy panel: ARDL(1,1) form throughout. Coefficient names follow plm/pgmm conventions; lag(ln_gdpw, 1) and ln_gdpw_lag refer to the same lagged outcome.",
label = "tab:p4-compare",
single.row = FALSE,
digits = 3,
stars = c(0.01, 0.05, 0.1))
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| ln_gdpw_lag | 0.925*** | |||||
| (0.008) | ||||||
| reg | 0.013 | -0.050 | -0.015 | -0.057 | -0.020 | -0.044* |
| (0.018) | (0.034) | (0.027) | (0.037) | (0.020) | (0.023) | |
| reg_lag | -0.018 | 0.013 | -0.002 | 0.009 | -0.030 | -0.031 |
| (0.017) | (0.033) | (0.028) | (0.033) | (0.019) | (0.020) | |
| edt | -0.004 | -0.028*** | -0.008** | -0.024*** | -0.012*** | -0.016*** |
| (0.003) | (0.005) | (0.004) | (0.006) | (0.003) | (0.003) | |
| reg_x_edt | -0.001 | 0.012** | 0.005 | 0.013* | 0.004 | 0.009** |
| (0.004) | (0.006) | (0.005) | (0.007) | (0.004) | (0.004) | |
| reg_x_edt_lag | 0.001 | -0.001 | 0.001 | -0.001 | 0.005 | 0.004 |
| (0.004) | (0.005) | (0.005) | (0.006) | (0.004) | (0.004) | |
| lag(ln_gdpw, 1) | 1.031*** | 0.951*** | 1.013*** | 1.010*** | 1.013*** | |
| (0.022) | (0.015) | (0.021) | (0.002) | (0.002) | ||
| oil | -0.026** | -0.036*** | ||||
| (0.010) | (0.014) | |||||
| R2 | 0.844 | |||||
| Adj. R2 | 0.836 | |||||
| Num. obs. | 2787 | 4126 | 4126 | 4126 | 4126 | 4126 |
| n | 135 | 135 | 135 | 135 | 135 | |
| T | 40 | 40 | 40 | 40 | 40 | |
| Num. obs. used | 2674 | 2674 | 2674 | 5461 | 5461 | |
| Sargan Test: chisq | 52.624 | 111.274 | 97.629 | 111.643 | 105.778 | |
| Sargan Test: df | 37.000 | 740.000 | 110.000 | 784.000 | 154.000 | |
| Sargan Test: p-value | 0.046 | 1.000 | 0.794 | 1.000 | 0.999 | |
| Wald Test Coefficients: chisq | 6666.721 | 12971.017 | 10251.020 | 25896930.378 | 17727569.930 | |
| Wald Test Coefficients: df | 6 | 6 | 6 | 7 | 7 | |
| Wald Test Coefficients: p-value | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| ***p < 0.01; **p < 0.05; *p < 0.1 | ||||||
Custom diagnostic table for a cleaner read of the §2 checklist across all six fits:
extract_diags <- function(fit, label, fit_class) {
if (fit_class == "plm") {
return(tibble(model = label,
phi_hat = unname(coef(fit)["ln_gdpw_lag"]),
sargan_p = NA_real_, ar1_p = NA_real_, ar2_p = NA_real_,
n_inst = NA_integer_,
n_obs = nobs(fit)))
}
s <- summary(fit, robust = TRUE)
tibble(model = label,
phi_hat = unname(coef(fit)["lag(ln_gdpw, 1)"]),
sargan_p = if (!is.null(s$sargan)) s$sargan$p.value else NA_real_,
ar1_p = if (!is.null(s$m1)) s$m1$p.value else NA_real_,
ar2_p = if (!is.null(s$m2)) s$m2$p.value else NA_real_,
n_inst = length(fit$W[[1]][1, ]),
n_obs = sum(sapply(fit$residuals, length)))
}
diag_tbl <- bind_rows(
extract_diags(fit_fe, "(1) FE-ARDL", "plm"),
extract_diags(fit_ah, "(2) AH", "pgmm"),
extract_diags(fit_diff_full, "(3) Diff-GMM (full)", "pgmm"),
extract_diags(fit_diff_coll, "(4) Diff-GMM (lag 2:4)", "pgmm"),
extract_diags(fit_sys_full, "(5) Sys-GMM (full)", "pgmm"),
extract_diags(fit_sys_coll, "(6) Sys-GMM (lag 2:4)", "pgmm")
)
diag_tbl |>
mutate(across(c(phi_hat, sargan_p, ar1_p, ar2_p), ~ round(.x, 3))) |>
knitr::kable(caption = "Diagnostics across estimators. phi_hat is the lagged-DV coefficient; sargan_p is the overid test p-value; ar1_p / ar2_p are the Arellano-Bond differenced-residual tests; n_inst is the instrument count.")
| model | phi_hat | sargan_p | ar1_p | ar2_p | n_inst | n_obs |
|---|---|---|---|---|---|---|
| (1) FE-ARDL | 0.925 | NA | NA | NA | NA | 2787 |
| (2) AH | 1.031 | 0.046 | 0 | 0.924 | 43 | 5130 |
| (3) Diff-GMM (full) | 0.951 | 1.000 | 0 | 0.945 | 746 | 5130 |
| (4) Diff-GMM (lag 2:4) | 1.013 | 0.794 | 0 | 0.923 | 116 | 5130 |
| (5) Sys-GMM (full) | 1.010 | 1.000 | 0 | 0.933 | 791 | 10395 |
| (6) Sys-GMM (lag 2:4) | 1.013 | 0.999 | 0 | 0.920 | 161 | 10395 |
Two patterns leap out of this table:
lag(2:4) versions drop the count by an order of magnitude,
and that’s when Sargan becomes informative.Before drawing substantive conclusions, it is worth seeing what a
failed specification looks like. Diff-GMM with
lag(y, 2:99) uses all available lags. What happens if we
choke off the instrument set to just lag 2 and lag 3? Or to lags 2
through 5?
# Misspec 1: only lag-2 and lag-3 as instruments — over-identified by 1 moment.
fit_miss23 <- pgmm(ln_gdpw ~ lag(ln_gdpw, 1) + reg + reg_lag + edt +
reg_x_edt + reg_x_edt_lag |
lag(ln_gdpw, 2:3),
data = pdat, effect = "individual",
model = "twosteps", transformation = "d")
# Misspec 2: lag-2 through lag-5 — still tight, but a bit more identification.
fit_miss25 <- pgmm(ln_gdpw ~ lag(ln_gdpw, 1) + reg + reg_lag + edt +
reg_x_edt + reg_x_edt_lag |
lag(ln_gdpw, 2:5),
data = pdat, effect = "individual",
model = "twosteps", transformation = "d")
# Compare the two against the full Diff-GMM in (3).
diag_misspec <- bind_rows(
extract_diags(fit_miss23, "Misspec (2:3)", "pgmm"),
extract_diags(fit_miss25, "Misspec (2:5)", "pgmm"),
extract_diags(fit_diff_full, "Diff-GMM (full)", "pgmm")
)
diag_misspec |>
mutate(across(c(phi_hat, sargan_p, ar1_p, ar2_p), ~ round(.x, 3))) |>
knitr::kable(caption = "Misspecified Diff-GMM fits with truncated instrument sets, compared to the full specification. Watch what happens to $\\hat\\phi$ and the diagnostics as the instrument set shrinks.")
| model | phi_hat | sargan_p | ar1_p | ar2_p | n_inst | n_obs |
|---|---|---|---|---|---|---|
| Misspec (2:3) | 1.029 | 0.228 | 0 | 0.918 | 80 | 5130 |
| Misspec (2:5) | 1.003 | 0.997 | 0 | 0.923 | 151 | 5130 |
| Diff-GMM (full) | 0.951 | 1.000 | 0 | 0.945 | 746 | 5130 |
Reading this table against the §2 checklist:
2:99 spec stabilizes it.The teaching point. There is no single “correct”
instrument depth. The Sargan test is informative only when the moment
count is moderate. Standard practice (Roodman 2009): try multiple lag
depths, report all, flag if estimates move substantially. We do this
above with 2:99 (full), 2:4 (collapsed), and
2:3 / 2:5 (heavily truncated). If \(\hat\phi\) is reasonably stable across
them, you have evidence of robustness.
Three things to track across the comparison table in §4.2:
reg_x_edt and reg_x_edt_lag rows.oil. Diff-GMM kills
it (differencing); Sys-GMM identifies it through the level equation. If
you actually need to estimate the oil effect, you need sys-GMM.The diagnostic table is the second-order read: are the GMM fits
credible? Sargan trivially passes in the full-lag-depth columns (Roodman
pathology), but the collapsed columns give us informative tests. If you
wanted to publish from this panel, the lag(2:4)
specifications are the ones to report.
For a second applied example with a sharper substantive story — and an actual policy counterfactual to interpret — we turn to the cigarette consumption panel that the old version of this lab used (and that you saw in lecture). The question is simple: by how much would a tax hike reduce cigarette consumption?
The data are US state-level cigarette sales, prices, taxes, and income from 1985–1995. 48 states × 11 years = 528 observations, a balanced panel — short enough that Nickell bias matters and long enough that Diff-GMM is well-identified. The variables we need:
| Variable | Description |
|---|---|
packpc |
packs of cigarettes per capita (the outcome) |
income |
state personal income (total) |
pop |
state population |
tax |
average federal + state + local excise tax (cents/pack) |
taxs |
excise taxes including sales taxes (cents/pack) |
avgprs |
average price including sales taxes (cents/pack) |
cpi |
consumer price index (for inflation adjustment) |
We need dollar amounts in a constant unit. The trick: every dollar variable gets multiplied by the 1995 CPI and divided by the CPI in the observed year — bringing the figure to “1995 dollars”. We do this inline (no helper function) so the operation is fully transparent.
# Load the panel. 48 states × 11 years (1985–1995) = 528 rows.
cig <- read_csv("data/cigarette.csv", show_col_types = FALSE)
# Quick panel check
length(unique(cig$state)) # number of states
## [1] 48
length(unique(cig$year)) # number of years
## [1] 11
# Inflation-adjust. The CPI is the same across states within a given year,
# so we just pull the 1995 value once and rescale.
cpi_base <- unique(cig$cpi[cig$year == 1995]) # 1995 CPI value (a scalar)
cig <- cig |>
mutate(
income95 = cpi_base * income / cpi, # state income, 1995 dollars
tax95 = cpi_base * tax / cpi, # avg excise tax, 1995 cents/pack
taxs95 = cpi_base * taxs / cpi, # excise + sales tax
avgprs95 = cpi_base * avgprs / cpi, # avg price (incl. sales tax)
income95pc = income95 / pop, # per-capita income (1995 dollars)
pretax95 = avgprs95 - taxs95 # price net of tax
)
# Quick look at the post-adjustment numbers
cig |>
summarise(across(c(packpc, income95pc, avgprs95, taxs95),
list(mean = mean, sd = sd))) |>
knitr::kable(digits = 2,
caption = "Means and standard deviations of the key variables after inflation adjustment.")
| packpc_mean | packpc_sd | income95pc_mean | income95pc_sd | avgprs95_mean | avgprs95_sd | taxs95_mean | taxs95_sd |
|---|---|---|---|---|---|---|---|
| 106.45 | 23.12 | 21.25 | 3.26 | 175.14 | 23.61 | 53.84 | 13.86 |
The Diff-GMM spec — packs per capita as a dynamic function of per-capita income and the (tax-inclusive) average price, with lag-2 instruments stacked GMM-style up to whatever the panel supports:
# Convert to plm's pdata.frame: tells plm which columns identify the
# unit-time grid. Order matters: c("unit", "time").
pdat_cig <- pdata.frame(cig, index = c("state", "year"))
# Diff-GMM (Arellano-Bond) fit on cigarette consumption.
# Reading the formula:
# * Left of |: packpc is the outcome; lag(packpc, 1) is the lagged DV
# (pgmm will internally difference everything). income95pc
# and avgprs95 enter as contemporaneous regressors.
# * First |: lag(packpc, 2:99) is the GMM-style instrument block —
# every available lag from 2 onward, stacked per period.
# Arguments:
# * effect = "individual": only state FE. We do NOT use "twoways" here
# because pgmm needs at least one regressor to be moved by year, and
# with this panel the time effects would absorb the avgprs95 variation
# we want to identify.
# * model = "twosteps": Arellano-Bond two-step efficient weighting matrix.
# summary(..., robust = TRUE) below applies the Windmeijer correction.
# * transformation = "d": difference GMM (not system).
fit_cig <- pgmm(
packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
lag(packpc, 2:99),
data = pdat_cig,
effect = "individual",
model = "twosteps",
transformation = "d"
)
# Capture the summary once — the §2 diagnostic battery hangs off this object
sum_cig <- summary(fit_cig, robust = TRUE)
sum_cig
## Oneway (individual) effect Two-steps model Difference GMM
##
## Call:
## pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
## lag(packpc, 2:99), data = pdat_cig, effect = "individual",
## model = "twosteps", transformation = "d")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 432
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -25.4405 -2.4992 0.1460 0.1232 2.7499 25.6071
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(packpc, 1) 0.639465 0.055354 11.5523 < 2.2e-16 ***
## income95pc -0.479040 0.496258 -0.9653 0.3344
## avgprs95 -0.179869 0.028086 -6.4042 1.511e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(44) = 47.09887 (p-value = 0.34693)
## Autocorrelation test (1): normal = -3.443569 (p-value = 0.00057409)
## Autocorrelation test (2): normal = -0.5365189 (p-value = 0.5916)
## Wald test for coefficients: chisq(3) = 2446.946 (p-value = < 2.22e-16)
Read off (against the §2 checklist):
lag(packpc, 1) should be positive and below 1 (sticky
consumption from year to year).avgprs95 should be negative (higher price
→ fewer packs sold).income95pc should be small and positive (richer states buy
slightly more).lag(packpc, 2:99) spec
will overshoot the \(N\) rule and
Sargan will likely come back \(\approx
1\) (the Roodman pathology). We accept this trade-off because the
§2.5 instrument-limited spec gives qualitatively the same coefficients
on the substantive variables.This is the policy question: what if the average state raised
its tax by 60 cents per pack, starting in 1996? Sixty cents is
roughly 3 standard deviations of the 1995 tax distribution — a
deliberately large shock to make the response visible. We compute the
trajectory of packpc 3 years out (1996, 1997, 1998) and
report three quantities of interest:
# Sample statistics that motivate the +60 cent scenario:
# the average tax in 1995 is about 60 cents and 1 sd is ~18 cents,
# so +60 is roughly a 3-sd intervention — large but informative.
summary(pdat_cig$taxs95[pdat_cig$year == 1995])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34.44 48.75 59.84 61.87 74.78 112.63
sd(pdat_cig$taxs95[pdat_cig$year == 1995])
## [1] 18.47741
# Forecast horizon
periods_out <- 3
# Number of parameter draws for the Monte-Carlo simulation
sims <- 1000
# We anchor the forecast at the last observed year (1995). lagY is the most
# recent ΔY for each state — we average across states for the average-state
# forecast. initialY is the level of Y in 1995, same averaging.
last_year <- pdat_cig[pdat_cig$year == 1995, ]
prev_year <- pdat_cig[pdat_cig$year == 1994, ]
# Match states between the two slices (panel is balanced so this is trivial)
initialY <- mean(last_year$packpc, na.rm = TRUE)
lagY <- mean(last_year$packpc - prev_year$packpc, na.rm = TRUE)
tibble(
constant = c("periods_out", "initialY (avg packpc, 1995)", "lagY (avg ΔY, 1995)"),
value = c(periods_out, initialY, lagY)
) |>
knitr::kable(digits = 2,
caption = "Anchor constants for the cigarette counterfactual.")
| constant | value |
|---|---|
| periods_out | 3.00 |
| initialY (avg packpc, 1995) | 96.33 |
| lagY (avg ΔY, 1995) | 1.30 |
Parameter draws. Sample 1000 parameter vectors from \(\text{MVN}(\hat\theta, \hat V_{\text{Windmeijer}})\) — the joint distribution of \((\phi, \beta_{\text{income}}, \beta_{\text{price}})\) implied by the fit’s Windmeijer-corrected variance. The first column of the resulting matrix is \(\phi\); the remaining two are the betas on income and price.
set.seed(20260520)
# Joint MVN draws from the fit. vcovHC() gives the Windmeijer-corrected
# variance for two-step GMM (this matches what summary(..., robust = TRUE)
# uses for SEs).
simparam <- mvrnorm(n = sims,
mu = coefficients(fit_cig),
Sigma = vcovHC(fit_cig))
# Split: phi is the LDV coef (column 1); betas are the rest (columns 2 onward)
simphi <- simparam[, 1] # length-1000 vector
simbeta <- simparam[, 2:ncol(simparam), drop = FALSE] # 1000 × (p-1) matrix
# Sanity check
dim(simbeta)
## [1] 1000 2
head(simphi)
## [1] 0.5722195 0.6450104 0.5157122 0.5672364 0.5571531 0.6867599
Build the counterfactual matrices. Two
cfMake objects:
xhyp represents the treatment trajectory (60-cent tax
hike happens in 1996, persists).xbase represents the baseline (nothing changes).cfMake returns an object whose $x field
holds the change in each covariate at each forecast
period (because transform = "diff" below will undo the
differencing). We zero out everything by default, then
cfChange() the price variable in period 1 only — sustained
higher price thereafter is encoded by the LDV propagating it
forward.
# Covariate-only formula for cfMake (LDV is fed in separately via phi)
formula_cf <- packpc ~ income95pc + avgprs95 - 1
# Treatment: +60 cent change in avgprs95 at period 1, no other changes
xhyp <- cfMake(formula = formula_cf, data = pdat_cig, nscen = periods_out)
xhyp$x <- 0 * xhyp$x # zero out every period's change
xhyp$xpre <- 0 * xhyp$xpre # zero out the "previous" matrix too
xhyp <- cfChange(xhyp, covname = "avgprs95", x = 60, scen = 1)
# Baseline: zero change in everything
xbase <- xhyp
xbase$x <- xbase$xpre
# Inspect: the 60-cent shock appears in row 1 of xhyp$x, zeros elsewhere
xhyp$x
## packpc income95pc avgprs95
## 1 0 0 60
## 2 0 0 0
## 3 0 0 0
xbase$x
## packpc income95pc avgprs95
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
Three simcf calls — each takes the parameter draws
(simbeta, simphi), the counterfactual matrix
(xhyp or xbase), the initial conditions
(initialY, lagY), and
transform = "diff" to say “the fit is in first differences;
undo that when forecasting in levels”.
# Expected value of Y under the +60 cent shock (level scale)
sev_treat <- ldvsimev(
xhyp,
b = simbeta,
ci = 0.95,
constant = NA, # NA = no intercept term (FE absorbed)
phi = simphi,
lagY = lagY,
transform = "diff",
initialY = initialY
)
# Expected value under the no-shock baseline
sev_base <- ldvsimev(
xbase,
b = simbeta,
ci = 0.95,
constant = NA,
phi = simphi,
lagY = lagY,
transform = "diff",
initialY = initialY
)
# First difference: treatment level - baseline level, with CIs
sfd <- ldvsimfd(
xhyp,
b = simbeta,
ci = 0.95,
constant = NA,
phi = simphi,
lagY = lagY,
transform = "diff"
)
# Relative risk: (treatment - baseline) / baseline × 100, with CIs
srr <- ldvsimrr(
xhyp,
b = simbeta,
ci = 0.95,
constant = NA,
phi = simphi,
lagY = lagY,
transform = "diff",
initialY = initialY
)
# Stack the three quantities into one tidy frame
plot_data <- bind_rows(
tibble(quantity = "EV (packs/capita)",
h = seq_len(periods_out),
pe = sev_treat$pe,
lo = sev_treat$lower[, 1],
hi = sev_treat$upper[, 1],
baseline_pe = sev_base$pe),
tibble(quantity = "FD (treat − baseline)",
h = seq_len(periods_out),
pe = sfd$pe,
lo = sfd$lower[, 1],
hi = sfd$upper[, 1],
baseline_pe = NA_real_),
tibble(quantity = "RR (% change)",
h = seq_len(periods_out),
pe = srr$pe,
lo = srr$lower[, 1],
hi = srr$upper[, 1],
baseline_pe = NA_real_)
) |>
mutate(quantity = factor(quantity,
levels = c("EV (packs/capita)",
"FD (treat − baseline)",
"RR (% change)")))
ggplot(plot_data, aes(h, pe)) +
geom_ribbon(aes(ymin = lo, ymax = hi), alpha = 0.25, fill = "steelblue") +
geom_line(linewidth = 1, colour = "steelblue") +
geom_line(aes(y = baseline_pe), linetype = "dashed", colour = "grey40", linewidth = 0.8) +
geom_hline(yintercept = 0, linetype = "dotted", colour = "grey60") +
facet_wrap(~ quantity, scales = "free_y") +
scale_x_continuous(breaks = seq_len(periods_out),
labels = paste0("199", 5 + seq_len(periods_out))) +
labs(x = NULL, y = NULL) +
theme_minimal(base_size = 11) +
theme(strip.text = element_text(size = 11, face = "bold"),
panel.spacing.x = unit(1, "lines"))
Counterfactual forecast of a 60-cent tax hike on cigarette consumption. EV: predicted packs/capita under the hike (solid) and under baseline (dashed). FD: absolute difference (treat - baseline). RR: percent change. Horizon = 1996-1998. Bands are 95% intervals from 1000 parametric draws.
Read the three panels.
Caveats to flag in class.
initialY and lagY across states. A
state-specific forecast would let each state start at its own 1995
level; this is the natural extension and is straightforward to
code.