EmplUK
These materials have been revised and updated by successive teaching assistants over recent years.
Revision history:
| Package | Purpose | Key functions |
|---|---|---|
plm |
Panel models, dynamic-panel GMM | plm(), pgmm(),
pdata.frame() |
sandwich |
Robust variance-covariance matrices | vcovHC() |
MASS |
MVN draws for parametric simulation | mvrnorm() |
simcf |
King-Tomz-Wittenberg counterfactual simulators (Adolph’s package) | cfMake(), cfChange(),
ldvsimev(), ldvsimfd(),
ldvsimrr() |
texreg |
Side-by-side model tables | htmlreg() |
tidyverse |
Data manipulation, plotting | ggplot(), tibble(), pipes
(|>) |
The lab uses EmplUK from
data(EmplUK, package = "plm") — the Arellano-Bond (1991)
labor-demand panel — for the main demonstration of fixed-effects LDV,
Anderson-Hsiao, Difference GMM, and System GMM in Part 2.
| Part | Topic |
|---|---|
| 1 | Why dynamic-panel GMM? Nickell bias, the estimator menu, assumptions, and diagnostic tests. |
| 2 | Dynamic panel estimators in practice with
EmplUK. FE-LDV -> Anderson-Hsiao ->
Difference GMM -> System GMM, with diagnostics and forecasts. |
| 3 | Cigarette price counterfactual.
Diff-GMM fit on the 1985-1995 cigarette panel + a +60-cent price
scenario, with EV / FD / RR forecasts via simcf. |
| Appendix A | Self-study. Nickell bias and hand-built GMM from raw matrix algebra. |
Lab 5 ended with dynamic panel models. Once a lagged dependent variable enters a fixed-effects model, the within transformation creates Nickell bias: the transformed lagged outcome is mechanically correlated with the transformed error. This lab shows why that happens, how the IV/GMM estimators respond to it, and how to check whether the GMM specification is credible.
The detailed matrix algebra is now in Appendix A. The main lab focuses on the applied workflow: choose the estimator, inspect the moment conditions, read the diagnostics, and simulate interpretable quantities of interest.
Start from the dynamic panel model:
\[ y_{it} = \alpha_i + \phi y_{i,t-1} + x_{it}\beta + \varepsilon_{it}. \]
The fixed effect \(\alpha_i\) captures stable differences across units. The problem is that the within transformation subtracts unit means from every variable. The transformed lagged outcome contains past shocks through its unit mean, and the transformed error also contains those same shocks through its unit mean. As a result, \(\tilde y_{i,t-1}\) and \(\tilde\varepsilon_{it}\) are correlated even when the original \(\varepsilon_{it}\) is serially independent.
Nickell’s leading approximation is:
\[ \text{plim}(\hat\phi_{FE} - \phi) \approx -\frac{1 + \phi}{T - 1}. \]
The main implication is practical: adding more units does not solve the problem when \(T\) is fixed. The bias shrinks as the number of time periods grows. With many units and short panels, a different estimator is needed.
Read this table before fitting any GMM model. It tells us what each estimator assumes, what it is trying to fix, and how the diagnostics should look if the specification is credible.
| Estimator | Core idea | Main assumptions | What it identifies | Diagnostics / tests | Target result | Red flags |
|---|---|---|---|---|---|---|
| FE-LDV | Estimate the dynamic model with unit FE and \(y_{i,t-1}\) on the RHS. | Strict exogeneity apart from the lagged outcome; large enough \(T\) for Nickell bias to be small. | Within-unit short-run effects; long-run effects use \(\beta/(1-\phi)\). | Residual serial-correlation checks; compare \(\hat\phi\) to GMM estimates. | Useful baseline when \(T\) is moderate/large. | Small \(T\); \(\hat\phi\) much lower than GMM; persistent residuals. |
| Anderson-Hsiao IV | First-difference to remove \(\alpha_i\); instrument \(\Delta y_{i,t-1}\) with \(y_{i,t-2}\). | No serial correlation in level errors; lag 2 is relevant and exogenous. | Dynamic effect after removing unit FE. Conceptually, the textbook AH estimator is just-identified. | In pgmm(), the lag instrument is
period-stacked, so an overidentification test may still print. Inspect
AR tests too. |
Coefficients move away from FE-LDV in the expected direction. | Very noisy estimates; weak lag instruments. |
| Difference GMM / Arellano-Bond | Difference the equation and use deeper lags as GMM-style instruments. | No AR(2) in differenced residuals; instruments valid; enough within variation after differencing. | Dynamic effects in the differenced equation, interpreted back in the level model. | Sargan/Hansen; AR(1); AR(2); instrument count. | Sargan/Hansen fail to reject; AR(1) rejects; AR(2) fails to reject; instruments < units. | AR(2) rejects; Sargan/Hansen rejects; p-values near 1 with too many instruments. |
| System GMM / Blundell-Bond | Add a level equation instrumented by lagged differences. | Difference-GMM assumptions plus stationarity/initial-condition restrictions for the level equation. | Dynamic effects with better information when series are persistent. Can retain time-invariant regressors. | Same tests as Diff-GMM, plus sensitivity to Diff-GMM results. | Similar substantive conclusions to Diff-GMM, often more precise. | Big disagreement with Diff-GMM; too many instruments; questionable stationarity restrictions. |
| Two-step GMM + Windmeijer | Re-estimate the weighting matrix using one-step residuals. | Same moments as the chosen GMM estimator; finite-sample SE correction needed. | More efficient GMM estimates. | Compare robust vs non-robust two-step SEs. | Report Windmeijer-corrected robust SEs. | Plain two-step SEs are too small. |
| Limited / collapsed instruments | Restrict lag depth or collapse instruments to avoid instrument proliferation. | Same moment assumptions, but fewer moments. | More stable diagnostics and less overfitting of endogenous variables. | Instrument count and estimate stability across lag windows. | Instruments fewer than units; estimates stable across reasonable lag limits. | Estimates change dramatically when lag depth changes. |
For a credible Difference-GMM or System-GMM fit, the diagnostic pattern is usually:
The rest of the lab applies this checklist. Appendix A shows the matrix algebra behind the estimators and explains where the moment conditions come from.
EmplUKThe real workhorse for routine dynamic-panel GMM in R is
plm::pgmm(), which packages the Anderson-Hsiao,
Arellano-Bond, and Blundell-Bond logic with built-in diagnostics. The
applied workflow begins with a dynamic fixed-effects model, explains why
that model is limited in a short panel, and then moves estimator by
estimator through the IV/GMM family.
Part 2 uses one dataset throughout: EmplUK, the classic
Arellano-Bond labor-demand panel. The substantive question is:
how does firm employment respond to wages, capital, and output
when employment is persistent over time?
GMM is not a magic correction for endogeneity. It is a
moment-based inference technique that delivers consistent
estimates only if its identifying assumptions hold. Four of those
assumptions get a formal test on every summary.pgmm
printout; a fifth (the instrument count) is a structural check.
The five-row checklist (the order in which to read the printout):
| Assumption | Test | \(H_0\) | Target result |
|---|---|---|---|
| Instrument validity (overid) | Sargan / Hansen | \(E[Z'\varepsilon] = 0\) | Fail to reject (\(p > 0.10\)) |
| Error process is AR(1), not AR(2)+ | AR(1) on \(\Delta\hat\varepsilon\) | no first-order serial correlation in \(\Delta\hat\varepsilon\) | Reject (\(p < 0.05\) — mechanical) |
| Same | AR(2) on \(\Delta\hat\varepsilon\) | no second-order serial correlation in \(\Delta\hat\varepsilon\) | Fail to reject (\(p > 0.05\)) |
| Time effects matter | Wald on \(\tau_t\) | all year dummies \(= 0\) | Reject -> keep year FE; fail to reject -> consider dropping only if theory allows |
| Instruments not too many | # instruments \(< N\) | (rule of thumb) | Check by hand |
EmplUK contains a short panel of UK firms from 1976 to
1984. The variables are economic quantities measured at the firm-year
level. Logs are used so the coefficients can be read approximately as
elasticities.
| Variable | Meaning | Role in this example |
|---|---|---|
emp |
firm employment | outcome, modeled as log(emp) |
wage |
firm wage measure | labor cost, modeled as log(wage) |
capital |
firm capital stock | production input, modeled as
log(capital) |
output |
firm output | demand / production scale, modeled as
log(output) |
firm, year |
panel identifiers | unit and time indexes |
# Load the classic Arellano-Bond labor-demand panel.
data("EmplUK", package = "plm")
# Count observations per firm to show the short-panel structure.
firm_T <- table(EmplUK$firm)
# Summarize the panel dimensions in a small table.
tibble(
quantity = c("Firms", "Firm-years", "First year", "Last year",
"Minimum T per firm", "Maximum T per firm"),
value = c(length(unique(EmplUK$firm)),
nrow(EmplUK),
min(EmplUK$year),
max(EmplUK$year),
min(firm_T),
max(firm_T))
) |>
knitr::kable(caption = "Panel structure of the EmplUK labor-demand data.")
| quantity | value |
|---|---|
| Firms | 140 |
| Firm-years | 1031 |
| First year | 1976 |
| Last year | 1984 |
| Minimum T per firm | 7 |
| Maximum T per firm | 9 |
# Informally check balance
table(table(EmplUK$firm))
##
## 7 8 9
## 103 23 14
Before fitting a dynamic panel model, it helps to see where the variation in the variables lives. In panel data, the same pooled association can be decomposed into:
This distinction matters because fixed effects use within-firm variation, while pooled and random-effects estimators can also use between-firm differences.
# Build logged variables because the labor-demand models use logs.
# The lag is created within firm so that the first observation for each firm is
# missing, as it should be.
empl_wb_df <- EmplUK |>
arrange(firm, year) |>
group_by(firm) |>
mutate(log_emp = log(emp),
log_emp_lag1 = dplyr::lag(log_emp, 1),
log_wage = log(wage),
log_capital = log(capital),
log_output = log(output)) |>
ungroup()
# These are the predictors in the dynamic labor-demand equation.
empl_wb_vars <- c("log_emp_lag1", "log_wage", "log_capital", "log_output")
# Stack the predictors into long format so the same operations can be repeated
# for each variable with group_by().
empl_wb_long <- empl_wb_df |>
select(firm, log_emp, all_of(empl_wb_vars)) |>
pivot_longer(cols = all_of(empl_wb_vars),
names_to = "variable",
values_to = "value")
# Overall correlation: all firm-years are pooled together.
empl_overall <- empl_wb_long |>
group_by(variable) |>
summarise(overall = cor(value, log_emp, use = "pairwise.complete.obs"),
.groups = "drop")
# Between correlation: first take each firm's mean, then correlate firm means.
empl_between <- empl_wb_long |>
group_by(variable, firm) |>
summarise(value_mean = mean(value, na.rm = TRUE),
log_emp_mean = mean(log_emp, na.rm = TRUE),
.groups = "drop") |>
group_by(variable) |>
summarise(between = cor(value_mean, log_emp_mean, use = "pairwise.complete.obs"),
.groups = "drop")
# Within correlation: subtract each firm's mean from both variables, then
# correlate the deviations.
empl_within <- empl_wb_long |>
group_by(variable, firm) |>
mutate(value_dev = value - mean(value, na.rm = TRUE),
log_emp_dev = log_emp - mean(log_emp, na.rm = TRUE)) |>
ungroup() |>
group_by(variable) |>
summarise(within = cor(value_dev, log_emp_dev, use = "pairwise.complete.obs"),
.groups = "drop")
# Within-variance share: how much of each predictor's variance is movement
# within firms rather than differences between firms.
empl_within_share <- empl_wb_long |>
group_by(variable, firm) |>
mutate(value_dev = value - mean(value, na.rm = TRUE)) |>
ungroup() |>
group_by(variable) |>
summarise(within_share = var(value_dev, na.rm = TRUE) / var(value, na.rm = TRUE),
.groups = "drop")
# Join the four summaries into one table.
empl_wb_tab <- empl_overall |>
left_join(empl_between, by = "variable") |>
left_join(empl_within, by = "variable") |>
left_join(empl_within_share, by = "variable") |>
arrange(desc(abs(overall)))
empl_wb_tab |>
knitr::kable(digits = 3,
col.names = c("Variable", "Overall", "Between", "Within",
"Within var. share"),
caption = "Correlation with log employment, decomposed into overall, between-firm, and within-firm correlations.")
| Variable | Overall | Between | Within | Within var. share |
|---|---|---|---|---|
| log_emp_lag1 | 0.995 | 1.000 | 0.754 | 0.015 |
| log_capital | 0.911 | 0.917 | 0.739 | 0.020 |
| log_output | 0.089 | 0.037 | 0.533 | 0.828 |
| log_wage | -0.014 | -0.011 | -0.286 | 0.100 |
empl_wb_tab |>
mutate(variable = fct_reorder(variable, between)) |>
ggplot(aes(y = variable)) +
geom_segment(aes(x = between, xend = within, yend = variable),
colour = "grey70") +
geom_point(aes(x = between, colour = "Between"), size = 2.6) +
geom_point(aes(x = within, colour = "Within"), size = 2.6) +
geom_vline(xintercept = 0, linetype = "dashed", colour = "grey50") +
scale_colour_manual(values = c("Between" = "firebrick",
"Within" = "steelblue")) +
labs(x = "Correlation with log employment", y = NULL, colour = NULL) +
theme_minimal(base_size = 11) +
theme(legend.position = "bottom")
Between- and within-firm correlations with log employment in EmplUK.
Two details matter for the models below. First, lagged employment is strongly correlated with current employment, which is why the dynamic term belongs in the model. Second, wages, capital, and output can tell different stories across firms than within the same firm over time. That is exactly why it is useful to inspect the between and within pieces before deciding what a coefficient is identifying.
The natural first model is a fixed-effects dynamic regression:
\[ \log(emp)_{it} = \alpha_i + \tau_t + \phi_1 \log(emp)_{i,t-1} + \phi_2 \log(emp)_{i,t-2} + x_{it}\beta + \varepsilon_{it}. \]
This model is useful because it matches the substantive story: employment is persistent, and firms differ in stable ways. The problem is that this is a short panel. With only 7 to 9 observations per firm, the lagged dependent variable is correlated with the transformed error after firm fixed effects are removed. This is the Nickell-bias problem from Part 1.
# Convert the data to a pdata.frame so plm knows the firm-year structure.
empl_pdat <- pdata.frame(EmplUK, index = c("firm", "year"))
# FE-LDV baseline. This mirrors the Arellano-Bond labor-demand specification
# but estimates it with the within transformation rather than GMM instruments.
fe_ldv_fit <- plm(
log(emp) ~ lag(log(emp), 1:2) +
lag(log(wage), 0:1) +
log(capital) +
lag(log(output), 0:1),
data = empl_pdat,
effect = "twoways", # firm FE and year FE
model = "within" # fixed-effects / within estimator
)
summary(fe_ldv_fit)
## Twoways effects Within Model
##
## Call:
## plm(formula = log(emp) ~ lag(log(emp), 1:2) + lag(log(wage),
## 0:1) + log(capital) + lag(log(output), 0:1), data = empl_pdat,
## effect = "twoways", model = "within")
##
## Unbalanced Panel: n = 140, T = 5-7, N = 751
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.474827 -0.043536 0.001307 0.044602 0.451831
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## lag(log(emp), 1:2)1 0.700274 0.037557 18.6458 < 2.2e-16 ***
## lag(log(emp), 1:2)2 -0.169027 0.035413 -4.7731 2.284e-06 ***
## lag(log(wage), 0:1)0 -0.559251 0.056941 -9.8215 < 2.2e-16 ***
## lag(log(wage), 0:1)1 0.292637 0.059360 4.9299 1.067e-06 ***
## log(capital) 0.348173 0.026848 12.9682 < 2.2e-16 ***
## lag(log(output), 0:1)0 0.490632 0.123499 3.9728 7.971e-05 ***
## lag(log(output), 0:1)1 -0.607329 0.122101 -4.9740 8.584e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 15.748
## Residual Sum of Squares: 5.3411
## R-Squared: 0.66085
## Adj. R-Squared: 0.57465
## F-statistic: 166.463 on 7 and 598 DF, p-value: < 2.22e-16
This model is fit first because it is the most transparent specification, not because it is the best estimator here. In a short dynamic panel, the FE-LDV estimate gives a baseline for comparison against the IV/GMM estimators.
Before moving on, check whether the model residuals look independent. Lab 5 introduced three variance-covariance corrections for panel models:
| Correction | Allows | Main use |
|---|---|---|
| Cluster-robust by unit | arbitrary heteroskedasticity and serial correlation within firm | default correction when residual dependence is mainly within units |
| Beck-Katz PCSE | contemporaneous correlation across units | TSCS panels where many units move together in the same year |
| Driscoll-Kraay | serial correlation and cross-sectional dependence | longer panels where both problems are present |
EmplUK has a very short time dimension, so the main
comparison table below uses firm-clustered standard errors for the
FE-LDV and FD-LDV baselines. The diagnostics still matter because they
show why conventional standard errors are not credible.
# Serial-correlation test for FE-LDV residuals.
# H0: no serial correlation in the idiosyncratic errors.
fe_serial <- pbgtest(fe_ldv_fit)
# Pesaran CD test for cross-sectional dependence.
# H0: residuals are cross-sectionally independent across firms.
fe_csd <- pcdtest(fe_ldv_fit, test = "cd")
tibble(
diagnostic = c("Serial correlation: pbgtest",
"Cross-sectional dependence: Pesaran CD"),
null = c("No serial correlation within firms",
"Cross-sectional independence across firms"),
p_value = c(fe_serial$p.value, fe_csd$p.value)
) |>
knitr::kable(digits = 3,
caption = "Post-estimation dependence checks for the FE-LDV baseline.")
| diagnostic | null | p_value |
|---|---|---|
| Serial correlation: pbgtest | No serial correlation within firms | 0.000 |
| Cross-sectional dependence: Pesaran CD | Cross-sectional independence across firms | 0.274 |
# Firm-clustered covariance matrix for the FE-LDV baseline.
# method = "arellano" is the plm convention for a covariance matrix robust to
# general heteroskedasticity and serial correlation within firm.
fe_vcov_cluster <- vcovHC(fe_ldv_fit,
method = "arellano",
type = "HC1",
cluster = "group")
# Coefficient table with firm-clustered standard errors.
fe_clustered <- lmtest::coeftest(fe_ldv_fit, vcov = fe_vcov_cluster)
fe_clustered
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## lag(log(emp), 1:2)1 0.700274 0.062394 11.2234 < 2.2e-16 ***
## lag(log(emp), 1:2)2 -0.169027 0.072151 -2.3427 0.0194731 *
## lag(log(wage), 0:1)0 -0.559251 0.151552 -3.6901 0.0002446 ***
## lag(log(wage), 0:1)1 0.292637 0.134858 2.1700 0.0304026 *
## log(capital) 0.348173 0.048716 7.1470 2.593e-12 ***
## lag(log(output), 0:1)0 0.490632 0.169683 2.8915 0.0039738 **
## lag(log(output), 0:1)1 -0.607329 0.177043 -3.4304 0.0006442 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The next step keeps the dynamic model but changes the estimation strategy.
Because the GMM estimators below work with first-differenced equations, it is useful to fit the first-differenced model directly before introducing instruments:
Start from the same dynamic equation in levels and subtract the previous period from the current period:
\[ \begin{aligned} &\left[ \log(emp)_{it} - \log(emp)_{i,t-1} \right] \\ &\quad = \phi_1 \left[ \log(emp)_{i,t-1} - \log(emp)_{i,t-2} \right] + \phi_2 \left[ \log(emp)_{i,t-2} - \log(emp)_{i,t-3} \right] \\ &\qquad + \left(x_{it} - x_{i,t-1}\right)\beta + \delta_t + \left(\varepsilon_{it} - \varepsilon_{i,t-1}\right). \end{aligned} \]
The unit fixed effect drops out because \(\alpha_i - \alpha_i = 0\). The term \(\delta_t\) is a period effect for changes: it captures shocks that move all firms between two adjacent years. But the lagged dependent-variable problem remains visible: the first lagged change contains \(\log(emp)_{i,t-1}\), which contains \(\varepsilon_{i,t-1}\), and the differenced error also contains \(-\varepsilon_{i,t-1}\).
Using compact notation:
\[ \Delta \log(emp)_{it} = \delta_t + \phi_1 \Delta \log(emp)_{i,t-1} + \phi_2 \Delta \log(emp)_{i,t-2} + \Delta x_{it}\beta + \Delta\varepsilon_{it}. \]
This model removes firm fixed effects by subtracting adjacent years instead of subtracting firm means. The period effects are now effects on changes between years, not level differences across years. The model is useful as a bridge because it has the same differenced structure as Difference GMM, but it is still not a valid final estimator for a short dynamic panel. The reason is the same as before: \(\Delta \log(emp)_{i,t-1}\) shares \(\varepsilon_{i,t-1}\) with \(\Delta\varepsilon_{it}\).
The code below fits the model with lm() and reports
firm-clustered standard errors. Clustering is an inference correction:
it allows residuals from the same firm to be correlated. It does not
solve the dynamic endogeneity problem.
# Build the first-differenced variables one firm at a time.
# The d_ prefix means "change from the previous year".
empl_fd <- EmplUK |>
as_tibble() |>
arrange(firm, year) |>
group_by(firm) |>
mutate(log_emp = log(emp),
log_wage = log(wage),
log_capital = log(capital),
log_output = log(output),
d_log_emp = log_emp - dplyr::lag(log_emp),
d_log_wage = log_wage - dplyr::lag(log_wage),
d_log_capital = log_capital - dplyr::lag(log_capital),
d_log_output = log_output - dplyr::lag(log_output),
d_log_emp_lag1 = dplyr::lag(d_log_emp, 1),
d_log_emp_lag2 = dplyr::lag(d_log_emp, 2),
d_log_wage_lag1 = dplyr::lag(d_log_wage, 1),
d_log_output_lag1 = dplyr::lag(d_log_output, 1)) |>
ungroup()
# Keep only rows where every variable in the FD model is observed.
# This makes the sample explicit before fitting lm().
empl_fd_model <- empl_fd |>
drop_na(d_log_emp, d_log_emp_lag1, d_log_emp_lag2,
d_log_wage, d_log_wage_lag1,
d_log_capital,
d_log_output, d_log_output_lag1,
firm, year)
# Naive first-differenced LDV model with period effects.
# factor(year) captures common shocks to annual changes.
fd_ldv_fit <- lm(
d_log_emp ~ d_log_emp_lag1 + d_log_emp_lag2 +
d_log_wage + d_log_wage_lag1 +
d_log_capital +
d_log_output + d_log_output_lag1 +
factor(year),
data = empl_fd_model
)
# Firm-clustered covariance matrix for the FD-LDV bridge model.
# The cluster vector is the firm ID in the exact sample used by lm().
fd_vcov_cluster <- sandwich::vcovCL(fd_ldv_fit,
cluster = empl_fd_model$firm,
type = "HC1")
# Coefficient table with firm-clustered standard errors.
fd_clustered <- lmtest::coeftest(fd_ldv_fit, vcov = fd_vcov_cluster)
fd_clustered
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0082133 0.0085202 0.9640 0.3354451
## d_log_emp_lag1 0.1991876 0.0420364 4.7385 2.693e-06 ***
## d_log_emp_lag2 -0.0171433 0.0312393 -0.5488 0.5833651
## d_log_wage -0.5650252 0.1683421 -3.3564 0.0008397 ***
## d_log_wage_lag1 0.1527075 0.0767036 1.9909 0.0469491 *
## d_log_capital 0.4023066 0.0585539 6.8707 1.608e-11 ***
## d_log_output 0.6001617 0.1727582 3.4740 0.0005499 ***
## d_log_output_lag1 -0.3690729 0.1596989 -2.3111 0.0211690 *
## factor(year)1980 0.0060845 0.0144073 0.4223 0.6729404
## factor(year)1981 -0.0342071 0.0187360 -1.8257 0.0683872 .
## factor(year)1982 -0.0382944 0.0164787 -2.3239 0.0204661 *
## factor(year)1983 -0.0278807 0.0189011 -1.4751 0.1407156
## factor(year)1984 -0.0125293 0.0168219 -0.7448 0.4566727
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# For diagnostics, refit the same already-differenced equation with plm.
# model = "pooling" means OLS on the differenced variables. The pdata.frame
# structure tells pbgtest() and pcdtest() which observations belong to the same
# firm and which observations share the same year.
empl_fd_pdat <- pdata.frame(empl_fd_model, index = c("firm", "year"))
fd_ldv_plm <- plm(
d_log_emp ~ d_log_emp_lag1 + d_log_emp_lag2 +
d_log_wage + d_log_wage_lag1 +
d_log_capital +
d_log_output + d_log_output_lag1 +
factor(year),
data = empl_fd_pdat,
model = "pooling"
)
# The key question for SEs: is there still residual dependence after fitting
# the first-differenced model? If yes, conventional SEs are too optimistic.
fd_serial <- pbgtest(fd_ldv_plm)
fd_csd <- pcdtest(fd_ldv_plm, test = "cd")
tibble(
diagnostic = c("Serial correlation: pbgtest",
"Cross-sectional dependence: Pesaran CD"),
null = c("No serial correlation within firms",
"Cross-sectional independence across firms"),
p_value = c(fd_serial$p.value, fd_csd$p.value)
) |>
knitr::kable(digits = 3,
caption = "Post-estimation dependence checks for the FD-LDV bridge model.")
| diagnostic | null | p_value |
|---|---|---|
| Serial correlation: pbgtest | No serial correlation within firms | 0.005 |
| Cross-sectional dependence: Pesaran CD | Cross-sectional independence across firms | 0.605 |
The rest of this part keeps the same research question and the same dataset, but changes the estimator. The sequence moves from the familiar FE-LDV baseline, to a first-differenced bridge model, to Anderson-Hsiao, Difference GMM, limited-instrument Difference GMM, and System GMM. For each estimator, the logic is the same: state the identifying assumption, fit the model, read the diagnostics, and decide what the estimate is buying relative to the previous specification.
Before the first pgmm() call, read its formula as two
pieces:
y ~ <regressors> | <GMM-style instruments>
The left side of | is the dynamic equation written in
levels. The right side of | is the instrument set. Under
transformation = "d", pgmm() differences the
equation internally; under transformation = "ld", it stacks
the differenced equation and the level equation for System GMM. This is
why the formula is written in levels even when the estimator is
Difference GMM.
The GMM logic is simple: after differencing, valid instruments should be uncorrelated with the differenced model residuals. For a differenced dynamic panel,
\[ \Delta y_{it} = \phi\,\Delta y_{i,t-1} + \Delta x_{it}\beta + \Delta\varepsilon_{it}. \]
If \(z_{it}\) is a valid instrument, the population moment condition is:
\[ E\left[z_{it}\Delta\varepsilon_{it}\right] = 0. \]
In sample form, GMM chooses the coefficients that make the instrument-residual moments as close to zero as possible:
\[ \hat g(\theta) = \frac{1}{N}\sum_{i=1}^{N} Z_i' \left(\Delta y_i - \Delta X_i\theta\right), \qquad \hat\theta_{GMM} = \arg\min_{\theta} \hat g(\theta)'W\hat g(\theta). \]
The matrix \(W\) tells the estimator how to weight the different moment conditions. One-step GMM uses a simple initial weighting matrix; two-step GMM first estimates the model, uses the residuals to estimate a better weighting matrix, and then estimates the same structural coefficients again. This is different from the two stages in 2SLS, where the first stage predicts the endogenous regressor.
The instrument strategy only fixes the variables that are actually treated as endogenous or predetermined and supplied with instruments. In the main Difference-GMM example below, the GMM-style instruments are aimed at the lagged outcome. They correct the mechanical endogeneity of \(\Delta y_{i,t-1}\) after differencing. The other covariates in \(X_{it}\) are not automatically “purged” by the lagged-outcome instruments. Their coefficients are credible only under the maintained assumption that those covariates are exogenous or predetermined in the way the model specifies. If a covariate is also endogenous, it needs its own instrument block.
The DAG below shows the main logic. The lagged outcome is endogenous because it shares the previous shock with the differenced error. Deeper lags of the outcome are useful instruments because they predict the lagged outcome but should not be correlated with the current differenced error if the no-serial-correlation assumption is right.
Dynamic-panel IV/GMM logic. The lag instruments target the endogenous lagged outcome; covariates in \(X\) require their own exogeneity assumption or their own instruments.
The weighting matrix \(W\) does not make invalid instruments valid. The validity of the lag instruments comes from an assumption about the level errors. For example, \(y_{i,t-2}\) is useful as an instrument for \(\Delta y_{i,t-1}\) only if it predicts the lagged outcome but does not predict the current differenced error:
\[ E[y_{i,t-2}\Delta\varepsilon_{it}] = 0, \qquad \Delta\varepsilon_{it} = \varepsilon_{it} - \varepsilon_{i,t-1}. \]
This requires \(y_{i,t-2}\) to be uncorrelated with both \(\varepsilon_{it}\) and \(\varepsilon_{i,t-1}\). Since \(y_{i,t-2}\) contains older shocks such as \(\varepsilon_{i,t-2}\), this assumption fails if the level errors are serially correlated. In that case, the path from old shocks to later shocks remains open, and lagged outcomes inherit the error contamination.
This is why the Arellano-Bond serial-correlation tests matter. Some AR(1) correlation in the differenced residuals is expected mechanically because \(\Delta\varepsilon_{it}\) and \(\Delta\varepsilon_{i,t-1}\) both contain \(\varepsilon_{i,t-1}\). The more important diagnostic is AR(2): if AR(2) is present in the differenced residuals, that signals serial correlation in the level errors, which makes lag-2 instruments suspect. The preferred diagnostic pattern is therefore: AR(1) rejects, AR(2) does not reject.
One important implication follows. If the lagged-outcome coefficient \(\hat\phi\) is badly biased, the other coefficients can also be distorted because the model is misspecifying the dynamic part of the outcome. Instrumenting the lagged outcome can therefore change the estimates on \(X_{it}\) indirectly. But those \(X\) coefficients are not made unbiased by the lag instruments alone. They still require the relevant exogeneity assumption for \(X\), or separate instruments if \(X\) is endogenous.
Difference the dynamic panel equation:
\[ \Delta y_{it} = \phi\, \Delta y_{i,t-1} + \beta\, \Delta x_{it} + \Delta \varepsilon_{it} \]
The unit fixed effects \(\alpha_i\) vanish — they are constant in \(t\), so \(\Delta \alpha_i = 0\). Differencing achieves what within-demeaning achieves, but without mixing future \(\varepsilon\) into the lag through \(\bar y_{i,-1}\). This is the core idea behind every estimator in the GMM family.
Of course, the differenced lag \(\Delta y_{i,t-1}\) is still correlated with \(\Delta\varepsilon_{it}\) — both contain \(\varepsilon_{i,t-1}\) mechanically. An instrument is needed.
Anderson and Hsiao proposed using \(y_{i,t-2}\) as IV for \(\Delta y_{i,t-1}\). The exclusion logic is the same as in Appendix A: \(y_{i,t-2}\) is correlated with \(\Delta y_{i,t-1}\) (relevance) but, under iid \(\varepsilon\), uncorrelated with \(\Delta\varepsilon_{it}\) (exclusion — precisely the assumption tested by the AR(2) diagnostic in §2.4).
This is an IV estimator, not OLS. OLS on the differenced equation would regress \(\Delta y_{it}\) directly on \(\Delta y_{i,t-1}\), but \(\Delta y_{i,t-1}\) contains \(\varepsilon_{i,t-1}\) and \(\Delta\varepsilon_{it}\) also contains \(-\varepsilon_{i,t-1}\). That mechanical overlap makes OLS inconsistent. Anderson-Hsiao fixes this by replacing the endogenous differenced lag with variation predicted by a valid lagged-level instrument.
Why use pgmm() here, then? Because IV is a special case
of GMM. The moment condition is:
\[ E\left[y_{i,t-2}\left(\Delta y_{it} - \phi\Delta y_{i,t-1} - \Delta x_{it}\beta\right)\right] = 0. \]
If there is exactly one valid instrument for one endogenous
regressor, a just-identified IV/2SLS estimate and a GMM estimate give
the same coefficient, because the GMM weighting matrix cannot change a
just-identified solution. In pgmm(), however, the
instrument is stacked by period, so even the AH-style call can create
multiple moment conditions. That is why the software estimates it as a
restricted one-step GMM model and may print an overidentification
test.
The pgmm call below implements that logic. Read the
formula in two halves:
| — the regression equation in
level form. lag(log(emp), 1) is the lagged
outcome; pgmm internally differences it.| — the GMM instrument block.
lag(log(emp), 2) says use only the second lag of
log(emp) as an instrument — that is the Anderson-Hsiao
restriction.ah_fit <- pgmm(log(emp) ~ lag(log(emp), 1) + log(wage) + log(capital) |
lag(log(emp), 2),
data = EmplUK,
effect = "individual", # only unit FE (no time dummies)
model = "onestep", # one-step keeps the AH-style example simple
transformation = "d") # "d" = difference GMM
sum_ah <- summary(ah_fit, robust = TRUE)
sum_ah
## Oneway (individual) effect One-step model Difference GMM
##
## Call:
## pgmm(formula = log(emp) ~ lag(log(emp), 1) + log(wage) + log(capital) |
## lag(log(emp), 2), data = EmplUK, effect = "individual", model = "onestep",
## transformation = "d")
##
## Unbalanced Panel: n = 140, T = 7-9, N = 1031
##
## Number of Observations Used: 751
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.903849 -0.045858 0.000000 -0.001085 0.041154 0.844502
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(log(emp), 1) 0.801824 0.157098 5.1040 3.326e-07 ***
## log(wage) -0.631281 0.195599 -3.2274 0.001249 **
## log(capital) 0.241204 0.056267 4.2868 1.813e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(6) = 34.79026 (p-value = 4.732e-06)
## Autocorrelation test (1): normal = -3.923134 (p-value = 8.7404e-05)
## Autocorrelation test (2): normal = -1.10812 (p-value = 0.26781)
## Wald test for coefficients: chisq(3) = 605.8932 (p-value = < 2.22e-16)
In this fit, the AH-style model is best read as a bridge, not as the final specification. The AR(2) test does not reject, which is good for the lag-2 exclusion logic. But the Sargan test rejects, meaning the period-stacked instruments are not jointly credible under this simple specification. That is exactly why the lab moves next to richer Difference-GMM specifications: AH shows the IV idea clearly, but it often leaves useful valid moments out of the estimator.
Conceptually, the textbook AH estimator uses one lag as the
instrument for one endogenous differenced lag. In the simplest
just-identified IV case, there is no Sargan test because there are no
extra instruments to test. pgmm() implements the same idea
with period-stacked moments: even lag(log(emp), 2) can
create more than one moment condition because the lag-2 instrument is
available in several periods. That is why the software may still print a
Sargan overidentification test for this AH-style fit. If the model is
overidentified, read the Sargan test the same way as in the later GMM
models: the null is that the instruments are jointly valid, so the
preferred result is to fail to reject.
The lag range is the pedagogical bridge from AH to Difference GMM:
lag(y, 2) keeps the clean Anderson-Hsiao idea: one
deeper lag is used to break the mechanical correlation between \(\Delta y_{i,t-1}\) and \(\Delta\varepsilon_{it}\).lag(y, 2:3) or lag(y, 2:6) adds more valid
lagged levels. This can improve efficiency if those lags are relevant
instruments.lag(y, 2:99) means “use every available lag from 2
onward.” In a short panel like EmplUK, the panel only has 7
to 9 years per firm, so 99 is not literally 99 lags. It
just tells pgmm() to use as many eligible lags as the data
allow.The trade-off is instrument proliferation. More lags can increase
efficiency, but they also increase the number of moment conditions, can
overfit the endogenous regressor, and can make the Sargan/Hansen test
weak. In EmplUK, lag(y, 2:99) is tolerable
mainly because the realized number of instruments remains below the
number of firms (\(N = 140\)). That is
not guaranteed in other panels. With longer \(T\) or more endogenous regressors,
2:99 can quickly create more instruments than units, and
then a restricted range such as 2:6 is usually safer.
AH is rarely used as the final model because it leaves valid moments on the table, but it is the right conceptual stepping stone between OLS on differences, one-instrument IV, and many-instrument Difference GMM.
The key insight from Arellano & Bond: under the AH exogeneity assumption, all lags \(y_{i,t-2}, y_{i,t-3}, \ldots\) are valid instruments for \(\Delta y_{i,t-1}\). With each successive period \(t\), one more usable lag becomes available. Stacking them all up turns AH into an over-identified GMM problem with a corresponding efficiency gain.
The standard Arellano-Bond difference GMM specification on
EmplUK. Pay attention to the formula syntax:
lag(log(emp), 1:2) — both AR(1) and AR(2) terms enter
the equation. This is heavier dynamics than the cigarette model in Part
3, but it matches the published AB specification.lag(log(wage), 0:1) — both contemporaneous wage
and lagged wage. The 0 means “the contemporaneous
value”, 1 means “the first lag”.| lag(log(emp), 2:99) — instrument block.
2:99 means “use every lag from 2 up to 99 (or as many as
the panel allows)”. This is the full Arellano-Bond moment set.ab_fit <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) +
log(capital) + lag(log(output), 0:1) |
lag(log(emp), 2:99),
data = EmplUK,
effect = "twoways", # unit + time fixed effects (AB-91 spec)
model = "twosteps", # use the optimal two-step weighting matrix
transformation = "d") # difference GMM
sum_ab <- summary(ab_fit, robust = TRUE) # robust = TRUE -> Windmeijer-corrected SEs
sum_ab
## Twoways effects Two-steps model Difference GMM
##
## Call:
## pgmm(formula = log(emp) ~ lag(log(emp), 1:2) + lag(log(wage),
## 0:1) + log(capital) + lag(log(output), 0:1) | lag(log(emp),
## 2:99), data = EmplUK, effect = "twoways", model = "twosteps",
## transformation = "d")
##
## Unbalanced Panel: n = 140, T = 7-9, N = 1031
##
## Number of Observations Used: 611
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.6190677 -0.0255683 0.0000000 -0.0001339 0.0332013 0.6410272
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(log(emp), 1:2)1 0.474151 0.185398 2.5575 0.0105437 *
## lag(log(emp), 1:2)2 -0.052967 0.051749 -1.0235 0.3060506
## lag(log(wage), 0:1)0 -0.513205 0.145565 -3.5256 0.0004225 ***
## lag(log(wage), 0:1)1 0.224640 0.141950 1.5825 0.1135279
## log(capital) 0.292723 0.062627 4.6741 2.953e-06 ***
## lag(log(output), 0:1)0 0.609775 0.156263 3.9022 9.530e-05 ***
## lag(log(output), 0:1)1 -0.446373 0.217302 -2.0542 0.0399605 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(25) = 30.11247 (p-value = 0.22011)
## Autocorrelation test (1): normal = -1.53845 (p-value = 0.12394)
## Autocorrelation test (2): normal = -0.2796829 (p-value = 0.77972)
## Wald test for coefficients: chisq(7) = 142.0353 (p-value = < 2.22e-16)
## Wald test for time dummies: chisq(6) = 16.97046 (p-value = 0.0093924)
What to read off the printout, applying the §2 checklist top-to-bottom:
lag(log(emp), 1), \(\hat\phi_2
\approx -0.05\) on lag(log(emp), 2) — modest
persistence with slight overshooting. Standard errors are the
Windmeijer-corrected two-step SEs because
robust = TRUE.effect = "twoways".Overall, the Difference-GMM fit is a credible candidate model in this application. The Sargan test does not reject, so there is no strong evidence against the overidentifying restrictions. The AR(2) test also does not reject, so the lag-2 instruments remain defensible. The AR(1) test is weaker than the textbook pattern, but this is not the decisive diagnostic; the key threat to lag-2 instruments is second-order serial correlation in the differenced residuals.
When the outcome is highly persistent (\(\phi\) near 1), the moment \(E[y_{i,t-2} \cdot \Delta y_{i,t-1}]\) shrinks toward zero. Lagged levels then become weak instruments for the differenced equation. In macro panels this is common: variables such as employment, income, democracy, or consumption often move slowly over time.
Blundell and Bond’s solution is to estimate a system of two equations. The first equation is the familiar differenced equation from Arellano-Bond. The second equation is the original level equation. The level equation is not estimated by ordinary fixed effects; it is instrumented with lagged differences.
\[ \begin{aligned} \text{Differenced eq:} &\quad \Delta y_{it} = \phi\, \Delta y_{i,t-1} + \beta\, \Delta x_{it} + \Delta\varepsilon_{it} \quad \text{(IVs: } y_{i,t-2}, y_{i,t-3}, \ldots\text{)} \\ \text{Level eq:} &\quad y_{it} = \alpha_i + \phi\, y_{i,t-1} + \beta\, x_{it} + \varepsilon_{it} \quad \text{(IVs: } \Delta y_{i,t-1}, \Delta x_{i,t-1}\text{)} \end{aligned} \]
The extra equation increases the amount of information used by the estimator. It does not literally create more firms or years, but it adds level-equation moments to the difference-equation moments. In that sense, it increases the effective information available for estimating the dynamic coefficients. This can be especially helpful when the differenced equation has weak instruments.
The cost is a stronger assumption. The new moment conditions require lagged differences to be uncorrelated with the unit effect \(\alpha_i\). This is often described as a restriction on the initial conditions: units may have different levels, but their recent changes should not be systematically related to their fixed effects. That assumption is more plausible when the panel is close to a stable long-run process. It is less plausible when units are observed during a transition, a reform period, a crisis, or any setting where units with high fixed effects are also systematically growing faster or slower.
Use System GMM when the outcome is persistent and Difference GMM appears weak or imprecise, but only if the level-equation assumption is substantively defensible and the instrument count remains controlled. Do not use it mechanically. Be cautious when the series have strong trends, when the panel begins far from steady state, when Difference GMM and System GMM tell very different stories, or when the number of instruments approaches the number of units.
In pgmm, system GMM is fit by setting
transformation = "ld" (levels + differences):
bb_fit <- pgmm(log(emp) ~ lag(log(emp), 1) + lag(log(wage), 0:1) +
lag(log(capital), 0:1) |
lag(log(emp), 2:99) + lag(log(wage), 2:99) +
lag(log(capital), 2:99),
data = EmplUK,
effect = "twoways",
model = "onestep",
transformation = "ld") # "ld" = levels + differences (system GMM)
sum_bb <- summary(bb_fit, robust = TRUE)
sum_bb
## Twoways effects One-step model System GMM
##
## Call:
## pgmm(formula = log(emp) ~ lag(log(emp), 1) + lag(log(wage), 0:1) +
## lag(log(capital), 0:1) | lag(log(emp), 2:99) + lag(log(wage),
## 2:99) + lag(log(capital), 2:99), data = EmplUK, effect = "twoways",
## model = "onestep", transformation = "ld")
##
## Unbalanced Panel: n = 140, T = 7-9, N = 1031
##
## Number of Observations Used: 1642
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.7530341 -0.0369030 0.0000000 0.0002882 0.0466069 0.6001503
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(log(emp), 1) 0.935605 0.026295 35.5810 < 2.2e-16 ***
## lag(log(wage), 0:1)0 -0.630976 0.118054 -5.3448 9.050e-08 ***
## lag(log(wage), 0:1)1 0.482620 0.136887 3.5257 0.0004224 ***
## lag(log(capital), 0:1)0 0.483930 0.053867 8.9838 < 2.2e-16 ***
## lag(log(capital), 0:1)1 -0.424393 0.058479 -7.2572 3.952e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(100) = 118.763 (p-value = 0.097096)
## Autocorrelation test (1): normal = -4.808434 (p-value = 1.5212e-06)
## Autocorrelation test (2): normal = -0.2800133 (p-value = 0.77947)
## Wald test for coefficients: chisq(5) = 11174.82 (p-value = < 2.22e-16)
## Wald test for time dummies: chisq(7) = 14.71138 (p-value = 0.039882)
Two things change compared with Difference GMM:
|, not just the lagged DV. Each block contributes its own
moment conditions.The System-GMM output looks different from Difference GMM because it uses additional level-equation moments. The lagged employment coefficient is much larger, which implies stronger persistence. The AR(1) and AR(2) pattern is textbook: AR(1) rejects and AR(2) does not. The Sargan p-value is acceptable but closer to the margin, and the instrument count is larger. This makes System GMM useful as a comparison, but not automatically preferable. It buys precision and persistence by adding stronger assumptions.
GMM minimizes a quadratic form \(\hat g(\theta)' \, W \, \hat g(\theta)\) where \(\hat g\) is the sample moment vector and \(W\) is a weighting matrix. Two strategies:
This “two-step” language can be confusing because it is different from 2SLS. In 2SLS, the first stage predicts the endogenous regressor with the instruments, and the second stage regresses the outcome on that fitted value. In two-step GMM, both steps estimate the same structural coefficients from the same moment conditions; the second step only changes the weighting matrix used to combine the moments.
In pgmm, model = "twosteps" was selected
for AB above. The catch: two-step standard errors are
downward biased in finite samples — sometimes severely.
Windmeijer (2005) derived a correction. Invoke it via
robust = TRUE in summary.pgmm:
# Two-step AB fit, plain SEs
summary(ab_fit, robust = FALSE)$coefficients[, 2] |> head(3)
## lag(log(emp), 1:2)1 lag(log(emp), 1:2)2 lag(log(wage), 0:1)0
## 0.08530307 0.02728433 0.04934539
# Two-step AB fit, Windmeijer-style robust SEs
summary(ab_fit, robust = TRUE)$coefficients[, 2] |> head(3)
## lag(log(emp), 1:2)1 lag(log(emp), 1:2)2 lag(log(wage), 0:1)0
## 0.1853985 0.0517491 0.1455653
The robust SEs in the second call are larger — that’s the Windmeijer correction in action. Always report the robust two-step SEs; the plain ones are not safe in finite samples.
Section 2.6 argued that too many instruments degrades both the
weighting matrix and the Sargan diagnostic. Here the same AB
specification is refit with lag depth capped at 2:6 so the
moment count stays moderate.
Why 2:6? Lag 2 is the first valid lag under the
Arellano-Bond logic: it is far enough back to avoid the mechanical
correlation between \(\Delta
y_{i,t-1}\) and \(\Delta\varepsilon_{it}\). Lags 3 through 6
add more moment conditions, but still keep the instrument set smaller
than the full 2:99 specification. In this dataset, a very
narrow window such as 2:4 is so conservative that it
removes much of the identifying variation for the lagged outcome. The
2:6 window is a better sensitivity check: it reduces the
instrument count while still preserving a plausible amount of employment
persistence.
The important point is not that 2:6 is a universal rule.
The lag window must use valid instruments. If the AR(2)
test rejected, lag 2 would be suspect and the model would need to be
changed. If the Sargan test rejected, the instrument set would also be
questionable. Limiting instruments helps only when the remaining
instruments are valid and relevant; it does not make bad instruments
good.
ab_fit_lag26 <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) +
log(capital) + lag(log(output), 0:1) |
lag(log(emp), 2:6),
data = EmplUK,
effect = "twoways",
model = "twosteps",
transformation = "d")
sum_ab_lag26 <- summary(ab_fit_lag26, robust = TRUE)
sum_ab_lag26
## Twoways effects Two-steps model Difference GMM
##
## Call:
## pgmm(formula = log(emp) ~ lag(log(emp), 1:2) + lag(log(wage),
## 0:1) + log(capital) + lag(log(output), 0:1) | lag(log(emp),
## 2:6), data = EmplUK, effect = "twoways", model = "twosteps",
## transformation = "d")
##
## Unbalanced Panel: n = 140, T = 7-9, N = 1031
##
## Number of Observations Used: 611
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.6339252 -0.0233820 0.0000000 -0.0002934 0.0294973 0.6617657
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(log(emp), 1:2)1 0.354649 0.214933 1.6500 0.098933 .
## lag(log(emp), 1:2)2 -0.044811 0.055247 -0.8111 0.417303
## lag(log(wage), 0:1)0 -0.436421 0.141283 -3.0890 0.002008 **
## lag(log(wage), 0:1)1 0.153272 0.125533 1.2210 0.222097
## log(capital) 0.309765 0.068780 4.5037 6.677e-06 ***
## lag(log(output), 0:1)0 0.569246 0.152034 3.7442 0.000181 ***
## lag(log(output), 0:1)1 -0.297321 0.199777 -1.4883 0.136683
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(22) = 27.24218 (p-value = 0.20217)
## Autocorrelation test (1): normal = -1.009434 (p-value = 0.31277)
## Autocorrelation test (2): normal = -0.1774997 (p-value = 0.85912)
## Wald test for coefficients: chisq(7) = 109.2295 (p-value = < 2.22e-16)
## Wald test for time dummies: chisq(6) = 15.96688 (p-value = 0.013933)
And the diagnostic table across the EmplUK fits from this part:
# fit$W is a list of unit-level instrument matrices.
# length(fit$W[[1]][1, ]) gives the number of instruments for the first unit.
# length(fit$W) gives the number of units used by pgmm.
gmm_diagnostics <- tibble(
model = c("AH-style", "Diff-GMM", "Diff-GMM lag 2:6", "System GMM"),
sargan_p = c(sum_ah$sargan$p.value,
sum_ab$sargan$p.value,
sum_ab_lag26$sargan$p.value,
sum_bb$sargan$p.value),
ar1_p = c(sum_ah$m1$p.value,
sum_ab$m1$p.value,
sum_ab_lag26$m1$p.value,
sum_bb$m1$p.value),
ar2_p = c(sum_ah$m2$p.value,
sum_ab$m2$p.value,
sum_ab_lag26$m2$p.value,
sum_bb$m2$p.value),
n_instruments = c(length(ah_fit$W[[1]][1, ]),
length(ab_fit$W[[1]][1, ]),
length(ab_fit_lag26$W[[1]][1, ]),
length(bb_fit$W[[1]][1, ])),
n_units = c(length(ah_fit$W),
length(ab_fit$W),
length(ab_fit_lag26$W),
length(bb_fit$W))
) |>
mutate(instrument_ratio = round(n_instruments / n_units, 2),
instruments_lt_N = n_instruments < n_units)
gmm_diagnostics |>
knitr::kable(digits = 3,
caption = "Model diagnostics. Preferred pattern: Sargan not rejected, AR(2) not rejected, and instrument count below N.")
| model | sargan_p | ar1_p | ar2_p | n_instruments | n_units | instrument_ratio | instruments_lt_N |
|---|---|---|---|---|---|---|---|
| AH-style | 0.000 | 0.000 | 0.268 | 9 | 140 | 0.06 | TRUE |
| Diff-GMM | 0.220 | 0.124 | 0.780 | 38 | 140 | 0.27 | TRUE |
| Diff-GMM lag 2:6 | 0.202 | 0.313 | 0.859 | 35 | 140 | 0.25 | TRUE |
| System GMM | 0.097 | 0.000 | 0.779 | 113 | 140 | 0.81 | TRUE |
Limiting from 2:99 to 2:6 reduces the
moment count while keeping the same substantive equation. In longer
panels this kind of restriction can be a large reduction; in EmplUK it
is more modest but still illustrates the diagnostic logic. On EmplUK
with \(N = 140\), every spec above
passes the rule of thumb — but the instrument-count logic becomes even
more important when the full instrument set approaches or exceeds \(N\).
The limited model should be read as a sensitivity check. In this
application, 2:6 keeps the Sargan and AR(2) diagnostics in
the acceptable range and recovers a more plausible dynamic coefficient
than a very narrow 2:4 window. That comparison is useful:
it shows that limiting instruments is not only about reducing a number
in the diagnostic table. It can also change the information used to
estimate persistence.
Before putting the models side by side, return to the standard-error problem. The FE-LDV and FD-LDV diagnostics above check for serial correlation within firms and cross-sectional dependence across firms. For these OLS-style baselines, residual dependence is an inference problem: the coefficient may be biased because of dynamic endogeneity, and conventional standard errors are also too optimistic. The table therefore reports firm-clustered standard errors for both baseline columns.
For GMM, the logic is different. The Arellano-Bond AR tests are not merely a prompt to “cluster the standard errors.” The AR(2) test is part of the instrument-validity argument: if second-order serial correlation remains in the differenced residuals, then lag-2 instruments are suspect. Robust GMM standard errors help with heteroskedasticity and finite-sample inference, but they do not rescue invalid moment conditions. If AR(2) rejects, the specification needs to be changed.
Now put the baseline and the GMM estimators side by side. This is the table to read slowly in class:
The choice is not mechanical. FE-LDV is the natural baseline and can be reasonable when \(T\) is not very small. FD-LDV is a useful bridge because it uses the same differenced equation as Difference GMM, but without instruments. Anderson-Hsiao is useful for seeing the IV logic clearly, but it is usually inefficient. Difference GMM is the standard starting point for many short dynamic panels because it removes fixed effects by differencing and uses deeper lags as instruments. If the instrument count grows quickly or estimates change sharply with lag depth, restrict the instrument set. System GMM is most attractive when the outcome is highly persistent and lagged levels are weak instruments in the differenced equation, but it requires the stronger level-equation assumption and can generate many instruments quickly.
# Keep only substantively meaningful coefficients and drop the many time-dummy
# coefficients that pgmm prints for two-way models.
coef_order <- c("lag(log(emp), 1)",
"lag(log(emp), 1:2)1",
"lag(log(emp), 1:2)2",
"d_log_emp_lag1",
"d_log_emp_lag2",
"log(wage)",
"lag(log(wage), 0:1)0",
"lag(log(wage), 0:1)1",
"d_log_wage",
"d_log_wage_lag1",
"log(capital)",
"lag(log(capital), 0:1)0",
"lag(log(capital), 0:1)1",
"d_log_capital",
"lag(log(output), 0:1)0",
"lag(log(output), 0:1)1",
"d_log_output",
"d_log_output_lag1")
coef_labels <- c("lag(log(emp), 1)" = "log(emp) lag 1",
"lag(log(emp), 1:2)1" = "log(emp) lag 1",
"lag(log(emp), 1:2)2" = "log(emp) lag 2",
"d_log_emp_lag1" = "log(emp) lag 1",
"d_log_emp_lag2" = "log(emp) lag 2",
"log(wage)" = "log(wage)",
"lag(log(wage), 0:1)0" = "log(wage)",
"lag(log(wage), 0:1)1" = "log(wage) lag 1",
"d_log_wage" = "log(wage)",
"d_log_wage_lag1" = "log(wage) lag 1",
"log(capital)" = "log(capital)",
"lag(log(capital), 0:1)0" = "log(capital)",
"lag(log(capital), 0:1)1" = "log(capital) lag 1",
"d_log_capital" = "log(capital)",
"lag(log(output), 0:1)0" = "log(output)",
"lag(log(output), 0:1)1" = "log(output) lag 1",
"d_log_output" = "log(output)",
"d_log_output_lag1" = "log(output) lag 1")
# Extract FE-LDV coefficients with firm-clustered standard errors.
fe_names <- intersect(coef_order, rownames(fe_clustered))
fe_tab <- fe_clustered[fe_names, , drop = FALSE]
# Extract FD-LDV coefficients with firm-clustered standard errors.
fd_names <- intersect(coef_order, rownames(fd_clustered))
fd_tab <- fd_clustered[fd_names, , drop = FALSE]
# Extract GMM coefficients from the robust pgmm summaries.
ah_names <- intersect(coef_order, rownames(sum_ah$coefficients))
ab_names <- intersect(coef_order, rownames(sum_ab$coefficients))
ab26_names <- intersect(coef_order, rownames(sum_ab_lag26$coefficients))
bb_names <- intersect(coef_order, rownames(sum_bb$coefficients))
ah_tab <- sum_ah$coefficients[ah_names, , drop = FALSE]
ab_tab <- sum_ab$coefficients[ab_names, , drop = FALSE]
ab26_tab <- sum_ab_lag26$coefficients[ab26_names, , drop = FALSE]
bb_tab <- sum_bb$coefficients[bb_names, , drop = FALSE]
# Create clean texreg objects manually. This avoids default GOF rows such as
# Wald chi-square statistics that are less useful here than the GMM diagnostics.
tr_fe <- texreg::createTexreg(
coef.names = unname(coef_labels[fe_names]),
coef = fe_tab[, 1],
se = fe_tab[, 2],
pvalues = fe_tab[, 4],
gof.names = c("Serial corr. p", "Cross-sec dep. p", "Instruments", "Instruments/N",
"Sargan p", "AR(1) p", "AR(2) p"),
gof = c(fe_serial$p.value, fe_csd$p.value, NA, NA, NA, NA, NA),
gof.decimal = c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE),
model.name = "FE-LDV"
)
tr_fd <- texreg::createTexreg(
coef.names = unname(coef_labels[fd_names]),
coef = fd_tab[, 1],
se = fd_tab[, 2],
pvalues = fd_tab[, 4],
gof.names = c("Serial corr. p", "Cross-sec dep. p", "Instruments", "Instruments/N",
"Sargan p", "AR(1) p", "AR(2) p"),
gof = c(fd_serial$p.value, fd_csd$p.value, NA, NA, NA, NA, NA),
gof.decimal = c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE),
model.name = "FD-LDV"
)
tr_ah <- texreg::createTexreg(
coef.names = unname(coef_labels[ah_names]),
coef = ah_tab[, 1],
se = ah_tab[, 2],
pvalues = ah_tab[, 4],
gof.names = c("Serial corr. p", "Cross-sec dep. p", "Instruments", "Instruments/N",
"Sargan p", "AR(1) p", "AR(2) p"),
gof = c(NA,
NA,
length(ah_fit$W[[1]][1, ]),
length(ah_fit$W[[1]][1, ]) / length(ah_fit$W),
sum_ah$sargan$p.value,
sum_ah$m1$p.value,
sum_ah$m2$p.value),
gof.decimal = c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE),
model.name = "AH-style"
)
tr_ab <- texreg::createTexreg(
coef.names = unname(coef_labels[ab_names]),
coef = ab_tab[, 1],
se = ab_tab[, 2],
pvalues = ab_tab[, 4],
gof.names = c("Serial corr. p", "Cross-sec dep. p", "Instruments", "Instruments/N",
"Sargan p", "AR(1) p", "AR(2) p"),
gof = c(NA,
NA,
length(ab_fit$W[[1]][1, ]),
length(ab_fit$W[[1]][1, ]) / length(ab_fit$W),
sum_ab$sargan$p.value,
sum_ab$m1$p.value,
sum_ab$m2$p.value),
gof.decimal = c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE),
model.name = "Diff-GMM"
)
tr_ab26 <- texreg::createTexreg(
coef.names = unname(coef_labels[ab26_names]),
coef = ab26_tab[, 1],
se = ab26_tab[, 2],
pvalues = ab26_tab[, 4],
gof.names = c("Serial corr. p", "Cross-sec dep. p", "Instruments", "Instruments/N",
"Sargan p", "AR(1) p", "AR(2) p"),
gof = c(NA,
NA,
length(ab_fit_lag26$W[[1]][1, ]),
length(ab_fit_lag26$W[[1]][1, ]) / length(ab_fit_lag26$W),
sum_ab_lag26$sargan$p.value,
sum_ab_lag26$m1$p.value,
sum_ab_lag26$m2$p.value),
gof.decimal = c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE),
model.name = "Diff-GMM lag 2:6"
)
tr_bb <- texreg::createTexreg(
coef.names = unname(coef_labels[bb_names]),
coef = bb_tab[, 1],
se = bb_tab[, 2],
pvalues = bb_tab[, 4],
gof.names = c("Serial corr. p", "Cross-sec dep. p", "Instruments", "Instruments/N",
"Sargan p", "AR(1) p", "AR(2) p"),
gof = c(NA,
NA,
length(bb_fit$W[[1]][1, ]),
length(bb_fit$W[[1]][1, ]) / length(bb_fit$W),
sum_bb$sargan$p.value,
sum_bb$m1$p.value,
sum_bb$m2$p.value),
gof.decimal = c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE),
model.name = "System GMM"
)
model_list <- list(tr_fe, tr_fd, tr_ah, tr_ab, tr_ab26, tr_bb)
if (knitr::is_html_output()) {
texreg::htmlreg(
model_list,
caption = "Dynamic labor-demand models on EmplUK. FE-LDV and FD-LDV report firm-clustered SEs; GMM columns report robust/Windmeijer-corrected SEs where applicable.",
digits = 3,
doctype = FALSE
)
} else {
texreg::texreg(
model_list,
caption = "Dynamic labor-demand models on EmplUK. FE-LDV and FD-LDV report firm-clustered SEs; GMM columns report robust/Windmeijer-corrected SEs where applicable.",
digits = 3,
table = FALSE
)
}
| FE-LDV | FD-LDV | AH-style | Diff-GMM | Diff-GMM lag 2:6 | System GMM | |
|---|---|---|---|---|---|---|
| log(emp) lag 1 | 0.700*** | 0.199*** | 0.802*** | 0.474* | 0.355 | 0.936*** |
| (0.062) | (0.042) | (0.157) | (0.185) | (0.215) | (0.026) | |
| log(emp) lag 2 | -0.169* | -0.017 | -0.053 | -0.045 | ||
| (0.072) | (0.031) | (0.052) | (0.055) | |||
| log(wage) | -0.559*** | -0.565*** | -0.631** | -0.513*** | -0.436** | -0.631*** |
| (0.152) | (0.168) | (0.196) | (0.146) | (0.141) | (0.118) | |
| log(wage) lag 1 | 0.293* | 0.153* | 0.225 | 0.153 | 0.483*** | |
| (0.135) | (0.077) | (0.142) | (0.126) | (0.137) | ||
| log(capital) | 0.348*** | 0.402*** | 0.241*** | 0.293*** | 0.310*** | 0.484*** |
| (0.049) | (0.059) | (0.056) | (0.063) | (0.069) | (0.054) | |
| log(output) | 0.491** | 0.600*** | 0.610*** | 0.569*** | ||
| (0.170) | (0.173) | (0.156) | (0.152) | |||
| log(output) lag 1 | -0.607*** | -0.369* | -0.446* | -0.297 | ||
| (0.177) | (0.160) | (0.217) | (0.200) | |||
| log(capital) lag 1 | -0.424*** | |||||
| (0.058) | ||||||
| Serial corr. p | 0.000 | 0.005 | ||||
| Cross-sec dep. p | 0.274 | 0.605 | ||||
| Instruments | 9 | 38 | 35 | 113 | ||
| Instruments/N | 0.064 | 0.271 | 0.250 | 0.807 | ||
| Sargan p | 0.000 | 0.220 | 0.202 | 0.097 | ||
| AR(1) p | 0.000 | 0.124 | 0.313 | 0.000 | ||
| AR(2) p | 0.268 | 0.780 | 0.859 | 0.779 | ||
| ***p < 0.001; **p < 0.01; *p < 0.05 | ||||||
The degrees of freedom from Wald or chi-square tests are not shown here because they mainly summarize omnibus model tests. They are mathematically interpretable — they count the restrictions being tested — but they are not the diagnostic quantities needed for this workflow. The relevant checks are serial correlation and cross-sectional dependence for the OLS-style baselines, and Sargan/AR(1)/AR(2)/instrument count for the GMM columns.
The coefficients are not expected to be identical. The Anderson-Hsiao column is intentionally simpler than the published AB-style specification, and the System-GMM column uses a closely related but not identical equation because system GMM is more sensitive to instrument growth in this short panel. The point of the table is to ask whether the substantive story travels across estimators: employment is persistent, wages tend to reduce employment, capital and output tend to raise employment, and the GMM estimators give us alternatives to the Nickell-biased FE-LDV benchmark.
After the estimator comparison, the last step is to translate coefficients into quantities of interest. The forecast below asks what the fitted dynamics imply for employment under a concrete wage scenario. The exercise uses the two main GMM estimators from the table: Difference GMM and System GMM.
There are several useful ways to turn a fitted dynamic panel model into interpretable quantities:
simcf.
This is the main applied tool. It uses the estimated dynamic equation,
simulated coefficients, initial outcome values, and a hypothetical
covariate path to generate expected outcomes, first differences, and
relative risks.The scenario throughout this section is:
What would expected employment look like over the next six forecast periods if firm wages increased by 10%, holding the other included covariates fixed?
Wages are the clearest labor-demand shock in this dataset. A wage increase raises the cost of labor, so the expected sign is negative: firms should employ fewer workers, all else equal.
This section compares two estimators from the table above:
ab_fit) estimates the
differenced equation. It uses lagged levels of employment as instruments
for the differenced lagged outcome. In this lab it has two employment
lags, current and lagged wages, capital, and current and lagged
output.bb_fit) stacks the
differenced equation with a level equation. It can be more efficient
when employment is persistent, but it adds the stronger assumption that
lagged differences are valid instruments for the level equation. In this
lab it has one employment lag, current and lagged wages, and current and
lagged capital.So the predictions below compare both the estimator and the dynamic equation. A difference in forecasts should not be read as only “Diff-GMM vs System-GMM”; it also reflects AR(2) versus AR(1) dynamics and a slightly different control set.
This matters for interpretation. Difference GMM often has wider intervals in persistent panels because lagged levels can be weak instruments for changes. That uncertainty enters the coefficient draws and then compounds as the dynamic model is iterated forward. System GMM can produce tighter intervals because it adds level-equation moments, increasing the effective information used by the estimator. That precision comes from stronger assumptions, especially the validity of lagged differences as instruments for the level equation.
The shapes can also differ. In this lab, Difference GMM uses two employment lags, and the second lag partly offsets the first. That makes the response look relatively flat after the first few periods. System GMM uses one employment lag with stronger persistence, so the forecast adjusts more gradually over time.
simcf forecast: 10% wage increaseThe fitted models are in logs, so a 10% increase in wages is approximately:
\[ \log(1.10). \]
The forecast horizon should be read relative to the empirical panel.
EmplUK observes firms for only 7 to 9 years, so a
six-period forecast is already a fairly long extrapolation from a short
time window. There is no universal rule for the maximum valid horizon,
but the farther the forecast moves beyond the observed panel, the more
it depends on the model’s dynamic assumptions rather than on observed
data. For that reason, long horizons should be treated as model-implied
scenarios, not as precise forecasts.
set.seed(20260522)
# Forecast horizon: six future years after the last observed year, 1984.
# This is long enough to see the dynamic adjustment, but still short enough
# that the forecast is not entirely driven by the model's long-run extrapolation.
empl_periods_out <- 6
# Number of coefficient simulations.
empl_sims <- 1000
# Convert the data to a tibble and anchor the forecast at the last two years.
empl_df <- as_tibble(EmplUK)
empl_last <- empl_df |> filter(year == 1984)
empl_prev <- empl_df |> filter(year == 1983)
# A 10% increase in wages on the log scale.
wage_shock <- log(1.10)
# Last observed average log employment levels.
# Difference GMM has two lags of employment; System GMM has one.
empl_lagY_ab <- c(mean(log(empl_last$emp), na.rm = TRUE),
mean(log(empl_prev$emp), na.rm = TRUE))
empl_lagY_bb <- mean(log(empl_last$emp), na.rm = TRUE)
Now set up the Difference-GMM forecast. simcf takes the
lagged-outcome coefficients through phi = and the other
coefficients through b =.
# Simulate coefficient uncertainty from the Windmeijer-corrected covariance
# matrix of the full Difference-GMM model.
empl_draws_ab <- mvrnorm(n = empl_sims,
mu = coef(ab_fit),
Sigma = vcovHC(ab_fit))
# Difference GMM has two lagged-outcome coefficients.
empl_phi_ab <- empl_draws_ab[, c("lag(log(emp), 1:2)1",
"lag(log(emp), 1:2)2")]
# Keep only the substantive covariates used in the forecast.
# The year dummies are omitted, so future year effects are set to zero.
empl_beta_names_ab <- c("lag(log(wage), 0:1)0",
"lag(log(wage), 0:1)1",
"log(capital)",
"lag(log(output), 0:1)0",
"lag(log(output), 0:1)1")
empl_beta_ab <- empl_draws_ab[, empl_beta_names_ab]
# Baseline path: hold wages, capital, and output at their 1984 averages.
empl_xbase_ab <- matrix(NA,
nrow = empl_periods_out,
ncol = length(empl_beta_names_ab))
colnames(empl_xbase_ab) <- empl_beta_names_ab
empl_xbase_ab[, "lag(log(wage), 0:1)0"] <- mean(log(empl_last$wage), na.rm = TRUE)
empl_xbase_ab[, "lag(log(wage), 0:1)1"] <- mean(log(empl_last$wage), na.rm = TRUE)
empl_xbase_ab[, "log(capital)"] <- mean(log(empl_last$capital), na.rm = TRUE)
empl_xbase_ab[, "lag(log(output), 0:1)0"] <- mean(log(empl_last$output), na.rm = TRUE)
empl_xbase_ab[, "lag(log(output), 0:1)1"] <- mean(log(empl_last$output), na.rm = TRUE)
# Treatment path: wages are 10% higher in every future year.
# The lagged-wage column changes only from horizon 2 onward because the first
# forecast year still lags the observed 1984 wage level.
empl_xhyp_ab <- empl_xbase_ab
empl_xhyp_ab[, "lag(log(wage), 0:1)0"] <- empl_xhyp_ab[, "lag(log(wage), 0:1)0"] +
wage_shock
empl_xhyp_ab[2:empl_periods_out, "lag(log(wage), 0:1)1"] <-
empl_xhyp_ab[2:empl_periods_out, "lag(log(wage), 0:1)1"] + wage_shock
empl_xbase_ab
## lag(log(wage), 0:1)0 lag(log(wage), 0:1)1 log(capital)
## [1,] 3.107475 3.107475 -1.192489
## [2,] 3.107475 3.107475 -1.192489
## [3,] 3.107475 3.107475 -1.192489
## [4,] 3.107475 3.107475 -1.192489
## [5,] 3.107475 3.107475 -1.192489
## [6,] 3.107475 3.107475 -1.192489
## lag(log(output), 0:1)0 lag(log(output), 0:1)1
## [1,] 4.58738 4.58738
## [2,] 4.58738 4.58738
## [3,] 4.58738 4.58738
## [4,] 4.58738 4.58738
## [5,] 4.58738 4.58738
## [6,] 4.58738 4.58738
empl_xhyp_ab
## lag(log(wage), 0:1)0 lag(log(wage), 0:1)1 log(capital)
## [1,] 3.202785 3.107475 -1.192489
## [2,] 3.202785 3.202785 -1.192489
## [3,] 3.202785 3.202785 -1.192489
## [4,] 3.202785 3.202785 -1.192489
## [5,] 3.202785 3.202785 -1.192489
## [6,] 3.202785 3.202785 -1.192489
## lag(log(output), 0:1)0 lag(log(output), 0:1)1
## [1,] 4.58738 4.58738
## [2,] 4.58738 4.58738
## [3,] 4.58738 4.58738
## [4,] 4.58738 4.58738
## [5,] 4.58738 4.58738
## [6,] 4.58738 4.58738
Now set up the System-GMM forecast. This model has one lag of employment and uses current and lagged wages and capital.
# Simulate coefficient uncertainty from the robust covariance matrix of the
# System-GMM model.
empl_draws_bb <- mvrnorm(n = empl_sims,
mu = coef(bb_fit),
Sigma = vcovHC(bb_fit))
# System GMM has one lagged-outcome coefficient in this lab specification.
empl_phi_bb <- matrix(empl_draws_bb[, "lag(log(emp), 1)"],
ncol = 1)
colnames(empl_phi_bb) <- "lag(log(emp), 1)"
# Keep the substantive covariates in the System-GMM equation.
empl_beta_names_bb <- c("lag(log(wage), 0:1)0",
"lag(log(wage), 0:1)1",
"lag(log(capital), 0:1)0",
"lag(log(capital), 0:1)1")
empl_beta_bb <- empl_draws_bb[, empl_beta_names_bb]
# Baseline path: hold wages and capital at their 1984 averages.
empl_xbase_bb <- matrix(NA,
nrow = empl_periods_out,
ncol = length(empl_beta_names_bb))
colnames(empl_xbase_bb) <- empl_beta_names_bb
empl_xbase_bb[, "lag(log(wage), 0:1)0"] <- mean(log(empl_last$wage), na.rm = TRUE)
empl_xbase_bb[, "lag(log(wage), 0:1)1"] <- mean(log(empl_last$wage), na.rm = TRUE)
empl_xbase_bb[, "lag(log(capital), 0:1)0"] <- mean(log(empl_last$capital), na.rm = TRUE)
empl_xbase_bb[, "lag(log(capital), 0:1)1"] <- mean(log(empl_last$capital), na.rm = TRUE)
# Treatment path: wages are 10% higher in every future year.
empl_xhyp_bb <- empl_xbase_bb
empl_xhyp_bb[, "lag(log(wage), 0:1)0"] <- empl_xhyp_bb[, "lag(log(wage), 0:1)0"] +
wage_shock
empl_xhyp_bb[2:empl_periods_out, "lag(log(wage), 0:1)1"] <-
empl_xhyp_bb[2:empl_periods_out, "lag(log(wage), 0:1)1"] + wage_shock
empl_xbase_bb
## lag(log(wage), 0:1)0 lag(log(wage), 0:1)1 lag(log(capital), 0:1)0
## [1,] 3.107475 3.107475 -1.192489
## [2,] 3.107475 3.107475 -1.192489
## [3,] 3.107475 3.107475 -1.192489
## [4,] 3.107475 3.107475 -1.192489
## [5,] 3.107475 3.107475 -1.192489
## [6,] 3.107475 3.107475 -1.192489
## lag(log(capital), 0:1)1
## [1,] -1.192489
## [2,] -1.192489
## [3,] -1.192489
## [4,] -1.192489
## [5,] -1.192489
## [6,] -1.192489
empl_xhyp_bb
## lag(log(wage), 0:1)0 lag(log(wage), 0:1)1 lag(log(capital), 0:1)0
## [1,] 3.202785 3.107475 -1.192489
## [2,] 3.202785 3.202785 -1.192489
## [3,] 3.202785 3.202785 -1.192489
## [4,] 3.202785 3.202785 -1.192489
## [5,] 3.202785 3.202785 -1.192489
## [6,] 3.202785 3.202785 -1.192489
## lag(log(capital), 0:1)1
## [1,] -1.192489
## [2,] -1.192489
## [3,] -1.192489
## [4,] -1.192489
## [5,] -1.192489
## [6,] -1.192489
Now simulate expected values, first differences, and relative risks.
Because the outcome is logged employment, transform = "log"
returns expected values on the employment scale rather than the
log-employment scale.
# Difference-GMM expected values under the wage shock and baseline.
empl_ev_treat_ab <- ldvsimev(empl_xhyp_ab, b = empl_beta_ab, ci = 0.95,
constant = NA, phi = empl_phi_ab,
lagY = empl_lagY_ab, transform = "log")
empl_ev_base_ab <- ldvsimev(empl_xbase_ab, b = empl_beta_ab, ci = 0.95,
constant = NA, phi = empl_phi_ab,
lagY = empl_lagY_ab, transform = "log")
# Difference-GMM first difference and relative risk.
empl_fd_ab <- ldvsimfd(empl_xhyp_ab, b = empl_beta_ab, ci = 0.95,
constant = NA, xpre = empl_xbase_ab,
phi = empl_phi_ab, lagY = empl_lagY_ab,
transform = "log")
empl_rr_ab <- ldvsimrr(empl_xhyp_ab, b = empl_beta_ab, ci = 0.95,
constant = NA, xpre = empl_xbase_ab,
phi = empl_phi_ab, lagY = empl_lagY_ab,
transform = "log")
# System-GMM expected values under the wage shock and baseline.
empl_ev_treat_bb <- ldvsimev(empl_xhyp_bb, b = empl_beta_bb, ci = 0.95,
constant = NA, phi = empl_phi_bb,
lagY = empl_lagY_bb, transform = "log")
empl_ev_base_bb <- ldvsimev(empl_xbase_bb, b = empl_beta_bb, ci = 0.95,
constant = NA, phi = empl_phi_bb,
lagY = empl_lagY_bb, transform = "log")
# System-GMM first difference and relative risk.
empl_fd_bb <- ldvsimfd(empl_xhyp_bb, b = empl_beta_bb, ci = 0.95,
constant = NA, xpre = empl_xbase_bb,
phi = empl_phi_bb, lagY = empl_lagY_bb,
transform = "log")
empl_rr_bb <- ldvsimrr(empl_xhyp_bb, b = empl_beta_bb, ci = 0.95,
constant = NA, xpre = empl_xbase_bb,
phi = empl_phi_bb, lagY = empl_lagY_bb,
transform = "log")
# Put the main quantities of interest in one table.
empl_qoi <- bind_rows(
tibble(model = "Diff-GMM",
quantity = "EV: wage +10%",
horizon = seq_len(empl_periods_out),
pe = as.numeric(empl_ev_treat_ab$pe),
lower = as.numeric(empl_ev_treat_ab$lower[, 1]),
upper = as.numeric(empl_ev_treat_ab$upper[, 1])),
tibble(model = "Diff-GMM",
quantity = "FD: difference in employment",
horizon = seq_len(empl_periods_out),
pe = as.numeric(empl_fd_ab$pe),
lower = as.numeric(empl_fd_ab$lower[, 1]),
upper = as.numeric(empl_fd_ab$upper[, 1])),
tibble(model = "Diff-GMM",
quantity = "RR: percent change",
horizon = seq_len(empl_periods_out),
pe = 100 * (as.numeric(empl_rr_ab$pe) - 1),
lower = 100 * (as.numeric(empl_rr_ab$lower[, 1]) - 1),
upper = 100 * (as.numeric(empl_rr_ab$upper[, 1]) - 1)),
tibble(model = "System GMM",
quantity = "EV: wage +10%",
horizon = seq_len(empl_periods_out),
pe = as.numeric(empl_ev_treat_bb$pe),
lower = as.numeric(empl_ev_treat_bb$lower[, 1]),
upper = as.numeric(empl_ev_treat_bb$upper[, 1])),
tibble(model = "System GMM",
quantity = "FD: difference in employment",
horizon = seq_len(empl_periods_out),
pe = as.numeric(empl_fd_bb$pe),
lower = as.numeric(empl_fd_bb$lower[, 1]),
upper = as.numeric(empl_fd_bb$upper[, 1])),
tibble(model = "System GMM",
quantity = "RR: percent change",
horizon = seq_len(empl_periods_out),
pe = 100 * (as.numeric(empl_rr_bb$pe) - 1),
lower = 100 * (as.numeric(empl_rr_bb$lower[, 1]) - 1),
upper = 100 * (as.numeric(empl_rr_bb$upper[, 1]) - 1))
)
# The full empl_qoi object is used below for plotting. To keep the printed
# table readable, show only the first five horizons for the percent-change
# quantity from each estimator.
empl_qoi |>
filter(quantity == "RR: percent change") |>
group_by(model) |>
slice_head(n = 5) |>
ungroup() |>
knitr::kable(digits = 3,
caption = "Preview of the counterfactual percent-change response to a sustained 10% wage increase: first five horizons from each estimator.")
| model | quantity | horizon | pe | lower | upper |
|---|---|---|---|---|---|
| Diff-GMM | RR: percent change | 1 | -4.782 | -7.387 | -2.211 |
| Diff-GMM | RR: percent change | 2 | -4.928 | -7.265 | -2.555 |
| Diff-GMM | RR: percent change | 3 | -4.796 | -7.258 | -2.155 |
| Diff-GMM | RR: percent change | 4 | -4.699 | -7.298 | -1.845 |
| Diff-GMM | RR: percent change | 5 | -4.654 | -7.364 | -1.584 |
| System GMM | RR: percent change | 1 | -5.842 | -7.850 | -3.726 |
| System GMM | RR: percent change | 2 | -6.805 | -8.977 | -4.761 |
| System GMM | RR: percent change | 3 | -7.697 | -10.436 | -5.051 |
| System GMM | RR: percent change | 4 | -8.525 | -11.887 | -5.161 |
| System GMM | RR: percent change | 5 | -9.296 | -13.423 | -5.334 |
The same objects can be plotted directly. The first row shows expected employment under the baseline and the 10% wage scenario. The second row shows the first difference: treatment employment minus baseline employment.
# Expected-value paths for both models.
empl_ev_plot <- bind_rows(
tibble(model = "Diff-GMM", quantity = "Baseline",
horizon = seq_len(empl_periods_out),
pe = as.numeric(empl_ev_base_ab$pe),
lower = as.numeric(empl_ev_base_ab$lower[, 1]),
upper = as.numeric(empl_ev_base_ab$upper[, 1])),
tibble(model = "Diff-GMM", quantity = "Wage +10%",
horizon = seq_len(empl_periods_out),
pe = as.numeric(empl_ev_treat_ab$pe),
lower = as.numeric(empl_ev_treat_ab$lower[, 1]),
upper = as.numeric(empl_ev_treat_ab$upper[, 1])),
tibble(model = "System GMM", quantity = "Baseline",
horizon = seq_len(empl_periods_out),
pe = as.numeric(empl_ev_base_bb$pe),
lower = as.numeric(empl_ev_base_bb$lower[, 1]),
upper = as.numeric(empl_ev_base_bb$upper[, 1])),
tibble(model = "System GMM", quantity = "Wage +10%",
horizon = seq_len(empl_periods_out),
pe = as.numeric(empl_ev_treat_bb$pe),
lower = as.numeric(empl_ev_treat_bb$lower[, 1]),
upper = as.numeric(empl_ev_treat_bb$upper[, 1]))
)
# First differences for both models.
empl_fd_plot <- bind_rows(
tibble(model = "Diff-GMM", quantity = "Difference",
horizon = seq_len(empl_periods_out),
pe = as.numeric(empl_fd_ab$pe),
lower = as.numeric(empl_fd_ab$lower[, 1]),
upper = as.numeric(empl_fd_ab$upper[, 1])),
tibble(model = "System GMM", quantity = "Difference",
horizon = seq_len(empl_periods_out),
pe = as.numeric(empl_fd_bb$pe),
lower = as.numeric(empl_fd_bb$lower[, 1]),
upper = as.numeric(empl_fd_bb$upper[, 1]))
)
# Combine expected values and differences into one faceted figure.
empl_simcf_plot <- bind_rows(
empl_ev_plot |>
mutate(panel = "Expected employment"),
empl_fd_plot |>
mutate(panel = "Employment difference")
)
# Use the same estimator colors in every Part 2 estimator plot.
gmm_colors <- c("Diff-GMM" = "steelblue",
"System GMM" = "firebrick")
ggplot(empl_simcf_plot,
aes(x = horizon, y = pe,
colour = model, fill = model,
linetype = quantity,
group = interaction(model, quantity))) +
geom_hline(data = tibble(panel = "Employment difference"),
aes(yintercept = 0), inherit.aes = FALSE,
linetype = "dashed", colour = "grey50") +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.12,
colour = NA) +
geom_line(linewidth = 0.9) +
geom_point(size = 2) +
facet_grid(panel ~ model, scales = "free_y") +
scale_x_continuous(breaks = seq_len(empl_periods_out)) +
scale_colour_manual(values = gmm_colors) +
scale_fill_manual(values = gmm_colors) +
scale_linetype_manual(values = c("Baseline" = "dashed",
"Wage +10%" = "solid",
"Difference" = "solid")) +
labs(x = "Forecast horizon", y = NULL,
colour = "Estimator", fill = "Estimator",
linetype = "Quantity") +
theme_minimal(base_size = 11) +
theme(legend.position = "bottom")
Simulated employment paths and counterfactual difference from a sustained 10% wage increase.
The first visual difference between the two estimators is the adjustment pattern. The Difference-GMM forecast reaches most of its implied effect quickly and then changes only modestly after the first few horizons. The System-GMM forecast is more gradual because more of the previous-period response is carried forward.
The comparison is useful because the models rest on different identifying assumptions. Difference GMM is less demanding, but in persistent panels lagged levels may be weak instruments for differences. System GMM adds more information by using a level equation, but only if lagged differences are valid instruments for that level equation. If the two forecasts differ, read that difference together with the diagnostics, the instrument count, and the substantive plausibility of the wage effect.
This subsection is optional and shows what simcf is
doing under the hood. The calculation is just the recursive forecast
from the estimated dynamic model. The Difference-GMM model is AR(2):
\[ y_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + x_t\beta, \]
while the System-GMM model used here is AR(1):
\[ y_t = \phi_1 y_{t-1} + x_t\beta. \]
The manual forecast below uses the same simulated coefficients, the same baseline scenario, and the same 10% wage scenario.
# Number of coefficient draws and forecast periods.
n_sims <- empl_sims
H <- empl_periods_out
# Empty matrices to store Difference-GMM forecast paths.
manual_log_base_ab <- matrix(NA, nrow = n_sims, ncol = H)
manual_log_treat_ab <- matrix(NA, nrow = n_sims, ncol = H)
# Empty matrices to store System-GMM forecast paths.
manual_log_base_bb <- matrix(NA, nrow = n_sims, ncol = H)
manual_log_treat_bb <- matrix(NA, nrow = n_sims, ncol = H)
# Difference-GMM lag coefficients.
phi1_ab <- empl_phi_ab[, "lag(log(emp), 1:2)1"]
phi2_ab <- empl_phi_ab[, "lag(log(emp), 1:2)2"]
# System-GMM lag coefficient.
phi1_bb <- as.numeric(empl_phi_bb[, "lag(log(emp), 1)"])
# Forecast each horizon recursively. The loop is useful because horizon h
# depends on previous forecasted outcomes.
for (h in seq_len(H)) {
# Difference-GMM lag 1.
if (h == 1) {
lag1_base_ab <- empl_lagY_ab[1]
lag1_treat_ab <- empl_lagY_ab[1]
} else {
lag1_base_ab <- manual_log_base_ab[, h - 1]
lag1_treat_ab <- manual_log_treat_ab[, h - 1]
}
# Difference-GMM lag 2.
if (h == 1) {
lag2_base_ab <- empl_lagY_ab[2]
lag2_treat_ab <- empl_lagY_ab[2]
} else if (h == 2) {
lag2_base_ab <- empl_lagY_ab[1]
lag2_treat_ab <- empl_lagY_ab[1]
} else {
lag2_base_ab <- manual_log_base_ab[, h - 2]
lag2_treat_ab <- manual_log_treat_ab[, h - 2]
}
# Difference-GMM x_t beta.
xb_base_ab_h <- as.numeric(empl_beta_ab %*% empl_xbase_ab[h, ])
xb_treat_ab_h <- as.numeric(empl_beta_ab %*% empl_xhyp_ab[h, ])
# Difference-GMM recursive forecast on the log scale.
manual_log_base_ab[, h] <-
phi1_ab * lag1_base_ab + phi2_ab * lag2_base_ab + xb_base_ab_h
manual_log_treat_ab[, h] <-
phi1_ab * lag1_treat_ab + phi2_ab * lag2_treat_ab + xb_treat_ab_h
# System-GMM lag 1.
if (h == 1) {
lag1_base_bb <- empl_lagY_bb
lag1_treat_bb <- empl_lagY_bb
} else {
lag1_base_bb <- manual_log_base_bb[, h - 1]
lag1_treat_bb <- manual_log_treat_bb[, h - 1]
}
# System-GMM x_t beta.
xb_base_bb_h <- as.numeric(empl_beta_bb %*% empl_xbase_bb[h, ])
xb_treat_bb_h <- as.numeric(empl_beta_bb %*% empl_xhyp_bb[h, ])
# System-GMM recursive forecast on the log scale.
manual_log_base_bb[, h] <- phi1_bb * lag1_base_bb + xb_base_bb_h
manual_log_treat_bb[, h] <- phi1_bb * lag1_treat_bb + xb_treat_bb_h
}
# Convert from log employment to employment.
manual_base_ab <- exp(manual_log_base_ab)
manual_treat_ab <- exp(manual_log_treat_ab)
manual_fd_ab <- manual_treat_ab - manual_base_ab
manual_rr_ab <- manual_treat_ab / manual_base_ab
manual_base_bb <- exp(manual_log_base_bb)
manual_treat_bb <- exp(manual_log_treat_bb)
manual_fd_bb <- manual_treat_bb - manual_base_bb
manual_rr_bb <- manual_treat_bb / manual_base_bb
# Summarize Difference-GMM simulated paths by horizon.
manual_fd_tab_ab <- tibble(
model = "Diff-GMM",
quantity = "FD: difference in employment",
horizon = seq_len(H),
pe = colMeans(manual_fd_ab),
lower = apply(manual_fd_ab, 2, quantile, probs = 0.025),
upper = apply(manual_fd_ab, 2, quantile, probs = 0.975)
)
manual_rr_tab_ab <- tibble(
model = "Diff-GMM",
quantity = "RR: percent change",
horizon = seq_len(H),
pe = 100 * (colMeans(manual_rr_ab) - 1),
lower = 100 * (apply(manual_rr_ab, 2, quantile, probs = 0.025) - 1),
upper = 100 * (apply(manual_rr_ab, 2, quantile, probs = 0.975) - 1)
)
# Summarize System-GMM simulated paths by horizon.
manual_fd_tab_bb <- tibble(
model = "System GMM",
quantity = "FD: difference in employment",
horizon = seq_len(H),
pe = colMeans(manual_fd_bb),
lower = apply(manual_fd_bb, 2, quantile, probs = 0.025),
upper = apply(manual_fd_bb, 2, quantile, probs = 0.975)
)
manual_rr_tab_bb <- tibble(
model = "System GMM",
quantity = "RR: percent change",
horizon = seq_len(H),
pe = 100 * (colMeans(manual_rr_bb) - 1),
lower = 100 * (apply(manual_rr_bb, 2, quantile, probs = 0.025) - 1),
upper = 100 * (apply(manual_rr_bb, 2, quantile, probs = 0.975) - 1)
)
manual_qoi <- bind_rows(manual_fd_tab_ab, manual_rr_tab_ab,
manual_fd_tab_bb, manual_rr_tab_bb)
# Again, keep the full manual_qoi object for plotting, but print a compact
# preview: five horizons from each estimator for the percent-change quantity.
manual_qoi |>
filter(quantity == "RR: percent change") |>
group_by(model) |>
slice_head(n = 5) |>
ungroup() |>
knitr::kable(digits = 3,
caption = "Preview of the manual recursive percent-change forecast for a sustained 10% wage increase: first five horizons from each estimator.")
| model | quantity | horizon | pe | lower | upper |
|---|---|---|---|---|---|
| Diff-GMM | RR: percent change | 1 | -4.782 | -7.387 | -2.211 |
| Diff-GMM | RR: percent change | 2 | -4.928 | -7.265 | -2.555 |
| Diff-GMM | RR: percent change | 3 | -4.796 | -7.258 | -2.155 |
| Diff-GMM | RR: percent change | 4 | -4.699 | -7.298 | -1.845 |
| Diff-GMM | RR: percent change | 5 | -4.654 | -7.364 | -1.584 |
| System GMM | RR: percent change | 1 | -5.842 | -7.850 | -3.726 |
| System GMM | RR: percent change | 2 | -6.805 | -8.977 | -4.761 |
| System GMM | RR: percent change | 3 | -7.697 | -10.436 | -5.051 |
| System GMM | RR: percent change | 4 | -8.525 | -11.887 | -5.161 |
| System GMM | RR: percent change | 5 | -9.296 | -13.423 | -5.334 |
The manual calculation follows the same logic as
ldvsimev(): draw coefficients, set a counterfactual path
for \(x_t\), anchor the forecast with
the last observed outcomes, and iterate the dynamic equation forward. At
horizon 1, only the contemporaneous wage coefficient enters the shock.
From horizon 2 onward, lagged wages and lagged employment both transmit
the change.
bind_rows(manual_fd_tab_ab, manual_fd_tab_bb) |>
ggplot(aes(x = horizon, y = pe, colour = model, fill = model)) +
geom_hline(yintercept = 0, linetype = "dashed", colour = "grey50") +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.10,
colour = NA) +
geom_line(linewidth = 0.9) +
geom_point(size = 2) +
facet_wrap(~ model, nrow = 1, ncol = 2, scales = "free_y") +
scale_x_continuous(breaks = seq_len(H)) +
scale_colour_manual(values = gmm_colors) +
scale_fill_manual(values = gmm_colors) +
labs(x = "Forecast horizon", y = "Employment difference",
colour = NULL, fill = NULL) +
theme_minimal(base_size = 11) +
theme(legend.position = "none")
Manual recursive forecast of the employment difference from a sustained 10% wage increase.
The forecast above compared employment under one baseline path and one treatment path. A related question is the dynamic multiplier: how does a sustained 10% wage increase move employment toward its new equilibrium?
For Difference GMM, the response follows the AR(2) equation:
\[ \Delta y_h = \phi_1 \Delta y_{h-1} + \phi_2 \Delta y_{h-2} + \beta_{wage,0}\log(1.10) + \beta_{wage,1}\log(1.10), \]
after the first horizon. For System GMM, the response follows the AR(1) equation:
\[ \Delta y_h = \phi_1 \Delta y_{h-1} + \beta_{wage,0}\log(1.10) + \beta_{wage,1}\log(1.10). \]
The long-run multipliers therefore differ:
\[ \text{Diff-GMM: } \frac{(\beta_{wage,0}+\beta_{wage,1})\log(1.10)} {1-\phi_1-\phi_2} \qquad \text{System GMM: } \frac{(\beta_{wage,0}+\beta_{wage,1})\log(1.10)} {1-\phi_1}. \]
# Use the same parametric coefficient draws created in Section 2.9.
# Each row is one simulated set of coefficients from the estimated sampling
# distribution of the model.
phi1_sim_ab <- empl_draws_ab[, "lag(log(emp), 1:2)1"]
phi2_sim_ab <- empl_draws_ab[, "lag(log(emp), 1:2)2"]
b_wage0_sim_ab <- empl_draws_ab[, "lag(log(wage), 0:1)0"]
b_wage1_sim_ab <- empl_draws_ab[, "lag(log(wage), 0:1)1"]
phi1_sim_bb <- empl_draws_bb[, "lag(log(emp), 1)"]
b_wage0_sim_bb <- empl_draws_bb[, "lag(log(wage), 0:1)0"]
b_wage1_sim_bb <- empl_draws_bb[, "lag(log(wage), 0:1)1"]
# Trace the response for twelve forecast periods.
irf_H <- 12
# Each row is a simulated coefficient draw; each column is a forecast horizon.
irf_log_ab <- matrix(NA, nrow = empl_sims, ncol = irf_H)
irf_log_bb <- matrix(NA, nrow = empl_sims, ncol = irf_H)
for (h in seq_len(irf_H)) {
# Difference-GMM previous responses. Before the shock, responses are zero.
if (h == 1) {
irf_lag1_ab <- 0
irf_lag2_ab <- 0
} else if (h == 2) {
irf_lag1_ab <- irf_log_ab[, h - 1]
irf_lag2_ab <- 0
} else {
irf_lag1_ab <- irf_log_ab[, h - 1]
irf_lag2_ab <- irf_log_ab[, h - 2]
}
# System-GMM previous response.
if (h == 1) {
irf_lag1_bb <- 0
} else {
irf_lag1_bb <- irf_log_bb[, h - 1]
}
# Current wage enters immediately. Lagged wage enters from horizon 2 onward.
wage_effect_ab <- b_wage0_sim_ab * wage_shock
wage_effect_bb <- b_wage0_sim_bb * wage_shock
if (h >= 2) {
wage_effect_ab <- wage_effect_ab + b_wage1_sim_ab * wage_shock
wage_effect_bb <- wage_effect_bb + b_wage1_sim_bb * wage_shock
}
# Dynamic propagation for every simulated coefficient draw.
irf_log_ab[, h] <- phi1_sim_ab * irf_lag1_ab +
phi2_sim_ab * irf_lag2_ab +
wage_effect_ab
irf_log_bb[, h] <- phi1_sim_bb * irf_lag1_bb +
wage_effect_bb
}
# Convert the simulated log responses into percent changes in employment.
irf_pct_ab <- 100 * (exp(irf_log_ab) - 1)
irf_pct_bb <- 100 * (exp(irf_log_bb) - 1)
# Long-run multipliers, one for each simulated coefficient draw.
# This is a nonlinear quantity because the denominator includes the dynamic
# coefficient(s). If a few simulated draws put the denominator close to zero,
# the mean can be dominated by extreme values. The median is more stable and
# more readable for the dashed long-run reference line.
irf_long_run_pct_ab <- 100 * (exp(((b_wage0_sim_ab + b_wage1_sim_ab) * wage_shock) /
(1 - phi1_sim_ab - phi2_sim_ab)) - 1)
irf_long_run_pct_bb <- 100 * (exp(((b_wage0_sim_bb + b_wage1_sim_bb) * wage_shock) /
(1 - phi1_sim_bb)) - 1)
irf_tab <- bind_rows(
tibble(model = "Diff-GMM",
horizon = seq_len(irf_H),
percent_response = colMeans(irf_pct_ab),
lower = apply(irf_pct_ab, 2, quantile, probs = 0.025),
upper = apply(irf_pct_ab, 2, quantile, probs = 0.975),
long_run_percent = median(irf_long_run_pct_ab),
long_run_lower = quantile(irf_long_run_pct_ab, probs = 0.025),
long_run_upper = quantile(irf_long_run_pct_ab, probs = 0.975)),
tibble(model = "System GMM",
horizon = seq_len(irf_H),
percent_response = colMeans(irf_pct_bb),
lower = apply(irf_pct_bb, 2, quantile, probs = 0.025),
upper = apply(irf_pct_bb, 2, quantile, probs = 0.975),
long_run_percent = median(irf_long_run_pct_bb),
long_run_lower = quantile(irf_long_run_pct_bb, probs = 0.025),
long_run_upper = quantile(irf_long_run_pct_bb, probs = 0.975))
)
# Print a compact preview of the impulse-response table. The full irf_tab
# object is used in the plot below.
irf_tab |>
group_by(model) |>
slice_head(n = 5) |>
ungroup() |>
knitr::kable(digits = 3,
caption = "Preview of the dynamic multiplier from a sustained 10% wage increase: first five horizons from each estimator.")
| model | horizon | percent_response | lower | upper | long_run_percent | long_run_lower | long_run_upper |
|---|---|---|---|---|---|---|---|
| Diff-GMM | 1 | -4.782 | -7.387 | -2.211 | -4.668 | -7.433 | -1.340 |
| Diff-GMM | 2 | -4.928 | -7.265 | -2.555 | -4.668 | -7.433 | -1.340 |
| Diff-GMM | 3 | -4.796 | -7.258 | -2.155 | -4.668 | -7.433 | -1.340 |
| Diff-GMM | 4 | -4.699 | -7.298 | -1.845 | -4.668 | -7.433 | -1.340 |
| Diff-GMM | 5 | -4.654 | -7.364 | -1.584 | -4.668 | -7.433 | -1.340 |
| System GMM | 1 | -5.842 | -7.850 | -3.726 | -19.546 | -65.474 | -2.717 |
| System GMM | 2 | -6.805 | -8.977 | -4.761 | -19.546 | -65.474 | -2.717 |
| System GMM | 3 | -7.697 | -10.436 | -5.051 | -19.546 | -65.474 | -2.717 |
| System GMM | 4 | -8.525 | -11.887 | -5.161 | -19.546 | -65.474 | -2.717 |
| System GMM | 5 | -9.296 | -13.423 | -5.334 | -19.546 | -65.474 | -2.717 |
ggplot(irf_tab,
aes(x = horizon, y = percent_response, colour = model, fill = model)) +
geom_hline(aes(yintercept = long_run_percent, colour = model),
linetype = "dashed") +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.12,
colour = NA) +
geom_line(linewidth = 0.9) +
geom_point(size = 2) +
scale_x_continuous(breaks = seq(1, irf_H, by = 1)) +
scale_colour_manual(values = gmm_colors) +
scale_fill_manual(values = gmm_colors) +
labs(x = "Forecast horizon",
y = "Employment response (%)",
colour = NULL, fill = NULL,
caption = "Ribbons = 95% parametric-simulation intervals. Dashed lines = median long-run multipliers.") +
theme_minimal(base_size = 11) +
theme(legend.position = "bottom")
Dynamic multiplier from a sustained 10% wage increase.
This plot strips the forecast down to the response itself. Difference GMM and System GMM can imply different adjustment paths because they use different moment conditions and different dynamic equations. The long-run multiplier is especially sensitive because it divides by \(1-\phi_1-\phi_2\) or \(1-\phi_1\). If that denominator is close to zero for some simulated draws, the long-run distribution becomes very skewed. For that reason, the dashed line uses the median simulated long-run multiplier, while the table still reports the 95% simulation interval.
For a second applied example with a sharper substantive story — and an actual policy counterfactual to interpret — the lab turns to the cigarette consumption panel used in the old version of this lab and covered in lecture. The question is simple: by how much would a tax-induced price increase reduce cigarette consumption?
The data are US state-level cigarette sales, prices, taxes, and income from 1985–1995. 48 states × 11 years = 528 observations, a balanced panel — short enough that Nickell bias matters and long enough that Diff-GMM is well-identified. The variables needed here are:
| Variable | Description |
|---|---|
packpc |
packs of cigarettes per capita (the outcome) |
income |
state personal income (total) |
pop |
state population |
tax |
average federal + state + local excise tax (cents/pack) |
taxs |
excise taxes including sales taxes (cents/pack) |
avgprs |
average price including sales taxes (cents/pack) |
cpi |
consumer price index (for inflation adjustment) |
Dollar amounts need to be in a constant unit. The trick: every dollar variable gets multiplied by the 1995 CPI and divided by the CPI in the observed year — bringing the figure to “1995 dollars”. This is done inline (no helper function) so the operation is fully transparent.
# Load the panel. 48 states × 11 years (1985–1995) = 528 rows.
cig <- read_csv("data/cigarette.csv", show_col_types = FALSE)
# Quick panel check
length(unique(cig$state)) # number of states
## [1] 48
length(unique(cig$year)) # number of years
## [1] 11
# Inflation-adjust. The CPI is the same across states within a given year,
# so pull the 1995 value once and rescale.
cpi_base <- unique(cig$cpi[cig$year == 1995]) # 1995 CPI value (a scalar)
cig <- cig |>
mutate(
income95 = cpi_base * income / cpi, # state income, 1995 dollars
tax95 = cpi_base * tax / cpi, # avg excise tax, 1995 cents/pack
taxs95 = cpi_base * taxs / cpi, # excise + sales tax
avgprs95 = cpi_base * avgprs / cpi, # avg price (incl. sales tax)
income95pc = income95 / pop, # per-capita income (1995 dollars)
pretax95 = avgprs95 - taxs95 # price net of tax
)
# Quick look at the post-adjustment numbers
cig |>
summarise(across(c(packpc, income95pc, avgprs95, taxs95),
list(mean = mean, sd = sd))) |>
knitr::kable(digits = 2,
caption = "Means and standard deviations of the key variables after inflation adjustment.")
| packpc_mean | packpc_sd | income95pc_mean | income95pc_sd | avgprs95_mean | avgprs95_sd | taxs95_mean | taxs95_sd |
|---|---|---|---|---|---|---|---|
| 106.45 | 23.12 | 21.25 | 3.26 | 175.14 | 23.61 | 53.84 | 13.86 |
The cigarette panel is balanced, but the same pooled/between/within distinction still matters. A high pooled correlation may come from stable differences across states, while a within correlation asks whether changes inside the same state over time move with changes in consumption.
# Create the lagged outcome within state. The first year for each state has no
# lagged value and is therefore missing.
cig_wb_df <- cig |>
arrange(state, year) |>
group_by(state) |>
mutate(packpc_lag1 = dplyr::lag(packpc, 1)) |>
ungroup()
# These are the variables used in the model or policy scenario.
cig_wb_vars <- c("packpc_lag1", "income95pc", "avgprs95",
"tax95", "taxs95", "pretax95")
# Stack predictors into long format: one row is one state-year-variable.
cig_wb_long <- cig_wb_df |>
select(state, packpc, all_of(cig_wb_vars)) |>
pivot_longer(cols = all_of(cig_wb_vars),
names_to = "variable",
values_to = "value")
# Overall correlation: all state-years pooled together.
cig_overall <- cig_wb_long |>
group_by(variable) |>
summarise(overall = cor(value, packpc, use = "pairwise.complete.obs"),
.groups = "drop")
# Between correlation: collapse to state means, then correlate those means.
cig_between <- cig_wb_long |>
group_by(variable, state) |>
summarise(value_mean = mean(value, na.rm = TRUE),
packpc_mean = mean(packpc, na.rm = TRUE),
.groups = "drop") |>
group_by(variable) |>
summarise(between = cor(value_mean, packpc_mean, use = "pairwise.complete.obs"),
.groups = "drop")
# Within correlation: remove each state's mean from both variables, then
# correlate the state-year deviations.
cig_within <- cig_wb_long |>
group_by(variable, state) |>
mutate(value_dev = value - mean(value, na.rm = TRUE),
packpc_dev = packpc - mean(packpc, na.rm = TRUE)) |>
ungroup() |>
group_by(variable) |>
summarise(within = cor(value_dev, packpc_dev, use = "pairwise.complete.obs"),
.groups = "drop")
# Within-variance share: the fraction of each predictor's variance that comes
# from movement over time within states.
cig_within_share <- cig_wb_long |>
group_by(variable, state) |>
mutate(value_dev = value - mean(value, na.rm = TRUE)) |>
ungroup() |>
group_by(variable) |>
summarise(within_share = var(value_dev, na.rm = TRUE) / var(value, na.rm = TRUE),
.groups = "drop")
# Join the pieces into one table.
cig_wb_tab <- cig_overall |>
left_join(cig_between, by = "variable") |>
left_join(cig_within, by = "variable") |>
left_join(cig_within_share, by = "variable") |>
arrange(desc(abs(overall)))
cig_wb_tab |>
knitr::kable(digits = 3,
col.names = c("Variable", "Overall", "Between", "Within",
"Within var. share"),
caption = "Correlation with cigarette packs per capita, decomposed into overall, between-state, and within-state correlations.")
| Variable | Overall | Between | Within | Within var. share |
|---|---|---|---|---|
| packpc_lag1 | 0.979 | 0.999 | 0.902 | 0.207 |
| avgprs95 | -0.614 | -0.544 | -0.843 | 0.542 |
| taxs95 | -0.539 | -0.496 | -0.678 | 0.269 |
| tax95 | -0.497 | -0.472 | -0.581 | 0.248 |
| pretax95 | -0.467 | -0.477 | -0.706 | 0.847 |
| income95pc | -0.129 | -0.047 | -0.681 | 0.080 |
cig_wb_tab |>
mutate(variable = fct_reorder(variable, between)) |>
ggplot(aes(y = variable)) +
geom_segment(aes(x = between, xend = within, yend = variable),
colour = "grey70") +
geom_point(aes(x = between, colour = "Between"), size = 2.6) +
geom_point(aes(x = within, colour = "Within"), size = 2.6) +
geom_vline(xintercept = 0, linetype = "dashed", colour = "grey50") +
scale_colour_manual(values = c("Between" = "firebrick",
"Within" = "steelblue")) +
labs(x = "Correlation with packs per capita", y = NULL, colour = NULL) +
theme_minimal(base_size = 11) +
theme(legend.position = "bottom")
Between- and within-state correlations with cigarette packs per capita.
The lagged outcome captures persistence in cigarette consumption. The tax and price variables are also informative within states over time, which is the relevant variation for the counterfactual price increase. The between correlations should not be read as the causal effect of price: states with high prices may differ from low-price states in many stable ways that are not in the model.
The Diff-GMM spec — packs per capita as a dynamic function of per-capita income and the (tax-inclusive) average price, with lag-2 instruments stacked GMM-style up to whatever the panel supports:
The model is:
\[ \text{packpc}_{it} = \alpha_i + \phi\,\text{packpc}_{i,t-1} + \beta_1\,\text{income95pc}_{it} + \beta_2\,\text{avgprs95}_{it} + \varepsilon_{it}. \]
pgmm(..., transformation = "d") estimates the
differenced equation internally:
\[ \Delta \text{packpc}_{it} = \phi\,\Delta \text{packpc}_{i,t-1} + \beta_1\,\Delta \text{income95pc}_{it} + \beta_2\,\Delta \text{avgprs95}_{it} + \Delta\varepsilon_{it}. \]
The lagged difference \(\Delta \text{packpc}_{i,t-1}\) is endogenous because it shares \(\varepsilon_{i,t-1}\) with \(\Delta\varepsilon_{it}\). Difference GMM instruments it with deeper lags of the level outcome, starting at \(t-2\).
The other covariates are not automatically instrumented by this
specification. The model treats changes in income95pc and
avgprs95 as exogenous or predetermined enough to enter
directly. That assumption is part of the research design: if price
changes respond to unobserved shocks in cigarette consumption, then
price would need its own instrument block or a different identification
strategy.
# Convert to plm's pdata.frame: tells plm which columns identify the
# unit-time grid. Order matters: c("unit", "time").
pdat_cig <- pdata.frame(cig, index = c("state", "year"))
# Diff-GMM (Arellano-Bond) fit on cigarette consumption.
# Reading the formula:
# * Left of |: packpc is the outcome; lag(packpc, 1) is the lagged DV
# (pgmm will internally difference everything). income95pc
# and avgprs95 enter as contemporaneous regressors.
# * First |: lag(packpc, 2:99) is the GMM-style instrument block —
# every available lag from 2 onward, stacked per period.
# Arguments:
# * effect = "individual": only state FE. This keeps the example simple
# and preserves a direct price counterfactual. A two-way version
# is a useful robustness check, but it shifts identification toward
# within-year cross-state price differences and can produce weighting-
# matrix warnings in short panels.
# * model = "twosteps": Arellano-Bond two-step efficient weighting matrix.
# summary(..., robust = TRUE) below applies the Windmeijer correction.
# * transformation = "d": difference GMM (not system).
fit_cig <- pgmm(
packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
lag(packpc, 2:99),
data = pdat_cig,
effect = "individual",
model = "twosteps",
transformation = "d"
)
# Capture the summary once — the Part 2 diagnostic battery hangs off this object
sum_cig <- summary(fit_cig, robust = TRUE)
sum_cig
## Oneway (individual) effect Two-steps model Difference GMM
##
## Call:
## pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
## lag(packpc, 2:99), data = pdat_cig, effect = "individual",
## model = "twosteps", transformation = "d")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 432
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -25.4405 -2.4992 0.1460 0.1232 2.7499 25.6071
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(packpc, 1) 0.639465 0.055354 11.5523 < 2.2e-16 ***
## income95pc -0.479040 0.496258 -0.9653 0.3344
## avgprs95 -0.179869 0.028086 -6.4042 1.511e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(44) = 47.09887 (p-value = 0.34693)
## Autocorrelation test (1): normal = -3.443569 (p-value = 0.00057409)
## Autocorrelation test (2): normal = -0.5365189 (p-value = 0.5916)
## Wald test for coefficients: chisq(3) = 2446.946 (p-value = < 2.22e-16)
Read off (against the Part 2 checklist):
lag(packpc, 1) should be positive and below 1 (sticky
consumption from year to year).avgprs95 should be negative (higher price
→ fewer packs sold).income95pc is an empirical control, not the main object of
interest. Its sign can be sensitive to the short panel, the price
measure, and the dynamic specification.lag(packpc, 2:99) spec
can get close to the \(N\) rule
quickly. Read the Sargan test together with the instrument count, not by
itself.# Put the core diagnostic statistics in one small table.
# These are the same objects printed at the bottom of summary(fit_cig).
cig_n_inst <- length(fit_cig$W[[1]][1, ])
cig_n_units <- length(fit_cig$W)
tibble(
diagnostic = c("Sargan overidentification test",
"AR(1) on differenced residuals",
"AR(2) on differenced residuals",
"Instrument count / N"),
statistic = c(as.numeric(sum_cig$sargan$statistic),
as.numeric(sum_cig$m1$statistic),
as.numeric(sum_cig$m2$statistic),
cig_n_inst),
p_value = c(sum_cig$sargan$p.value,
sum_cig$m1$p.value,
sum_cig$m2$p.value,
NA_real_),
desired_pattern = c("fail to reject",
"reject",
"fail to reject",
"ratio below 1"),
decision_rule = c("p > 0.10",
"p < 0.05",
"p > 0.05",
paste0(cig_n_inst, " / ", cig_n_units,
" = ", round(cig_n_inst / cig_n_units, 2)))
) |>
knitr::kable(digits = 3,
caption = "Cigarette Diff-GMM diagnostics, read against the opening checklist.")
| diagnostic | statistic | p_value | desired_pattern | decision_rule |
|---|---|---|---|---|
| Sargan overidentification test | 47.099 | 0.347 | fail to reject | p > 0.10 |
| AR(1) on differenced residuals | -3.444 | 0.001 | reject | p < 0.05 |
| AR(2) on differenced residuals | -0.537 | 0.592 | fail to reject | p > 0.05 |
| Instrument count / N | 47.000 | NA | ratio below 1 | 47 / 48 = 0.98 |
If the AR(2) test rejected, the analysis would stop: lag-2 instruments would not be valid. If the Sargan test rejected, a more conservative instrument set would be needed. If the instrument count were much larger than \(N\), a high Sargan p-value would be suspicious rather than reassuring.
This is the policy question: what if the average state’s
tax-inclusive price increased by 60 cents per pack, starting in
1996? This can be interpreted as a tax-induced price scenario
only under a full pass-through assumption: a 60-cent tax increase
becomes a 60-cent increase in the price paid by consumers. Sixty cents
is roughly 3 standard deviations of the 1995 tax distribution — a
deliberately large shock to make the response visible. The forecast
computes the trajectory of packpc 3 years out (1996, 1997,
1998) and reports three quantities of interest:
There are three ingredients in any dynamic counterfactual:
fit_cig,
the Diff-GMM model from §3.2.Because the model is estimated in differences, a one-period change in
the covariate path means something specific. Setting
avgprs95 to increase by 60 in the first forecast period
means price jumps by 60 cents in 1996. Setting later changes to zero
means the new higher price level persists in 1997 and 1998; it does
not mean the price returns to its old level.
# Sample statistics that motivate the +60 cent scenario:
# the average tax in 1995 is about 60 cents and 1 sd is ~18 cents,
# so +60 is roughly a 3-sd intervention — large but informative.
summary(pdat_cig$taxs95[pdat_cig$year == 1995])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34.44 48.75 59.84 61.87 74.78 112.63
sd(pdat_cig$taxs95[pdat_cig$year == 1995])
## [1] 18.47741
# Forecast horizon
periods_out <- 3
# Number of parameter draws for the Monte-Carlo simulation
sims <- 1000
# Anchor the forecast at the last observed year (1995).
# initialY is the average level of packpc in 1995.
# lagY is the average change in packpc from 1994 to 1995.
# Because this is an "average-state" forecast, both are averaged
# across states.
last_year <- pdat_cig[pdat_cig$year == 1995, ]
prev_year <- pdat_cig[pdat_cig$year == 1994, ]
# Match states between the two slices (panel is balanced so this is trivial)
initialY <- mean(last_year$packpc, na.rm = TRUE)
lagY <- mean(last_year$packpc - prev_year$packpc, na.rm = TRUE)
tibble(
constant = c("periods_out", "initialY (avg packpc, 1995)", "lagY (avg ΔY, 1995)"),
value = c(periods_out, initialY, lagY)
) |>
knitr::kable(digits = 2,
caption = "Anchor constants for the cigarette counterfactual.")
| constant | value |
|---|---|
| periods_out | 3.00 |
| initialY (avg packpc, 1995) | 96.33 |
| lagY (avg ΔY, 1995) | 1.30 |
Parameter draws. Model uncertainty is simulated
through draws from the coefficient distribution. Each row of
simparam is one possible parameter vector drawn from
\[ \tilde\theta^{(s)} \sim \mathcal{N}(\hat\theta,\widehat{V}_{Windmeijer}). \]
The lagged-outcome coefficient is \(\phi\). The remaining columns are the
coefficients on income and price. They are separated because
simcf asks for the LDV coefficient through
phi = and the non-LDV coefficients through
b =.
set.seed(20260520)
# Joint MVN draws from the fit. vcovHC() gives the Windmeijer-corrected
# variance for two-step GMM (this matches what summary(..., robust = TRUE)
# uses for SEs).
simparam <- mvrnorm(n = sims,
mu = coefficients(fit_cig),
Sigma = vcovHC(fit_cig))
# Split by coefficient names so the code stays correct if the formula changes.
# phi is the LDV coefficient; betas are the substantive covariates.
simphi <- simparam[, "lag(packpc, 1)"] # length-1000 vector
simbeta <- simparam[, c("income95pc", "avgprs95"),
drop = FALSE] # 1000 x 2 matrix
# Sanity check
dim(simbeta)
## [1] 1000 2
head(simphi)
## [1] 0.5722195 0.6450104 0.5157122 0.5672364 0.5571531 0.6867599
Build the counterfactual matrices. Two
cfMake objects:
xhyp represents the treatment trajectory (60-cent price
increase happens in 1996, persists).xbase represents the baseline (nothing changes).cfMake returns an object whose $x field
holds the change in each covariate at each forecast
period. Because the model was estimated in first differences, this is
exactly the needed input: a row for the 1996 change, a row for the 1997
change, and a row for the 1998 change.
Every covariate change is set to zero first. Then
avgprs95 changes by 60 in the first forecast period. The
lagged dependent variable propagates that initial shock forward through
the dynamic model.
# Covariate-only formula for cfMake (LDV is fed in separately via phi)
formula_cf <- packpc ~ income95pc + avgprs95 - 1
# Treatment: +60 cent change in avgprs95 at period 1, no other changes
xhyp <- cfMake(formula = formula_cf, data = pdat_cig, nscen = periods_out)
xhyp$x <- 0 * xhyp$x # zero out every period's change
xhyp$xpre <- 0 * xhyp$xpre # zero out the "previous" matrix too
xhyp <- cfChange(xhyp, covname = "avgprs95", x = 60, scen = 1)
# Baseline: zero change in everything
xbase <- xhyp
xbase$x <- xbase$xpre
# Inspect: the 60-cent shock appears in row 1 of xhyp$x, zeros elsewhere
xhyp$x
## packpc income95pc avgprs95
## 1 0 0 60
## 2 0 0 0
## 3 0 0 0
xbase$x
## packpc income95pc avgprs95
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
simcf can report three different quantities. The inputs
are the same each time:
x is the counterfactual covariate path.b = simbeta passes the simulated non-LDV
coefficients.phi = simphi passes the simulated lagged-outcome
coefficient.lagY = lagY gives the most recent observed change in
the outcome.transform = "diff" tells simcf that the
model was estimated in differences.initialY = initialY lets simcf convert the
differenced forecast back to the level scale.The outputs differ because EV, FD, and RR answer different questions.
# EV under treatment:
# What is expected packpc if the 60-cent price increase happens?
# Because transform = "diff" and initialY is provided, the result is returned
# on the original level scale of packs per capita.
sev_treat <- ldvsimev(
xhyp,
b = simbeta,
ci = 0.95,
constant = NA, # NA = no intercept term (FE absorbed)
phi = simphi,
lagY = lagY,
transform = "diff",
initialY = initialY
)
# EV under baseline:
# What is expected packpc if there is no price increase?
# Compute this separately so the dashed baseline path can be drawn in the EV
# panel and so the FD comparison is explicit.
sev_base <- ldvsimev(
xbase,
b = simbeta,
ci = 0.95,
constant = NA,
phi = simphi,
lagY = lagY,
transform = "diff",
initialY = initialY
)
# FD:
# What is the absolute effect of the treatment path?
# ldvsimfd() compares xhyp$x to xhyp$xpre. Since xhyp$xpre is all zeros,
# this is "60-cent shock" minus "no shock" at each forecast horizon.
sfd <- ldvsimfd(
xhyp,
b = simbeta,
ci = 0.95,
constant = NA,
phi = simphi,
lagY = lagY,
transform = "diff"
)
# RR:
# What is the treatment path divided by the baseline path?
# This is a ratio. Below it is converted to percent change as 100 * (RR - 1).
srr <- ldvsimrr(
xhyp,
b = simbeta,
ci = 0.95,
constant = NA,
phi = simphi,
lagY = lagY,
transform = "diff",
initialY = initialY
)
# Stack the three quantities into one tidy frame for plotting.
# Each simcf object returns:
# pe = point estimate
# lower = lower confidence interval
# upper = upper confidence interval
plot_data <- bind_rows(
tibble(quantity = "EV (packs/capita)",
h = seq_len(periods_out),
pe = sev_treat$pe,
lo = sev_treat$lower[, 1],
hi = sev_treat$upper[, 1],
baseline_pe = sev_base$pe),
tibble(quantity = "FD (treat − baseline)",
h = seq_len(periods_out),
pe = sfd$pe,
lo = sfd$lower[, 1],
hi = sfd$upper[, 1],
baseline_pe = NA_real_),
tibble(quantity = "RR (% change)",
h = seq_len(periods_out),
pe = 100 * (srr$pe - 1),
lo = 100 * (srr$lower[, 1] - 1),
hi = 100 * (srr$upper[, 1] - 1),
baseline_pe = NA_real_)
) |>
mutate(quantity = factor(quantity,
levels = c("EV (packs/capita)",
"FD (treat − baseline)",
"RR (% change)")))
ggplot(plot_data, aes(h, pe)) +
geom_ribbon(aes(ymin = lo, ymax = hi), alpha = 0.25, fill = "steelblue") +
geom_line(linewidth = 1, colour = "steelblue") +
geom_line(aes(y = baseline_pe), linetype = "dashed", colour = "grey40", linewidth = 0.8) +
geom_hline(yintercept = 0, linetype = "dotted", colour = "grey60") +
facet_wrap(~ quantity, scales = "free_y") +
scale_x_continuous(breaks = seq_len(periods_out),
labels = paste0("199", 5 + seq_len(periods_out))) +
labs(x = NULL, y = NULL) +
theme_minimal(base_size = 11) +
theme(strip.text = element_text(size = 11, face = "bold"),
panel.spacing.x = unit(1, "lines"))
Counterfactual forecast of a 60-cent price increase on cigarette consumption. EV: predicted packs/capita under the increase (solid) and under baseline (dashed). FD: absolute difference (treat - baseline). RR: percent change. Horizon = 1996-1998. Bands are 95% intervals from 1000 parametric draws.
Read the three panels.
Caveats to flag in class.
initialY and
lagY are averaged across states. A state-specific forecast
would let each state start at its own 1995 level; this is the natural
extension and is straightforward to code.This appendix is optional self-study. It intentionally uses matrix algebra and a few explicit loops because the goal is to show how GMM is constructed under the hood. These are not required coding patterns for the applied sections above; they are included to make the mechanics visible one step at a time.
Take the simplest dynamic panel:
\[ y_{it} = \alpha_i + \phi\, y_{i,t-1} + \beta\, x_{it} + \varepsilon_{it}, \qquad |\phi| < 1, \quad \varepsilon_{it} \sim \text{iid}(0, \sigma^2) \]
The within transformation subtracts unit means: \(\tilde y_{it} = y_{it} - \bar y_i\). This sweeps out \(\alpha_i\) — the whole reason FE is used. But it introduces a new problem. The within-transformed lag \(\tilde y_{i,t-1}\) contains \(\bar y_{i,-1}\), which depends on \(\varepsilon_{i,1}, \ldots, \varepsilon_{i,T-1}\). The within-transformed error \(\tilde\varepsilon_{it}\) also contains \(\bar\varepsilon_i\), which depends on \(\varepsilon_{i,1}, \ldots, \varepsilon_{i,T}\). They share terms.
Therefore \(\tilde y_{i,t-1}\) and \(\tilde\varepsilon_{it}\) are correlated, OLS on the within-transformed data is no longer consistent, and Nickell (1981) showed that as \(N \to \infty\) with \(T\) fixed,
\[ \boxed{\;\text{plim}(\hat\phi_{\text{FE}} - \phi) \approx -\,\frac{1+\phi}{T-1}\;} \]
Three consequences worth flagging:
This self-study appendix builds that “change estimator” path by hand, on a small simulated panel. The sequence is: (i) generate data with known parameters, (ii) reproduce the Nickell bias on an FE-LDV fit, (iii) difference to kill \(\alpha_i\) and watch the residual endogeneity appear, (iv) construct the Anderson-Hsiao instrumental-variable estimator from raw matrix algebra, and (v) extend to an over-identified one-step Arellano-Bond GMM. The goal is to demystify GMM: it is just a weighted moment-condition system with a closed-form solution.
A small panel is simulated from the DGP
\[ y_{it} = \alpha_i + \phi\, y_{i,t-1} + \beta\, x_{it} + \varepsilon_{it} \]
with \(\phi = 0.7\), \(\beta = 0.5\), \(\alpha_i \sim N(0, 1)\), \(\varepsilon_{it} \sim N(0, 1)\), \(x_{it} \sim N(0, 1)\). The design uses \(N = 50\) and \(T = 8\) — small enough for the matrix algebra to stay visible, large enough that estimates are well-defined. A burn-in period lets the series escape its arbitrary initial value before observations are collected.
The true \(\varepsilon_{it}\) is kept alongside the observed data. This is not available in real applications, but it makes the endogeneity in §A.4 visible.
set.seed(20260515)
# DGP parameters
phi_true <- 0.7 # true persistence
beta_true <- 0.5 # true effect of x on y
N <- 50 # number of units
T_obs <- 8 # observed periods per unit
burn <- 50 # burn-in length
# Unit fixed effects, drawn once
alpha <- rnorm(N, mean = 0, sd = 1)
# Generate one unit at a time and stack
toy_list <- vector("list", N)
for (i in seq_len(N)) {
Tlong <- T_obs + burn
x_i <- rnorm(Tlong, 0, 1)
eps_i <- rnorm(Tlong, 0, 1)
y_i <- numeric(Tlong)
# initial value from the stationary distribution of an AR(1) with intercept alpha_i
y_i[1] <- alpha[i] / (1 - phi_true) +
rnorm(1, sd = 1 / sqrt(1 - phi_true^2))
for (tt in 2:Tlong) {
y_i[tt] <- alpha[i] + phi_true * y_i[tt - 1] +
beta_true * x_i[tt] + eps_i[tt]
}
# Pre-compute the slices BEFORE building the tibble — tibble evaluates
# columns sequentially, so y_lag = y_i[...] inside tibble() would shadow the
# outer 'y_i' with the just-created 'y' column. Pre-computing avoids that.
#
# Lags are constructed from the OBSERVED window only — NA where the lag
# would reach into pre-window data. This mirrors what pgmm sees on a real
# panel, so the manual fits in §A.5–§A.6 use the same sample pgmm does.
y_obs <- y_i [(burn + 1):Tlong]
y_lag1 <- c(NA, y_obs[1:(T_obs - 1)])
y_lag2 <- c(NA, NA, y_obs[1:(T_obs - 2)])
y_lag3 <- c(NA, NA, NA, y_obs[1:(T_obs - 3)])
y_lag4 <- c(NA, NA, NA, NA, y_obs[1:(T_obs - 4)])
x_obs <- x_i [(burn + 1):Tlong]
eps_obs <- eps_i[(burn + 1):Tlong]
toy_list[[i]] <- tibble(
unit = i,
time = seq_len(T_obs),
y = y_obs,
y_lag1 = y_lag1,
y_lag2 = y_lag2,
y_lag3 = y_lag3,
y_lag4 = y_lag4,
x = x_obs,
eps = eps_obs
)
}
toy <- bind_rows(toy_list)
glimpse(toy)
## Rows: 400
## Columns: 9
## $ unit <int> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, …
## $ time <int> 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, …
## $ y <dbl> -4.5417604, -6.1090964, -7.9833272, -6.8805058, -6.9750735, -6.…
## $ y_lag1 <dbl> NA, -4.5417604, -6.1090964, -7.9833272, -6.8805058, -6.9750735,…
## $ y_lag2 <dbl> NA, NA, -4.5417604, -6.1090964, -7.9833272, -6.8805058, -6.9750…
## $ y_lag3 <dbl> NA, NA, NA, -4.5417604, -6.1090964, -7.9833272, -6.8805058, -6.…
## $ y_lag4 <dbl> NA, NA, NA, NA, -4.5417604, -6.1090964, -7.9833272, -6.8805058,…
## $ x <dbl> 1.38427471, 1.02050701, -0.61394292, -0.03089128, 0.77359679, 0…
## $ eps <dbl> -0.255258495, -1.821209281, -1.781079872, 0.342177224, -0.92660…
The columns used downstream:
y — outcome at time \(t\).y_lag1, y_lag2, y_lag3,
y_lag4 — lagged outcomes at \(t-1, t-2, t-3, t-4\). NA for the first
\(k\) rows of each unit since those
lags would reach pre-window. This is exactly what pgmm()
sees internally, so downstream drop_na() steps will yield
the same effective sample as pgmm().x — covariate at time \(t\).eps — the true error \(\varepsilon_{it}\) (visible to us, not to a
real researcher).Within-demean every variable inside its unit, then OLS the demeaned
outcome on the demeaned lag and demeaned \(x\). No intercept (it’s absorbed by the
demeaning). This is exactly what plm(model = "within") does
internally.
# Listwise-delete the t = 1 rows where y_lag1 is unobservable (pre-window),
# then within-demean over the remaining 7 rows per unit. This mirrors what
# plm(model = "within") does internally.
toy_dm <- toy |>
drop_na(y, y_lag1, x) |>
group_by(unit) |>
mutate(y_dm = y - mean(y),
y_lag_dm = y_lag1 - mean(y_lag1),
x_dm = x - mean(x)) |>
ungroup()
# OLS on the within-demeaned variables, no intercept
fit_fe_manual <- lm(y_dm ~ y_lag_dm + x_dm - 1, data = toy_dm)
phi_fe <- unname(coef(fit_fe_manual)["y_lag_dm"])
beta_fe <- unname(coef(fit_fe_manual)["x_dm"])
tibble(parameter = c("phi", "beta"),
true = c(phi_true, beta_true),
FE_hat = c(phi_fe, beta_fe),
bias = FE_hat - true) |>
knitr::kable(digits = 3,
caption = "FE-LDV on the toy panel. True phi = 0.7; FE estimate is biased downward by Nickell. True beta = 0.5; the bias is mild in this DGP but you can see it travelling.")
| parameter | true | FE_hat | bias |
|---|---|---|---|
| phi | 0.7 | 0.511 | -0.189 |
| beta | 0.5 | 0.427 | -0.073 |
\(\hat\phi_{FE}\) comes in below 0.7 — the Nickell bias is visible. The next steps build an estimator that closes that gap.
One-realization caveat. Nickell’s formula predicts the expected bias under repeated sampling, \(\text{plim}(\hat\phi_{FE} - \phi) \approx -(1+\phi)/(T-1) \approx -0.243\) here. The number in the
biascolumn above is just one realization around that expectation, scattered by Monte-Carlo noise. Re-run with a different seed and the magnitude moves; the sign and rough scale stay. Read the table as evidence the bias is real, not as a precise estimate of its size.
Take first differences within each unit. The \(\alpha_i\) vanishes (\(\Delta \alpha_i = 0\)), but the differenced lag \(\Delta y_{i,t-1}\) correlates with the differenced error \(\Delta \varepsilon_{it}\) — both contain \(\varepsilon_{i,t-1}\). Let’s see this correlation directly.
# Compute differences within each unit. drop_na() removes rows whose lags
# would reach into the pre-window — t = 1 (where y_lag1 is NA) and t = 2
# (where y_lag2 is NA). The remaining sample is t = 3..8 per unit, exactly
# the rows pgmm uses internally for transformation = "d" with lag(y, 2)
# as the instrument.
#
# Force a final unit-then-time row ordering with arrange(unit, time) so
# downstream matrix construction (Appendix A.7) lines up rows correctly.
toy_d <- toy |>
arrange(unit, time) |>
group_by(unit) |>
mutate(dy = y - y_lag1,
dy_lag = y_lag1 - y_lag2, # = Δy_{t-1}
dx = x - dplyr::lag(x),
deps = eps - dplyr::lag(eps)) |>
ungroup() |>
drop_na(dy, dy_lag, dx, deps) |>
arrange(unit, time)
# The endogeneity in the differenced model:
# Δy_{t-1} and Δε_{t} should be correlated (both contain ε_{t-1});
# Δx and Δε should not be correlated because x is exogenous in the DGP.
tibble(
pair = c("Δy_lag vs Δε (endogeneity in the differenced equation)",
"Δx vs Δε (sanity check — x is exogenous in the DGP)"),
correlation = c(cor(toy_d$dy_lag, toy_d$deps),
cor(toy_d$dx, toy_d$deps))
) |>
knitr::kable(digits = 3,
caption = "Correlations among differenced variables and the differenced error in the simulated data.")
| pair | correlation |
|---|---|
| Δy_lag vs Δε (endogeneity in the differenced equation) | -0.575 |
| Δx vs Δε (sanity check — x is exogenous in the DGP) | -0.043 |
The first row is sizeable and clearly non-zero — that’s exactly the endogeneity problem an instrumental variable will fix. The second is near zero, confirming that \(\Delta x\) is exogenous to \(\Delta \varepsilon\) in this DGP and does not itself need an instrument.
Why \(y_{i,t-2}\) is the natural instrument. It depends on \(\varepsilon_{i,1}, \ldots, \varepsilon_{i,t-2}\) but neither \(\varepsilon_{i,t-1}\) nor \(\varepsilon_{i,t}\). So \(y_{i,t-2}\) is correlated with \(\Delta y_{i,t-1} = y_{i,t-1} - y_{i,t-2}\) (the relevance condition) but uncorrelated with \(\Delta \varepsilon_{it}\) under the assumption that \(\varepsilon\) is iid (the exclusion condition). In the simulated data, both can be checked directly:
tibble(
condition = c("Relevance: cor(y_lag2, Δy_lag) should be far from 0",
"Exclusion: cor(y_lag2, Δε) should be near 0"),
correlation = c(cor(toy_d$y_lag2, toy_d$dy_lag),
cor(toy_d$y_lag2, toy_d$deps))
) |>
knitr::kable(digits = 3,
caption = "Relevance and exclusion checks for the lag-2 instrument.")
| condition | correlation |
|---|---|
| Relevance: cor(y_lag2, Δy_lag) should be far from 0 | -0.134 |
| Exclusion: cor(y_lag2, Δε) should be near 0 | -0.009 |
Both conditions hold in the simulated data, by construction.
There is one endogenous regressor (\(\Delta y_{i,t-1}\)) and one valid instrument (\(y_{i,t-2}\)). \(\Delta x\) instruments itself. The system is just-identified.
The closed-form IV (= 2SLS = GMM with any weighting matrix) estimator is
\[ \hat \theta_{IV} \;=\; (Z' X)^{-1} Z' \Delta y \]
where \(\Delta y\) is the stacked outcome vector, \(X\) is the regressor matrix with columns \([\Delta y_{i,t-1}, \Delta x_{it}]\), and \(Z\) is the instrument matrix with columns \([y_{i,t-2}, \Delta x_{it}]\). Let’s build each piece explicitly.
The data have 300 rows (50 units × 6 periods after dropping \(t=1, 2\) where the lags reach pre-window), 2 columns in \(X\), and 2 columns in \(Z\) — a 2×2 system to solve.
# Stack the data as plain vectors / matrices. drop_na in the previous chunk
# already kept only rows where every needed lag is observed (t >= 2 here).
dy_vec <- toy_d$dy # outcome vector
X_mat <- cbind(dy_lag = toy_d$dy_lag, # endogenous regressor
dx = toy_d$dx) # exogenous regressor
Z_mat <- cbind(y_lag2 = toy_d$y_lag2, # instrument for dy_lag
dx = toy_d$dx) # dx instruments itself
# Build Z'X (a 2x2 matrix) and Z'dy (a 2-vector)
ZpX <- crossprod(Z_mat, X_mat)
Zpdy <- crossprod(Z_mat, dy_vec)
# Closed-form IV estimator
theta_AH <- solve(ZpX, Zpdy)
rownames(theta_AH) <- c("phi (dy_lag)", "beta (dx)")
The two small matrices just built:
knitr::kable(ZpX, digits = 2, caption = "Z'X")
| dy_lag | dx | |
|---|---|---|
| y_lag2 | -214.58 | 26.73 |
| dx | -151.49 | 660.48 |
knitr::kable(Zpdy, digits = 2, caption = "Z'Δy")
| y_lag2 | -156.21 |
| dx | 197.32 |
And the resulting IV estimates:
knitr::kable(theta_AH, digits = 3,
caption = "Anderson-Hsiao IV estimates from the closed-form solve.")
| phi (dy_lag) | 0.788 |
| beta (dx) | 0.479 |
Read off \(\hat\phi_{AH}\) — it should be close to the true value 0.7. The Nickell bias has been corrected by replacing the endogenous \(\Delta y_{i,t-1}\) with its instrumented counterpart.
Comparison: FE-LDV vs Anderson-Hsiao on the same panel.
tibble(parameter = c("phi", "beta"),
true = c(phi_true, beta_true),
FE = c(phi_fe, beta_fe),
AH = c(theta_AH[1, 1], theta_AH[2, 1])) |>
knitr::kable(digits = 3,
caption = "FE-LDV is biased; Anderson-Hsiao IV recovers the truth (up to sampling error).")
| parameter | true | FE | AH |
|---|---|---|---|
| phi | 0.7 | 0.511 | 0.788 |
| beta | 0.5 | 0.427 | 0.479 |
What if two lags are used as instruments — say \(y_{i,t-2}\) and \(y_{i,t-3}\)? Each is exogenous under the iid-\(\varepsilon\) assumption, and each is correlated with \(\Delta y_{i,t-1}\). Adding the second moment makes the system over-identified (more moments than parameters), and a weighting matrix is needed to combine them.
The (one-step) GMM estimator with weighting matrix \(W\) is
\[ \hat\theta_{GMM} \;=\; \big( X' Z\, W\, Z' X \big)^{-1} \, X' Z\, W\, Z' \Delta y \]
For one-step GMM, the standard choice is \(W = (Z'Z)^{-1}\). (Two-step would use the residual-based optimal \(W\) — see §2.6. This appendix stops at one-step to keep the algebra clean.)
To use lag-3 as an instrument, \(y_{i,t-3}\) must be observable, which means restricting the equation to \(t \ge 3\) — one period of usable rows is lost.
Heads-up on the verification step. The literal AH estimator in §A.5 uses one stacked lag-2 instrument column for the lagged outcome.
pgmm(..., | lag(y, 2))uses the same basic lag-2 logic, but it period-stacks the instrument into separate moments. The estimates should be close, but they need not match exactly in finite samples. The over-identified GMM below (\(y_{i,t-2}\) and \(y_{i,t-3}\) both as instruments) can also differ slightly frompgmm(..., | lag(y, 2:3))for the same reason: same sample, different moment structure.
The instrument matrix now has 3 columns (two lags of \(y\) plus the exogenous \(\Delta x\)) against only 2 parameters in \(X\) — the system is over-identified by one moment. Because \(y_{i,t-3}\) must come from the observed window, the \(t=3\) row is also dropped, leaving 250 rows (5 periods \(\times\) 50 units).
# Adding y_lag3 as an instrument means y_lag3 must be observed, which
# requires t >= 4 (the t = 3 row in toy_d would have y_lag3 reaching pre-
# window data). Drop those rows so every instrument in Z is genuinely
# available from the observed sample — same logic as listwise deletion.
toy_g <- toy_d |> drop_na(y_lag3)
dy_vec <- toy_g$dy
X_mat <- cbind(dy_lag = toy_g$dy_lag,
dx = toy_g$dx)
# Wider instrument matrix: two lags of y plus the exogenous dx
Z_mat <- cbind(y_lag2 = toy_g$y_lag2,
y_lag3 = toy_g$y_lag3,
dx = toy_g$dx)
# One-step weighting matrix W = (Z'Z)^{-1}
ZpZ <- crossprod(Z_mat) # 3x3
W_one <- solve(ZpZ)
# Z'X and Z'dy
ZpX <- crossprod(Z_mat, X_mat) # 3x2
Zpdy <- crossprod(Z_mat, dy_vec) # 3x1
# GMM normal equation: θ = (X'Z W Z'X)^{-1} X'Z W Z'Δy
XpZ_W <- t(ZpX) %*% W_one
A_mat <- XpZ_W %*% ZpX # 2x2 — the bread
b_vec <- XpZ_W %*% Zpdy # 2x1 — the meat
theta_GMM <- solve(A_mat, b_vec)
rownames(theta_GMM) <- c("phi (dy_lag)", "beta (dx)")
The one-step weighting matrix:
knitr::kable(W_one, digits = 4,
caption = "W = (Z'Z)^{-1}, the one-step GMM weighting matrix (3 × 3).")
| y_lag2 | y_lag3 | dx | |
|---|---|---|---|
| y_lag2 | 0.0027 | -0.0026 | 0.0000 |
| y_lag3 | -0.0026 | 0.0027 | 0.0000 |
| dx | 0.0000 | 0.0000 | 0.0018 |
The GMM estimates:
knitr::kable(theta_GMM, digits = 3,
caption = "One-step Arellano-Bond GMM estimates from the closed-form solve.")
| phi (dy_lag) | 0.563 |
| beta (dx) | 0.444 |
A side-by-side of the three estimators on the same data:
tibble(parameter = c("phi", "beta"),
true = c(phi_true, beta_true),
FE = c(phi_fe, beta_fe),
AH = c(theta_AH[1, 1], theta_AH[2, 1]),
GMM = c(theta_GMM[1, 1], theta_GMM[2, 1])) |>
knitr::kable(digits = 3,
caption = "FE biased, literal AH unbiased (just-identified), GMM unbiased and more efficient (over-identified). With only one extra moment the GMM gain is small, but the principle generalizes to the larger moment sets used in pgmm.")
| parameter | true | FE | AH | GMM |
|---|---|---|---|---|
| phi | 0.7 | 0.511 | 0.788 | 0.563 |
| beta | 0.5 | 0.427 | 0.479 | 0.444 |
The GMM estimate sits very close to the AH estimate — that is the expected behavior with only one extra moment. The conceptual payoff is the machinery: construct the moment conditions \(Z' (\Delta y - X \theta) = 0\), pick a weighting matrix, and solve the over-identified system in one closed-form line. The estimators in the main lab are this same template scaled up.
pgmm()This final verification is the most technical part of the appendix.
It is mainly for readers who want to understand why a literal hand-built
Anderson-Hsiao estimator does not exactly match the object returned by
pgmm(). The difference is not a coding error; it comes from
the way pgmm() period-stacks the instrument moments.
Run the AH-style pgmm fit on the same sample and put the
coefficients side by side with the manual result:
pdat_toy <- pdata.frame(toy, index = c("unit", "time"))
ah_pgmm <- pgmm(y ~ lag(y, 1) + x | lag(y, 2),
data = pdat_toy,
effect = "individual",
model = "onestep",
transformation = "d")
tibble(parameter = c("phi", "beta"),
manual_literal_AH = c(theta_AH[1, 1], theta_AH[2, 1]),
pgmm_AH = c(unname(coef(ah_pgmm)["lag(y, 1)"]),
unname(coef(ah_pgmm)["x"]))) |>
knitr::kable(digits = 6,
caption = "Literal Anderson-Hsiao (one stacked moment for y_{t-2}) vs pgmm with the same instrument specification.")
| parameter | manual_literal_AH | pgmm_AH |
|---|---|---|
| phi | 0.787692 | 0.675650 |
| beta | 0.479428 | 0.456828 |
The numbers are close but do not match exactly. The
reason is not the sample — both fits use the same 300 rows. It is the
moment structure. Look at the instrument matrix
pgmm builds for the first unit:
# pgmm's per-unit instrument block: rows are periods (t = 3..8),
# columns are: 6 period-specific copies of y_{t-2}, plus 1 column for x.
ah_pgmm$W[[1]] |>
round(3) |>
as.data.frame() |>
knitr::kable(caption = "pgmm's instrument matrix for the first unit (6 × 7). The block-diagonal structure of the first six columns reveals that pgmm splits the lag-2 instrument into one moment **per period**, not a single stacked moment.")
| V1 | V2 | V3 | V4 | V5 | V6 | x | |
|---|---|---|---|---|---|---|---|
| 3 | -4.542 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | -1.634 |
| 4 | 0.000 | -6.109 | 0.000 | 0.000 | 0.000 | 0.000 | 0.583 |
| 5 | 0.000 | 0.000 | -7.983 | 0.000 | 0.000 | 0.000 | 0.804 |
| 6 | 0.000 | 0.000 | 0.000 | -6.881 | 0.000 | 0.000 | 0.001 |
| 7 | 0.000 | 0.000 | 0.000 | 0.000 | -6.975 | 0.000 | -0.859 |
| 8 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | -6.133 | -0.364 |
What literal AH did with one y-instrument moment,
pgmm does with six — one per period \(t = 3, \ldots, 8\). Each column of the
block-diagonal section is “\(y_{i,t-2}\) at period \(t\), zero elsewhere”. So the AH-style
pgmm spec is over-identified internally, with \(T - 2 = 6\) moments for y plus 1 moment for
x. Different moment system, slightly different estimates in finite
samples.
To replicate pgmm exactly by hand, two
things are needed:
# Rebuild X and y from toy_d (the AH-sample, 300 rows) — §A.6 overwrote
# X_mat and dy_vec with the smaller §A.6 GMM sample (toy_g, 250 rows),
# so those objects cannot be reused here.
dy_vec_AH <- toy_d$dy
X_mat_AH <- cbind(dy_lag = toy_d$dy_lag, dx = toy_d$dx)
unit_ids <- sort(unique(toy_d$unit))
n_per <- length(unique(toy_d$time)) # 6 differenced periods (t = 3..8)
period_vals <- sort(unique(toy_d$time))
# Build per-unit Z blocks.
# Rows are periods kept for unit i.
# Columns are the 6 period-specific y instruments plus 1 column for dx.
Z_blocks <- vector("list", length(unit_ids))
for (j in seq_along(unit_ids)) {
# Work one unit at a time.
d_i <- toy_d |>
filter(unit == unit_ids[j]) |>
arrange(time)
# Start with zeros, then place y_{t-2} in the column for its period.
Z_i <- matrix(0, nrow = nrow(d_i), ncol = n_per)
for (k in seq_len(nrow(d_i))) {
p_idx <- match(d_i$time[k], period_vals)
Z_i[k, p_idx] <- d_i$y_lag2[k]
}
# Add dx as the final ordinary instrument column.
Z_blocks[[j]] <- cbind(Z_i, dx = d_i$dx)
}
# Arellano-Bond H matrix: tridiagonal with 2 on the diagonal, -1 off
H_AB <- matrix(0, n_per, n_per)
diag(H_AB) <- 2
for (k in seq_len(n_per - 1)) {
H_AB[k, k + 1] <- -1
H_AB[k + 1, k] <- -1
}
# One-step W = (sum_i Z_i' H Z_i)^{-1}.
# Build the sum one unit at a time so the matrix formula is visible.
ZHZ <- matrix(0, nrow = ncol(Z_blocks[[1]]), ncol = ncol(Z_blocks[[1]]))
for (j in seq_along(Z_blocks)) {
Z_i <- Z_blocks[[j]]
ZHZ <- ZHZ + t(Z_i) %*% H_AB %*% Z_i
}
W_AB <- solve(ZHZ)
# Stack Z across units.
Z_all <- Z_blocks[[1]]
for (j in 2:length(Z_blocks)) {
Z_all <- rbind(Z_all, Z_blocks[[j]])
}
# Solve the GMM normal equation.
ZtX <- crossprod(Z_all, X_mat_AH)
Zty <- crossprod(Z_all, dy_vec_AH)
theta_AH_pgmm_style <- solve(t(ZtX) %*% W_AB %*% ZtX,
t(ZtX) %*% W_AB %*% Zty)
rownames(theta_AH_pgmm_style) <- c("phi (dy_lag)", "beta (dx)")
Side by side with pgmm:
tibble(parameter = c("phi", "beta"),
manual_pgmm_style_AH = c(theta_AH_pgmm_style[1, 1],
theta_AH_pgmm_style[2, 1]),
pgmm_AH = c(unname(coef(ah_pgmm)["lag(y, 1)"]),
unname(coef(ah_pgmm)["x"]))) |>
knitr::kable(digits = 10,
caption = "Manual GMM-style AH (period-stacked Z + Arellano-Bond H weighting) versus pgmm. The two now agree to many decimal places.")
| parameter | manual_pgmm_style_AH | pgmm_AH |
|---|---|---|
| phi | 0.6756498 | 0.6756498 |
| beta | 0.4568281 | 0.4568281 |
Exact match. Three pedagogical takeaways:
pgmm, Stata’s
xtabond/xtabond2 — implements a richer
period-stacked version with the Arellano-Bond
H-weighting. Both are consistent; they give slightly different point
estimates in finite samples.pgmm packaging exactly this
structure.