---
title: "CSSS/POLS 512 - Time Series and Panel Data Methods"
subtitle: "Lab 5: Fixed and Random Effects in Panel Data Analysis"
author: "Ramses Llobet"
fontfamily: accanthis
linkcolor: blue
output:
html_document:
toc: true
toc_depth: 4
editor_options:
chunk_output_type: console
fontsize: 11pt
header-includes:
\usepackage{float}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, error=FALSE, warning=FALSE, message = FALSE)
# clean environment
rm(list=ls())
# Load packages
library(nlme) # Estimation of mixed effects models
library(lme4) # Alternative package for mixed effects models
library(plm) # Econometrics package for linear panel models
library(fixest) # Alternative package for fixed effect models
library(arm) # Gelman & Hill code for mixed effects simulation
library(pcse) # Calculate PCSEs for LS models (Beck & Katz)
library(tseries) # For ADF unit root test
library(simcf) # For panel functions and simulators
library(lmtest) # Print results after correcting standard errors
library(stargazer) # For output tables
library(RColorBrewer) # For nice colors
library(MASS) # For mvrnorm()
library(tidyverse) # For data wrangling
# load source
source("source/TS_code.R") # Chris's TS custom functions
```
# Democracy data
In today's lab, we will be working with the following dataset on political regimes, resources endowments and economic growth around the world.
|Variable|Description^[See [http://www.u.arizona.edu/~mishler/PrzeworskiCodebook.PDF](http://www.u.arizona.edu/~mishler/PrzeworskiCodebook.PDF).]|
|---|---|
| `britcol` | Dummy variable coded 1 for every year in countries that were British colonies any time after 1918, 0 otherwise. |
| `cath` | Percentage of Catholics in the population. |
| `civilib` | Civil liberty. This variable can take values from 1 (least free) to 7 (most free). |
| `edt` | Cumulative years of education of the average member of the labor force. |
| `elf60` | Index of ethnolinguistic fractionalization, 1960. |
| **`gdpw`** | Real GDP per worker, 1985 international prices. |
| `moslem` | Percentage of Moslems in the population. |
| `newc` | Dummy variable coded 1 for every year in countries that became independent after 1945, 0 otherwise. |
| `oil` | Dummy variable coded 1 if the average ratio of fuel exports to total exports in 1984-86 exceeded 50%, 0 otherwise. |
| `pollib` | Political liberty. This variable can take values from 1 (least free) to 7 (most free). |
| **`reg`** | Dummy variable coded 1 for dictatorships and 0 for democracies. Transition years are coded as the regime that emerges in that year. For instance, there was a transition from democracy to dictatorship in Argentina in 1955. In that year, `reg`=1. |
| `stra` | The sum of past transitions to authoritarianism (as defined by `reg`) in a country. If a country experienced one or more transitions to authoritarianism before 1950, `stra` was coded 1 in 1950. |
```{r}
# Load Democracy data
# Variable names:
# COUNTRY YEAR BRITCOL CATH
# CIVLIB EDT ELF60 GDPW MOSLEM
# NEWC OIL POLLIB REG STRA
#
# data <- read.csv("data/democracy.csv",
# header = TRUE,
# na.strings = ".")
data <- read.csv("data/democracy.csv",
header = TRUE)
colnames(data)
colnames(data) <- colnames(data) %>% tolower() # change variable names to lower case
colnames(data)
# check out missing variables
library(questionr)
freq.na(data)
# Check sample's N and T
data$country %>% unique() %>% length()
data$year %>% unique() %>% length()
# so it is an N=135, T=40 panel data.
## some data wrangling
data <- data %>%
mutate(
ln_gdpw = log(gdpw),
country_f = as.factor(country),
year_f = as.factor(year)
)
# Remember to perform panel exploratory analysis before fitting models! Refer to lab 1.
```
## An Overview of Panel Data Models
The full flexibility model:
$$\Delta^{di}y_{it} = \alpha_i + X_{it}\beta_i + \sum^{P_i}_{p=1}y_{i,t-p}\phi_i + \sum^{Q_i}_{q=1}y_{i,t-q}\rho_i + \epsilon_{it}$$
where intercept $\alpha_i$, slope $\beta_i$, ARIMA order $P_i, D_i, Q_i$ and its coefficients $\phi_i$, $\rho_i$ vary and allows panel heteroskedasticity $\sigma^2_i$. We can put unit-specific trend $t\theta_i$ and seasonality $\kappa_{k,i}$ inside this model too.
# Fixed Effects
## What is Fixed Effects?
In simple terms, "fixed effects" refer to unit-specific intercepts. In other words, each unit or case $i$ has its own intercept $\alpha_i$ instead of pooling all cases into the same intercept. In practice, these case-specific intercepts introduce adjustments or controls for the mean outcome at the group level. This is akin to removing the mean from your data at the group level and conducting a simple linear regression.
However, the most common rationale for employing fixed effects estimators is their ability to eliminate **time-invariant group-level unobservables**. This arises from controlling for each group's mean level of the outcome variable. However, it is essential to recognize that an underlying assumption is that these time-invariant unobservables truly never change, not only in their substance but also in their functional interpretation.
For instance, is Trump's MAGA conservatism perceived the same as George Bush's conservatism? If the answer is strictly no, then conservatism in the last 15 years is actually time-variant, and no fixed effects will shield you from **omitted variable bias** or **measurement error**. The more aggregate is your unit of analysis (individuals < neighborhoods < cities < regions < countries) *or* the longer is your time window sample $T$, the more implausible is the assumption of time-invariant unobservables.
## Group-level heterogeneity
Suppose there is unobserved group-level or individual-level heterogeneity $u_i$ in the true data generating process (DGP) (such group-level or unit-level heterogeneity that does not change over individual or time periods):
$$
y_{it} = \alpha + \beta^\prime x_{it} + \color{red}{\delta^\prime u_{i}} + \varepsilon_{it}
$$
Because it is unobserved, we cannot measure it. Now set $\alpha_i = \alpha_i^* = \alpha + \delta^\prime u_i$, we can remove this unobservable group-level heterogeneity and rewrite the above equation as follows:
$$
y_{it} = \alpha_i + \beta^\prime x_{it} + \varepsilon_{it}
$$
The FE model can control for omitted unobservable variables if we assume that these unobservables are constant over time.
- if we add $\alpha_i$: control for time-invariant unit-level heterogeneity, such as some features of a country's culture.
- A model with only unit fixed effects, will be equivalent a *within-estimator*.
- if we add $\tau_t$: control for common shocks at each period, such as global economic conditions each year.
- A model with only period fixed effects, will be equivalent a *between-estimator*.
- Of course we can also control for omitted unobservables at other levels, such as group, cohort, etc.
However, FE models come at a large cost.
- We cannot add time-invariant variables into FE models, otherwise it will cause perfect collinearity.
- FE estimates can be inefficient and exacerbate **small sample bias**.
- Incidental parameter problem: suppose we include individual-level FE $\alpha_i$, the estimates is consistent as $T \rightarrow \infty$, but not as $N \rightarrow \infty$.
There are 3 ways to estimate this FE model:
1. Within Estimator
2. Least Square Dummy Variable (LSDV) Estimator
3. First Difference (FD) Estimator
### How to Check Time- or Unit-invariant Variables?
```{r}
pdata <- pdata.frame(data, index = c("country_f", "year_f"))
head(pdata) # it will create an id for each observation
pvar(pdata)
# or
pvar(data, index = c("country", "year"))
```
### Within Estimator
We take each individuals’ deviation from group means (average over $t$ within each group/unit $i$). Because the observable unit-level heterogeneity is time-invariant, its value should not changed after averaging over time within each unit. Thus, we can avoid estimating the varying intercepts.
$$
\begin{aligned}
\tilde{y}_{it} =& y_{it} - \bar{y}_{i} \\
=& \color{red}{(\alpha_i - \bar{\alpha}_i)} + \beta^\prime \underbrace{(x_{it} - \bar{x}_{i})}_{\tilde{x}_{it}} + \underbrace{(\varepsilon_{it} - \bar{\varepsilon}_i)}_{\tilde{\varepsilon}_{it}} \\
=& \beta^\prime \tilde{x}_{it} + \tilde{\varepsilon}_{it}
\end{aligned}
$$
Like OLS, if not GM assumption is violated, the within estimator $\hat{\beta}_W$ is asymptotically normal and efficient:
$$
\widehat{\beta}_{W} \stackrel{\text { approx. }}{\sim} \mathcal{N}\left(\beta, \sigma^{2}\left(\tilde{\mathbf{X}}^{\top} \tilde{\mathbf{X}}\right)^{-1}\right)
$$
However, the estimation of standard error is a little tricky. When we demean all the variables, we already used $N$ degrees of freedom for $N$ units.
$$
\widehat{\sigma}^{2}=\frac{1}{N T-N-K} \sum_{i=1}^{N} \sum_{t=1}^{T}{\widehat{\tilde{\varepsilon}}}_{i}^{2}
$$
Hence, if we use a canned OLS function for $\hat{\beta}_W$, the standard error is not correct, and we need to adjust it manually.
### Within-FE Model in R
```{r}
# two-way within estimator using `plm`
plm_within <- plm(
#1st, the formula. if we include `oil`, plm() will automatically drop it. Why?
ln_gdpw ~ reg + edt,
data = data,
#2nd, you Ns and Ts indices
index = c("country_f","year_f"),
# what model specification? See help documentation: ?plm
model = "within",
# what estimator?
effect = "individual"
)
summary(plm_within) # note that the standard error does not take serial correlation into account
## Adjusting uncertainty (standard errors)
# Arellano (1987), these are also commonly known as "cluster standard errors"
coeftest(plm_within, vcov. = vcovHC(plm_within, method = "arellano"))
# Beck and Katz (1995) panel corrected standard errors
coeftest(plm_within, vcov. = vcovBK(plm_within))
# Driscoll and Kraay panel corrected standard errors
coeftest(plm_within, vcov. = vcovSCC(plm_within))
# an alternative way is to use `fixest`; it is faster when we have a large number of FEs
fixest_model <- feols(
ln_gdpw ~ reg + edt | country + year,
data = data,
cluster = ~ country + year
)
summary(fixest_model)
```
Although within estimator ignores varying intercepts, we can actually recover them by using:
$$
\hat{\alpha}_i = \hat{y}_{it} - \hat{\beta}^\prime x_{it}
$$
Sometimes it is useful for us to look at the variation between groups.
```{r}
plm::fixef(plm_within)
```
## LSDV Estimator
Another option to directly estimate group-specific intercepts is to use the least squares dummy variables (LSDV) estimator, i.e., OLS with $N$ group dummies excluding the global intercept.
The group dummies are often colloquially called "fixed effects." For large $T$, it should be very similar to FE estimator. But it is not a good idea for very small $T$: estimates of $\alpha_i$ will be poor due to small $T$ within each group.
```{r}
# LSDV estimator using `lm`
plm_lsdv <- plm(
ln_gdpw ~ 0 + reg + edt + country_f, # optional: subtract 1 or set 0 to drop global intercept
index = c("country_f", "year_f"),
effect = "individual",
model = "pooling",
data = data
)
summary(plm_lsdv)
coeftest(plm_lsdv, vcov. = vcovHC(plm_lsdv, method = "arellano"))[1:2,]
# you can also use lm() to implement LSDV estimator, but it is difficult to adjust for standard errors
# You can estimate the same model using lm, as follows:
summary(lm(ln_gdpw ~ country_f + year_f + reg + edt, data=data))
# You will need other packages to adjust the standard errors in lm model objects. Check out in the internet but coeftest surely have options!
```
### Within vs. LSDV
Within and LSDV estimators are equivalent to each other. They give exactly the same estimates.
But we favor one over another based on the following reasons:
- Their standard errors will differ slightly. We don’t need to adjust degrees of freedom for $\sigma^2$ manually in LSDV estimator because it directly estimate the fixed effects of N groups. By contrast, standard errors from vanilla OLS on the within estimator will be slightly off due to incorrect degrees of freedom (smart software such as `plm()` in `R` and `areg` in `Stata` will correct it).
- Compared to within estimator, LSDV approach can be computationally demanding. The model matrix is a very large matrix to invert when $N$ is large.
- However, within and LSDV cannot estimate the intercept very well. The estimates of $\alpha_i$ is inconsistent if $N \rightarrow \infty$ and $T$ fixed (i.e. incidental parameters problem). But there is no need to estimate $\alpha_i$ unless the quantity of interest is a function of it, such as some non-linear models.
## FD Estimator
Since the group-level time-invariant unobservables are constant across time, first-differences are an alter- native to purge out it. The first difference model is the following:
$$
\begin{aligned}
\Delta y_{it} =& y_{it} - y_{i,t-1} \\
=& \color{red}{(\alpha_i - \alpha_i)} + \beta^\prime \underbrace{(x_{it} - x_{i,t-1})}_{\Delta x_{it}} + \underbrace{(\varepsilon_{it} - \varepsilon_{i,t-1})}_{\Delta \varepsilon_{it}} \\
=& \beta^\prime \Delta x_{it} + \Delta \varepsilon_{it}
\end{aligned}
$$
If $\Delta \varepsilon_{it}$ are homoskedastic and without serial correlation, then usual OLS standard errors work just fine. Otherwise we need to use GLS to estimate uncertainties.
FD estimator in `plm()` will automatically include an intercept. The intercept in FD model is actually an coefficient for linear time trend (in the second step of the above equation, it is $\theta (t+1 - t)$).
### FE vs. FD Estimators
- FD and FE will give the same estimates when $T = 2$, but for $T > 2$, the two methods are different.
- Note that FD often cannot account for two-way fixed effects. It is not intuitive to have $y_{it} - y_{i-1,t}$.
- Both FE and FD estimators are consistent under the assumption of **strict exogeneity**.
- However, the primary tradeoff between FE and FD estimators lies in the *i.i.d.* assumption: if original errors, $\varepsilon_{it}$, are serially uncorrelated, then FE estimator will be more efficient than FD estimator.
- But if there are substantial serial dependence in the original errors, we may consider FD estimator to make the first-differences of errors, $\Delta \varepsilon_{it}$, serially uncorrelated. In the latter case, FD estimator is more efficient than FE estimator.
- We can always try both: if results are not very sensitive, good!
```{r}
# FD estimator using `plm`
plm_fd <- plm(
ln_gdpw ~ reg + edt - 1,
index = c("country", "year"),
model = "fd", # FD estimator
data = data
)
summary(plm_fd)
```
## Two-way Fixed Effects
A very common model specificaiton is the two-way fixed effects (TWFE). Note that the TWFE does not provide any causal identification unless pre-treatment parallel trends hold and we assume covariate imbalance between control and treatment group. Even in this case, recent econometric research has found many [shortcoming](https://www.sciencedirect.com/science/article/pii/S0304407621001445?casa_token=pntpS6an_3wAAAAA:iT-BYE3zAfuTRaHs32NRoM8-gidl4rvNwtEuuXycRzK07R-_RwAjhhnx_qIq9qHzLQ5ycNGCpw) of this model specification for the causal estimation of treatment effects.
$$ y_{it} = \alpha_i + \tau_t + \beta^\prime x_{it} + \varepsilon_{it}$$
Where we incorporate a dummy variable for each period $\tau_t$. However, some [researchers](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0231349) argue that this model may be inappropriate to answer some research questions.
```{r}
# two-way fixed effects
plm_twfe <- plm(ln_gdpw ~ reg + edt,
data = data,
index = c("country_f", "year_f"),
model = "within",
effect = "twoways")
summary(plm_twfe)
# You can estimate the same model using lm, as follows:
summary(lm(ln_gdpw ~ country_f + year_f + reg + edt, data=data))
# Correcting for uncertainty with clustered standard errors:
coeftest(plm_twfe, vcov. = vcovHC(plm_twfe, method = "arellano"))
```
# Random Effects / Mixed Effects
## What is Random Effects?
Assumes the individuals come from a common population. Each individual's unobservable heterogeneous effect on the outcome follows a normal distribution with an unknown variance $\sigma_\alpha^2$. Then we may have the following model setup:
$$
\begin{aligned}
y_{i t} &=\beta_{0}+\beta_{1} x_{i t}+\alpha_{i}+\varepsilon_{i t} \\
\alpha_{i} & \sim \mathcal{N}\left(0, \sigma_{\alpha}^{2}\right) \\
\varepsilon_{i t} & \sim \mathcal{N}\left(0, \sigma_\varepsilon^{2}\right)
\end{aligned}
$$
## Estimate an RE model in R
- Three packages in R: `plm` (feasible generalized least squares by defaulty), `nlme` (maximum likelihood by default), and `lme4` (restricted maximum likelihood by default)
- Pros and Cons among different estimation strategies
- FGLS^[See [https://cran.r-project.org/web/packages/plm/vignettes/A_plmPackage.html#nlme](https://cran.r-project.org/web/packages/plm/vignettes/A_plmPackage.html#nlme).]
- preferred by some econometricians because it has fewer distributional assumptions.
- sometimes FGLS is computationally complicated because it need to invert a large matrix.
- ML: biased estimation of variance parameters; but such bias gets smaller for larger sample size
- REML: unbiased estimation of variance parameters
- If you prefer a Bayesian approach for more flexible modeling, you can use `brms` or even manually specify the model in `JAGS` and `stan`.
- Pros and cons among different packages, besides estimation strategies
- `plm` cannot deal with mixed effects, i.e., the mean of $\alpha_i$ varies due to certain covariates, i.e. $\alpha_i \sim \mathcal{N}(\gamma^\prime z_{it}, \sigma_\alpha^2)$, but `nlme` and `lme4` can do so.
- `nlme` can take ARMA structure into account, but `plm` and `lme4` can only consider AR terms.
- `nlme` is difficult to specify [crossed (non-nested) random effect](https://errickson.net/stats-notes/vizrandomeffects.html) structure (i.e. multiple separated random effects), while `lme4` is easy to do so.
- In terms of the mixed effects model in `nlme` and `lme4`, you may be confused by their different use of "fixed effects" and "random effects."^[See [https://statmodeling.stat.columbia.edu/2005/01/25/why_i_dont_use/](https://statmodeling.stat.columbia.edu/2005/01/25/why_i_dont_use/).]
- fixed effect: coefficients that do not vary by group, such that they are fixed, not random.
- random effect: group-specific effect; these coefficients vary across groups.
A typical mixed effect model is as follows:
$$
y_{it} = \underbrace{\beta^\prime x_{it}}_\text{"fixed effect"} + \underbrace{\gamma^\prime z_i}_\text{"random effects"} + \varepsilon_{it}
$$
or can be specified as follows:
$$
\begin{aligned}
y_{i t} &=\beta^\prime x_{i t}+\alpha_{i}+\varepsilon_{i t} \\
\alpha_{i} & \sim \mathcal{N}\left(\gamma^\prime z_i, \sigma_{\alpha}^{2}\right) \\
\varepsilon_{i t} & \sim \mathcal{N}\left(0, \sigma_\varepsilon^{2}\right)
\end{aligned}
$$
```{r}
# RE model using `plm`
plm_re <- plm(
ln_gdpw ~ reg + edt + oil,
data = data,
index = c("country", "year"),
model = "random",
effect = "individual"
)
summary(plm_re)
# RE model using `nlme`
lme_re <- lme(
fixed = ln_gdpw ~ reg + edt + oil,
random = ~ 1 | country,
data = data,
na.action = na.omit # need to specify how we deal with NA values, otherwise throws an error
)
summary(lme_re)
# RE model using `lme4`
lmer_re <- lmer(
ln_gdpw ~ reg + edt + oil + (1 | country),
data = data
)
summary(lmer_re)
```
## FE vs. RE
- Different assumption about the relationship between group heterogeneity $u_i$ and covariates $x_{it}$.
- In FE model, we assume conditional ignorability $E[\varepsilon_{it}|x_{it}, u_i] = 0$. That is, we assume that conditional on unobservable confounder $u_i$, $\varepsilon_{it}$ and $x_{it}$ is uncorrelated. $x_{it}$ and $u_i$ have some correlation.
- In RE model, we assume there is no unmeasured confounder, so ignorability holds without conditioning on $u_i$: $E[\varepsilon_{it} | x_{it}] = E[u_i] = 0$. $x_{it}$ and $u_i$ have no correlation.
- RE estimator is a matrix-weighted average of within estimator and between estimator.
- But an intuitive way to think about it is that FE model **rule out** between-group variation; but in RE model, we preserve the information of between-group variation by modeling $\sigma_\alpha^2$.
- It is difficult to show it mathematically, because it requires to go through the variance-covariance matrix of the random effects and requires some understanding what Gaussian processes are. If you want to learn how to model random effects, you should consider taking CS&SS 560 and CS&SS 592.
We can think of RE model doing **quasi-demeaning** or **partial-pooling**:
$$
y_{it} - \theta\bar{y}_i = \beta^\prime (x_{it} - \theta \bar{x}_i) + (\varepsilon_{it} - \theta\bar{\varepsilon}_i)
$$
where $\theta = 0$ implies pooled OLS and $\theta = 1$ implies fixed effects. Doing some math shows that
$$
\theta = 1 - \sqrt{\frac{\sigma_\varepsilon^2}{\sigma_\varepsilon^2 + T \cdot \sigma_\alpha^2}}
$$
# Dynamic TS Structure in Panel Data Model
## Detect Time Series Structure in Panel Data
We know that for some countries which are good at growth maintenance, `gdpw` may be a non-stationary time series; whereas for some countries experienced economic fluctuation, their `gdpw` may have mean-reverting or even worse behavior in the long run.
```{r}
ggplot(pdata, aes(x = year, y = gdpw, group = country, color = region)) +
geom_line(alpha = 0.5) +
theme_bw()
ggplot(pdata, aes(x = year, y = ln_gdpw, group = country, color = region)) +
geom_line(alpha = 0.5) +
theme_bw()
```
### Detect Time Series Structure in Panel Data
We can use statistical tests to formally detect stationarity of time series in panel data.
- `purtest()` in `plm` package
- `punitroots` package: the author no longer maintain this package
- apply `adf.test()` and `pp.test()` to each time series, and plot the distribution of p values.
```{r}
pseries <- split(pdata$gdpw, pdata$country_f)
# Illustration: pp test for a single panel/ts
pseries[1]
pp.test(pseries$`1`)
pp.test(pseries$`1`)[["p.value"]]
# Now we use sapply() function to loop this analysis in all the panels
pp_pval <- sapply(pseries, function(x){pp.test(x)[["p.value"]]})
adf_pval <- sapply(pseries, function(x){adf.test(x)[["p.value"]]})
# Look at the distribution of the statistical tests to assess stationary
hist(pp_pval, breaks = 100)
hist(adf_pval, breaks = 100)
```
We detect many countries have non-stationary time series for the level dependent variable, but what about a differences dependent variable?
```{r}
pdata$gdpw_diff <- diff(pdata$gdpw) # because we already have a pdata.frame, we can safely use plm::diff() to transform it
ggplot(pdata, aes(x = year, y = gdpw_diff, group = country, color = region)) +
geom_line(alpha = 0.5) +
theme_bw() # much more stationary!
pseries <- split(pdata$gdpw_diff, pdata$country)
# have NA values in the first observation, need to remove them in statistical tests
# sapply will not include NAs
pp_pval <- sapply(pseries, function(x){pp.test(na.omit(x))[["p.value"]]})
adf_pval <- sapply(pseries, function(x){adf.test(na.omit(x))[["p.value"]]})
hist(pp_pval, breaks = 100)
hist(adf_pval, breaks = 100)
```
## Dynamic TS structure in FE model
We will create lags using the `plm::lag` function into `pdata` class object. However, notice that you can perform the same operation using dplyr `lag` in a tidy class object with `group_by()` specified.
```{r}
# create lags using pdata class
pdata$gdpw_diff_lag1 <- lag(pdata$gdpw_diff)
pdata$gdpw_diff_lag2 <- lag(pdata$gdpw_diff, 2)
# fixed effect ARIMA(1, 1, 0); cannot incorporate MA structure
fe_ar1 <- plm(
gdpw_diff ~ gdpw_diff_lag1 + reg + edt,
data = pdata,
model = "within",
effect = "individual"
)
# fixed effect ARIMA(2, 1, 0)
fe_ar2 <- plm(
gdpw_diff ~ gdpw_diff_lag1 + gdpw_diff_lag2 + reg + edt,
data = pdata,
model = "within",
effect = "individual"
)
# Breusch–Godfrey Test for Panel Models
# null means no serial correlation
pbgtest(fe_ar1) # there is remaining serial correlation
pbgtest(fe_ar2) # there is no serial correlation for AR(2) model
```
## Dynamic TS structure in RE model
In contrast to `plm`, `lme` estimates the panel using `ML` or `REML`, and it allows the specificaiton of moving average processes.
```{r}
# random effects ARIMA(1, 1, 0)
# note: the time varibale must be an integer, not a character or factor.
re_arima <- lme(
fixed = gdpw_diff ~ reg + edt + oil,
random = ~ 1 | country,
correlation = corARMA(
# year nested in country
form = ~ year | country,
p = 1, # AR(p)
q = 0 # MA(q)
),
data = pdata,
na.action = na.omit
)
summary(re_arima)
```
### FE vs. lagged dependent variable (LDV)
There is a tradeoff when we both incorporate FE and lagged dependent variable in one model.
The estimate of $\phi$ will always be underestimated in dynamic FE model, but the bias vanishes as $T$ increases. This is called *"Nickell bias."*
To obtain an unbiased estimate with small $T$, we may choose dynamic panel data model, or prefer either FE or LDV model. If we care about causal inference and want to use a different-in-difference design (suppose it is a simple $2\times 2$ DID) to estimate treatment effect, then we may need to think about the underlying causal identification assumption:
- FE: conditioning on **time-invariant** unobservable individual-level confounder $\alpha_i$, $x_{it}$ and $y_{it}$ are independent.
- LDV: conditioning on **time-varying** unobservable individual-level confounder $y_{i,t-1}$, $x_{it}$ and $y_{it}$ are independent.
FE and LDV have a very useful bracketing property (Angrist and Pischke 2009; [Ding and Li 2019](https://www.cambridge.org/core/journals/political-analysis/article/bracketing-relationship-between-differenceindifferences-and-laggeddependentvariable-adjustment/2FDDA72EBC5979561F1D0414329F003E)):
- If LDV is correct, but we mistakenly use FE, then the estimate of treatment effect will be biased upward.
- If FE is correct, but we mistakenly use LDV, then the estimate of treatment effect will be biased downward.
- We can think of FE and LDV as bounding the causal effect of interest.
- A better way is to always check the robustness of our findings using alternative identifying assumptions.
Recent literature attempts to go beyond the additive assumption of unobservable confounders. We can allow unit-level effect and time effect interact with each other so that we can account for time-invariant and time-varying unit-level confounders at the same time. Recent literature consider decomposing the matrix of error terms to several latent factors (e.g. use PCA) to account for this interactive structure (Bai 2009; Xu 2017).
# Counterfactual Forecasting
## Counterfactual Forecasting with RE Models
- Extract model information for simulation, such as coefficients and variance-covariance matrix.
- In `nlme` package, we use `fixed.effects()` or `fixef()` to extract coefficients that are fixed across all groups. (note that here we have different meaning of "fixed effect" and "random effect.")
- You can extract group-varying coefficients or intercepts using `random.effects()` or `ranef()`.
- Simply using `coefficients()` or `coef()` will aggregate "fixed effect" and "random effect" for each group.
- Simulate parameters from multivariate Normal distribution
- Create hypothetical scenarios
- Use simulated parameters and hypothetical scenarios to calculate quantity of interest, such as expected values, first difference, relative risk, etc.
```{r}
sims <- 1000
simbetas <- mvrnorm(sims,
fixef(re_arima),
vcov(re_arima))
# Make matrix of hypothetical x's
formula <- gdpw_diff ~ reg + edt + oil
periods.out <- 50
xhyp <- cfMake(formula, pdata, periods.out)
for (i in 1:periods.out)
xhyp <- cfChange(xhyp, "edt", x = mean(pdata$edt, na.rm=TRUE) + sd(pdata$edt, na.rm=TRUE), scen=i) # we assume a permanent increase in education (+ 1 SD) over the next 50 years
phi <- 0.25 # from model summary()
lagY <- mean(pdata$gdpw_diff, na.rm = TRUE) # Hypothetical previous change in Y for simulation
initialY <- mean(pdata$gdpw, na.rm = TRUE) # Hypothetical initial level of Y for simulation
```
```{r}
# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_evre <- ldvsimev(xhyp, # The matrix of hypothetical x's
simbetas, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=1, # Column containing the constant
# set to NA for no constant
phi=phi, # estimated AR parameters; length must match lagY
lagY=lagY, # lags of y, most recent last
transform="diff", # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
initialY=initialY # for differenced models, the lag of the level of y
)
# Simulate first differences in Y (on original level scale)
# out to periods.out given hypothetical future values of X, X0,
# and initial lags of the change in Y
sim_fdre <- ldvsimfd(xhyp, # The matrix of hypothetical x's
simbetas, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=1, # Column containing the constant
# set to NA for no constant
phi=phi, # estimated AR parameters; length must match lagY
lagY=lagY, # lags of y, most recent last
transform="diff" # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
)
```
Now plot with base R.
```{r}
plot.new()
par(usr=c(1,50,-15000,15000))
axis(1,at=seq(1,50,10))
axis(2,at=seq(-15000,15000,5000))
title(xlab="Time",ylab="Expected change in constant GDP ($ pc)",main="Random effects ARIMA(1,1,0)")
# Make the x-coord of a confidence envelope polygon
xpoly <- c(1:periods.out,
rev(1:periods.out),
1)
# Make the y-coord of a confidence envelope polygon
ypoly <- c(sim_fdre$lower,
rev(sim_fdre$upper),
sim_fdre$lower[1])
# Choose the color of the polygon
col <- "lightblue"
# Plot the polygon first, before the points & lines
polygon(x=xpoly,
y=ypoly,
col=col,
border=FALSE
)
# Plot the fitted line
lines(x=1:periods.out,y=sim_fdre$pe,col="darkblue")
lines(x=c(0,50),y=c(0,0),lty="solid")
```
## Counterfactual Forecasting with FE Model
- The general steps are similar to counterfactual forecasting with RE model.
- We can use `fixef()` to extract group-specific intercepts.
- Sometimes we may prefer using panel-corrected standard error as variance-covariance matrix for parameter simulation.
```{r}
sims <- 1000
simparam <- mvrnorm(sims, coef(fe_ar1), vcovBK(fe_ar1))
# Pull off the simulated lag coefficient
simphi <- simparam[,1]
# Put together the "constant" term (avg of the FEs, or a specific FE if you like)
# with the rest of the regressors
simbetas <- cbind(rep(mean(fixef(fe_ar1)), sims), simparam[,2:ncol(simparam)])
# Make matrix of hypothetical x's: drop OIL!
# and now we need hypothetical countries!
# and no intercept!
# Make matrix of hypothetical x's
# Note the formula below is a little different; we've removed the lag
# and will let the simulator handle lag structure
formula <- gdpw_diff ~ reg + edt
periods.out <- 50
xhyp <- cfMake(formula, pdata, periods.out)
for (i in 1:periods.out)
xhyp <- cfChange(xhyp, "edt", x=mean(pdata$edt,na.rm=TRUE)+sd(pdata$edt,na.rm=TRUE), scen=i)
phi <- mean(simphi)
lagY <- mean(pdata$gdpw_diff, na.rm = TRUE) # Hypothetical previous change in Y for simulation
initialY <- mean(pdata$gdpw, na.rm = TRUE) # Hypothetical initial level of Y for simulation
```
```{r}
# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_evfe <- ldvsimev(xhyp, # The matrix of hypothetical x's
simbetas, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # NA indicates no constant!
phi=phi, # estimated AR parameters; length must match lagY
lagY=lagY, # lags of y, most recent last
transform="diff", # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
initialY=initialY # for differenced models, the lag of the level of y
)
# Simulate first differences in Y (on original level scale)
# out to periods.out given hypothetical future values of X, X0,
# and initial lags of the change in Y
sim_fdfe <- ldvsimfd(xhyp, # The matrix of hypothetical x's
simbetas, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # Column containing the constant
# set to NA for no constant
phi=phi, # estimated AR parameters; length must match lagY
lagY=lagY, # lags of y, most recent last
transform="diff" # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
)
```
Now plot with base R.
```{r}
plot.new()
par(usr=c(1,50,-15000,15000))
axis(1,at=seq(1,50,10))
axis(2,at=seq(-15000,15000,5000))
title(xlab="Time",ylab="Expected change in constant GDP ($ pc)",main="Fixed effects ARIMA(1,1,0), pcse")
# Make the x-coord of a confidence envelope polygon
xpoly <- c(1:periods.out,
rev(1:periods.out),
1)
# Make the y-coord of a confidence envelope polygon
ypoly <- c(sim_fdfe$lower,
rev(sim_fdfe$upper),
sim_fdfe$lower[1])
# Choose the color of the polygon
col <- "lightgreen"
# Plot the polygon first, before the points & lines
polygon(x=xpoly,
y=ypoly,
col=col,
border=FALSE
)
# Plot the fitted line
lines(x=1:periods.out,y=sim_fdfe$pe,col="darkgreen")
lines(x=c(0,50),y=c(0,0),lty="solid")
```