Democracy data

In today’s lab, we will be working with the following dataset on political regimes, resources endowments and economic growth around the world.

Variable Description1
britcol Dummy variable coded 1 for every year in countries that were British colonies any time after 1918, 0 otherwise.
cath Percentage of Catholics in the population.
civilib Civil liberty. This variable can take values from 1 (least free) to 7 (most free).
edt Cumulative years of education of the average member of the labor force.
elf60 Index of ethnolinguistic fractionalization, 1960.
gdpw Real GDP per worker, 1985 international prices.
moslem Percentage of Moslems in the population.
newc Dummy variable coded 1 for every year in countries that became independent after 1945, 0 otherwise.
oil Dummy variable coded 1 if the average ratio of fuel exports to total exports in 1984-86 exceeded 50%, 0 otherwise.
pollib Political liberty. This variable can take values from 1 (least free) to 7 (most free).
reg Dummy variable coded 1 for dictatorships and 0 for democracies. Transition years are coded as the regime that emerges in that year. For instance, there was a transition from democracy to dictatorship in Argentina in 1955. In that year, reg=1.
stra The sum of past transitions to authoritarianism (as defined by reg) in a country. If a country experienced one or more transitions to authoritarianism before 1950, stra was coded 1 in 1950.
# Load Democracy data
# Variable names:
# COUNTRY   YEAR    BRITCOL     CATH
# CIVLIB    EDT ELF60   GDPW    MOSLEM
# NEWC      OIL POLLIB  REG STRA
# 
# data <- read.csv("data/democracy.csv", 
#                  header = TRUE, 
#                  na.strings = ".")

data <- read.csv("data/democracy.csv", 
                 header = TRUE)

colnames(data)
##  [1] "COUNTRY" "CTYNAME" "REGION"  "YEAR"    "BRITCOL" "CATH"    "CIVLIB" 
##  [8] "EDT"     "ELF60"   "GDPW"    "MOSLEM"  "NEWC"    "OIL"     "POLLIB" 
## [15] "REG"     "STRA"
colnames(data) <- colnames(data)  %>%  tolower() # change variable names to lower case
colnames(data)
##  [1] "country" "ctyname" "region"  "year"    "britcol" "cath"    "civlib" 
##  [8] "edt"     "elf60"   "gdpw"    "moslem"  "newc"    "oil"     "pollib" 
## [15] "reg"     "stra"
# check out missing variables

library(questionr)

freq.na(data)
##         missing  %
## civlib     1727 42
## pollib     1727 42
## edt        1226 30
## elf60       542 13
## country       0  0
## ctyname       0  0
## region        0  0
## year          0  0
## britcol       0  0
## cath          0  0
## gdpw          0  0
## moslem        0  0
## newc          0  0
## oil           0  0
## reg           0  0
## stra          0  0
# Check sample's N and T
data$country %>% unique()  %>%  length()
## [1] 135
data$year  %>%  unique() %>% length()
## [1] 40
# so it is an N=135, T=40 panel data.


## some data wrangling

data <- data %>% 
  mutate(
    ln_gdpw = log(gdpw),
    country_f = as.factor(country),
    year_f = as.factor(year)
  )


# Remember to perform panel exploratory analysis before fitting models! Refer to lab 1.

An Overview of Panel Data Models

The full flexibility model:

\[\Delta^{di}y_{it} = \alpha_i + X_{it}\beta_i + \sum^{P_i}_{p=1}y_{i,t-p}\phi_i + \sum^{Q_i}_{q=1}y_{i,t-q}\rho_i + \epsilon_{it}\]

where intercept \(\alpha_i\), slope \(\beta_i\), ARIMA order \(P_i, D_i, Q_i\) and its coefficients \(\phi_i\), \(\rho_i\) vary and allows panel heteroskedasticity \(\sigma^2_i\). We can put unit-specific trend \(t\theta_i\) and seasonality \(\kappa_{k,i}\) inside this model too.

Fixed Effects

What is Fixed Effects?

In simple terms, “fixed effects” refer to unit-specific intercepts. In other words, each unit or case \(i\) has its own intercept \(\alpha_i\) instead of pooling all cases into the same intercept. In practice, these case-specific intercepts introduce adjustments or controls for the mean outcome at the group level. This is akin to removing the mean from your data at the group level and conducting a simple linear regression.

However, the most common rationale for employing fixed effects estimators is their ability to eliminate time-invariant group-level unobservables. This arises from controlling for each group’s mean level of the outcome variable. However, it is essential to recognize that an underlying assumption is that these time-invariant unobservables truly never change, not only in their substance but also in their functional interpretation.

For instance, is Trump’s MAGA conservatism perceived the same as George Bush’s conservatism? If the answer is strictly no, then conservatism in the last 15 years is actually time-variant, and no fixed effects will shield you from omitted variable bias or measurement error. The more aggregate is your unit of analysis (individuals < neighborhoods < cities < regions < countries) or the longer is your time window sample \(T\), the more implausible is the assumption of time-invariant unobservables.

Group-level heterogeneity

Suppose there is unobserved group-level or individual-level heterogeneity \(u_i\) in the true data generating process (DGP) (such group-level or unit-level heterogeneity that does not change over individual or time periods):

\[ y_{it} = \alpha + \beta^\prime x_{it} + \color{red}{\delta^\prime u_{i}} + \varepsilon_{it} \]

Because it is unobserved, we cannot measure it. Now set \(\alpha_i = \alpha_i^* = \alpha + \delta^\prime u_i\), we can remove this unobservable group-level heterogeneity and rewrite the above equation as follows:

\[ y_{it} = \alpha_i + \beta^\prime x_{it} + \varepsilon_{it} \]

The FE model can control for omitted unobservable variables if we assume that these unobservables are constant over time.

  • if we add \(\alpha_i\): control for time-invariant unit-level heterogeneity, such as some features of a country’s culture.
    • A model with only unit fixed effects, will be equivalent a within-estimator.
  • if we add \(\tau_t\): control for common shocks at each period, such as global economic conditions each year.
    • A model with only period fixed effects, will be equivalent a between-estimator.
  • Of course we can also control for omitted unobservables at other levels, such as group, cohort, etc.

However, FE models come at a large cost.

  • We cannot add time-invariant variables into FE models, otherwise it will cause perfect collinearity.
  • FE estimates can be inefficient and exacerbate small sample bias.
  • Incidental parameter problem: suppose we include individual-level FE \(\alpha_i\), the estimates is consistent as \(T \rightarrow \infty\), but not as \(N \rightarrow \infty\).

There are 3 ways to estimate this FE model: 1. Within Estimator 2. Least Square Dummy Variable (LSDV) Estimator 3. First Difference (FD) Estimator

How to Check Time- or Unit-invariant Variables?

pdata <- pdata.frame(data, index = c("country_f", "year_f"))
head(pdata) # it will create an id for each observation
##        country ctyname region year britcol cath civlib   edt elf60 gdpw moslem
## 1-1962       1 Algeria Africa 1962       0  0.5     NA 1.160  0.43 5012   99.1
## 1-1963       1 Algeria Africa 1963       0  0.5     NA 1.250  0.43 6083   99.1
## 1-1964       1 Algeria Africa 1964       0  0.5     NA 1.345  0.43 6502   99.1
## 1-1965       1 Algeria Africa 1965       0  0.5     NA 1.450  0.43 6620   99.1
## 1-1966       1 Algeria Africa 1966       0  0.5     NA 1.560  0.43 6612   99.1
## 1-1967       1 Algeria Africa 1967       0  0.5     NA 1.675  0.43 6982   99.1
##        newc oil pollib reg stra  ln_gdpw country_f year_f
## 1-1962    1   1     NA   0    0 8.519590         1   1962
## 1-1963    1   1     NA   0    0 8.713253         1   1963
## 1-1964    1   1     NA   0    0 8.779865         1   1964
## 1-1965    1   1     NA   0    0 8.797851         1   1965
## 1-1966    1   1     NA   0    0 8.796641         1   1966
## 1-1967    1   1     NA   0    0 8.851091         1   1967
pvar(pdata)
## no time variation:       country ctyname region britcol cath elf60 moslem newc oil country_f 
## no individual variation: year edt elf60 year_f 
## all NA in time dimension for at least one individual:  civlib edt elf60 pollib 
## all NA in ind. dimension for at least one time period: civlib edt elf60 pollib
# or
pvar(data, index = c("country", "year"))
## no time variation:       country ctyname region britcol cath elf60 moslem newc oil country_f 
## no individual variation: year edt elf60 year_f 
## all NA in time dimension for at least one individual:  civlib edt elf60 pollib 
## all NA in ind. dimension for at least one time period: civlib edt elf60 pollib

Within Estimator

We take each individuals’ deviation from group means (average over \(t\) within each group/unit \(i\)). Because the observable unit-level heterogeneity is time-invariant, its value should not changed after averaging over time within each unit. Thus, we can avoid estimating the varying intercepts.

\[ \begin{aligned} \tilde{y}_{it} =& y_{it} - \bar{y}_{i} \\ =& \color{red}{(\alpha_i - \bar{\alpha}_i)} + \beta^\prime \underbrace{(x_{it} - \bar{x}_{i})}_{\tilde{x}_{it}} + \underbrace{(\varepsilon_{it} - \bar{\varepsilon}_i)}_{\tilde{\varepsilon}_{it}} \\ =& \beta^\prime \tilde{x}_{it} + \tilde{\varepsilon}_{it} \end{aligned} \]

Like OLS, if not GM assumption is violated, the within estimator \(\hat{\beta}_W\) is asymptotically normal and efficient:

\[ \widehat{\beta}_{W} \stackrel{\text { approx. }}{\sim} \mathcal{N}\left(\beta, \sigma^{2}\left(\tilde{\mathbf{X}}^{\top} \tilde{\mathbf{X}}\right)^{-1}\right) \]

However, the estimation of standard error is a little tricky. When we demean all the variables, we already used \(N\) degrees of freedom for \(N\) units.

\[ \widehat{\sigma}^{2}=\frac{1}{N T-N-K} \sum_{i=1}^{N} \sum_{t=1}^{T}{\widehat{\tilde{\varepsilon}}}_{i}^{2} \]

Hence, if we use a canned OLS function for \(\hat{\beta}_W\), the standard error is not correct, and we need to adjust it manually.

Within-FE Model in R

# two-way within estimator using `plm`

plm_within <- plm(
  #1st, the formula. if we include `oil`, plm() will automatically drop it. Why?
  ln_gdpw ~ reg + edt,
  data = data,
  #2nd, you Ns and Ts indices
  index = c("country_f","year_f"),
  # what model specification? See help documentation: ?plm
  model = "within",
  # what estimator?
  effect = "individual"
)

summary(plm_within) # note that the standard error does not take serial correlation into account
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = ln_gdpw ~ reg + edt, data = data, effect = "individual", 
##     model = "within", index = c("country_f", "year_f"))
## 
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -0.832562 -0.095557  0.015056  0.111956  0.714304 
## 
## Coefficients:
##      Estimate Std. Error t-value Pr(>|t|)    
## reg -0.014014   0.015048 -0.9313   0.3518    
## edt  0.167823   0.003712 45.2107   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    158.01
## Residual Sum of Squares: 91.097
## R-Squared:      0.42346
## Adj. R-Squared: 0.39986
## F-statistic: 1022.77 on 2 and 2785 DF, p-value: < 2.22e-16
## Adjusting uncertainty (standard errors)

# Arellano (1987), these are also commonly known as "cluster standard errors"
coeftest(plm_within, vcov. = vcovHC(plm_within, method = "arellano"))
## 
## t test of coefficients:
## 
##      Estimate Std. Error t value Pr(>|t|)    
## reg -0.014014   0.033891 -0.4135   0.6793    
## edt  0.167823   0.016088 10.4313   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Beck and Katz (1995) panel corrected standard errors
coeftest(plm_within, vcov. = vcovBK(plm_within))
## 
## t test of coefficients:
## 
##      Estimate Std. Error t value Pr(>|t|)    
## reg -0.014014   0.038906 -0.3602   0.7187    
## edt  0.167823   0.014056 11.9398   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Driscoll and Kraay panel corrected standard errors
coeftest(plm_within, vcov. = vcovSCC(plm_within))
## 
## t test of coefficients:
## 
##      Estimate Std. Error t value  Pr(>|t|)    
## reg -0.014014   0.026186 -0.5352    0.5926    
## edt  0.167823   0.024322  6.9001 6.408e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# an alternative way is to use `fixest`; it is faster when we have a large number of FEs
fixest_model <- feols(
  ln_gdpw ~ reg + edt | country + year,
  data = data,
  cluster = ~ country + year
)

summary(fixest_model)
## OLS estimation, Dep. Var.: ln_gdpw
## Observations: 2,900
## Fixed-effects: country: 113,  year: 28
## Standard-errors: Clustered (country & year) 
##     Estimate Std. Error  t value Pr(>|t|) 
## reg 0.027458   0.028661 0.958037  0.34654 
## edt 0.021767   0.020583 1.057552  0.29963 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.146007     Adj. R2: 0.979069
##                  Within R2: 0.006949

Although within estimator ignores varying intercepts, we can actually recover them by using:

\[ \hat{\alpha}_i = \hat{y}_{it} - \hat{\beta}^\prime x_{it} \]

Sometimes it is useful for us to look at the variation between groups.

plm::fixef(plm_within)
##      1      2      3      4      5      6      7      9     12     14     15 
## 8.6922 7.1792 7.4219 7.6883 6.5901 6.5233 7.2324 6.9027 7.6231 7.8300 6.4095 
##     16     17     18     19     21     22     23     24     25     26     27 
## 8.4965 7.3238 7.2507 6.9872 7.9429 7.0042 6.4524 7.4450 7.1212 6.4010 7.1359 
##     28     29     30     31     33     34     35     37     39     40     41 
## 7.7960 7.7022 8.2290 7.0245 7.4692 6.6891 7.6043 7.6612 8.1048 7.6142 7.9845 
##     42     43     44     45     46     47     48     50     52     53     54 
## 6.5103 6.7621 8.4530 6.8853 6.5790 7.4898 7.3069 7.7311 8.3731 8.2562 7.9466 
##     55     57     58     59     60     61     62     63     64     65     66 
## 8.0132 8.4557 7.1631 7.8058 7.4805 8.8477 8.2020 7.9713 8.8951 8.4059 8.6039 
##     67     68     69     70     71     72     73     74     76     77     78 
## 7.9171 8.5114 8.2335 8.3645 8.0146 7.5932 7.5457 8.1486 8.1386 9.3620 7.6592 
##     79     80     81     82     83     84     85     86     87     89     91 
## 6.6462 7.1602 7.2470 9.2366 9.4700 8.2150 7.8672 8.5831 7.5581 7.6918 6.6110 
##     92     93     94     95     96     97     98     99    101    102    103 
## 7.1852 7.7833 7.1381 8.3698 6.9976 8.6090 7.8404 7.1839 8.7655 8.4318 7.6233 
##    104    105    106    107    108    110    111    112    113    114    116 
## 6.8035 8.5153 8.0884 8.8824 8.3497 8.3088 7.9080 8.4210 8.1295 8.8111 7.9799 
##    117    118    119    120    122    123    124    125    126    128    129 
## 8.6501 8.3967 7.4098 8.2413 8.7539 8.4339 8.8509 8.0024 8.2790 8.0166 8.3293 
##    130    131    132 
## 7.9694 8.1688 7.9530

LSDV Estimator

Another option to directly estimate group-specific intercepts is to use the least squares dummy variables (LSDV) estimator, i.e., OLS with \(N\) group dummies excluding the global intercept.

The group dummies are often colloquially called “fixed effects.” For large \(T\), it should be very similar to FE estimator. But it is not a good idea for very small \(T\): estimates of \(\alpha_i\) will be poor due to small \(T\) within each group.

# LSDV estimator using `lm`
plm_lsdv <- plm(
  ln_gdpw ~ 0 + reg + edt + country_f, # optional: subtract 1 or set 0 to drop global intercept
  index = c("country_f", "year_f"),
  effect = "individual",
  model = "pooling",
  data = data
)

summary(plm_lsdv)
## Pooling Model
## 
## Call:
## plm(formula = ln_gdpw ~ 0 + reg + edt + country_f, data = data, 
##     effect = "individual", model = "pooling", index = c("country_f", 
##         "year_f"))
## 
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -0.832562 -0.095557  0.015056  0.111956  0.714304 
## 
## Coefficients:
##               Estimate Std. Error  t-value Pr(>|t|)    
## reg          -0.014014   0.015048  -0.9313   0.3518    
## edt           0.167823   0.003712  45.2107   <2e-16 ***
## country_f1    8.692165   0.037078 234.4285   <2e-16 ***
## country_f2    7.179217   0.050311 142.6968   <2e-16 ***
## country_f3    7.421886   0.034538 214.8875   <2e-16 ***
## country_f4    7.688276   0.041769 184.0689   <2e-16 ***
## country_f5    6.590053   0.034200 192.6892   <2e-16 ***
## country_f6    6.523307   0.035696 182.7472   <2e-16 ***
## country_f7    7.232441   0.036072 200.5009   <2e-16 ***
## country_f9    6.902693   0.035198 196.1126   <2e-16 ***
## country_f12   7.623119   0.037553 202.9983   <2e-16 ***
## country_f14   7.830041   0.036720 213.2343   <2e-16 ***
## country_f15   6.409459   0.034835 183.9944   <2e-16 ***
## country_f16   8.496536   0.036453 233.0786   <2e-16 ***
## country_f17   7.323790   0.039538 185.2365   <2e-16 ***
## country_f18   7.250721   0.035904 201.9445   <2e-16 ***
## country_f19   6.987238   0.040469 172.6561   <2e-16 ***
## country_f21   7.942900   0.035070 226.4880   <2e-16 ***
## country_f22   7.004193   0.037813 185.2344   <2e-16 ***
## country_f23   6.452393   0.044311 145.6147   <2e-16 ***
## country_f24   7.445021   0.036058 206.4743   <2e-16 ***
## country_f25   7.121235   0.036557 194.7974   <2e-16 ***
## country_f26   6.401030   0.038870 164.6782   <2e-16 ***
## country_f27   7.135904   0.034896 204.4897   <2e-16 ***
## country_f28   7.795955   0.034840 223.7661   <2e-16 ***
## country_f29   7.702185   0.049563 155.4013   <2e-16 ***
## country_f30   8.229040   0.034960 235.3836   <2e-16 ***
## country_f31   7.024504   0.050963 137.8366   <2e-16 ***
## country_f33   7.469226   0.034981 213.5221   <2e-16 ***
## country_f34   6.689090   0.036526 183.1319   <2e-16 ***
## country_f35   7.604345   0.035057 216.9155   <2e-16 ***
## country_f37   7.661224   0.036000 212.8123   <2e-16 ***
## country_f39   8.104824   0.041534 195.1383   <2e-16 ***
## country_f40   7.614208   0.044193 172.2949   <2e-16 ***
## country_f41   7.984547   0.045110 177.0034   <2e-16 ***
## country_f42   6.510265   0.035187 185.0183   <2e-16 ***
## country_f43   6.762131   0.035903 188.3450   <2e-16 ***
## country_f44   8.453048   0.035486 238.2109   <2e-16 ***
## country_f45   6.885345   0.036245 189.9670   <2e-16 ***
## country_f46   6.579019   0.036460 180.4430   <2e-16 ***
## country_f47   7.489796   0.038509 194.4947   <2e-16 ***
## country_f48   7.306941   0.040315 181.2472   <2e-16 ***
## country_f50   7.731116   0.056431 137.0017   <2e-16 ***
## country_f52   8.373147   0.054087 154.8088   <2e-16 ***
## country_f53   8.256219   0.041741 197.7957   <2e-16 ***
## country_f54   7.946584   0.041003 193.8045   <2e-16 ***
## country_f55   8.013156   0.036821 217.6245   <2e-16 ***
## country_f57   8.455746   0.036999 228.5413   <2e-16 ***
## country_f58   7.163093   0.035763 200.2909   <2e-16 ***
## country_f59   7.805819   0.036596 213.2976   <2e-16 ***
## country_f60   7.480513   0.046400 161.2162   <2e-16 ***
## country_f61   8.847664   0.037756 234.3349   <2e-16 ***
## country_f62   8.202005   0.037018 221.5674   <2e-16 ***
## country_f63   7.971304   0.040845 195.1576   <2e-16 ***
## country_f64   8.895075   0.047209 188.4191   <2e-16 ***
## country_f65   8.405851   0.056110 149.8097   <2e-16 ***
## country_f66   8.603877   0.040941 210.1516   <2e-16 ***
## country_f67   7.917066   0.036889 214.6206   <2e-16 ***
## country_f68   8.511364   0.036764 231.5150   <2e-16 ***
## country_f69   8.233487   0.041333 199.2007   <2e-16 ***
## country_f70   8.364510   0.039341 212.6174   <2e-16 ***
## country_f71   8.014597   0.039677 201.9955   <2e-16 ***
## country_f72   7.593192   0.046809 162.2158   <2e-16 ***
## country_f73   7.545740   0.040401 186.7694   <2e-16 ***
## country_f74   8.148604   0.040606 200.6740   <2e-16 ***
## country_f76   8.138626   0.043015 189.2046   <2e-16 ***
## country_f77   9.361968   0.040101 233.4590   <2e-16 ***
## country_f78   7.659247   0.045408 168.6744   <2e-16 ***
## country_f79   6.646220   0.037217 178.5787   <2e-16 ***
## country_f80   7.160237   0.038725 184.8993   <2e-16 ***
## country_f81   7.246958   0.037038 195.6620   <2e-16 ***
## country_f82   9.236566   0.034915 264.5418   <2e-16 ***
## country_f83   9.470011   0.034855 271.6991   <2e-16 ***
## country_f84   8.214967   0.049984 164.3510   <2e-16 ***
## country_f85   7.867225   0.049690 158.3271   <2e-16 ***
## country_f86   8.583122   0.036291 236.5051   <2e-16 ***
## country_f87   7.558124   0.041461 182.2941   <2e-16 ***
## country_f89   7.691750   0.042721 180.0478   <2e-16 ***
## country_f91   6.611022   0.034733 190.3388   <2e-16 ***
## country_f92   7.185249   0.036577 196.4415   <2e-16 ***
## country_f93   7.783283   0.034854 223.3100   <2e-16 ***
## country_f94   7.138141   0.043016 165.9400   <2e-16 ***
## country_f95   8.369775   0.044039 190.0538   <2e-16 ***
## country_f96   6.997565   0.045182 154.8742   <2e-16 ***
## country_f97   8.609031   0.038564 223.2399   <2e-16 ***
## country_f98   7.840400   0.041052 190.9855   <2e-16 ***
## country_f99   7.183878   0.039623 181.3068   <2e-16 ***
## country_f101  8.765478   0.043786 200.1867   <2e-16 ***
## country_f102  8.431750   0.050976 165.4069   <2e-16 ***
## country_f103  7.623339   0.087233  87.3910   <2e-16 ***
## country_f104  6.803535   0.053491 127.1901   <2e-16 ***
## country_f105  8.515306   0.048098 177.0409   <2e-16 ***
## country_f106  8.088411   0.052565 153.8752   <2e-16 ***
## country_f107  8.882387   0.044449 199.8338   <2e-16 ***
## country_f108  8.349650   0.051757 161.3238   <2e-16 ***
## country_f110  8.308826   0.042299 196.4332   <2e-16 ***
## country_f111  7.908045   0.051903 152.3607   <2e-16 ***
## country_f112  8.420992   0.047897 175.8148   <2e-16 ***
## country_f113  8.129540   0.048559 167.4171   <2e-16 ***
## country_f114  8.811148   0.044203 199.3319   <2e-16 ***
## country_f116  7.979905   0.047038 169.6487   <2e-16 ***
## country_f117  8.650079   0.049233 175.6950   <2e-16 ***
## country_f118  8.396685   0.050589 165.9773   <2e-16 ***
## country_f119  7.409756   0.058229 127.2524   <2e-16 ***
## country_f120  8.241283   0.039397 209.1863   <2e-16 ***
## country_f122  8.753945   0.040785 214.6389   <2e-16 ***
## country_f123  8.433890   0.051287 164.4455   <2e-16 ***
## country_f124  8.850903   0.047364 186.8717   <2e-16 ***
## country_f125  8.002405   0.038647 207.0622   <2e-16 ***
## country_f126  8.279025   0.050960 162.4606   <2e-16 ***
## country_f128  8.016642   0.041468 193.3208   <2e-16 ***
## country_f129  8.329265   0.054070 154.0458   <2e-16 ***
## country_f130  7.969381   0.053386 149.2776   <2e-16 ***
## country_f131  8.168812   0.056465 144.6705   <2e-16 ***
## country_f132  7.952992   0.056730 140.1899   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    3104.6
## Residual Sum of Squares: 91.097
## R-Squared:      0.97066
## Adj. R-Squared: 0.96946
## F-statistic: 58478.9 on 115 and 2785 DF, p-value: < 2.22e-16
coeftest(plm_lsdv, vcov. = vcovHC(plm_lsdv, method = "arellano"))[1:2,] 
##        Estimate Std. Error    t value     Pr(>|t|)
## reg -0.01401365 0.03389072 -0.4134951 6.792757e-01
## edt  0.16782325 0.01608840 10.4313214 5.123485e-25
# you can also use lm() to implement LSDV estimator, but it is difficult to adjust for standard errors

# You can estimate the same model using lm, as follows:
summary(lm(ln_gdpw ~ country_f + year_f + reg + edt, data=data))
## 
## Call:
## lm(formula = ln_gdpw ~ country_f + year_f + reg + edt, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.64057 -0.07997  0.00780  0.07941  0.59769 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.730613   0.036399 239.857  < 2e-16 ***
## country_f2   -1.917900   0.053018 -36.175  < 2e-16 ***
## country_f3   -1.473490   0.041683 -35.350  < 2e-16 ***
## country_f4   -1.056680   0.044576 -23.705  < 2e-16 ***
## country_f5   -2.453336   0.043279 -56.687  < 2e-16 ***
## country_f6   -2.435972   0.042857 -56.839  < 2e-16 ***
## country_f7   -1.499153   0.041183 -36.402  < 2e-16 ***
## country_f9   -1.995622   0.041999 -47.516  < 2e-16 ***
## country_f12  -0.929167   0.041505 -22.387  < 2e-16 ***
## country_f14  -0.732719   0.041046 -17.851  < 2e-16 ***
## country_f15  -2.619545   0.043429 -60.318  < 2e-16 ***
## country_f16  -0.181423   0.041144  -4.409 1.08e-05 ***
## country_f17  -1.726617   0.045976 -37.555  < 2e-16 ***
## country_f18  -1.417699   0.040828 -34.724  < 2e-16 ***
## country_f19  -2.109640   0.047064 -44.825  < 2e-16 ***
## country_f21  -0.992496   0.042319 -23.453  < 2e-16 ***
## country_f22  -1.691828   0.041942 -40.337  < 2e-16 ***
## country_f23  -1.990957   0.045851 -43.422  < 2e-16 ***
## country_f24  -1.398031   0.041994 -33.291  < 2e-16 ***
## country_f25  -1.543200   0.041152 -37.500  < 2e-16 ***
## country_f26  -2.262115   0.042419 -53.328  < 2e-16 ***
## country_f27  -1.869869   0.043059 -43.426  < 2e-16 ***
## country_f28  -1.248226   0.043533 -28.673  < 2e-16 ***
## country_f29  -0.539974   0.050557 -10.681  < 2e-16 ***
## country_f30  -0.572730   0.041078 -13.943  < 2e-16 ***
## country_f31  -1.871067   0.051378 -36.418  < 2e-16 ***
## country_f33  -1.423227   0.041805 -34.044  < 2e-16 ***
## country_f34  -2.084951   0.041651 -50.057  < 2e-16 ***
## country_f35  -1.335325   0.042359 -31.524  < 2e-16 ***
## country_f37  -1.244580   0.042403 -29.351  < 2e-16 ***
## country_f39  -0.122955   0.045250  -2.717 0.006625 ** 
## country_f40  -1.407871   0.048167 -29.229  < 2e-16 ***
## country_f41  -0.616258   0.046216 -13.334  < 2e-16 ***
## country_f42  -2.390835   0.042021 -56.896  < 2e-16 ***
## country_f43  -1.995645   0.041242 -48.389  < 2e-16 ***
## country_f44  -0.379319   0.041552  -9.129  < 2e-16 ***
## country_f45  -1.963885   0.042025 -46.732  < 2e-16 ***
## country_f46  -2.012405   0.040934 -49.162  < 2e-16 ***
## country_f47  -1.220878   0.042391 -28.800  < 2e-16 ***
## country_f48  -1.285795   0.043111 -29.825  < 2e-16 ***
## country_f50  -0.006703   0.061292  -0.109 0.912929    
## country_f52   0.808902   0.062404  12.962  < 2e-16 ***
## country_f53  -0.113515   0.044777  -2.535 0.011296 *  
## country_f54  -0.472017   0.044150 -10.691  < 2e-16 ***
## country_f55  -0.547391   0.041114 -13.314  < 2e-16 ***
## country_f57  -0.307334   0.041925  -7.331 3.00e-13 ***
## country_f58  -1.617781   0.041315 -39.157  < 2e-16 ***
## country_f59  -0.820488   0.041106 -19.960  < 2e-16 ***
## country_f60  -0.632048   0.049739 -12.707  < 2e-16 ***
## country_f61   0.387952   0.041704   9.303  < 2e-16 ***
## country_f62  -0.337781   0.041220  -8.195 3.80e-16 ***
## country_f63  -0.259760   0.044779  -5.801 7.35e-09 ***
## country_f64   0.771826   0.050320  15.338  < 2e-16 ***
## country_f65   0.949886   0.065606  14.479  < 2e-16 ***
## country_f66   0.350150   0.044747   7.825 7.17e-15 ***
## country_f67  -0.651497   0.041172 -15.824  < 2e-16 ***
## country_f68  -0.128995   0.041251  -3.127 0.001784 ** 
## country_f69   0.008814   0.045210   0.195 0.845440    
## country_f70  -0.251850   0.042843  -5.878 4.64e-09 ***
## country_f71  -0.330624   0.043386  -7.621 3.45e-14 ***
## country_f72  -0.635309   0.048574 -13.079  < 2e-16 ***
## country_f73  -0.697559   0.044301 -15.746  < 2e-16 ***
## country_f74  -0.146948   0.044286  -3.318 0.000918 ***
## country_f76   0.010711   0.047175   0.227 0.820411    
## country_f77   0.833581   0.043290  19.256  < 2e-16 ***
## country_f78  -1.105945   0.046788 -23.637  < 2e-16 ***
## country_f79  -1.939535   0.041334 -46.923  < 2e-16 ***
## country_f80  -1.539557   0.042681 -36.072  < 2e-16 ***
## country_f81  -1.359055   0.041265 -32.935  < 2e-16 ***
## country_f82   0.426294   0.041119  10.367  < 2e-16 ***
## country_f83   0.647830   0.041182  15.731  < 2e-16 ***
## country_f84   0.418555   0.056035   7.470 1.07e-13 ***
## country_f85   0.053313   0.055587   0.959 0.337601    
## country_f86  -0.027722   0.040876  -0.678 0.497704    
## country_f87  -0.610045   0.045569 -13.387  < 2e-16 ***
## country_f89  -0.390746   0.047160  -8.286  < 2e-16 ***
## country_f91  -2.241439   0.041358 -54.195  < 2e-16 ***
## country_f92  -1.693048   0.042632 -39.713  < 2e-16 ***
## country_f93  -1.062980   0.041347 -25.709  < 2e-16 ***
## country_f94  -0.939023   0.047529 -19.757  < 2e-16 ***
## country_f95   0.111158   0.046421   2.395 0.016707 *  
## country_f96  -0.995379   0.050057 -19.885  < 2e-16 ***
## country_f97   0.158108   0.042187   3.748 0.000182 ***
## country_f98  -0.355784   0.045064  -7.895 4.15e-15 ***
## country_f99  -1.131414   0.043437 -26.047  < 2e-16 ***
## country_f101  0.561476   0.047187  11.899  < 2e-16 ***
## country_f102  0.693318   0.057553  12.047  < 2e-16 ***
## country_f103 -0.350231   0.078815  -4.444 9.19e-06 ***
## country_f104 -0.737881   0.061880 -11.924  < 2e-16 ***
## country_f105  0.604345   0.053201  11.360  < 2e-16 ***
## country_f106  0.440257   0.060017   7.336 2.89e-13 ***
## country_f107  0.727325   0.048048  15.138  < 2e-16 ***
## country_f108  0.655984   0.058760  11.164  < 2e-16 ***
## country_f110  0.082554   0.045937   1.797 0.072424 .  
## country_f111 -0.220533   0.052461  -4.204 2.71e-05 ***
## country_f112  0.497444   0.052904   9.403  < 2e-16 ***
## country_f113  0.247127   0.053885   4.586 4.72e-06 ***
## country_f114  0.638162   0.047725  13.372  < 2e-16 ***
## country_f116 -0.192216   0.049565  -3.878 0.000108 ***
## country_f117  0.808797   0.054897  14.733  < 2e-16 ***
## country_f118  0.635817   0.056960  11.163  < 2e-16 ***
## country_f119 -0.436869   0.060082  -7.271 4.62e-13 ***
## country_f120 -0.127130   0.043102  -2.950 0.003209 ** 
## country_f122  0.502084   0.044633  11.249  < 2e-16 ***
## country_f123  0.713371   0.058033  12.293  < 2e-16 ***
## country_f124  0.893549   0.052122  17.143  < 2e-16 ***
## country_f125 -0.602866   0.042373 -14.228  < 2e-16 ***
## country_f126  0.539690   0.057529   9.381  < 2e-16 ***
## country_f128 -0.200672   0.045153  -4.444 9.17e-06 ***
## country_f129  0.764097   0.062377  12.250  < 2e-16 ***
## country_f130 -0.118661   0.054084  -2.194 0.028318 *  
## country_f131  0.731500   0.066171  11.055  < 2e-16 ***
## country_f132 -1.137835   0.056866 -20.009  < 2e-16 ***
## year_f1961    0.035883   0.025082   1.431 0.152652    
## year_f1962    0.079099   0.024685   3.204 0.001369 ** 
## year_f1963    0.105967   0.024648   4.299 1.77e-05 ***
## year_f1964    0.141043   0.024512   5.754 9.67e-09 ***
## year_f1965    0.164482   0.024398   6.742 1.90e-11 ***
## year_f1966    0.186794   0.024221   7.712 1.72e-14 ***
## year_f1967    0.207793   0.024290   8.555  < 2e-16 ***
## year_f1968    0.241314   0.024288   9.936  < 2e-16 ***
## year_f1969    0.282157   0.024376  11.575  < 2e-16 ***
## year_f1970    0.319016   0.024447  13.049  < 2e-16 ***
## year_f1971    0.355129   0.024406  14.551  < 2e-16 ***
## year_f1972    0.376303   0.024535  15.337  < 2e-16 ***
## year_f1973    0.404055   0.024680  16.371  < 2e-16 ***
## year_f1974    0.440244   0.024825  17.734  < 2e-16 ***
## year_f1975    0.452843   0.024904  18.183  < 2e-16 ***
## year_f1976    0.482772   0.025110  19.226  < 2e-16 ***
## year_f1977    0.504968   0.025336  19.930  < 2e-16 ***
## year_f1978    0.526284   0.025574  20.579  < 2e-16 ***
## year_f1979    0.540096   0.025798  20.936  < 2e-16 ***
## year_f1980    0.544420   0.026076  20.878  < 2e-16 ***
## year_f1981    0.547954   0.026301  20.834  < 2e-16 ***
## year_f1982    0.533165   0.026557  20.077  < 2e-16 ***
## year_f1983    0.520768   0.026820  19.417  < 2e-16 ***
## year_f1984    0.517421   0.027102  19.092  < 2e-16 ***
## year_f1985    0.523403   0.027429  19.082  < 2e-16 ***
## year_f1986    0.527585   0.027774  18.996  < 2e-16 ***
## year_f1987    0.527344   0.028255  18.664  < 2e-16 ***
## reg           0.027458   0.012644   2.172 0.029974 *  
## edt           0.021767   0.005799   3.754 0.000178 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1497 on 2758 degrees of freedom
##   (1226 observations deleted due to missingness)
## Multiple R-squared:  0.9801, Adjusted R-squared:  0.9791 
## F-statistic: 962.7 on 141 and 2758 DF,  p-value: < 2.2e-16
# You will need other packages to adjust the standard errors in lm model objects. Check out in the internet but coeftest surely have options!

Within vs. LSDV

Within and LSDV estimators are equivalent to each other. They give exactly the same estimates.

But we favor one over another based on the following reasons:

  • Their standard errors will differ slightly. We don’t need to adjust degrees of freedom for \(\sigma^2\) manually in LSDV estimator because it directly estimate the fixed effects of N groups. By contrast, standard errors from vanilla OLS on the within estimator will be slightly off due to incorrect degrees of freedom (smart software such as plm() in R and areg in Stata will correct it).
  • Compared to within estimator, LSDV approach can be computationally demanding. The model matrix is a very large matrix to invert when \(N\) is large.
  • However, within and LSDV cannot estimate the intercept very well. The estimates of \(\alpha_i\) is inconsistent if \(N \rightarrow \infty\) and \(T\) fixed (i.e. incidental parameters problem). But there is no need to estimate \(\alpha_i\) unless the quantity of interest is a function of it, such as some non-linear models.

FD Estimator

Since the group-level time-invariant unobservables are constant across time, first-differences are an alter- native to purge out it. The first difference model is the following: \[ \begin{aligned} \Delta y_{it} =& y_{it} - y_{i,t-1} \\ =& \color{red}{(\alpha_i - \alpha_i)} + \beta^\prime \underbrace{(x_{it} - x_{i,t-1})}_{\Delta x_{it}} + \underbrace{(\varepsilon_{it} - \varepsilon_{i,t-1})}_{\Delta \varepsilon_{it}} \\ =& \beta^\prime \Delta x_{it} + \Delta \varepsilon_{it} \end{aligned} \]

If \(\Delta \varepsilon_{it}\) are homoskedastic and without serial correlation, then usual OLS standard errors work just fine. Otherwise we need to use GLS to estimate uncertainties.

FD estimator in plm() will automatically include an intercept. The intercept in FD model is actually an coefficient for linear time trend (in the second step of the above equation, it is \(\theta (t+1 - t)\)).

FE vs. FD Estimators

  • FD and FE will give the same estimates when \(T = 2\), but for \(T > 2\), the two methods are different.
  • Note that FD often cannot account for two-way fixed effects. It is not intuitive to have \(y_{it} - y_{i-1,t}\).
  • Both FE and FD estimators are consistent under the assumption of strict exogeneity.
  • However, the primary tradeoff between FE and FD estimators lies in the i.i.d. assumption: if original errors, \(\varepsilon_{it}\), are serially uncorrelated, then FE estimator will be more efficient than FD estimator.
  • But if there are substantial serial dependence in the original errors, we may consider FD estimator to make the first-differences of errors, \(\Delta \varepsilon_{it}\), serially uncorrelated. In the latter case, FD estimator is more efficient than FE estimator.
  • We can always try both: if results are not very sensitive, good!
# FD estimator using `plm`
plm_fd <- plm(
  ln_gdpw ~ reg + edt - 1,
  index = c("country", "year"),
  model = "fd", # FD estimator
  data = data
)
summary(plm_fd)
## Oneway (individual) effect First-Difference Model
## 
## Call:
## plm(formula = ln_gdpw ~ reg + edt - 1, data = data, model = "fd", 
##     index = c("country", "year"))
## 
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
## Observations used in estimation: 2787
## 
## Residuals:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.6344 -0.0102  0.0190  0.0168  0.0469  0.5230 
## 
## Coefficients:
##      Estimate Std. Error t-value  Pr(>|t|)    
## reg 0.0061632  0.0082067  0.7510    0.4527    
## edt 0.0395749  0.0055039  7.1904 8.264e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    10.679
## Residual Sum of Squares: 11.628
## R-Squared:      5.8796e-05
## Adj. R-Squared: -0.00030025
## F-statistic: 26.1857 on 2 and 2785 DF, p-value: 5.4114e-12

Two-way Fixed Effects

A very common model specificaiton is the two-way fixed effects (TWFE). Note that the TWFE does not provide any causal identification unless pre-treatment parallel trends hold and we assume covariate imbalance between control and treatment group. Even in this case, recent econometric research has found many shortcoming of this model specification for the causal estimation of treatment effects.

\[ y_{it} = \alpha_i + \tau_t + \beta^\prime x_{it} + \varepsilon_{it}\] Where we incorporate a dummy variable for each period \(\tau_t\). However, some researchers argue that this model may be inappropriate to answer some research questions.

# two-way fixed effects

plm_twfe <- plm(ln_gdpw ~ reg + edt, 
                data = data, 
                index = c("country_f", "year_f"), 
                model = "within", 
                effect = "twoways")

summary(plm_twfe)
## Twoways effects Within Model
## 
## Call:
## plm(formula = ln_gdpw ~ reg + edt, data = data, effect = "twoways", 
##     model = "within", index = c("country_f", "year_f"))
## 
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -0.6405721 -0.0799739  0.0078023  0.0794105  0.5976897 
## 
## Coefficients:
##      Estimate Std. Error t-value  Pr(>|t|)    
## reg 0.0274582  0.0126445  2.1716 0.0299740 *  
## edt 0.0217672  0.0057988  3.7537 0.0001778 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    62.255
## Residual Sum of Squares: 61.822
## R-Squared:      0.0069488
## Adj. R-Squared: -0.04382
## F-statistic: 9.64946 on 2 and 2758 DF, p-value: 6.6663e-05
# You can estimate the same model using lm, as follows:
summary(lm(ln_gdpw ~ country_f + year_f + reg + edt, data=data))
## 
## Call:
## lm(formula = ln_gdpw ~ country_f + year_f + reg + edt, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.64057 -0.07997  0.00780  0.07941  0.59769 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.730613   0.036399 239.857  < 2e-16 ***
## country_f2   -1.917900   0.053018 -36.175  < 2e-16 ***
## country_f3   -1.473490   0.041683 -35.350  < 2e-16 ***
## country_f4   -1.056680   0.044576 -23.705  < 2e-16 ***
## country_f5   -2.453336   0.043279 -56.687  < 2e-16 ***
## country_f6   -2.435972   0.042857 -56.839  < 2e-16 ***
## country_f7   -1.499153   0.041183 -36.402  < 2e-16 ***
## country_f9   -1.995622   0.041999 -47.516  < 2e-16 ***
## country_f12  -0.929167   0.041505 -22.387  < 2e-16 ***
## country_f14  -0.732719   0.041046 -17.851  < 2e-16 ***
## country_f15  -2.619545   0.043429 -60.318  < 2e-16 ***
## country_f16  -0.181423   0.041144  -4.409 1.08e-05 ***
## country_f17  -1.726617   0.045976 -37.555  < 2e-16 ***
## country_f18  -1.417699   0.040828 -34.724  < 2e-16 ***
## country_f19  -2.109640   0.047064 -44.825  < 2e-16 ***
## country_f21  -0.992496   0.042319 -23.453  < 2e-16 ***
## country_f22  -1.691828   0.041942 -40.337  < 2e-16 ***
## country_f23  -1.990957   0.045851 -43.422  < 2e-16 ***
## country_f24  -1.398031   0.041994 -33.291  < 2e-16 ***
## country_f25  -1.543200   0.041152 -37.500  < 2e-16 ***
## country_f26  -2.262115   0.042419 -53.328  < 2e-16 ***
## country_f27  -1.869869   0.043059 -43.426  < 2e-16 ***
## country_f28  -1.248226   0.043533 -28.673  < 2e-16 ***
## country_f29  -0.539974   0.050557 -10.681  < 2e-16 ***
## country_f30  -0.572730   0.041078 -13.943  < 2e-16 ***
## country_f31  -1.871067   0.051378 -36.418  < 2e-16 ***
## country_f33  -1.423227   0.041805 -34.044  < 2e-16 ***
## country_f34  -2.084951   0.041651 -50.057  < 2e-16 ***
## country_f35  -1.335325   0.042359 -31.524  < 2e-16 ***
## country_f37  -1.244580   0.042403 -29.351  < 2e-16 ***
## country_f39  -0.122955   0.045250  -2.717 0.006625 ** 
## country_f40  -1.407871   0.048167 -29.229  < 2e-16 ***
## country_f41  -0.616258   0.046216 -13.334  < 2e-16 ***
## country_f42  -2.390835   0.042021 -56.896  < 2e-16 ***
## country_f43  -1.995645   0.041242 -48.389  < 2e-16 ***
## country_f44  -0.379319   0.041552  -9.129  < 2e-16 ***
## country_f45  -1.963885   0.042025 -46.732  < 2e-16 ***
## country_f46  -2.012405   0.040934 -49.162  < 2e-16 ***
## country_f47  -1.220878   0.042391 -28.800  < 2e-16 ***
## country_f48  -1.285795   0.043111 -29.825  < 2e-16 ***
## country_f50  -0.006703   0.061292  -0.109 0.912929    
## country_f52   0.808902   0.062404  12.962  < 2e-16 ***
## country_f53  -0.113515   0.044777  -2.535 0.011296 *  
## country_f54  -0.472017   0.044150 -10.691  < 2e-16 ***
## country_f55  -0.547391   0.041114 -13.314  < 2e-16 ***
## country_f57  -0.307334   0.041925  -7.331 3.00e-13 ***
## country_f58  -1.617781   0.041315 -39.157  < 2e-16 ***
## country_f59  -0.820488   0.041106 -19.960  < 2e-16 ***
## country_f60  -0.632048   0.049739 -12.707  < 2e-16 ***
## country_f61   0.387952   0.041704   9.303  < 2e-16 ***
## country_f62  -0.337781   0.041220  -8.195 3.80e-16 ***
## country_f63  -0.259760   0.044779  -5.801 7.35e-09 ***
## country_f64   0.771826   0.050320  15.338  < 2e-16 ***
## country_f65   0.949886   0.065606  14.479  < 2e-16 ***
## country_f66   0.350150   0.044747   7.825 7.17e-15 ***
## country_f67  -0.651497   0.041172 -15.824  < 2e-16 ***
## country_f68  -0.128995   0.041251  -3.127 0.001784 ** 
## country_f69   0.008814   0.045210   0.195 0.845440    
## country_f70  -0.251850   0.042843  -5.878 4.64e-09 ***
## country_f71  -0.330624   0.043386  -7.621 3.45e-14 ***
## country_f72  -0.635309   0.048574 -13.079  < 2e-16 ***
## country_f73  -0.697559   0.044301 -15.746  < 2e-16 ***
## country_f74  -0.146948   0.044286  -3.318 0.000918 ***
## country_f76   0.010711   0.047175   0.227 0.820411    
## country_f77   0.833581   0.043290  19.256  < 2e-16 ***
## country_f78  -1.105945   0.046788 -23.637  < 2e-16 ***
## country_f79  -1.939535   0.041334 -46.923  < 2e-16 ***
## country_f80  -1.539557   0.042681 -36.072  < 2e-16 ***
## country_f81  -1.359055   0.041265 -32.935  < 2e-16 ***
## country_f82   0.426294   0.041119  10.367  < 2e-16 ***
## country_f83   0.647830   0.041182  15.731  < 2e-16 ***
## country_f84   0.418555   0.056035   7.470 1.07e-13 ***
## country_f85   0.053313   0.055587   0.959 0.337601    
## country_f86  -0.027722   0.040876  -0.678 0.497704    
## country_f87  -0.610045   0.045569 -13.387  < 2e-16 ***
## country_f89  -0.390746   0.047160  -8.286  < 2e-16 ***
## country_f91  -2.241439   0.041358 -54.195  < 2e-16 ***
## country_f92  -1.693048   0.042632 -39.713  < 2e-16 ***
## country_f93  -1.062980   0.041347 -25.709  < 2e-16 ***
## country_f94  -0.939023   0.047529 -19.757  < 2e-16 ***
## country_f95   0.111158   0.046421   2.395 0.016707 *  
## country_f96  -0.995379   0.050057 -19.885  < 2e-16 ***
## country_f97   0.158108   0.042187   3.748 0.000182 ***
## country_f98  -0.355784   0.045064  -7.895 4.15e-15 ***
## country_f99  -1.131414   0.043437 -26.047  < 2e-16 ***
## country_f101  0.561476   0.047187  11.899  < 2e-16 ***
## country_f102  0.693318   0.057553  12.047  < 2e-16 ***
## country_f103 -0.350231   0.078815  -4.444 9.19e-06 ***
## country_f104 -0.737881   0.061880 -11.924  < 2e-16 ***
## country_f105  0.604345   0.053201  11.360  < 2e-16 ***
## country_f106  0.440257   0.060017   7.336 2.89e-13 ***
## country_f107  0.727325   0.048048  15.138  < 2e-16 ***
## country_f108  0.655984   0.058760  11.164  < 2e-16 ***
## country_f110  0.082554   0.045937   1.797 0.072424 .  
## country_f111 -0.220533   0.052461  -4.204 2.71e-05 ***
## country_f112  0.497444   0.052904   9.403  < 2e-16 ***
## country_f113  0.247127   0.053885   4.586 4.72e-06 ***
## country_f114  0.638162   0.047725  13.372  < 2e-16 ***
## country_f116 -0.192216   0.049565  -3.878 0.000108 ***
## country_f117  0.808797   0.054897  14.733  < 2e-16 ***
## country_f118  0.635817   0.056960  11.163  < 2e-16 ***
## country_f119 -0.436869   0.060082  -7.271 4.62e-13 ***
## country_f120 -0.127130   0.043102  -2.950 0.003209 ** 
## country_f122  0.502084   0.044633  11.249  < 2e-16 ***
## country_f123  0.713371   0.058033  12.293  < 2e-16 ***
## country_f124  0.893549   0.052122  17.143  < 2e-16 ***
## country_f125 -0.602866   0.042373 -14.228  < 2e-16 ***
## country_f126  0.539690   0.057529   9.381  < 2e-16 ***
## country_f128 -0.200672   0.045153  -4.444 9.17e-06 ***
## country_f129  0.764097   0.062377  12.250  < 2e-16 ***
## country_f130 -0.118661   0.054084  -2.194 0.028318 *  
## country_f131  0.731500   0.066171  11.055  < 2e-16 ***
## country_f132 -1.137835   0.056866 -20.009  < 2e-16 ***
## year_f1961    0.035883   0.025082   1.431 0.152652    
## year_f1962    0.079099   0.024685   3.204 0.001369 ** 
## year_f1963    0.105967   0.024648   4.299 1.77e-05 ***
## year_f1964    0.141043   0.024512   5.754 9.67e-09 ***
## year_f1965    0.164482   0.024398   6.742 1.90e-11 ***
## year_f1966    0.186794   0.024221   7.712 1.72e-14 ***
## year_f1967    0.207793   0.024290   8.555  < 2e-16 ***
## year_f1968    0.241314   0.024288   9.936  < 2e-16 ***
## year_f1969    0.282157   0.024376  11.575  < 2e-16 ***
## year_f1970    0.319016   0.024447  13.049  < 2e-16 ***
## year_f1971    0.355129   0.024406  14.551  < 2e-16 ***
## year_f1972    0.376303   0.024535  15.337  < 2e-16 ***
## year_f1973    0.404055   0.024680  16.371  < 2e-16 ***
## year_f1974    0.440244   0.024825  17.734  < 2e-16 ***
## year_f1975    0.452843   0.024904  18.183  < 2e-16 ***
## year_f1976    0.482772   0.025110  19.226  < 2e-16 ***
## year_f1977    0.504968   0.025336  19.930  < 2e-16 ***
## year_f1978    0.526284   0.025574  20.579  < 2e-16 ***
## year_f1979    0.540096   0.025798  20.936  < 2e-16 ***
## year_f1980    0.544420   0.026076  20.878  < 2e-16 ***
## year_f1981    0.547954   0.026301  20.834  < 2e-16 ***
## year_f1982    0.533165   0.026557  20.077  < 2e-16 ***
## year_f1983    0.520768   0.026820  19.417  < 2e-16 ***
## year_f1984    0.517421   0.027102  19.092  < 2e-16 ***
## year_f1985    0.523403   0.027429  19.082  < 2e-16 ***
## year_f1986    0.527585   0.027774  18.996  < 2e-16 ***
## year_f1987    0.527344   0.028255  18.664  < 2e-16 ***
## reg           0.027458   0.012644   2.172 0.029974 *  
## edt           0.021767   0.005799   3.754 0.000178 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1497 on 2758 degrees of freedom
##   (1226 observations deleted due to missingness)
## Multiple R-squared:  0.9801, Adjusted R-squared:  0.9791 
## F-statistic: 962.7 on 141 and 2758 DF,  p-value: < 2.2e-16
#  Correcting for uncertainty with clustered standard errors:
coeftest(plm_twfe, vcov. = vcovHC(plm_twfe, method = "arellano"))
## 
## t test of coefficients:
## 
##     Estimate Std. Error t value Pr(>|t|)
## reg 0.027458   0.027678  0.9921   0.3213
## edt 0.021767   0.020064  1.0849   0.2781

Random Effects / Mixed Effects

What is Random Effects?

Assumes the individuals come from a common population. Each individual’s unobservable heterogeneous effect on the outcome follows a normal distribution with an unknown variance \(\sigma_\alpha^2\). Then we may have the following model setup:

\[ \begin{aligned} y_{i t} &=\beta_{0}+\beta_{1} x_{i t}+\alpha_{i}+\varepsilon_{i t} \\ \alpha_{i} & \sim \mathcal{N}\left(0, \sigma_{\alpha}^{2}\right) \\ \varepsilon_{i t} & \sim \mathcal{N}\left(0, \sigma_\varepsilon^{2}\right) \end{aligned} \]

Estimate an RE model in R

  • Three packages in R: plm (feasible generalized least squares by defaulty), nlme (maximum likelihood by default), and lme4 (restricted maximum likelihood by default)
  • Pros and Cons among different estimation strategies
    • FGLS2
      • preferred by some econometricians because it has fewer distributional assumptions.
      • sometimes FGLS is computationally complicated because it need to invert a large matrix.
    • ML: biased estimation of variance parameters; but such bias gets smaller for larger sample size
    • REML: unbiased estimation of variance parameters
    • If you prefer a Bayesian approach for more flexible modeling, you can use brms or even manually specify the model in JAGS and stan.
  • Pros and cons among different packages, besides estimation strategies
    • plm cannot deal with mixed effects, i.e., the mean of \(\alpha_i\) varies due to certain covariates, i.e. \(\alpha_i \sim \mathcal{N}(\gamma^\prime z_{it}, \sigma_\alpha^2)\), but nlme and lme4 can do so.
    • nlme can take ARMA structure into account, but plm and lme4 can only consider AR terms.
    • nlme is difficult to specify crossed (non-nested) random effect structure (i.e. multiple separated random effects), while lme4 is easy to do so.
  • In terms of the mixed effects model in nlme and lme4, you may be confused by their different use of “fixed effects” and “random effects.”3
    • fixed effect: coefficients that do not vary by group, such that they are fixed, not random.
    • random effect: group-specific effect; these coefficients vary across groups.

A typical mixed effect model is as follows:

\[ y_{it} = \underbrace{\beta^\prime x_{it}}_\text{"fixed effect"} + \underbrace{\gamma^\prime z_i}_\text{"random effects"} + \varepsilon_{it} \]

or can be specified as follows: \[ \begin{aligned} y_{i t} &=\beta^\prime x_{i t}+\alpha_{i}+\varepsilon_{i t} \\ \alpha_{i} & \sim \mathcal{N}\left(\gamma^\prime z_i, \sigma_{\alpha}^{2}\right) \\ \varepsilon_{i t} & \sim \mathcal{N}\left(0, \sigma_\varepsilon^{2}\right) \end{aligned} \]

# RE model using `plm`
plm_re <- plm(
  ln_gdpw ~ reg + edt + oil,
  data = data,
  index = c("country", "year"),
  model = "random",
  effect = "individual"
)

summary(plm_re)
## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = ln_gdpw ~ reg + edt + oil, data = data, effect = "individual", 
##     model = "random", index = c("country", "year"))
## 
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
## 
## Effects:
##                   var std.dev share
## idiosyncratic 0.03271 0.18086 0.091
## individual    0.32784 0.57258 0.909
## theta:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8601  0.9382  0.9404  0.9381  0.9404  0.9404 
## 
## Residuals:
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.82664 -0.10254  0.01505  0.00081  0.11712  0.75912 
## 
## Coefficients:
##               Estimate Std. Error  z-value  Pr(>|z|)    
## (Intercept)  7.7290183  0.0608325 127.0540 < 2.2e-16 ***
## reg         -0.0030963  0.0150650  -0.2055  0.837159    
## edt          0.1717169  0.0036682  46.8124 < 2.2e-16 ***
## oil          0.5458284  0.1705342   3.2007  0.001371 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    173.89
## Residual Sum of Squares: 96.136
## R-Squared:      0.44728
## Adj. R-Squared: 0.44671
## Chisq: 2200.57 on 3 DF, p-value: < 2.22e-16
# RE model using `nlme`
lme_re <- lme(
  fixed = ln_gdpw ~ reg + edt + oil,
  random = ~ 1 | country,
  data = data,
  na.action = na.omit # need to specify how we deal with NA values, otherwise throws an error
)

summary(lme_re)
## Linear mixed-effects model fit by REML
##   Data: data 
##         AIC       BIC   logLik
##   -996.8293 -961.0027 504.4146
## 
## Random effects:
##  Formula: ~1 | country
##         (Intercept)  Residual
## StdDev:   0.6733608 0.1808866
## 
## Fixed effects:  ln_gdpw ~ reg + edt + oil 
##                 Value  Std.Error   DF   t-value p-value
## (Intercept)  7.735312 0.07004634 2785 110.43135  0.0000
## reg         -0.006041 0.01498196 2785  -0.40321  0.6868
## edt          0.170680 0.00366084 2785  46.62312  0.0000
## oil          0.543484 0.19892569  111   2.73210  0.0073
##  Correlation: 
##     (Intr) reg    edt   
## reg -0.066              
## edt -0.258 -0.058       
## oil -0.335  0.010  0.032
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.59951054 -0.52967154  0.08334205  0.61991189  4.00624582 
## 
## Number of Observations: 2900
## Number of Groups: 113
# RE model using `lme4`
lmer_re <- lmer(
  ln_gdpw ~ reg + edt + oil + (1 | country),
  data = data
)

summary(lmer_re)
## Linear mixed model fit by REML ['lmerMod']
## Formula: ln_gdpw ~ reg + edt + oil + (1 | country)
##    Data: data
## 
## REML criterion at convergence: -1008.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.5995 -0.5297  0.0833  0.6199  4.0062 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 0.45341  0.6734  
##  Residual             0.03272  0.1809  
## Number of obs: 2900, groups:  country, 113
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)  7.735312   0.070046 110.431
## reg         -0.006041   0.014982  -0.403
## edt          0.170680   0.003661  46.623
## oil          0.543484   0.198926   2.732
## 
## Correlation of Fixed Effects:
##     (Intr) reg    edt   
## reg -0.066              
## edt -0.258 -0.058       
## oil -0.335  0.010  0.032

FE vs. RE

  • Different assumption about the relationship between group heterogeneity \(u_i\) and covariates \(x_{it}\).
    • In FE model, we assume conditional ignorability \(E[\varepsilon_{it}|x_{it}, u_i] = 0\). That is, we assume that conditional on unobservable confounder \(u_i\), \(\varepsilon_{it}\) and \(x_{it}\) is uncorrelated. \(x_{it}\) and \(u_i\) have some correlation.

    • In RE model, we assume there is no unmeasured confounder, so ignorability holds without conditioning on \(u_i\): \(E[\varepsilon_{it} | x_{it}] = E[u_i] = 0\). \(x_{it}\) and \(u_i\) have no correlation.

  • RE estimator is a matrix-weighted average of within estimator and between estimator.
    • But an intuitive way to think about it is that FE model rule out between-group variation; but in RE model, we preserve the information of between-group variation by modeling \(\sigma_\alpha^2\).

    • It is difficult to show it mathematically, because it requires to go through the variance-covariance matrix of the random effects and requires some understanding what Gaussian processes are. If you want to learn how to model random effects, you should consider taking CS&SS 560 and CS&SS 592.

We can think of RE model doing quasi-demeaning or partial-pooling:

\[ y_{it} - \theta\bar{y}_i = \beta^\prime (x_{it} - \theta \bar{x}_i) + (\varepsilon_{it} - \theta\bar{\varepsilon}_i) \] where \(\theta = 0\) implies pooled OLS and \(\theta = 1\) implies fixed effects. Doing some math shows that

\[ \theta = 1 - \sqrt{\frac{\sigma_\varepsilon^2}{\sigma_\varepsilon^2 + T \cdot \sigma_\alpha^2}} \]

Dynamic TS Structure in Panel Data Model

Detect Time Series Structure in Panel Data

We know that for some countries which are good at growth maintenance, gdpw may be a non-stationary time series; whereas for some countries experienced economic fluctuation, their gdpw may have mean-reverting or even worse behavior in the long run.

ggplot(pdata, aes(x = year, y = gdpw, group = country, color = region)) +
  geom_line(alpha = 0.5) +
  theme_bw()

ggplot(pdata, aes(x = year, y = ln_gdpw, group = country, color = region)) +
  geom_line(alpha = 0.5) +
  theme_bw()

Detect Time Series Structure in Panel Data

We can use statistical tests to formally detect stationarity of time series in panel data.

  • purtest() in plm package
  • punitroots package: the author no longer maintain this package
  • apply adf.test() and pp.test() to each time series, and plot the distribution of p values.
pseries <- split(pdata$gdpw, pdata$country_f)

# Illustration: pp test for a single panel/ts

pseries[1]
## $`1`
## 1-1962 1-1963 1-1964 1-1965 1-1966 1-1967 1-1968 1-1969 1-1970 1-1971 1-1972 
##   5012   6083   6502   6620   6612   6982   7848   8378   8536   7816   9372 
## 1-1973 1-1974 1-1975 1-1976 1-1977 1-1978 1-1979 1-1980 1-1981 1-1982 1-1983 
##   9361  10480  10870  11381  11653  11971  12956  12710  12660  12876  13177 
## 1-1984 1-1985 1-1986 1-1987 1-1988 1-1989 1-1990 
##  13381  13434  13278  12962  12266  12530  12176
pp.test(pseries$`1`)
## 
##  Phillips-Perron Unit Root Test
## 
## data:  pseries$`1`
## Dickey-Fuller Z(alpha) = -0.14626, Truncation lag parameter = 2,
## p-value = 0.99
## alternative hypothesis: stationary
pp.test(pseries$`1`)[["p.value"]]
## [1] 0.99
# Now we use sapply() function to loop this analysis in all the panels

pp_pval <- sapply(pseries, function(x){pp.test(x)[["p.value"]]})

adf_pval <- sapply(pseries, function(x){adf.test(x)[["p.value"]]})

# Look at the distribution of the statistical tests to assess stationary

hist(pp_pval, breaks = 100)

hist(adf_pval, breaks = 100)

We detect many countries have non-stationary time series for the level dependent variable, but what about a differences dependent variable?

pdata$gdpw_diff <- diff(pdata$gdpw) # because we already have a pdata.frame, we can safely use plm::diff() to transform it
ggplot(pdata, aes(x = year, y = gdpw_diff, group = country, color = region)) +
  geom_line(alpha = 0.5) +
  theme_bw() # much more stationary!

pseries <- split(pdata$gdpw_diff, pdata$country) 

# have NA values in the first observation, need to remove them in statistical tests
# sapply will not include NAs

pp_pval <- sapply(pseries, function(x){pp.test(na.omit(x))[["p.value"]]})
adf_pval <- sapply(pseries, function(x){adf.test(na.omit(x))[["p.value"]]})

hist(pp_pval, breaks = 100)

hist(adf_pval, breaks = 100)

Dynamic TS structure in FE model

We will create lags using the plm::lag function into pdata class object. However, notice that you can perform the same operation using dplyr lag in a tidy class object with group_by() specified.

# create lags using pdata class

pdata$gdpw_diff_lag1 <- lag(pdata$gdpw_diff)
pdata$gdpw_diff_lag2 <- lag(pdata$gdpw_diff, 2)

# fixed effect ARIMA(1, 1, 0); cannot incorporate MA structure

fe_ar1 <- plm( 
  gdpw_diff ~ gdpw_diff_lag1 + reg + edt,
  data = pdata, 
  model = "within", 
  effect = "individual"
)

# fixed effect ARIMA(2, 1, 0)

fe_ar2 <- plm( 
  gdpw_diff ~ gdpw_diff_lag1 + gdpw_diff_lag2 + reg + edt,
  data = pdata, 
  model = "within", 
  effect = "individual"
)

# Breusch–Godfrey Test for Panel Models

# null means no serial correlation

pbgtest(fe_ar1) # there is remaining serial correlation
## 
##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models
## 
## data:  gdpw_diff ~ gdpw_diff_lag1 + reg + edt
## chisq = 37.209, df = 3, p-value = 4.155e-08
## alternative hypothesis: serial correlation in idiosyncratic errors
pbgtest(fe_ar2) # there is no serial correlation for AR(2) model
## 
##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models
## 
## data:  gdpw_diff ~ gdpw_diff_lag1 + gdpw_diff_lag2 + reg + edt
## chisq = 4.0656, df = 2, p-value = 0.131
## alternative hypothesis: serial correlation in idiosyncratic errors

Dynamic TS structure in RE model

In contrast to plm, lme estimates the panel using ML or REML, and it allows the specificaiton of moving average processes.

# random effects ARIMA(1, 1, 0)

# note: the time varibale must be an integer, not a character or factor.

re_arima <- lme(
  fixed = gdpw_diff ~ reg + edt + oil,
  random = ~ 1 | country,
  correlation = corARMA(
    # year nested in country
    form = ~ year | country,
    p = 1, # AR(p)
    q = 0  # MA(q)
  ),
  data = pdata,
  na.action = na.omit
)

summary(re_arima)
## Linear mixed-effects model fit by REML
##   Data: pdata 
##        AIC     BIC    logLik
##   44165.63 44207.3 -22075.81
## 
## Random effects:
##  Formula: ~1 | country
##         (Intercept) Residual
## StdDev:   0.2261146  581.998
## 
## Correlation Structure: AR(1)
##  Formula: ~year | country 
##  Parameter estimate(s):
##       Phi 
## 0.2268233 
## Fixed effects:  gdpw_diff ~ reg + edt + oil 
##                 Value Std.Error   DF   t-value p-value
## (Intercept)  27.43342  26.89294 2732  1.020097  0.3078
## reg         107.53138  33.46420 2732  3.213326  0.0013
## edt          24.95241   5.35870 2732  4.656433  0.0000
## oil         -14.76787  43.07012  111 -0.342880  0.7323
##  Correlation: 
##     (Intr) reg    edt   
## reg  0.071              
## edt -0.735 -0.561       
## oil -0.328  0.018  0.137
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -16.05507865  -0.28772677  -0.03423844   0.33234239  13.38099374 
## 
## Number of Observations: 2847
## Number of Groups: 113

FE vs. lagged dependent variable (LDV)

There is a tradeoff when we both incorporate FE and lagged dependent variable in one model.

The estimate of \(\phi\) will always be underestimated in dynamic FE model, but the bias vanishes as \(T\) increases. This is called “Nickell bias.”

To obtain an unbiased estimate with small \(T\), we may choose dynamic panel data model, or prefer either FE or LDV model. If we care about causal inference and want to use a different-in-difference design (suppose it is a simple \(2\times 2\) DID) to estimate treatment effect, then we may need to think about the underlying causal identification assumption:

  • FE: conditioning on time-invariant unobservable individual-level confounder \(\alpha_i\), \(x_{it}\) and \(y_{it}\) are independent.
  • LDV: conditioning on time-varying unobservable individual-level confounder \(y_{i,t-1}\), \(x_{it}\) and \(y_{it}\) are independent.

FE and LDV have a very useful bracketing property (Angrist and Pischke 2009; Ding and Li 2019):

  • If LDV is correct, but we mistakenly use FE, then the estimate of treatment effect will be biased upward.
  • If FE is correct, but we mistakenly use LDV, then the estimate of treatment effect will be biased downward.
  • We can think of FE and LDV as bounding the causal effect of interest.
  • A better way is to always check the robustness of our findings using alternative identifying assumptions.

Recent literature attempts to go beyond the additive assumption of unobservable confounders. We can allow unit-level effect and time effect interact with each other so that we can account for time-invariant and time-varying unit-level confounders at the same time. Recent literature consider decomposing the matrix of error terms to several latent factors (e.g. use PCA) to account for this interactive structure (Bai 2009; Xu 2017).

Counterfactual Forecasting

Counterfactual Forecasting with RE Models

  • Extract model information for simulation, such as coefficients and variance-covariance matrix.
    • In nlme package, we use fixed.effects() or fixef() to extract coefficients that are fixed across all groups. (note that here we have different meaning of “fixed effect” and “random effect.”)
    • You can extract group-varying coefficients or intercepts using random.effects() or ranef().
    • Simply using coefficients() or coef() will aggregate “fixed effect” and “random effect” for each group.
  • Simulate parameters from multivariate Normal distribution
  • Create hypothetical scenarios
  • Use simulated parameters and hypothetical scenarios to calculate quantity of interest, such as expected values, first difference, relative risk, etc.
sims <- 1000
simbetas <- mvrnorm(sims, 
                    fixef(re_arima), 
                    vcov(re_arima))

# Make matrix of hypothetical x's
formula <- gdpw_diff ~ reg + edt + oil
periods.out <- 50
xhyp <- cfMake(formula, pdata, periods.out)
for (i in 1:periods.out) 
  xhyp <- cfChange(xhyp, "edt", x = mean(pdata$edt, na.rm=TRUE) + sd(pdata$edt, na.rm=TRUE), scen=i) # we assume a permanent increase in education (+ 1 SD) over the next 50 years

phi <- 0.25 # from model summary()
lagY <- mean(pdata$gdpw_diff, na.rm = TRUE) # Hypothetical previous change in Y for simulation
initialY <- mean(pdata$gdpw, na.rm = TRUE) # Hypothetical initial level of Y for simulation
# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_evre <- ldvsimev(xhyp,               # The matrix of hypothetical x's
                    simbetas,           # The matrix of simulated betas
                    ci=0.95,            # Desired confidence interval
                    constant=1,         # Column containing the constant
                                        # set to NA for no constant
                    phi=phi,            # estimated AR parameters; length must match lagY 
                    lagY=lagY,          # lags of y, most recent last
                    transform="diff",   # "log" to undo log transformation,
                                        # "diff" to under first differencing
                                        # "difflog" to do both
                    initialY=initialY   # for differenced models, the lag of the level of y
                    )

# Simulate first differences in Y (on original level scale)
# out to periods.out given hypothetical future values of X, X0,
# and initial lags of the change in Y
sim_fdre <- ldvsimfd(xhyp,               # The matrix of hypothetical x's
                    simbetas,           # The matrix of simulated betas
                    ci=0.95,            # Desired confidence interval
                    constant=1,         # Column containing the constant
                                        # set to NA for no constant
                    phi=phi,            # estimated AR parameters; length must match lagY 
                    lagY=lagY,          # lags of y, most recent last
                    transform="diff"    # "log" to undo log transformation,
                                        # "diff" to under first differencing
                                        # "difflog" to do both
                    )

Now plot with base R.

plot.new()
par(usr=c(1,50,-15000,15000))
axis(1,at=seq(1,50,10))
axis(2,at=seq(-15000,15000,5000))
title(xlab="Time",ylab="Expected change in constant GDP ($ pc)",main="Random effects ARIMA(1,1,0)") 

# Make the x-coord of a confidence envelope polygon
xpoly <- c(1:periods.out,
           rev(1:periods.out),
           1)

# Make the y-coord of a confidence envelope polygon
ypoly <- c(sim_fdre$lower,
           rev(sim_fdre$upper),
           sim_fdre$lower[1])

# Choose the color of the polygon 
col <- "lightblue"

# Plot the polygon first, before the points & lines
polygon(x=xpoly,
        y=ypoly,
        col=col,
        border=FALSE
        )

# Plot the fitted line
lines(x=1:periods.out,y=sim_fdre$pe,col="darkblue")
lines(x=c(0,50),y=c(0,0),lty="solid")

Counterfactual Forecasting with FE Model

  • The general steps are similar to counterfactual forecasting with RE model.
  • We can use fixef() to extract group-specific intercepts.
  • Sometimes we may prefer using panel-corrected standard error as variance-covariance matrix for parameter simulation.
sims <- 1000
simparam <- mvrnorm(sims, coef(fe_ar1), vcovBK(fe_ar1))
# Pull off the simulated lag coefficient
simphi <- simparam[,1]
# Put together the "constant" term (avg of the FEs, or a specific FE if you like)
# with the rest of the regressors
simbetas <- cbind(rep(mean(fixef(fe_ar1)), sims), simparam[,2:ncol(simparam)])

# Make matrix of hypothetical x's: drop OIL!
#                                  and now we need hypothetical countries!
#                                  and no intercept!

# Make matrix of hypothetical x's
# Note the formula below is a little different; we've removed the lag
# and will let the simulator handle lag structure
formula <- gdpw_diff ~ reg + edt

periods.out <- 50
xhyp <- cfMake(formula, pdata, periods.out)
for (i in 1:periods.out) 
  xhyp <- cfChange(xhyp, "edt", x=mean(pdata$edt,na.rm=TRUE)+sd(pdata$edt,na.rm=TRUE), scen=i)

phi <- mean(simphi) 
lagY <- mean(pdata$gdpw_diff, na.rm = TRUE) # Hypothetical previous change in Y for simulation
initialY <- mean(pdata$gdpw, na.rm = TRUE) # Hypothetical initial level of Y for simulation
# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_evfe <- ldvsimev(xhyp,             # The matrix of hypothetical x's
                    simbetas,           # The matrix of simulated betas
                    ci=0.95,            # Desired confidence interval
                    constant=NA,        # NA indicates no constant!
                    phi=phi,            # estimated AR parameters; length must match lagY 
                    lagY=lagY,          # lags of y, most recent last
                    transform="diff",   # "log" to undo log transformation,
                                        # "diff" to under first differencing
                                        # "difflog" to do both
                    initialY=initialY   # for differenced models, the lag of the level of y
                    )


# Simulate first differences in Y (on original level scale)
# out to periods.out given hypothetical future values of X, X0,
# and initial lags of the change in Y
sim_fdfe <- ldvsimfd(xhyp,               # The matrix of hypothetical x's
                      simbetas,           # The matrix of simulated betas
                      ci=0.95,            # Desired confidence interval
                      constant=NA,         # Column containing the constant
                                          # set to NA for no constant
                      phi=phi,            # estimated AR parameters; length must match lagY 
                      lagY=lagY,          # lags of y, most recent last
                      transform="diff"    # "log" to undo log transformation,
                                          # "diff" to under first differencing
                                          # "difflog" to do both
                      )

Now plot with base R.

plot.new()
par(usr=c(1,50,-15000,15000))
axis(1,at=seq(1,50,10))
axis(2,at=seq(-15000,15000,5000))
title(xlab="Time",ylab="Expected change in constant GDP ($ pc)",main="Fixed effects ARIMA(1,1,0), pcse") 

# Make the x-coord of a confidence envelope polygon
xpoly <- c(1:periods.out,
           rev(1:periods.out),
           1)
# Make the y-coord of a confidence envelope polygon
ypoly <- c(sim_fdfe$lower,
           rev(sim_fdfe$upper),
           sim_fdfe$lower[1])
# Choose the color of the polygon 
col <- "lightgreen"
# Plot the polygon first, before the points & lines
polygon(x=xpoly,
        y=ypoly,
        col=col,
        border=FALSE
        )
# Plot the fitted line
lines(x=1:periods.out,y=sim_fdfe$pe,col="darkgreen")
lines(x=c(0,50),y=c(0,0),lty="solid")


  1. See http://www.u.arizona.edu/~mishler/PrzeworskiCodebook.PDF.↩︎

  2. See https://cran.r-project.org/web/packages/plm/vignettes/A_plmPackage.html#nlme.↩︎

  3. See https://statmodeling.stat.columbia.edu/2005/01/25/why_i_dont_use/.↩︎