Democracy data

In today’s lab, we will be working with the following dataset on political regimes, resources endowments and economic growth around the world.

Variable Description1
britcol Dummy variable coded 1 for every year in countries that were British colonies any time after 1918, 0 otherwise.
cath Percentage of Catholics in the population.
civilib Civil liberty. This variable can take values from 1 (least free) to 7 (most free).
edt Cumulative years of education of the average member of the labor force.
elf60 Index of ethnolinguistic fractionalization, 1960.
gdpw Real GDP per worker, 1985 international prices.
moslem Percentage of Moslems in the population.
newc Dummy variable coded 1 for every year in countries that became independent after 1945, 0 otherwise.
oil Dummy variable coded 1 if the average ratio of fuel exports to total exports in 1984-86 exceeded 50%, 0 otherwise.
pollib Political liberty. This variable can take values from 1 (least free) to 7 (most free).
reg Dummy variable coded 1 for dictatorships and 0 for democracies. Transition years are coded as the regime that emerges in that year. For instance, there was a transition from democracy to dictatorship in Argentina in 1955. In that year, reg=1.
stra The sum of past transitions to authoritarianism (as defined by reg) in a country. If a country experienced one or more transitions to authoritarianism before 1950, stra was coded 1 in 1950.
# Load Democracy data
# Variable names:
# COUNTRY   YEAR    BRITCOL     CATH
# CIVLIB    EDT ELF60   GDPW    MOSLEM
# NEWC      OIL POLLIB  REG STRA
# 
# data <- read.csv("data/democracy.csv", 
#                  header = TRUE, 
#                  na.strings = ".")

data <- read.csv("data/democracy.csv", 
                 header = TRUE)

colnames(data)
##  [1] "COUNTRY" "CTYNAME" "REGION"  "YEAR"    "BRITCOL" "CATH"    "CIVLIB" 
##  [8] "EDT"     "ELF60"   "GDPW"    "MOSLEM"  "NEWC"    "OIL"     "POLLIB" 
## [15] "REG"     "STRA"
colnames(data) <- colnames(data)  %>%  tolower() # change variable names to lower case
colnames(data)
##  [1] "country" "ctyname" "region"  "year"    "britcol" "cath"    "civlib" 
##  [8] "edt"     "elf60"   "gdpw"    "moslem"  "newc"    "oil"     "pollib" 
## [15] "reg"     "stra"
# check out missing variables

library(questionr)

freq.na(data)
##         missing  %
## civlib     1727 42
## pollib     1727 42
## edt        1226 30
## elf60       542 13
## country       0  0
## ctyname       0  0
## region        0  0
## year          0  0
## britcol       0  0
## cath          0  0
## gdpw          0  0
## moslem        0  0
## newc          0  0
## oil           0  0
## reg           0  0
## stra          0  0
# Check sample's N and T
data$country %>% unique()  %>%  length()
## [1] 135
data$year  %>%  unique() %>% length()
## [1] 40
# so it is an N=135, T=40 panel data.


## some data wrangling

data <- data %>% 
  mutate(
    ln_gdpw = log(gdpw),
    country_f = as.factor(country),
    year_f = as.factor(year)
  )


# Remember to perform panel exploratory analysis before fitting models! Refer to lab 1.

An Overview of Panel Data Models

The full flexibility model:

\[\Delta^{di}y_{it} = \alpha_i + X_{it}\beta_i + \sum^{P_i}_{p=1}y_{i,t-p}\phi_i + \sum^{Q_i}_{q=1}y_{i,t-q}\rho_i + \epsilon_{it}\]

where intercept \(\alpha_i\), slope \(\beta_i\), ARIMA order \(P_i, D_i, Q_i\) and its coefficients \(\phi_i\), \(\rho_i\) vary and allows panel heteroskedasticity \(\sigma^2_i\). We can put unit-specific trend \(t\theta_i\) and seasonality \(\kappa_{k,i}\) inside this model too.

Fixed Effects

What is Fixed Effects?

In simple terms, “fixed effects” refer to unit-specific intercepts. In other words, each unit or case \(i\) has its own intercept \(\alpha_i\) instead of pooling all cases into the same intercept. In practice, these case-specific intercepts introduce adjustments or controls for the mean outcome at the group level. This is akin to removing the mean from your data at the group level and conducting a simple linear regression.

However, the most common rationale for employing fixed effects estimators is their ability to eliminate time-invariant group-level unobservables. This arises from controlling for each group’s mean level of the outcome variable. However, it is essential to recognize that an underlying assumption is that these time-invariant unobservables truly never change, not only in their substance but also in their functional interpretation.

For instance, is Trump’s MAGA conservatism perceived the same as George Bush’s conservatism? If the answer is strictly no, then conservatism in the last 15 years is actually time-variant, and no fixed effects will shield you from omitted variable bias or measurement error. The more aggregate is your unit of analysis (individuals < neighborhoods < cities < regions < countries) or the longer is your time window sample \(T\), the more implausible is the assumption of time-invariant unobservables.

Group-level heterogeneity

Suppose there is unobserved group-level or individual-level heterogeneity \(u_i\) in the true data generating process (DGP) (such group-level or unit-level heterogeneity that does not change over individual or time periods):

\[ y_{it} = \alpha + \beta^\prime x_{it} + \color{red}{\delta^\prime u_{i}} + \varepsilon_{it} \]

Because it is unobserved, we cannot measure it. Now set \(\alpha_i = \alpha_i^* = \alpha + \delta^\prime u_i\), we can remove this unobservable group-level heterogeneity and rewrite the above equation as follows:

\[ y_{it} = \alpha_i + \beta^\prime x_{it} + \varepsilon_{it} \]

The FE model can control for omitted unobservable variables if we assume that these unobservables are constant over time.

  • if we add \(\alpha_i\): control for time-invariant unit-level heterogeneity, such as some features of a country’s culture.
    • A model with only unit fixed effects, will be equivalent a within-estimator.
  • if we add \(\tau_t\): control for common shocks at each period, such as global economic conditions each year.
    • A model with only period fixed effects, will be equivalent a between-estimator.
  • Of course we can also control for omitted unobservables at other levels, such as group, cohort, etc.

However, FE models come at a large cost.

  • We cannot add time-invariant variables into FE models, otherwise it will cause perfect collinearity.
  • FE estimates can be inefficient and exacerbate small sample bias.
  • Incidental parameter problem: suppose we include individual-level FE \(\alpha_i\), the estimates is consistent as \(T \rightarrow \infty\), but not as \(N \rightarrow \infty\).

There are 3 ways to estimate this FE model: 1. Within Estimator 2. Least Square Dummy Variable (LSDV) Estimator 3. First Difference (FD) Estimator

How to Check Time- or Unit-invariant Variables?

pdata <- pdata.frame(data, index = c("country_f", "year_f"))
head(pdata) # it will create an id for each observation
##        country ctyname region year britcol cath civlib   edt elf60 gdpw moslem
## 1-1962       1 Algeria Africa 1962       0  0.5     NA 1.160  0.43 5012   99.1
## 1-1963       1 Algeria Africa 1963       0  0.5     NA 1.250  0.43 6083   99.1
## 1-1964       1 Algeria Africa 1964       0  0.5     NA 1.345  0.43 6502   99.1
## 1-1965       1 Algeria Africa 1965       0  0.5     NA 1.450  0.43 6620   99.1
## 1-1966       1 Algeria Africa 1966       0  0.5     NA 1.560  0.43 6612   99.1
## 1-1967       1 Algeria Africa 1967       0  0.5     NA 1.675  0.43 6982   99.1
##        newc oil pollib reg stra  ln_gdpw country_f year_f
## 1-1962    1   1     NA   0    0 8.519590         1   1962
## 1-1963    1   1     NA   0    0 8.713253         1   1963
## 1-1964    1   1     NA   0    0 8.779865         1   1964
## 1-1965    1   1     NA   0    0 8.797851         1   1965
## 1-1966    1   1     NA   0    0 8.796641         1   1966
## 1-1967    1   1     NA   0    0 8.851091         1   1967
pvar(pdata)
## no time variation:       country ctyname region britcol cath elf60 moslem newc oil country_f 
## no individual variation: year edt elf60 year_f 
## all NA in time dimension for at least one individual:  civlib edt elf60 pollib 
## all NA in ind. dimension for at least one time period: civlib edt elf60 pollib
# or
pvar(data, index = c("country", "year"))
## no time variation:       country ctyname region britcol cath elf60 moslem newc oil country_f 
## no individual variation: year edt elf60 year_f 
## all NA in time dimension for at least one individual:  civlib edt elf60 pollib 
## all NA in ind. dimension for at least one time period: civlib edt elf60 pollib

Within Estimator

We take each individuals’ deviation from group means (average over \(t\) within each group/unit \(i\)). Because the observable unit-level heterogeneity is time-invariant, its value should not changed after averaging over time within each unit. Thus, we can avoid estimating the varying intercepts.

\[ \begin{aligned} \tilde{y}_{it} =& y_{it} - \bar{y}_{i} \\ =& \color{red}{(\alpha_i - \bar{\alpha}_i)} + \beta^\prime \underbrace{(x_{it} - \bar{x}_{i})}_{\tilde{x}_{it}} + \underbrace{(\varepsilon_{it} - \bar{\varepsilon}_i)}_{\tilde{\varepsilon}_{it}} \\ =& \beta^\prime \tilde{x}_{it} + \tilde{\varepsilon}_{it} \end{aligned} \]

Like OLS, if not GM assumption is violated, the within estimator \(\hat{\beta}_W\) is asymptotically normal and efficient:

\[ \widehat{\beta}_{W} \stackrel{\text { approx. }}{\sim} \mathcal{N}\left(\beta, \sigma^{2}\left(\tilde{\mathbf{X}}^{\top} \tilde{\mathbf{X}}\right)^{-1}\right) \]

However, the estimation of standard error is a little tricky. When we demean all the variables, we already used \(N\) degrees of freedom for \(N\) units.

\[ \widehat{\sigma}^{2}=\frac{1}{N T-N-K} \sum_{i=1}^{N} \sum_{t=1}^{T}{\widehat{\tilde{\varepsilon}}}_{i}^{2} \]

Hence, if we use a canned OLS function for \(\hat{\beta}_W\), the standard error is not correct, and we need to adjust it manually.

Within-FE Model in R

# two-way within estimator using `plm`

plm_within <- plm(
  #1st, the formula. if we include `oil`, plm() will automatically drop it. Why?
  ln_gdpw ~ reg + edt,
  data = data,
  #2nd, you Ns and Ts indices
  index = c("country_f","year_f"),
  # what model specification? See help documentation: ?plm
  model = "within",
  # what estimator?
  effect = "individual"
)

summary(plm_within) # note that the standard error does not take serial correlation into account
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = ln_gdpw ~ reg + edt, data = data, effect = "individual", 
##     model = "within", index = c("country_f", "year_f"))
## 
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -0.832562 -0.095557  0.015056  0.111956  0.714304 
## 
## Coefficients:
##      Estimate Std. Error t-value Pr(>|t|)    
## reg -0.014014   0.015048 -0.9313   0.3518    
## edt  0.167823   0.003712 45.2107   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    158.01
## Residual Sum of Squares: 91.097
## R-Squared:      0.42346
## Adj. R-Squared: 0.39986
## F-statistic: 1022.77 on 2 and 2785 DF, p-value: < 2.22e-16
## Adjusting uncertainty (standard errors)

# Arellano (1987), these are also commonly known as "cluster standard errors"
coeftest(plm_within, vcov. = vcovHC(plm_within, method = "arellano"))
## 
## t test of coefficients:
## 
##      Estimate Std. Error t value Pr(>|t|)    
## reg -0.014014   0.033891 -0.4135   0.6793    
## edt  0.167823   0.016088 10.4313   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Beck and Katz (1995) panel corrected standard errors
coeftest(plm_within, vcov. = vcovBK(plm_within))
## 
## t test of coefficients:
## 
##      Estimate Std. Error t value Pr(>|t|)    
## reg -0.014014   0.038906 -0.3602   0.7187    
## edt  0.167823   0.014056 11.9398   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Driscoll and Kraay panel corrected standard errors
coeftest(plm_within, vcov. = vcovSCC(plm_within))
## 
## t test of coefficients:
## 
##      Estimate Std. Error t value  Pr(>|t|)    
## reg -0.014014   0.026186 -0.5352    0.5926    
## edt  0.167823   0.024322  6.9001 6.408e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# an alternative way is to use `fixest`; it is faster when we have a large number of FEs
fixest_model <- feols(
  ln_gdpw ~ reg + edt | country + year,
  data = data,
  cluster = ~ country + year
)

summary(fixest_model)
## OLS estimation, Dep. Var.: ln_gdpw
## Observations: 2,900
## Fixed-effects: country: 113,  year: 28
## Standard-errors: Clustered (country & year) 
##     Estimate Std. Error  t value Pr(>|t|) 
## reg 0.027458   0.028661 0.958037  0.34654 
## edt 0.021767   0.020583 1.057552  0.29963 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.146007     Adj. R2: 0.979069
##                  Within R2: 0.006949

Although within estimator ignores varying intercepts, we can actually recover them by using:

\[ \hat{\alpha}_i = \hat{y}_{it} - \hat{\beta}^\prime x_{it} \]

Sometimes it is useful for us to look at the variation between groups.

plm::fixef(plm_within)
##      1      2      3      4      5      6      7      9     12     14     15 
## 8.6922 7.1792 7.4219 7.6883 6.5901 6.5233 7.2324 6.9027 7.6231 7.8300 6.4095 
##     16     17     18     19     21     22     23     24     25     26     27 
## 8.4965 7.3238 7.2507 6.9872 7.9429 7.0042 6.4524 7.4450 7.1212 6.4010 7.1359 
##     28     29     30     31     33     34     35     37     39     40     41 
## 7.7960 7.7022 8.2290 7.0245 7.4692 6.6891 7.6043 7.6612 8.1048 7.6142 7.9845 
##     42     43     44     45     46     47     48     50     52     53     54 
## 6.5103 6.7621 8.4530 6.8853 6.5790 7.4898 7.3069 7.7311 8.3731 8.2562 7.9466 
##     55     57     58     59     60     61     62     63     64     65     66 
## 8.0132 8.4557 7.1631 7.8058 7.4805 8.8477 8.2020 7.9713 8.8951 8.4059 8.6039 
##     67     68     69     70     71     72     73     74     76     77     78 
## 7.9171 8.5114 8.2335 8.3645 8.0146 7.5932 7.5457 8.1486 8.1386 9.3620 7.6592 
##     79     80     81     82     83     84     85     86     87     89     91 
## 6.6462 7.1602 7.2470 9.2366 9.4700 8.2150 7.8672 8.5831 7.5581 7.6918 6.6110 
##     92     93     94     95     96     97     98     99    101    102    103 
## 7.1852 7.7833 7.1381 8.3698 6.9976 8.6090 7.8404 7.1839 8.7655 8.4318 7.6233 
##    104    105    106    107    108    110    111    112    113    114    116 
## 6.8035 8.5153 8.0884 8.8824 8.3497 8.3088 7.9080 8.4210 8.1295 8.8111 7.9799 
##    117    118    119    120    122    123    124    125    126    128    129 
## 8.6501 8.3967 7.4098 8.2413 8.7539 8.4339 8.8509 8.0024 8.2790 8.0166 8.3293 
##    130    131    132 
## 7.9694 8.1688 7.9530

LSDV Estimator

Another option to directly estimate group-specific intercepts is to use the least squares dummy variables (LSDV) estimator, i.e., OLS with \(N\) group dummies excluding the global intercept.

The group dummies are often colloquially called “fixed effects.” For large \(T\), it should be very similar to FE estimator. But it is not a good idea for very small \(T\): estimates of \(\alpha_i\) will be poor due to small \(T\) within each group.

# LSDV estimator using `lm`
plm_lsdv <- plm(
  ln_gdpw ~ 0 + reg + edt + country_f, # optional: subtract 1 or set 0 to drop global intercept
  index = c("country_f", "year_f"),
  effect = "individual",
  model = "pooling",
  data = data
)

summary(plm_lsdv)
## Pooling Model
## 
## Call:
## plm(formula = ln_gdpw ~ 0 + reg + edt + country_f, data = data, 
##     effect = "individual", model = "pooling", index = c("country_f", 
##         "year_f"))
## 
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -0.832562 -0.095557  0.015056  0.111956  0.714304 
## 
## Coefficients:
##               Estimate Std. Error  t-value Pr(>|t|)    
## reg          -0.014014   0.015048  -0.9313   0.3518    
## edt           0.167823   0.003712  45.2107   <2e-16 ***
## country_f1    8.692165   0.037078 234.4285   <2e-16 ***
## country_f2    7.179217   0.050311 142.6968   <2e-16 ***
## country_f3    7.421886   0.034538 214.8875   <2e-16 ***
## country_f4    7.688276   0.041769 184.0689   <2e-16 ***
## country_f5    6.590053   0.034200 192.6892   <2e-16 ***
## country_f6    6.523307   0.035696 182.7472   <2e-16 ***
## country_f7    7.232441   0.036072 200.5009   <2e-16 ***
## country_f9    6.902693   0.035198 196.1126   <2e-16 ***
## country_f12   7.623119   0.037553 202.9983   <2e-16 ***
## country_f14   7.830041   0.036720 213.2343   <2e-16 ***
## country_f15   6.409459   0.034835 183.9944   <2e-16 ***
## country_f16   8.496536   0.036453 233.0786   <2e-16 ***
## country_f17   7.323790   0.039538 185.2365   <2e-16 ***
## country_f18   7.250721   0.035904 201.9445   <2e-16 ***
## country_f19   6.987238   0.040469 172.6561   <2e-16 ***
## country_f21   7.942900   0.035070 226.4880   <2e-16 ***
## country_f22   7.004193   0.037813 185.2344   <2e-16 ***
## country_f23   6.452393   0.044311 145.6147   <2e-16 ***
## country_f24   7.445021   0.036058 206.4743   <2e-16 ***
## country_f25   7.121235   0.036557 194.7974   <2e-16 ***
## country_f26   6.401030   0.038870 164.6782   <2e-16 ***
## country_f27   7.135904   0.034896 204.4897   <2e-16 ***
## country_f28   7.795955   0.034840 223.7661   <2e-16 ***
## country_f29   7.702185   0.049563 155.4013   <2e-16 ***
## country_f30   8.229040   0.034960 235.3836   <2e-16 ***
## country_f31   7.024504   0.050963 137.8366   <2e-16 ***
## country_f33   7.469226   0.034981 213.5221   <2e-16 ***
## country_f34   6.689090   0.036526 183.1319   <2e-16 ***
## country_f35   7.604345   0.035057 216.9155   <2e-16 ***
## country_f37   7.661224   0.036000 212.8123   <2e-16 ***
## country_f39   8.104824   0.041534 195.1383   <2e-16 ***
## country_f40   7.614208   0.044193 172.2949   <2e-16 ***
## country_f41   7.984547   0.045110 177.0034   <2e-16 ***
## country_f42   6.510265   0.035187 185.0183   <2e-16 ***
## country_f43   6.762131   0.035903 188.3450   <2e-16 ***
## country_f44   8.453048   0.035486 238.2109   <2e-16 ***
## country_f45   6.885345   0.036245 189.9670   <2e-16 ***
## country_f46   6.579019   0.036460 180.4430   <2e-16 ***
## country_f47   7.489796   0.038509 194.4947   <2e-16 ***
## country_f48   7.306941   0.040315 181.2472   <2e-16 ***
## country_f50   7.731116   0.056431 137.0017   <2e-16 ***
## country_f52   8.373147   0.054087 154.8088   <2e-16 ***
## country_f53   8.256219   0.041741 197.7957   <2e-16 ***
## country_f54   7.946584   0.041003 193.8045   <2e-16 ***
## country_f55   8.013156   0.036821 217.6245   <2e-16 ***
## country_f57   8.455746   0.036999 228.5413   <2e-16 ***
## country_f58   7.163093   0.035763 200.2909   <2e-16 ***
## country_f59   7.805819   0.036596 213.2976   <2e-16 ***
## country_f60   7.480513   0.046400 161.2162   <2e-16 ***
## country_f61   8.847664   0.037756 234.3349   <2e-16 ***
## country_f62   8.202005   0.037018 221.5674   <2e-16 ***
## country_f63   7.971304   0.040845 195.1576   <2e-16 ***
## country_f64   8.895075   0.047209 188.4191   <2e-16 ***
## country_f65   8.405851   0.056110 149.8097   <2e-16 ***
## country_f66   8.603877   0.040941 210.1516   <2e-16 ***
## country_f67   7.917066   0.036889 214.6206   <2e-16 ***
## country_f68   8.511364   0.036764 231.5150   <2e-16 ***
## country_f69   8.233487   0.041333 199.2007   <2e-16 ***
## country_f70   8.364510   0.039341 212.6174   <2e-16 ***
## country_f71   8.014597   0.039677 201.9955   <2e-16 ***
## country_f72   7.593192   0.046809 162.2158   <2e-16 ***
## country_f73   7.545740   0.040401 186.7694   <2e-16 ***
## country_f74   8.148604   0.040606 200.6740   <2e-16 ***
## country_f76   8.138626   0.043015 189.2046   <2e-16 ***
## country_f77   9.361968   0.040101 233.4590   <2e-16 ***
## country_f78   7.659247   0.045408 168.6744   <2e-16 ***
## country_f79   6.646220   0.037217 178.5787   <2e-16 ***
## country_f80   7.160237   0.038725 184.8993   <2e-16 ***
## country_f81   7.246958   0.037038 195.6620   <2e-16 ***
## country_f82   9.236566   0.034915 264.5418   <2e-16 ***
## country_f83   9.470011   0.034855 271.6991   <2e-16 ***
## country_f84   8.214967   0.049984 164.3510   <2e-16 ***
## country_f85   7.867225   0.049690 158.3271   <2e-16 ***
## country_f86   8.583122   0.036291 236.5051   <2e-16 ***
## country_f87   7.558124   0.041461 182.2941   <2e-16 ***
## country_f89   7.691750   0.042721 180.0478   <2e-16 ***
## country_f91   6.611022   0.034733 190.3388   <2e-16 ***
## country_f92   7.185249   0.036577 196.4415   <2e-16 ***
## country_f93   7.783283   0.034854 223.3100   <2e-16 ***
## country_f94   7.138141   0.043016 165.9400   <2e-16 ***
## country_f95   8.369775   0.044039 190.0538   <2e-16 ***
## country_f96   6.997565   0.045182 154.8742   <2e-16 ***
## country_f97   8.609031   0.038564 223.2399   <2e-16 ***
## country_f98   7.840400   0.041052 190.9855   <2e-16 ***
## country_f99   7.183878   0.039623 181.3068   <2e-16 ***
## country_f101  8.765478   0.043786 200.1867   <2e-16 ***
## country_f102  8.431750   0.050976 165.4069   <2e-16 ***
## country_f103  7.623339   0.087233  87.3910   <2e-16 ***
## country_f104  6.803535   0.053491 127.1901   <2e-16 ***
## country_f105  8.515306   0.048098 177.0409   <2e-16 ***
## country_f106  8.088411   0.052565 153.8752   <2e-16 ***
## country_f107  8.882387   0.044449 199.8338   <2e-16 ***
## country_f108  8.349650   0.051757 161.3238   <2e-16 ***
## country_f110  8.308826   0.042299 196.4332   <2e-16 ***
## country_f111  7.908045   0.051903 152.3607   <2e-16 ***
## country_f112  8.420992   0.047897 175.8148   <2e-16 ***
## country_f113  8.129540   0.048559 167.4171   <2e-16 ***
## country_f114  8.811148   0.044203 199.3319   <2e-16 ***
## country_f116  7.979905   0.047038 169.6487   <2e-16 ***
## country_f117  8.650079   0.049233 175.6950   <2e-16 ***
## country_f118  8.396685   0.050589 165.9773   <2e-16 ***
## country_f119  7.409756   0.058229 127.2524   <2e-16 ***
## country_f120  8.241283   0.039397 209.1863   <2e-16 ***
## country_f122  8.753945   0.040785 214.6389   <2e-16 ***
## country_f123  8.433890   0.051287 164.4455   <2e-16 ***
## country_f124  8.850903   0.047364 186.8717   <2e-16 ***
## country_f125  8.002405   0.038647 207.0622   <2e-16 ***
## country_f126  8.279025   0.050960 162.4606   <2e-16 ***
## country_f128  8.016642   0.041468 193.3208   <2e-16 ***
## country_f129  8.329265   0.054070 154.0458   <2e-16 ***
## country_f130  7.969381   0.053386 149.2776   <2e-16 ***
## country_f131  8.168812   0.056465 144.6705   <2e-16 ***
## country_f132  7.952992   0.056730 140.1899   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    3104.6
## Residual Sum of Squares: 91.097
## R-Squared:      0.97066
## Adj. R-Squared: 0.96946
## F-statistic: 58478.9 on 115 and 2785 DF, p-value: < 2.22e-16
coeftest(plm_lsdv, vcov. = vcovHC(plm_lsdv, method = "arellano"))[1:2,] 
##        Estimate Std. Error    t value     Pr(>|t|)
## reg -0.01401365 0.03389072 -0.4134951 6.792757e-01
## edt  0.16782325 0.01608840 10.4313214 5.123485e-25
# you can also use lm() to implement LSDV estimator, but it is difficult to adjust for standard errors

# You can estimate the same model using lm, as follows:
summary(lm(ln_gdpw ~ country_f + year_f + reg + edt, data=data))
## 
## Call:
## lm(formula = ln_gdpw ~ country_f + year_f + reg + edt, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.64057 -0.07997  0.00780  0.07941  0.59769 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.730613   0.036399 239.857  < 2e-16 ***
## country_f2   -1.917900   0.053018 -36.175  < 2e-16 ***
## country_f3   -1.473490   0.041683 -35.350  < 2e-16 ***
## country_f4   -1.056680   0.044576 -23.705  < 2e-16 ***
## country_f5   -2.453336   0.043279 -56.687  < 2e-16 ***
## country_f6   -2.435972   0.042857 -56.839  < 2e-16 ***
## country_f7   -1.499153   0.041183 -36.402  < 2e-16 ***
## country_f9   -1.995622   0.041999 -47.516  < 2e-16 ***
## country_f12  -0.929167   0.041505 -22.387  < 2e-16 ***
## country_f14  -0.732719   0.041046 -17.851  < 2e-16 ***
## country_f15  -2.619545   0.043429 -60.318  < 2e-16 ***
## country_f16  -0.181423   0.041144  -4.409 1.08e-05 ***
## country_f17  -1.726617   0.045976 -37.555  < 2e-16 ***
## country_f18  -1.417699   0.040828 -34.724  < 2e-16 ***
## country_f19  -2.109640   0.047064 -44.825  < 2e-16 ***
## country_f21  -0.992496   0.042319 -23.453  < 2e-16 ***
## country_f22  -1.691828   0.041942 -40.337  < 2e-16 ***
## country_f23  -1.990957   0.045851 -43.422  < 2e-16 ***
## country_f24  -1.398031   0.041994 -33.291  < 2e-16 ***
## country_f25  -1.543200   0.041152 -37.500  < 2e-16 ***
## country_f26  -2.262115   0.042419 -53.328  < 2e-16 ***
## country_f27  -1.869869   0.043059 -43.426  < 2e-16 ***
## country_f28  -1.248226   0.043533 -28.673  < 2e-16 ***
## country_f29  -0.539974   0.050557 -10.681  < 2e-16 ***
## country_f30  -0.572730   0.041078 -13.943  < 2e-16 ***
## country_f31  -1.871067   0.051378 -36.418  < 2e-16 ***
## country_f33  -1.423227   0.041805 -34.044  < 2e-16 ***
## country_f34  -2.084951   0.041651 -50.057  < 2e-16 ***
## country_f35  -1.335325   0.042359 -31.524  < 2e-16 ***
## country_f37  -1.244580   0.042403 -29.351  < 2e-16 ***
## country_f39  -0.122955   0.045250  -2.717 0.006625 ** 
## country_f40  -1.407871   0.048167 -29.229  < 2e-16 ***
## country_f41  -0.616258   0.046216 -13.334  < 2e-16 ***
## country_f42  -2.390835   0.042021 -56.896  < 2e-16 ***
## country_f43  -1.995645   0.041242 -48.389  < 2e-16 ***
## country_f44  -0.379319   0.041552  -9.129  < 2e-16 ***
## country_f45  -1.963885   0.042025 -46.732  < 2e-16 ***
## country_f46  -2.012405   0.040934 -49.162  < 2e-16 ***
## country_f47  -1.220878   0.042391 -28.800  < 2e-16 ***
## country_f48  -1.285795   0.043111 -29.825  < 2e-16 ***
## country_f50  -0.006703   0.061292  -0.109 0.912929    
## country_f52   0.808902   0.062404  12.962  < 2e-16 ***
## country_f53  -0.113515   0.044777  -2.535 0.011296 *  
## country_f54  -0.472017   0.044150 -10.691  < 2e-16 ***
## country_f55  -0.547391   0.041114 -13.314  < 2e-16 ***
## country_f57  -0.307334   0.041925  -7.331 3.00e-13 ***
## country_f58  -1.617781   0.041315 -39.157  < 2e-16 ***
## country_f59  -0.820488   0.041106 -19.960  < 2e-16 ***
## country_f60  -0.632048   0.049739 -12.707  < 2e-16 ***
## country_f61   0.387952   0.041704   9.303  < 2e-16 ***
## country_f62  -0.337781   0.041220  -8.195 3.80e-16 ***
## country_f63  -0.259760   0.044779  -5.801 7.35e-09 ***
## country_f64   0.771826   0.050320  15.338  < 2e-16 ***
## country_f65   0.949886   0.065606  14.479  < 2e-16 ***
## country_f66   0.350150   0.044747   7.825 7.17e-15 ***
## country_f67  -0.651497   0.041172 -15.824  < 2e-16 ***
## country_f68  -0.128995   0.041251  -3.127 0.001784 ** 
## country_f69   0.008814   0.045210   0.195 0.845440    
## country_f70  -0.251850   0.042843  -5.878 4.64e-09 ***
## country_f71  -0.330624   0.043386  -7.621 3.45e-14 ***
## country_f72  -0.635309   0.048574 -13.079  < 2e-16 ***
## country_f73  -0.697559   0.044301 -15.746  < 2e-16 ***
## country_f74  -0.146948   0.044286  -3.318 0.000918 ***
## country_f76   0.010711   0.047175   0.227 0.820411    
## country_f77   0.833581   0.043290  19.256  < 2e-16 ***
## country_f78  -1.105945   0.046788 -23.637  < 2e-16 ***
## country_f79  -1.939535   0.041334 -46.923  < 2e-16 ***
## country_f80  -1.539557   0.042681 -36.072  < 2e-16 ***
## country_f81  -1.359055   0.041265 -32.935  < 2e-16 ***
## country_f82   0.426294   0.041119  10.367  < 2e-16 ***
## country_f83   0.647830   0.041182  15.731  < 2e-16 ***
## country_f84   0.418555   0.056035   7.470 1.07e-13 ***
## country_f85   0.053313   0.055587   0.959 0.337601    
## country_f86  -0.027722   0.040876  -0.678 0.497704    
## country_f87  -0.610045   0.045569 -13.387  < 2e-16 ***
## country_f89  -0.390746   0.047160  -8.286  < 2e-16 ***
## country_f91  -2.241439   0.041358 -54.195  < 2e-16 ***
## country_f92  -1.693048   0.042632 -39.713  < 2e-16 ***
## country_f93  -1.062980   0.041347 -25.709  < 2e-16 ***
## country_f94  -0.939023   0.047529 -19.757  < 2e-16 ***
## country_f95   0.111158   0.046421   2.395 0.016707 *  
## country_f96  -0.995379   0.050057 -19.885  < 2e-16 ***
## country_f97   0.158108   0.042187   3.748 0.000182 ***
## country_f98  -0.355784   0.045064  -7.895 4.15e-15 ***
## country_f99  -1.131414   0.043437 -26.047  < 2e-16 ***
## country_f101  0.561476   0.047187  11.899  < 2e-16 ***
## country_f102  0.693318   0.057553  12.047  < 2e-16 ***
## country_f103 -0.350231   0.078815  -4.444 9.19e-06 ***
## country_f104 -0.737881   0.061880 -11.924  < 2e-16 ***
## country_f105  0.604345   0.053201  11.360  < 2e-16 ***
## country_f106  0.440257   0.060017   7.336 2.89e-13 ***
## country_f107  0.727325   0.048048  15.138  < 2e-16 ***
## country_f108  0.655984   0.058760  11.164  < 2e-16 ***
## country_f110  0.082554   0.045937   1.797 0.072424 .  
## country_f111 -0.220533   0.052461  -4.204 2.71e-05 ***
## country_f112  0.497444   0.052904   9.403  < 2e-16 ***
## country_f113  0.247127   0.053885   4.586 4.72e-06 ***
## country_f114  0.638162   0.047725  13.372  < 2e-16 ***
## country_f116 -0.192216   0.049565  -3.878 0.000108 ***
## country_f117  0.808797   0.054897  14.733  < 2e-16 ***
## country_f118  0.635817   0.056960  11.163  < 2e-16 ***
## country_f119 -0.436869   0.060082  -7.271 4.62e-13 ***
## country_f120 -0.127130   0.043102  -2.950 0.003209 ** 
## country_f122  0.502084   0.044633  11.249  < 2e-16 ***
## country_f123  0.713371   0.058033  12.293  < 2e-16 ***
## country_f124  0.893549   0.052122  17.143  < 2e-16 ***
## country_f125 -0.602866   0.042373 -14.228  < 2e-16 ***
## country_f126  0.539690   0.057529   9.381  < 2e-16 ***
## country_f128 -0.200672   0.045153  -4.444 9.17e-06 ***
## country_f129  0.764097   0.062377  12.250  < 2e-16 ***
## country_f130 -0.118661   0.054084  -2.194 0.028318 *  
## country_f131  0.731500   0.066171  11.055  < 2e-16 ***
## country_f132 -1.137835   0.056866 -20.009  < 2e-16 ***
## year_f1961    0.035883   0.025082   1.431 0.152652    
## year_f1962    0.079099   0.024685   3.204 0.001369 ** 
## year_f1963    0.105967   0.024648   4.299 1.77e-05 ***
## year_f1964    0.141043   0.024512   5.754 9.67e-09 ***
## year_f1965    0.164482   0.024398   6.742 1.90e-11 ***
## year_f1966    0.186794   0.024221   7.712 1.72e-14 ***
## year_f1967    0.207793   0.024290   8.555  < 2e-16 ***
## year_f1968    0.241314   0.024288   9.936  < 2e-16 ***
## year_f1969    0.282157   0.024376  11.575  < 2e-16 ***
## year_f1970    0.319016   0.024447  13.049  < 2e-16 ***
## year_f1971    0.355129   0.024406  14.551  < 2e-16 ***
## year_f1972    0.376303   0.024535  15.337  < 2e-16 ***
## year_f1973    0.404055   0.024680  16.371  < 2e-16 ***
## year_f1974    0.440244   0.024825  17.734  < 2e-16 ***
## year_f1975    0.452843   0.024904  18.183  < 2e-16 ***
## year_f1976    0.482772   0.025110  19.226  < 2e-16 ***
## year_f1977    0.504968   0.025336  19.930  < 2e-16 ***
## year_f1978    0.526284   0.025574  20.579  < 2e-16 ***
## year_f1979    0.540096   0.025798  20.936  < 2e-16 ***
## year_f1980    0.544420   0.026076  20.878  < 2e-16 ***
## year_f1981    0.547954   0.026301  20.834  < 2e-16 ***
## year_f1982    0.533165   0.026557  20.077  < 2e-16 ***
## year_f1983    0.520768   0.026820  19.417  < 2e-16 ***
## year_f1984    0.517421   0.027102  19.092  < 2e-16 ***
## year_f1985    0.523403   0.027429  19.082  < 2e-16 ***
## year_f1986    0.527585   0.027774  18.996  < 2e-16 ***
## year_f1987    0.527344   0.028255  18.664  < 2e-16 ***
## reg           0.027458   0.012644   2.172 0.029974 *  
## edt           0.021767   0.005799   3.754 0.000178 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1497 on 2758 degrees of freedom
##   (1226 observations deleted due to missingness)
## Multiple R-squared:  0.9801, Adjusted R-squared:  0.9791 
## F-statistic: 962.7 on 141 and 2758 DF,  p-value: < 2.2e-16
# You will need other packages to adjust the standard errors in lm model objects. Check out in the internet but coeftest surely have options!

Within vs. LSDV

Within and LSDV estimators are equivalent to each other. They give exactly the same estimates.

But we favor one over another based on the following reasons:

  • Their standard errors will differ slightly. We don’t need to adjust degrees of freedom for \(\sigma^2\) manually in LSDV estimator because it directly estimate the fixed effects of N groups. By contrast, standard errors from vanilla OLS on the within estimator will be slightly off due to incorrect degrees of freedom (smart software such as plm() in R and areg in Stata will correct it).
  • Compared to within estimator, LSDV approach can be computationally demanding. The model matrix is a very large matrix to invert when \(N\) is large.
  • However, within and LSDV cannot estimate the intercept very well. The estimates of \(\alpha_i\) is inconsistent if \(N \rightarrow \infty\) and \(T\) fixed (i.e. incidental parameters problem). But there is no need to estimate \(\alpha_i\) unless the quantity of interest is a function of it, such as some non-linear models.

FD Estimator

Since the group-level time-invariant unobservables are constant across time, first-differences are an alter- native to purge out it. The first difference model is the following: \[ \begin{aligned} \Delta y_{it} =& y_{it} - y_{i,t-1} \\ =& \color{red}{(\alpha_i - \alpha_i)} + \beta^\prime \underbrace{(x_{it} - x_{i,t-1})}_{\Delta x_{it}} + \underbrace{(\varepsilon_{it} - \varepsilon_{i,t-1})}_{\Delta \varepsilon_{it}} \\ =& \beta^\prime \Delta x_{it} + \Delta \varepsilon_{it} \end{aligned} \]

If \(\Delta \varepsilon_{it}\) are homoskedastic and without serial correlation, then usual OLS standard errors work just fine. Otherwise we need to use GLS to estimate uncertainties.

FD estimator in plm() will automatically include an intercept. The intercept in FD model is actually an coefficient for linear time trend (in the second step of the above equation, it is \(\theta (t+1 - t)\)).

FE vs. FD Estimators

  • FD and FE will give the same estimates when \(T = 2\), but for \(T > 2\), the two methods are different.
  • Note that FD often cannot account for two-way fixed effects. It is not intuitive to have \(y_{it} - y_{i-1,t}\).
  • Both FE and FD estimators are consistent under the assumption of strict exogeneity.
  • However, the primary tradeoff between FE and FD estimators lies in the i.i.d. assumption: if original errors, \(\varepsilon_{it}\), are serially uncorrelated, then FE estimator will be more efficient than FD estimator.
  • But if there are substantial serial dependence in the original errors, we may consider FD estimator to make the first-differences of errors, \(\Delta \varepsilon_{it}\), serially uncorrelated. In the latter case, FD estimator is more efficient than FE estimator.
  • We can always try both: if results are not very sensitive, good!
# FD estimator using `plm`
plm_fd <- plm(
  ln_gdpw ~ reg + edt - 1,
  index = c("country", "year"),
  model = "fd", # FD estimator
  data = data
)
summary(plm_fd)
## Oneway (individual) effect First-Difference Model
## 
## Call:
## plm(formula = ln_gdpw ~ reg + edt - 1, data = data, model = "fd", 
##     index = c("country", "year"))
## 
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
## Observations used in estimation: 2787
## 
## Residuals:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.6344 -0.0102  0.0190  0.0168  0.0469  0.5230 
## 
## Coefficients:
##      Estimate Std. Error t-value  Pr(>|t|)    
## reg 0.0061632  0.0082067  0.7510    0.4527    
## edt 0.0395749  0.0055039  7.1904 8.264e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    10.679
## Residual Sum of Squares: 11.628
## R-Squared:      5.8796e-05
## Adj. R-Squared: -0.00030025
## F-statistic: 26.1857 on 2 and 2785 DF, p-value: 5.4114e-12

Two-way Fixed Effects

A very common model specificaiton is the two-way fixed effects (TWFE). Note that the TWFE does not provide any causal identification unless pre-treatment parallel trends hold and we assume covariate imbalance between control and treatment group. Even in this case, recent econometric research has found many shortcoming of this model specification for the causal estimation of treatment effects.

\[ y_{it} = \alpha_i + \tau_t + \beta^\prime x_{it} + \varepsilon_{it}\] Where we incorporate a dummy variable for each period \(\tau_t\). However, some researchers argue that this model may be inappropriate to answer some research questions.

# two-way fixed effects

plm_twfe <- plm(ln_gdpw ~ reg + edt, 
                data = data, 
                index = c("country_f", "year_f"), 
                model = "within", 
                effect = "twoways")

summary(plm_twfe)
## Twoways effects Within Model
## 
## Call:
## plm(formula = ln_gdpw ~ reg + edt, data = data, effect = "twoways", 
##     model = "within", index = c("country_f", "year_f"))
## 
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -0.6405721 -0.0799739  0.0078023  0.0794105  0.5976897 
## 
## Coefficients:
##      Estimate Std. Error t-value  Pr(>|t|)    
## reg 0.0274582  0.0126445  2.1716 0.0299740 *  
## edt 0.0217672  0.0057988  3.7537 0.0001778 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    62.255
## Residual Sum of Squares: 61.822
## R-Squared:      0.0069488
## Adj. R-Squared: -0.04382
## F-statistic: 9.64946 on 2 and 2758 DF, p-value: 6.6663e-05
# You can estimate the same model using lm, as follows:
summary(lm(ln_gdpw ~ country_f + year_f + reg + edt, data=data))
## 
## Call:
## lm(formula = ln_gdpw ~ country_f + year_f + reg + edt, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.64057 -0.07997  0.00780  0.07941  0.59769 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.730613   0.036399 239.857  < 2e-16 ***
## country_f2   -1.917900   0.053018 -36.175  < 2e-16 ***
## country_f3   -1.473490   0.041683 -35.350  < 2e-16 ***
## country_f4   -1.056680   0.044576 -23.705  < 2e-16 ***
## country_f5   -2.453336   0.043279 -56.687  < 2e-16 ***
## country_f6   -2.435972   0.042857 -56.839  < 2e-16 ***
## country_f7   -1.499153   0.041183 -36.402  < 2e-16 ***
## country_f9   -1.995622   0.041999 -47.516  < 2e-16 ***
## country_f12  -0.929167   0.041505 -22.387  < 2e-16 ***
## country_f14  -0.732719   0.041046 -17.851  < 2e-16 ***
## country_f15  -2.619545   0.043429 -60.318  < 2e-16 ***
## country_f16  -0.181423   0.041144  -4.409 1.08e-05 ***
## country_f17  -1.726617   0.045976 -37.555  < 2e-16 ***
## country_f18  -1.417699   0.040828 -34.724  < 2e-16 ***
## country_f19  -2.109640   0.047064 -44.825  < 2e-16 ***
## country_f21  -0.992496   0.042319 -23.453  < 2e-16 ***
## country_f22  -1.691828   0.041942 -40.337  < 2e-16 ***
## country_f23  -1.990957   0.045851 -43.422  < 2e-16 ***
## country_f24  -1.398031   0.041994 -33.291  < 2e-16 ***
## country_f25  -1.543200   0.041152 -37.500  < 2e-16 ***
## country_f26  -2.262115   0.042419 -53.328  < 2e-16 ***
## country_f27  -1.869869   0.043059 -43.426  < 2e-16 ***
## country_f28  -1.248226   0.043533 -28.673  < 2e-16 ***
## country_f29  -0.539974   0.050557 -10.681  < 2e-16 ***
## country_f30  -0.572730   0.041078 -13.943  < 2e-16 ***
## country_f31  -1.871067   0.051378 -36.418  < 2e-16 ***
## country_f33  -1.423227   0.041805 -34.044  < 2e-16 ***
## country_f34  -2.084951   0.041651 -50.057  < 2e-16 ***
## country_f35  -1.335325   0.042359 -31.524  < 2e-16 ***
## country_f37  -1.244580   0.042403 -29.351  < 2e-16 ***
## country_f39  -0.122955   0.045250  -2.717 0.006625 ** 
## country_f40  -1.407871   0.048167 -29.229  < 2e-16 ***
## country_f41  -0.616258   0.046216 -13.334  < 2e-16 ***
## country_f42  -2.390835   0.042021 -56.896  < 2e-16 ***
## country_f43  -1.995645   0.041242 -48.389  < 2e-16 ***
## country_f44  -0.379319   0.041552  -9.129  < 2e-16 ***
## country_f45  -1.963885   0.042025 -46.732  < 2e-16 ***
## country_f46  -2.012405   0.040934 -49.162  < 2e-16 ***
## country_f47  -1.220878   0.042391 -28.800  < 2e-16 ***
## country_f48  -1.285795   0.043111 -29.825  < 2e-16 ***
## country_f50  -0.006703   0.061292  -0.109 0.912929    
## country_f52   0.808902   0.062404  12.962  < 2e-16 ***
## country_f53  -0.113515   0.044777  -2.535 0.011296 *  
## country_f54  -0.472017   0.044150 -10.691  < 2e-16 ***
## country_f55  -0.547391   0.041114 -13.314  < 2e-16 ***
## country_f57  -0.307334   0.041925  -7.331 3.00e-13 ***
## country_f58  -1.617781   0.041315 -39.157  < 2e-16 ***
## country_f59  -0.820488   0.041106 -19.960  < 2e-16 ***
## country_f60  -0.632048   0.049739 -12.707  < 2e-16 ***
## country_f61   0.387952   0.041704   9.303  < 2e-16 ***
## country_f62  -0.337781   0.041220  -8.195 3.80e-16 ***
## country_f63  -0.259760   0.044779  -5.801 7.35e-09 ***
## country_f64   0.771826   0.050320  15.338  < 2e-16 ***
## country_f65   0.949886   0.065606  14.479  < 2e-16 ***
## country_f66   0.350150   0.044747   7.825 7.17e-15 ***
## country_f67  -0.651497   0.041172 -15.824  < 2e-16 ***
## country_f68  -0.128995   0.041251  -3.127 0.001784 ** 
## country_f69   0.008814   0.045210   0.195 0.845440    
## country_f70  -0.251850   0.042843  -5.878 4.64e-09 ***
## country_f71  -0.330624   0.043386  -7.621 3.45e-14 ***
## country_f72  -0.635309   0.048574 -13.079  < 2e-16 ***
## country_f73  -0.697559   0.044301 -15.746  < 2e-16 ***
## country_f74  -0.146948   0.044286  -3.318 0.000918 ***
## country_f76   0.010711   0.047175   0.227 0.820411    
## country_f77   0.833581   0.043290  19.256  < 2e-16 ***
## country_f78  -1.105945   0.046788 -23.637  < 2e-16 ***
## country_f79  -1.939535   0.041334 -46.923  < 2e-16 ***
## country_f80  -1.539557   0.042681 -36.072  < 2e-16 ***
## country_f81  -1.359055   0.041265 -32.935  < 2e-16 ***
## country_f82   0.426294   0.041119  10.367  < 2e-16 ***
## country_f83   0.647830   0.041182  15.731  < 2e-16 ***
## country_f84   0.418555   0.056035   7.470 1.07e-13 ***
## country_f85   0.053313   0.055587   0.959 0.337601    
## country_f86  -0.027722   0.040876  -0.678 0.497704    
## country_f87  -0.610045   0.045569 -13.387  < 2e-16 ***
## country_f89  -0.390746   0.047160  -8.286  < 2e-16 ***
## country_f91  -2.241439   0.041358 -54.195  < 2e-16 ***
## country_f92  -1.693048   0.042632 -39.713  < 2e-16 ***
## country_f93  -1.062980   0.041347 -25.709  < 2e-16 ***
## country_f94  -0.939023   0.047529 -19.757  < 2e-16 ***
## country_f95   0.111158   0.046421   2.395 0.016707 *  
## country_f96  -0.995379   0.050057 -19.885  < 2e-16 ***
## country_f97   0.158108   0.042187   3.748 0.000182 ***
## country_f98  -0.355784   0.045064  -7.895 4.15e-15 ***
## country_f99  -1.131414   0.043437 -26.047  < 2e-16 ***
## country_f101  0.561476   0.047187  11.899  < 2e-16 ***
## country_f102  0.693318   0.057553  12.047  < 2e-16 ***
## country_f103 -0.350231   0.078815  -4.444 9.19e-06 ***
## country_f104 -0.737881   0.061880 -11.924  < 2e-16 ***
## country_f105  0.604345   0.053201  11.360  < 2e-16 ***
## country_f106  0.440257   0.060017   7.336 2.89e-13 ***
## country_f107  0.727325   0.048048  15.138  < 2e-16 ***
## country_f108  0.655984   0.058760  11.164  < 2e-16 ***
## country_f110  0.082554   0.045937   1.797 0.072424 .  
## country_f111 -0.220533   0.052461  -4.204 2.71e-05 ***
## country_f112  0.497444   0.052904   9.403  < 2e-16 ***
## country_f113  0.247127   0.053885   4.586 4.72e-06 ***
## country_f114  0.638162   0.047725  13.372  < 2e-16 ***
## country_f116 -0.192216   0.049565  -3.878 0.000108 ***
## country_f117  0.808797   0.054897  14.733  < 2e-16 ***
## country_f118  0.635817   0.056960  11.163  < 2e-16 ***
## country_f119 -0.436869   0.060082  -7.271 4.62e-13 ***
## country_f120 -0.127130   0.043102  -2.950 0.003209 ** 
## country_f122  0.502084   0.044633  11.249  < 2e-16 ***
## country_f123  0.713371   0.058033  12.293  < 2e-16 ***
## country_f124  0.893549   0.052122  17.143  < 2e-16 ***
## country_f125 -0.602866   0.042373 -14.228  < 2e-16 ***
## country_f126  0.539690   0.057529   9.381  < 2e-16 ***
## country_f128 -0.200672   0.045153  -4.444 9.17e-06 ***
## country_f129  0.764097   0.062377  12.250  < 2e-16 ***
## country_f130 -0.118661   0.054084  -2.194 0.028318 *  
## country_f131  0.731500   0.066171  11.055  < 2e-16 ***
## country_f132 -1.137835   0.056866 -20.009  < 2e-16 ***
## year_f1961    0.035883   0.025082   1.431 0.152652    
## year_f1962    0.079099   0.024685   3.204 0.001369 ** 
## year_f1963    0.105967   0.024648   4.299 1.77e-05 ***
## year_f1964    0.141043   0.024512   5.754 9.67e-09 ***
## year_f1965    0.164482   0.024398   6.742 1.90e-11 ***
## year_f1966    0.186794   0.024221   7.712 1.72e-14 ***
## year_f1967    0.207793   0.024290   8.555  < 2e-16 ***
## year_f1968    0.241314   0.024288   9.936  < 2e-16 ***
## year_f1969    0.282157   0.024376  11.575  < 2e-16 ***
## year_f1970    0.319016   0.024447  13.049  < 2e-16 ***
## year_f1971    0.355129   0.024406  14.551  < 2e-16 ***
## year_f1972    0.376303   0.024535  15.337  < 2e-16 ***
## year_f1973    0.404055   0.024680  16.371  < 2e-16 ***
## year_f1974    0.440244   0.024825  17.734  < 2e-16 ***
## year_f1975    0.452843   0.024904  18.183  < 2e-16 ***
## year_f1976    0.482772   0.025110  19.226  < 2e-16 ***
## year_f1977    0.504968   0.025336  19.930  < 2e-16 ***
## year_f1978    0.526284   0.025574  20.579  < 2e-16 ***
## year_f1979    0.540096   0.025798  20.936  < 2e-16 ***
## year_f1980    0.544420   0.026076  20.878  < 2e-16 ***
## year_f1981    0.547954   0.026301  20.834  < 2e-16 ***
## year_f1982    0.533165   0.026557  20.077  < 2e-16 ***
## year_f1983    0.520768   0.026820  19.417  < 2e-16 ***
## year_f1984    0.517421   0.027102  19.092  < 2e-16 ***
## year_f1985    0.523403   0.027429  19.082  < 2e-16 ***
## year_f1986    0.527585   0.027774  18.996  < 2e-16 ***
## year_f1987    0.527344   0.028255  18.664  < 2e-16 ***
## reg           0.027458   0.012644   2.172 0.029974 *  
## edt           0.021767   0.005799   3.754 0.000178 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1497 on 2758 degrees of freedom
##   (1226 observations deleted due to missingness)
## Multiple R-squared:  0.9801, Adjusted R-squared:  0.9791 
## F-statistic: 962.7 on 141 and 2758 DF,  p-value: < 2.2e-16
#  Correcting for uncertainty with clustered standard errors:
coeftest(plm_twfe, vcov. = vcovHC(plm_twfe, method = "arellano"))
## 
## t test of coefficients:
## 
##     Estimate Std. Error t value Pr(>|t|)
## reg 0.027458   0.027678  0.9921   0.3213
## edt 0.021767   0.020064  1.0849   0.2781

Random Effects / Mixed Effects

What is Random Effects?

Assumes the individuals come from a common population. Each individual’s unobservable heterogeneous effect on the outcome follows a normal distribution with an unknown variance \(\sigma_\alpha^2\). Then we may have the following model setup:

\[ \begin{aligned} y_{i t} &=\beta_{0}+\beta_{1} x_{i t}+\alpha_{i}+\varepsilon_{i t} \\ \alpha_{i} & \sim \mathcal{N}\left(0, \sigma_{\alpha}^{2}\right) \\ \varepsilon_{i t} & \sim \mathcal{N}\left(0, \sigma_\varepsilon^{2}\right) \end{aligned} \]

Estimate an RE model in R

  • Three packages in R: plm (feasible generalized least squares by defaulty), nlme (maximum likelihood by default), and lme4 (restricted maximum likelihood by default)
  • Pros and Cons among different estimation strategies
    • FGLS2
      • preferred by some econometricians because it has fewer distributional assumptions.
      • sometimes FGLS is computationally complicated because it need to invert a large matrix.
    • ML: biased estimation of variance parameters; but such bias gets smaller for larger sample size
    • REML: unbiased estimation of variance parameters
    • If you prefer a Bayesian approach for more flexible modeling, you can use brms or even manually specify the model in JAGS and stan.
  • Pros and cons among different packages, besides estimation strategies
    • plm cannot deal with mixed effects, i.e., the mean of \(\alpha_i\) varies due to certain covariates, i.e. \(\alpha_i \sim \mathcal{N}(\gamma^\prime z_{it}, \sigma_\alpha^2)\), but nlme and lme4 can do so.
    • nlme can take ARMA structure into account, but plm and lme4 can only consider AR terms.
    • nlme is difficult to specify crossed (non-nested) random effect structure (i.e. multiple separated random effects), while lme4 is easy to do so.
  • In terms of the mixed effects model in nlme and lme4, you may be confused by their different use of “fixed effects” and “random effects.”3
    • fixed effect: coefficients that do not vary by group, such that they are fixed, not random.
    • random effect: group-specific effect; these coefficients vary across groups.

A typical mixed effect model is as follows:

\[ y_{it} = \underbrace{\beta^\prime x_{it}}_\text{"fixed effect"} + \underbrace{\gamma^\prime z_i}_\text{"random effects"} + \varepsilon_{it} \]

or can be specified as follows: \[ \begin{aligned} y_{i t} &=\beta^\prime x_{i t}+\alpha_{i}+\varepsilon_{i t} \\ \alpha_{i} & \sim \mathcal{N}\left(\gamma^\prime z_i, \sigma_{\alpha}^{2}\right) \\ \varepsilon_{i t} & \sim \mathcal{N}\left(0, \sigma_\varepsilon^{2}\right) \end{aligned} \]

# RE model using `plm`
plm_re <- plm(
  ln_gdpw ~ reg + edt + oil,
  data = data,
  index = c("country", "year"),
  model = "random",
  effect = "individual"
)

summary(plm_re)
## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = ln_gdpw ~ reg + edt + oil, data = data, effect = "individual", 
##     model = "random", index = c("country", "year"))
## 
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
## 
## Effects:
##                   var std.dev share
## idiosyncratic 0.03271 0.18086 0.091
## individual    0.32784 0.57258 0.909
## theta:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8601  0.9382  0.9404  0.9381  0.9404  0.9404 
## 
## Residuals:
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.82664 -0.10254  0.01505  0.00081  0.11712  0.75912 
## 
## Coefficients:
##               Estimate Std. Error  z-value  Pr(>|z|)    
## (Intercept)  7.7290183  0.0608325 127.0540 < 2.2e-16 ***
## reg         -0.0030963  0.0150650  -0.2055  0.837159    
## edt          0.1717169  0.0036682  46.8124 < 2.2e-16 ***
## oil          0.5458284  0.1705342   3.2007  0.001371 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    173.89
## Residual Sum of Squares: 96.136
## R-Squared:      0.44728
## Adj. R-Squared: 0.44671
## Chisq: 2200.57 on 3 DF, p-value: < 2.22e-16
# RE model using `nlme`
lme_re <- lme(
  fixed = ln_gdpw ~ reg + edt + oil,
  random = ~ 1 | country,
  data = data,
  na.action = na.omit # need to specify how we deal with NA values, otherwise throws an error
)

summary(lme_re)
## Linear mixed-effects model fit by REML
##   Data: data 
##         AIC       BIC   logLik
##   -996.8293 -961.0027 504.4146
## 
## Random effects:
##  Formula: ~1 | country
##         (Intercept)  Residual
## StdDev:   0.6733608 0.1808866
## 
## Fixed effects:  ln_gdpw ~ reg + edt + oil 
##                 Value  Std.Error   DF   t-value p-value
## (Intercept)  7.735312 0.07004634 2785 110.43135  0.0000
## reg         -0.006041 0.01498196 2785  -0.40321  0.6868
## edt          0.170680 0.00366084 2785  46.62312  0.0000
## oil          0.543484 0.19892569  111   2.73210  0.0073
##  Correlation: 
##     (Intr) reg    edt   
## reg -0.066              
## edt -0.258 -0.058       
## oil -0.335  0.010  0.032
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.59951054 -0.52967154  0.08334205  0.61991189  4.00624582 
## 
## Number of Observations: 2900
## Number of Groups: 113
# RE model using `lme4`
lmer_re <- lmer(
  ln_gdpw ~ reg + edt + oil + (1 | country),
  data = data
)

summary(lmer_re)
## Linear mixed model fit by REML ['lmerMod']
## Formula: ln_gdpw ~ reg + edt + oil + (1 | country)
##    Data: data
## 
## REML criterion at convergence: -1008.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.5995 -0.5297  0.0833  0.6199  4.0062 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 0.45341  0.6734  
##  Residual             0.03272  0.1809  
## Number of obs: 2900, groups:  country, 113
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)  7.735312   0.070046 110.431
## reg         -0.006041   0.014982  -0.403
## edt          0.170680   0.003661  46.623
## oil          0.543484   0.198926   2.732
## 
## Correlation of Fixed Effects:
##     (Intr) reg    edt   
## reg -0.066              
## edt -0.258 -0.058       
## oil -0.335  0.010  0.032

FE vs. RE

  • Different assumption about the relationship between group heterogeneity \(u_i\) and covariates \(x_{it}\).
    • In FE model, we assume conditional ignorability \(E[\varepsilon_{it}|x_{it}, u_i] = 0\). That is, we assume that conditional on unobservable confounder \(u_i\), \(\varepsilon_{it}\) and \(x_{it}\) is uncorrelated. \(x_{it}\) and \(u_i\) have some correlation.

    • In RE model, we assume there is no unmeasured confounder, so ignorability holds without conditioning on \(u_i\): \(E[\varepsilon_{it} | x_{it}] = E[u_i] = 0\). \(x_{it}\) and \(u_i\) have no correlation.

  • RE estimator is a matrix-weighted average of within estimator and between estimator.
    • But an intuitive way to think about it is that FE model rule out between-group variation; but in RE model, we preserve the information of between-group variation by modeling \(\sigma_\alpha^2\).

    • It is difficult to show it mathematically, because it requires to go through the variance-covariance matrix of the random effects and requires some understanding what Gaussian processes are. If you want to learn how to model random effects, you should consider taking CS&SS 560 and CS&SS 592.

We can think of RE model doing quasi-demeaning or partial-pooling:

\[ y_{it} - \theta\bar{y}_i = \beta^\prime (x_{it} - \theta \bar{x}_i) + (\varepsilon_{it} - \theta\bar{\varepsilon}_i) \] where \(\theta = 0\) implies pooled OLS and \(\theta = 1\) implies fixed effects. Doing some math shows that

\[ \theta = 1 - \sqrt{\frac{\sigma_\varepsilon^2}{\sigma_\varepsilon^2 + T \cdot \sigma_\alpha^2}} \]

Dynamic TS Structure in Panel Data Model

Detect Time Series Structure in Panel Data

We know that for some countries which are good at growth maintenance, gdpw may be a non-stationary time series; whereas for some countries experienced economic fluctuation, their gdpw may have mean-reverting or even worse behavior in the long run.

ggplot(pdata, aes(x = year, y = gdpw, group = country, color = region)) +
  geom_line(alpha = 0.5) +
  theme_bw()

ggplot(pdata, aes(x = year, y = ln_gdpw, group = country, color = region)) +
  geom_line(alpha = 0.5) +
  theme_bw()