In today’s lab, we will be working with the following dataset on political regimes, resources endowments and economic growth around the world.
Variable | Description1 |
---|---|
britcol |
Dummy variable coded 1 for every year in countries that were British colonies any time after 1918, 0 otherwise. |
cath |
Percentage of Catholics in the population. |
civilib |
Civil liberty. This variable can take values from 1 (least free) to 7 (most free). |
edt |
Cumulative years of education of the average member of the labor force. |
elf60 |
Index of ethnolinguistic fractionalization, 1960. |
gdpw |
Real GDP per worker, 1985 international prices. |
moslem |
Percentage of Moslems in the population. |
newc |
Dummy variable coded 1 for every year in countries that became independent after 1945, 0 otherwise. |
oil |
Dummy variable coded 1 if the average ratio of fuel exports to total exports in 1984-86 exceeded 50%, 0 otherwise. |
pollib |
Political liberty. This variable can take values from 1 (least free) to 7 (most free). |
reg |
Dummy variable coded 1 for dictatorships and 0 for democracies.
Transition years are coded as the regime that emerges in that year. For
instance, there was a transition from democracy to dictatorship in
Argentina in 1955. In that year, reg =1. |
stra |
The sum of past transitions to authoritarianism (as defined by
reg ) in a country. If a country experienced one or more
transitions to authoritarianism before 1950, stra was coded
1 in 1950. |
# Load Democracy data
# Variable names:
# COUNTRY YEAR BRITCOL CATH
# CIVLIB EDT ELF60 GDPW MOSLEM
# NEWC OIL POLLIB REG STRA
#
# data <- read.csv("data/democracy.csv",
# header = TRUE,
# na.strings = ".")
data <- read.csv("data/democracy.csv",
header = TRUE)
colnames(data)
## [1] "COUNTRY" "CTYNAME" "REGION" "YEAR" "BRITCOL" "CATH" "CIVLIB"
## [8] "EDT" "ELF60" "GDPW" "MOSLEM" "NEWC" "OIL" "POLLIB"
## [15] "REG" "STRA"
colnames(data) <- colnames(data) %>% tolower() # change variable names to lower case
colnames(data)
## [1] "country" "ctyname" "region" "year" "britcol" "cath" "civlib"
## [8] "edt" "elf60" "gdpw" "moslem" "newc" "oil" "pollib"
## [15] "reg" "stra"
# check out missing variables
library(questionr)
freq.na(data)
## missing %
## civlib 1727 42
## pollib 1727 42
## edt 1226 30
## elf60 542 13
## country 0 0
## ctyname 0 0
## region 0 0
## year 0 0
## britcol 0 0
## cath 0 0
## gdpw 0 0
## moslem 0 0
## newc 0 0
## oil 0 0
## reg 0 0
## stra 0 0
# Check sample's N and T
data$country %>% unique() %>% length()
## [1] 135
data$year %>% unique() %>% length()
## [1] 40
# so it is an N=135, T=40 panel data.
## some data wrangling
data <- data %>%
mutate(
ln_gdpw = log(gdpw),
country_f = as.factor(country),
year_f = as.factor(year)
)
# Remember to perform panel exploratory analysis before fitting models! Refer to lab 1.
The full flexibility model:
\[\Delta^{di}y_{it} = \alpha_i + X_{it}\beta_i + \sum^{P_i}_{p=1}y_{i,t-p}\phi_i + \sum^{Q_i}_{q=1}y_{i,t-q}\rho_i + \epsilon_{it}\]
where intercept \(\alpha_i\), slope \(\beta_i\), ARIMA order \(P_i, D_i, Q_i\) and its coefficients \(\phi_i\), \(\rho_i\) vary and allows panel heteroskedasticity \(\sigma^2_i\). We can put unit-specific trend \(t\theta_i\) and seasonality \(\kappa_{k,i}\) inside this model too.
In simple terms, “fixed effects” refer to unit-specific intercepts. In other words, each unit or case \(i\) has its own intercept \(\alpha_i\) instead of pooling all cases into the same intercept. In practice, these case-specific intercepts introduce adjustments or controls for the mean outcome at the group level. This is akin to removing the mean from your data at the group level and conducting a simple linear regression.
However, the most common rationale for employing fixed effects estimators is their ability to eliminate time-invariant group-level unobservables. This arises from controlling for each group’s mean level of the outcome variable. However, it is essential to recognize that an underlying assumption is that these time-invariant unobservables truly never change, not only in their substance but also in their functional interpretation.
For instance, is Trump’s MAGA conservatism perceived the same as George Bush’s conservatism? If the answer is strictly no, then conservatism in the last 15 years is actually time-variant, and no fixed effects will shield you from omitted variable bias or measurement error. The more aggregate is your unit of analysis (individuals < neighborhoods < cities < regions < countries) or the longer is your time window sample \(T\), the more implausible is the assumption of time-invariant unobservables.
Suppose there is unobserved group-level or individual-level heterogeneity \(u_i\) in the true data generating process (DGP) (such group-level or unit-level heterogeneity that does not change over individual or time periods):
\[ y_{it} = \alpha + \beta^\prime x_{it} + \color{red}{\delta^\prime u_{i}} + \varepsilon_{it} \]
Because it is unobserved, we cannot measure it. Now set \(\alpha_i = \alpha_i^* = \alpha + \delta^\prime u_i\), we can remove this unobservable group-level heterogeneity and rewrite the above equation as follows:
\[ y_{it} = \alpha_i + \beta^\prime x_{it} + \varepsilon_{it} \]
The FE model can control for omitted unobservable variables if we assume that these unobservables are constant over time.
However, FE models come at a large cost.
There are 3 ways to estimate this FE model: 1. Within Estimator 2. Least Square Dummy Variable (LSDV) Estimator 3. First Difference (FD) Estimator
pdata <- pdata.frame(data, index = c("country_f", "year_f"))
head(pdata) # it will create an id for each observation
## country ctyname region year britcol cath civlib edt elf60 gdpw moslem
## 1-1962 1 Algeria Africa 1962 0 0.5 NA 1.160 0.43 5012 99.1
## 1-1963 1 Algeria Africa 1963 0 0.5 NA 1.250 0.43 6083 99.1
## 1-1964 1 Algeria Africa 1964 0 0.5 NA 1.345 0.43 6502 99.1
## 1-1965 1 Algeria Africa 1965 0 0.5 NA 1.450 0.43 6620 99.1
## 1-1966 1 Algeria Africa 1966 0 0.5 NA 1.560 0.43 6612 99.1
## 1-1967 1 Algeria Africa 1967 0 0.5 NA 1.675 0.43 6982 99.1
## newc oil pollib reg stra ln_gdpw country_f year_f
## 1-1962 1 1 NA 0 0 8.519590 1 1962
## 1-1963 1 1 NA 0 0 8.713253 1 1963
## 1-1964 1 1 NA 0 0 8.779865 1 1964
## 1-1965 1 1 NA 0 0 8.797851 1 1965
## 1-1966 1 1 NA 0 0 8.796641 1 1966
## 1-1967 1 1 NA 0 0 8.851091 1 1967
pvar(pdata)
## no time variation: country ctyname region britcol cath elf60 moslem newc oil country_f
## no individual variation: year edt elf60 year_f
## all NA in time dimension for at least one individual: civlib edt elf60 pollib
## all NA in ind. dimension for at least one time period: civlib edt elf60 pollib
# or
pvar(data, index = c("country", "year"))
## no time variation: country ctyname region britcol cath elf60 moslem newc oil country_f
## no individual variation: year edt elf60 year_f
## all NA in time dimension for at least one individual: civlib edt elf60 pollib
## all NA in ind. dimension for at least one time period: civlib edt elf60 pollib
We take each individuals’ deviation from group means (average over \(t\) within each group/unit \(i\)). Because the observable unit-level heterogeneity is time-invariant, its value should not changed after averaging over time within each unit. Thus, we can avoid estimating the varying intercepts.
\[ \begin{aligned} \tilde{y}_{it} =& y_{it} - \bar{y}_{i} \\ =& \color{red}{(\alpha_i - \bar{\alpha}_i)} + \beta^\prime \underbrace{(x_{it} - \bar{x}_{i})}_{\tilde{x}_{it}} + \underbrace{(\varepsilon_{it} - \bar{\varepsilon}_i)}_{\tilde{\varepsilon}_{it}} \\ =& \beta^\prime \tilde{x}_{it} + \tilde{\varepsilon}_{it} \end{aligned} \]
Like OLS, if not GM assumption is violated, the within estimator \(\hat{\beta}_W\) is asymptotically normal and efficient:
\[ \widehat{\beta}_{W} \stackrel{\text { approx. }}{\sim} \mathcal{N}\left(\beta, \sigma^{2}\left(\tilde{\mathbf{X}}^{\top} \tilde{\mathbf{X}}\right)^{-1}\right) \]
However, the estimation of standard error is a little tricky. When we demean all the variables, we already used \(N\) degrees of freedom for \(N\) units.
\[ \widehat{\sigma}^{2}=\frac{1}{N T-N-K} \sum_{i=1}^{N} \sum_{t=1}^{T}{\widehat{\tilde{\varepsilon}}}_{i}^{2} \]
Hence, if we use a canned OLS function for \(\hat{\beta}_W\), the standard error is not correct, and we need to adjust it manually.
# two-way within estimator using `plm`
plm_within <- plm(
#1st, the formula. if we include `oil`, plm() will automatically drop it. Why?
ln_gdpw ~ reg + edt,
data = data,
#2nd, you Ns and Ts indices
index = c("country_f","year_f"),
# what model specification? See help documentation: ?plm
model = "within",
# what estimator?
effect = "individual"
)
summary(plm_within) # note that the standard error does not take serial correlation into account
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = ln_gdpw ~ reg + edt, data = data, effect = "individual",
## model = "within", index = c("country_f", "year_f"))
##
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.832562 -0.095557 0.015056 0.111956 0.714304
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## reg -0.014014 0.015048 -0.9313 0.3518
## edt 0.167823 0.003712 45.2107 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 158.01
## Residual Sum of Squares: 91.097
## R-Squared: 0.42346
## Adj. R-Squared: 0.39986
## F-statistic: 1022.77 on 2 and 2785 DF, p-value: < 2.22e-16
## Adjusting uncertainty (standard errors)
# Arellano (1987), these are also commonly known as "cluster standard errors"
coeftest(plm_within, vcov. = vcovHC(plm_within, method = "arellano"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## reg -0.014014 0.033891 -0.4135 0.6793
## edt 0.167823 0.016088 10.4313 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Beck and Katz (1995) panel corrected standard errors
coeftest(plm_within, vcov. = vcovBK(plm_within))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## reg -0.014014 0.038906 -0.3602 0.7187
## edt 0.167823 0.014056 11.9398 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Driscoll and Kraay panel corrected standard errors
coeftest(plm_within, vcov. = vcovSCC(plm_within))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## reg -0.014014 0.026186 -0.5352 0.5926
## edt 0.167823 0.024322 6.9001 6.408e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# an alternative way is to use `fixest`; it is faster when we have a large number of FEs
fixest_model <- feols(
ln_gdpw ~ reg + edt | country + year,
data = data,
cluster = ~ country + year
)
summary(fixest_model)
## OLS estimation, Dep. Var.: ln_gdpw
## Observations: 2,900
## Fixed-effects: country: 113, year: 28
## Standard-errors: Clustered (country & year)
## Estimate Std. Error t value Pr(>|t|)
## reg 0.027458 0.028661 0.958037 0.34654
## edt 0.021767 0.020583 1.057552 0.29963
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.146007 Adj. R2: 0.979069
## Within R2: 0.006949
Although within estimator ignores varying intercepts, we can actually recover them by using:
\[ \hat{\alpha}_i = \hat{y}_{it} - \hat{\beta}^\prime x_{it} \]
Sometimes it is useful for us to look at the variation between groups.
plm::fixef(plm_within)
## 1 2 3 4 5 6 7 9 12 14 15
## 8.6922 7.1792 7.4219 7.6883 6.5901 6.5233 7.2324 6.9027 7.6231 7.8300 6.4095
## 16 17 18 19 21 22 23 24 25 26 27
## 8.4965 7.3238 7.2507 6.9872 7.9429 7.0042 6.4524 7.4450 7.1212 6.4010 7.1359
## 28 29 30 31 33 34 35 37 39 40 41
## 7.7960 7.7022 8.2290 7.0245 7.4692 6.6891 7.6043 7.6612 8.1048 7.6142 7.9845
## 42 43 44 45 46 47 48 50 52 53 54
## 6.5103 6.7621 8.4530 6.8853 6.5790 7.4898 7.3069 7.7311 8.3731 8.2562 7.9466
## 55 57 58 59 60 61 62 63 64 65 66
## 8.0132 8.4557 7.1631 7.8058 7.4805 8.8477 8.2020 7.9713 8.8951 8.4059 8.6039
## 67 68 69 70 71 72 73 74 76 77 78
## 7.9171 8.5114 8.2335 8.3645 8.0146 7.5932 7.5457 8.1486 8.1386 9.3620 7.6592
## 79 80 81 82 83 84 85 86 87 89 91
## 6.6462 7.1602 7.2470 9.2366 9.4700 8.2150 7.8672 8.5831 7.5581 7.6918 6.6110
## 92 93 94 95 96 97 98 99 101 102 103
## 7.1852 7.7833 7.1381 8.3698 6.9976 8.6090 7.8404 7.1839 8.7655 8.4318 7.6233
## 104 105 106 107 108 110 111 112 113 114 116
## 6.8035 8.5153 8.0884 8.8824 8.3497 8.3088 7.9080 8.4210 8.1295 8.8111 7.9799
## 117 118 119 120 122 123 124 125 126 128 129
## 8.6501 8.3967 7.4098 8.2413 8.7539 8.4339 8.8509 8.0024 8.2790 8.0166 8.3293
## 130 131 132
## 7.9694 8.1688 7.9530
Another option to directly estimate group-specific intercepts is to use the least squares dummy variables (LSDV) estimator, i.e., OLS with \(N\) group dummies excluding the global intercept.
The group dummies are often colloquially called “fixed effects.” For large \(T\), it should be very similar to FE estimator. But it is not a good idea for very small \(T\): estimates of \(\alpha_i\) will be poor due to small \(T\) within each group.
# LSDV estimator using `lm`
plm_lsdv <- plm(
ln_gdpw ~ 0 + reg + edt + country_f, # optional: subtract 1 or set 0 to drop global intercept
index = c("country_f", "year_f"),
effect = "individual",
model = "pooling",
data = data
)
summary(plm_lsdv)
## Pooling Model
##
## Call:
## plm(formula = ln_gdpw ~ 0 + reg + edt + country_f, data = data,
## effect = "individual", model = "pooling", index = c("country_f",
## "year_f"))
##
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.832562 -0.095557 0.015056 0.111956 0.714304
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## reg -0.014014 0.015048 -0.9313 0.3518
## edt 0.167823 0.003712 45.2107 <2e-16 ***
## country_f1 8.692165 0.037078 234.4285 <2e-16 ***
## country_f2 7.179217 0.050311 142.6968 <2e-16 ***
## country_f3 7.421886 0.034538 214.8875 <2e-16 ***
## country_f4 7.688276 0.041769 184.0689 <2e-16 ***
## country_f5 6.590053 0.034200 192.6892 <2e-16 ***
## country_f6 6.523307 0.035696 182.7472 <2e-16 ***
## country_f7 7.232441 0.036072 200.5009 <2e-16 ***
## country_f9 6.902693 0.035198 196.1126 <2e-16 ***
## country_f12 7.623119 0.037553 202.9983 <2e-16 ***
## country_f14 7.830041 0.036720 213.2343 <2e-16 ***
## country_f15 6.409459 0.034835 183.9944 <2e-16 ***
## country_f16 8.496536 0.036453 233.0786 <2e-16 ***
## country_f17 7.323790 0.039538 185.2365 <2e-16 ***
## country_f18 7.250721 0.035904 201.9445 <2e-16 ***
## country_f19 6.987238 0.040469 172.6561 <2e-16 ***
## country_f21 7.942900 0.035070 226.4880 <2e-16 ***
## country_f22 7.004193 0.037813 185.2344 <2e-16 ***
## country_f23 6.452393 0.044311 145.6147 <2e-16 ***
## country_f24 7.445021 0.036058 206.4743 <2e-16 ***
## country_f25 7.121235 0.036557 194.7974 <2e-16 ***
## country_f26 6.401030 0.038870 164.6782 <2e-16 ***
## country_f27 7.135904 0.034896 204.4897 <2e-16 ***
## country_f28 7.795955 0.034840 223.7661 <2e-16 ***
## country_f29 7.702185 0.049563 155.4013 <2e-16 ***
## country_f30 8.229040 0.034960 235.3836 <2e-16 ***
## country_f31 7.024504 0.050963 137.8366 <2e-16 ***
## country_f33 7.469226 0.034981 213.5221 <2e-16 ***
## country_f34 6.689090 0.036526 183.1319 <2e-16 ***
## country_f35 7.604345 0.035057 216.9155 <2e-16 ***
## country_f37 7.661224 0.036000 212.8123 <2e-16 ***
## country_f39 8.104824 0.041534 195.1383 <2e-16 ***
## country_f40 7.614208 0.044193 172.2949 <2e-16 ***
## country_f41 7.984547 0.045110 177.0034 <2e-16 ***
## country_f42 6.510265 0.035187 185.0183 <2e-16 ***
## country_f43 6.762131 0.035903 188.3450 <2e-16 ***
## country_f44 8.453048 0.035486 238.2109 <2e-16 ***
## country_f45 6.885345 0.036245 189.9670 <2e-16 ***
## country_f46 6.579019 0.036460 180.4430 <2e-16 ***
## country_f47 7.489796 0.038509 194.4947 <2e-16 ***
## country_f48 7.306941 0.040315 181.2472 <2e-16 ***
## country_f50 7.731116 0.056431 137.0017 <2e-16 ***
## country_f52 8.373147 0.054087 154.8088 <2e-16 ***
## country_f53 8.256219 0.041741 197.7957 <2e-16 ***
## country_f54 7.946584 0.041003 193.8045 <2e-16 ***
## country_f55 8.013156 0.036821 217.6245 <2e-16 ***
## country_f57 8.455746 0.036999 228.5413 <2e-16 ***
## country_f58 7.163093 0.035763 200.2909 <2e-16 ***
## country_f59 7.805819 0.036596 213.2976 <2e-16 ***
## country_f60 7.480513 0.046400 161.2162 <2e-16 ***
## country_f61 8.847664 0.037756 234.3349 <2e-16 ***
## country_f62 8.202005 0.037018 221.5674 <2e-16 ***
## country_f63 7.971304 0.040845 195.1576 <2e-16 ***
## country_f64 8.895075 0.047209 188.4191 <2e-16 ***
## country_f65 8.405851 0.056110 149.8097 <2e-16 ***
## country_f66 8.603877 0.040941 210.1516 <2e-16 ***
## country_f67 7.917066 0.036889 214.6206 <2e-16 ***
## country_f68 8.511364 0.036764 231.5150 <2e-16 ***
## country_f69 8.233487 0.041333 199.2007 <2e-16 ***
## country_f70 8.364510 0.039341 212.6174 <2e-16 ***
## country_f71 8.014597 0.039677 201.9955 <2e-16 ***
## country_f72 7.593192 0.046809 162.2158 <2e-16 ***
## country_f73 7.545740 0.040401 186.7694 <2e-16 ***
## country_f74 8.148604 0.040606 200.6740 <2e-16 ***
## country_f76 8.138626 0.043015 189.2046 <2e-16 ***
## country_f77 9.361968 0.040101 233.4590 <2e-16 ***
## country_f78 7.659247 0.045408 168.6744 <2e-16 ***
## country_f79 6.646220 0.037217 178.5787 <2e-16 ***
## country_f80 7.160237 0.038725 184.8993 <2e-16 ***
## country_f81 7.246958 0.037038 195.6620 <2e-16 ***
## country_f82 9.236566 0.034915 264.5418 <2e-16 ***
## country_f83 9.470011 0.034855 271.6991 <2e-16 ***
## country_f84 8.214967 0.049984 164.3510 <2e-16 ***
## country_f85 7.867225 0.049690 158.3271 <2e-16 ***
## country_f86 8.583122 0.036291 236.5051 <2e-16 ***
## country_f87 7.558124 0.041461 182.2941 <2e-16 ***
## country_f89 7.691750 0.042721 180.0478 <2e-16 ***
## country_f91 6.611022 0.034733 190.3388 <2e-16 ***
## country_f92 7.185249 0.036577 196.4415 <2e-16 ***
## country_f93 7.783283 0.034854 223.3100 <2e-16 ***
## country_f94 7.138141 0.043016 165.9400 <2e-16 ***
## country_f95 8.369775 0.044039 190.0538 <2e-16 ***
## country_f96 6.997565 0.045182 154.8742 <2e-16 ***
## country_f97 8.609031 0.038564 223.2399 <2e-16 ***
## country_f98 7.840400 0.041052 190.9855 <2e-16 ***
## country_f99 7.183878 0.039623 181.3068 <2e-16 ***
## country_f101 8.765478 0.043786 200.1867 <2e-16 ***
## country_f102 8.431750 0.050976 165.4069 <2e-16 ***
## country_f103 7.623339 0.087233 87.3910 <2e-16 ***
## country_f104 6.803535 0.053491 127.1901 <2e-16 ***
## country_f105 8.515306 0.048098 177.0409 <2e-16 ***
## country_f106 8.088411 0.052565 153.8752 <2e-16 ***
## country_f107 8.882387 0.044449 199.8338 <2e-16 ***
## country_f108 8.349650 0.051757 161.3238 <2e-16 ***
## country_f110 8.308826 0.042299 196.4332 <2e-16 ***
## country_f111 7.908045 0.051903 152.3607 <2e-16 ***
## country_f112 8.420992 0.047897 175.8148 <2e-16 ***
## country_f113 8.129540 0.048559 167.4171 <2e-16 ***
## country_f114 8.811148 0.044203 199.3319 <2e-16 ***
## country_f116 7.979905 0.047038 169.6487 <2e-16 ***
## country_f117 8.650079 0.049233 175.6950 <2e-16 ***
## country_f118 8.396685 0.050589 165.9773 <2e-16 ***
## country_f119 7.409756 0.058229 127.2524 <2e-16 ***
## country_f120 8.241283 0.039397 209.1863 <2e-16 ***
## country_f122 8.753945 0.040785 214.6389 <2e-16 ***
## country_f123 8.433890 0.051287 164.4455 <2e-16 ***
## country_f124 8.850903 0.047364 186.8717 <2e-16 ***
## country_f125 8.002405 0.038647 207.0622 <2e-16 ***
## country_f126 8.279025 0.050960 162.4606 <2e-16 ***
## country_f128 8.016642 0.041468 193.3208 <2e-16 ***
## country_f129 8.329265 0.054070 154.0458 <2e-16 ***
## country_f130 7.969381 0.053386 149.2776 <2e-16 ***
## country_f131 8.168812 0.056465 144.6705 <2e-16 ***
## country_f132 7.952992 0.056730 140.1899 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 3104.6
## Residual Sum of Squares: 91.097
## R-Squared: 0.97066
## Adj. R-Squared: 0.96946
## F-statistic: 58478.9 on 115 and 2785 DF, p-value: < 2.22e-16
coeftest(plm_lsdv, vcov. = vcovHC(plm_lsdv, method = "arellano"))[1:2,]
## Estimate Std. Error t value Pr(>|t|)
## reg -0.01401365 0.03389072 -0.4134951 6.792757e-01
## edt 0.16782325 0.01608840 10.4313214 5.123485e-25
# you can also use lm() to implement LSDV estimator, but it is difficult to adjust for standard errors
# You can estimate the same model using lm, as follows:
summary(lm(ln_gdpw ~ country_f + year_f + reg + edt, data=data))
##
## Call:
## lm(formula = ln_gdpw ~ country_f + year_f + reg + edt, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.64057 -0.07997 0.00780 0.07941 0.59769
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.730613 0.036399 239.857 < 2e-16 ***
## country_f2 -1.917900 0.053018 -36.175 < 2e-16 ***
## country_f3 -1.473490 0.041683 -35.350 < 2e-16 ***
## country_f4 -1.056680 0.044576 -23.705 < 2e-16 ***
## country_f5 -2.453336 0.043279 -56.687 < 2e-16 ***
## country_f6 -2.435972 0.042857 -56.839 < 2e-16 ***
## country_f7 -1.499153 0.041183 -36.402 < 2e-16 ***
## country_f9 -1.995622 0.041999 -47.516 < 2e-16 ***
## country_f12 -0.929167 0.041505 -22.387 < 2e-16 ***
## country_f14 -0.732719 0.041046 -17.851 < 2e-16 ***
## country_f15 -2.619545 0.043429 -60.318 < 2e-16 ***
## country_f16 -0.181423 0.041144 -4.409 1.08e-05 ***
## country_f17 -1.726617 0.045976 -37.555 < 2e-16 ***
## country_f18 -1.417699 0.040828 -34.724 < 2e-16 ***
## country_f19 -2.109640 0.047064 -44.825 < 2e-16 ***
## country_f21 -0.992496 0.042319 -23.453 < 2e-16 ***
## country_f22 -1.691828 0.041942 -40.337 < 2e-16 ***
## country_f23 -1.990957 0.045851 -43.422 < 2e-16 ***
## country_f24 -1.398031 0.041994 -33.291 < 2e-16 ***
## country_f25 -1.543200 0.041152 -37.500 < 2e-16 ***
## country_f26 -2.262115 0.042419 -53.328 < 2e-16 ***
## country_f27 -1.869869 0.043059 -43.426 < 2e-16 ***
## country_f28 -1.248226 0.043533 -28.673 < 2e-16 ***
## country_f29 -0.539974 0.050557 -10.681 < 2e-16 ***
## country_f30 -0.572730 0.041078 -13.943 < 2e-16 ***
## country_f31 -1.871067 0.051378 -36.418 < 2e-16 ***
## country_f33 -1.423227 0.041805 -34.044 < 2e-16 ***
## country_f34 -2.084951 0.041651 -50.057 < 2e-16 ***
## country_f35 -1.335325 0.042359 -31.524 < 2e-16 ***
## country_f37 -1.244580 0.042403 -29.351 < 2e-16 ***
## country_f39 -0.122955 0.045250 -2.717 0.006625 **
## country_f40 -1.407871 0.048167 -29.229 < 2e-16 ***
## country_f41 -0.616258 0.046216 -13.334 < 2e-16 ***
## country_f42 -2.390835 0.042021 -56.896 < 2e-16 ***
## country_f43 -1.995645 0.041242 -48.389 < 2e-16 ***
## country_f44 -0.379319 0.041552 -9.129 < 2e-16 ***
## country_f45 -1.963885 0.042025 -46.732 < 2e-16 ***
## country_f46 -2.012405 0.040934 -49.162 < 2e-16 ***
## country_f47 -1.220878 0.042391 -28.800 < 2e-16 ***
## country_f48 -1.285795 0.043111 -29.825 < 2e-16 ***
## country_f50 -0.006703 0.061292 -0.109 0.912929
## country_f52 0.808902 0.062404 12.962 < 2e-16 ***
## country_f53 -0.113515 0.044777 -2.535 0.011296 *
## country_f54 -0.472017 0.044150 -10.691 < 2e-16 ***
## country_f55 -0.547391 0.041114 -13.314 < 2e-16 ***
## country_f57 -0.307334 0.041925 -7.331 3.00e-13 ***
## country_f58 -1.617781 0.041315 -39.157 < 2e-16 ***
## country_f59 -0.820488 0.041106 -19.960 < 2e-16 ***
## country_f60 -0.632048 0.049739 -12.707 < 2e-16 ***
## country_f61 0.387952 0.041704 9.303 < 2e-16 ***
## country_f62 -0.337781 0.041220 -8.195 3.80e-16 ***
## country_f63 -0.259760 0.044779 -5.801 7.35e-09 ***
## country_f64 0.771826 0.050320 15.338 < 2e-16 ***
## country_f65 0.949886 0.065606 14.479 < 2e-16 ***
## country_f66 0.350150 0.044747 7.825 7.17e-15 ***
## country_f67 -0.651497 0.041172 -15.824 < 2e-16 ***
## country_f68 -0.128995 0.041251 -3.127 0.001784 **
## country_f69 0.008814 0.045210 0.195 0.845440
## country_f70 -0.251850 0.042843 -5.878 4.64e-09 ***
## country_f71 -0.330624 0.043386 -7.621 3.45e-14 ***
## country_f72 -0.635309 0.048574 -13.079 < 2e-16 ***
## country_f73 -0.697559 0.044301 -15.746 < 2e-16 ***
## country_f74 -0.146948 0.044286 -3.318 0.000918 ***
## country_f76 0.010711 0.047175 0.227 0.820411
## country_f77 0.833581 0.043290 19.256 < 2e-16 ***
## country_f78 -1.105945 0.046788 -23.637 < 2e-16 ***
## country_f79 -1.939535 0.041334 -46.923 < 2e-16 ***
## country_f80 -1.539557 0.042681 -36.072 < 2e-16 ***
## country_f81 -1.359055 0.041265 -32.935 < 2e-16 ***
## country_f82 0.426294 0.041119 10.367 < 2e-16 ***
## country_f83 0.647830 0.041182 15.731 < 2e-16 ***
## country_f84 0.418555 0.056035 7.470 1.07e-13 ***
## country_f85 0.053313 0.055587 0.959 0.337601
## country_f86 -0.027722 0.040876 -0.678 0.497704
## country_f87 -0.610045 0.045569 -13.387 < 2e-16 ***
## country_f89 -0.390746 0.047160 -8.286 < 2e-16 ***
## country_f91 -2.241439 0.041358 -54.195 < 2e-16 ***
## country_f92 -1.693048 0.042632 -39.713 < 2e-16 ***
## country_f93 -1.062980 0.041347 -25.709 < 2e-16 ***
## country_f94 -0.939023 0.047529 -19.757 < 2e-16 ***
## country_f95 0.111158 0.046421 2.395 0.016707 *
## country_f96 -0.995379 0.050057 -19.885 < 2e-16 ***
## country_f97 0.158108 0.042187 3.748 0.000182 ***
## country_f98 -0.355784 0.045064 -7.895 4.15e-15 ***
## country_f99 -1.131414 0.043437 -26.047 < 2e-16 ***
## country_f101 0.561476 0.047187 11.899 < 2e-16 ***
## country_f102 0.693318 0.057553 12.047 < 2e-16 ***
## country_f103 -0.350231 0.078815 -4.444 9.19e-06 ***
## country_f104 -0.737881 0.061880 -11.924 < 2e-16 ***
## country_f105 0.604345 0.053201 11.360 < 2e-16 ***
## country_f106 0.440257 0.060017 7.336 2.89e-13 ***
## country_f107 0.727325 0.048048 15.138 < 2e-16 ***
## country_f108 0.655984 0.058760 11.164 < 2e-16 ***
## country_f110 0.082554 0.045937 1.797 0.072424 .
## country_f111 -0.220533 0.052461 -4.204 2.71e-05 ***
## country_f112 0.497444 0.052904 9.403 < 2e-16 ***
## country_f113 0.247127 0.053885 4.586 4.72e-06 ***
## country_f114 0.638162 0.047725 13.372 < 2e-16 ***
## country_f116 -0.192216 0.049565 -3.878 0.000108 ***
## country_f117 0.808797 0.054897 14.733 < 2e-16 ***
## country_f118 0.635817 0.056960 11.163 < 2e-16 ***
## country_f119 -0.436869 0.060082 -7.271 4.62e-13 ***
## country_f120 -0.127130 0.043102 -2.950 0.003209 **
## country_f122 0.502084 0.044633 11.249 < 2e-16 ***
## country_f123 0.713371 0.058033 12.293 < 2e-16 ***
## country_f124 0.893549 0.052122 17.143 < 2e-16 ***
## country_f125 -0.602866 0.042373 -14.228 < 2e-16 ***
## country_f126 0.539690 0.057529 9.381 < 2e-16 ***
## country_f128 -0.200672 0.045153 -4.444 9.17e-06 ***
## country_f129 0.764097 0.062377 12.250 < 2e-16 ***
## country_f130 -0.118661 0.054084 -2.194 0.028318 *
## country_f131 0.731500 0.066171 11.055 < 2e-16 ***
## country_f132 -1.137835 0.056866 -20.009 < 2e-16 ***
## year_f1961 0.035883 0.025082 1.431 0.152652
## year_f1962 0.079099 0.024685 3.204 0.001369 **
## year_f1963 0.105967 0.024648 4.299 1.77e-05 ***
## year_f1964 0.141043 0.024512 5.754 9.67e-09 ***
## year_f1965 0.164482 0.024398 6.742 1.90e-11 ***
## year_f1966 0.186794 0.024221 7.712 1.72e-14 ***
## year_f1967 0.207793 0.024290 8.555 < 2e-16 ***
## year_f1968 0.241314 0.024288 9.936 < 2e-16 ***
## year_f1969 0.282157 0.024376 11.575 < 2e-16 ***
## year_f1970 0.319016 0.024447 13.049 < 2e-16 ***
## year_f1971 0.355129 0.024406 14.551 < 2e-16 ***
## year_f1972 0.376303 0.024535 15.337 < 2e-16 ***
## year_f1973 0.404055 0.024680 16.371 < 2e-16 ***
## year_f1974 0.440244 0.024825 17.734 < 2e-16 ***
## year_f1975 0.452843 0.024904 18.183 < 2e-16 ***
## year_f1976 0.482772 0.025110 19.226 < 2e-16 ***
## year_f1977 0.504968 0.025336 19.930 < 2e-16 ***
## year_f1978 0.526284 0.025574 20.579 < 2e-16 ***
## year_f1979 0.540096 0.025798 20.936 < 2e-16 ***
## year_f1980 0.544420 0.026076 20.878 < 2e-16 ***
## year_f1981 0.547954 0.026301 20.834 < 2e-16 ***
## year_f1982 0.533165 0.026557 20.077 < 2e-16 ***
## year_f1983 0.520768 0.026820 19.417 < 2e-16 ***
## year_f1984 0.517421 0.027102 19.092 < 2e-16 ***
## year_f1985 0.523403 0.027429 19.082 < 2e-16 ***
## year_f1986 0.527585 0.027774 18.996 < 2e-16 ***
## year_f1987 0.527344 0.028255 18.664 < 2e-16 ***
## reg 0.027458 0.012644 2.172 0.029974 *
## edt 0.021767 0.005799 3.754 0.000178 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1497 on 2758 degrees of freedom
## (1226 observations deleted due to missingness)
## Multiple R-squared: 0.9801, Adjusted R-squared: 0.9791
## F-statistic: 962.7 on 141 and 2758 DF, p-value: < 2.2e-16
# You will need other packages to adjust the standard errors in lm model objects. Check out in the internet but coeftest surely have options!
Within and LSDV estimators are equivalent to each other. They give exactly the same estimates.
But we favor one over another based on the following reasons:
plm()
in R
and
areg
in Stata
will correct it).Since the group-level time-invariant unobservables are constant across time, first-differences are an alter- native to purge out it. The first difference model is the following: \[ \begin{aligned} \Delta y_{it} =& y_{it} - y_{i,t-1} \\ =& \color{red}{(\alpha_i - \alpha_i)} + \beta^\prime \underbrace{(x_{it} - x_{i,t-1})}_{\Delta x_{it}} + \underbrace{(\varepsilon_{it} - \varepsilon_{i,t-1})}_{\Delta \varepsilon_{it}} \\ =& \beta^\prime \Delta x_{it} + \Delta \varepsilon_{it} \end{aligned} \]
If \(\Delta \varepsilon_{it}\) are homoskedastic and without serial correlation, then usual OLS standard errors work just fine. Otherwise we need to use GLS to estimate uncertainties.
FD estimator in plm()
will automatically include an
intercept. The intercept in FD model is actually an coefficient for
linear time trend (in the second step of the above equation, it is \(\theta (t+1 - t)\)).
# FD estimator using `plm`
plm_fd <- plm(
ln_gdpw ~ reg + edt - 1,
index = c("country", "year"),
model = "fd", # FD estimator
data = data
)
summary(plm_fd)
## Oneway (individual) effect First-Difference Model
##
## Call:
## plm(formula = ln_gdpw ~ reg + edt - 1, data = data, model = "fd",
## index = c("country", "year"))
##
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
## Observations used in estimation: 2787
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.6344 -0.0102 0.0190 0.0168 0.0469 0.5230
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## reg 0.0061632 0.0082067 0.7510 0.4527
## edt 0.0395749 0.0055039 7.1904 8.264e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 10.679
## Residual Sum of Squares: 11.628
## R-Squared: 5.8796e-05
## Adj. R-Squared: -0.00030025
## F-statistic: 26.1857 on 2 and 2785 DF, p-value: 5.4114e-12
A very common model specificaiton is the two-way fixed effects (TWFE). Note that the TWFE does not provide any causal identification unless pre-treatment parallel trends hold and we assume covariate imbalance between control and treatment group. Even in this case, recent econometric research has found many shortcoming of this model specification for the causal estimation of treatment effects.
\[ y_{it} = \alpha_i + \tau_t + \beta^\prime x_{it} + \varepsilon_{it}\] Where we incorporate a dummy variable for each period \(\tau_t\). However, some researchers argue that this model may be inappropriate to answer some research questions.
# two-way fixed effects
plm_twfe <- plm(ln_gdpw ~ reg + edt,
data = data,
index = c("country_f", "year_f"),
model = "within",
effect = "twoways")
summary(plm_twfe)
## Twoways effects Within Model
##
## Call:
## plm(formula = ln_gdpw ~ reg + edt, data = data, effect = "twoways",
## model = "within", index = c("country_f", "year_f"))
##
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.6405721 -0.0799739 0.0078023 0.0794105 0.5976897
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## reg 0.0274582 0.0126445 2.1716 0.0299740 *
## edt 0.0217672 0.0057988 3.7537 0.0001778 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 62.255
## Residual Sum of Squares: 61.822
## R-Squared: 0.0069488
## Adj. R-Squared: -0.04382
## F-statistic: 9.64946 on 2 and 2758 DF, p-value: 6.6663e-05
# You can estimate the same model using lm, as follows:
summary(lm(ln_gdpw ~ country_f + year_f + reg + edt, data=data))
##
## Call:
## lm(formula = ln_gdpw ~ country_f + year_f + reg + edt, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.64057 -0.07997 0.00780 0.07941 0.59769
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.730613 0.036399 239.857 < 2e-16 ***
## country_f2 -1.917900 0.053018 -36.175 < 2e-16 ***
## country_f3 -1.473490 0.041683 -35.350 < 2e-16 ***
## country_f4 -1.056680 0.044576 -23.705 < 2e-16 ***
## country_f5 -2.453336 0.043279 -56.687 < 2e-16 ***
## country_f6 -2.435972 0.042857 -56.839 < 2e-16 ***
## country_f7 -1.499153 0.041183 -36.402 < 2e-16 ***
## country_f9 -1.995622 0.041999 -47.516 < 2e-16 ***
## country_f12 -0.929167 0.041505 -22.387 < 2e-16 ***
## country_f14 -0.732719 0.041046 -17.851 < 2e-16 ***
## country_f15 -2.619545 0.043429 -60.318 < 2e-16 ***
## country_f16 -0.181423 0.041144 -4.409 1.08e-05 ***
## country_f17 -1.726617 0.045976 -37.555 < 2e-16 ***
## country_f18 -1.417699 0.040828 -34.724 < 2e-16 ***
## country_f19 -2.109640 0.047064 -44.825 < 2e-16 ***
## country_f21 -0.992496 0.042319 -23.453 < 2e-16 ***
## country_f22 -1.691828 0.041942 -40.337 < 2e-16 ***
## country_f23 -1.990957 0.045851 -43.422 < 2e-16 ***
## country_f24 -1.398031 0.041994 -33.291 < 2e-16 ***
## country_f25 -1.543200 0.041152 -37.500 < 2e-16 ***
## country_f26 -2.262115 0.042419 -53.328 < 2e-16 ***
## country_f27 -1.869869 0.043059 -43.426 < 2e-16 ***
## country_f28 -1.248226 0.043533 -28.673 < 2e-16 ***
## country_f29 -0.539974 0.050557 -10.681 < 2e-16 ***
## country_f30 -0.572730 0.041078 -13.943 < 2e-16 ***
## country_f31 -1.871067 0.051378 -36.418 < 2e-16 ***
## country_f33 -1.423227 0.041805 -34.044 < 2e-16 ***
## country_f34 -2.084951 0.041651 -50.057 < 2e-16 ***
## country_f35 -1.335325 0.042359 -31.524 < 2e-16 ***
## country_f37 -1.244580 0.042403 -29.351 < 2e-16 ***
## country_f39 -0.122955 0.045250 -2.717 0.006625 **
## country_f40 -1.407871 0.048167 -29.229 < 2e-16 ***
## country_f41 -0.616258 0.046216 -13.334 < 2e-16 ***
## country_f42 -2.390835 0.042021 -56.896 < 2e-16 ***
## country_f43 -1.995645 0.041242 -48.389 < 2e-16 ***
## country_f44 -0.379319 0.041552 -9.129 < 2e-16 ***
## country_f45 -1.963885 0.042025 -46.732 < 2e-16 ***
## country_f46 -2.012405 0.040934 -49.162 < 2e-16 ***
## country_f47 -1.220878 0.042391 -28.800 < 2e-16 ***
## country_f48 -1.285795 0.043111 -29.825 < 2e-16 ***
## country_f50 -0.006703 0.061292 -0.109 0.912929
## country_f52 0.808902 0.062404 12.962 < 2e-16 ***
## country_f53 -0.113515 0.044777 -2.535 0.011296 *
## country_f54 -0.472017 0.044150 -10.691 < 2e-16 ***
## country_f55 -0.547391 0.041114 -13.314 < 2e-16 ***
## country_f57 -0.307334 0.041925 -7.331 3.00e-13 ***
## country_f58 -1.617781 0.041315 -39.157 < 2e-16 ***
## country_f59 -0.820488 0.041106 -19.960 < 2e-16 ***
## country_f60 -0.632048 0.049739 -12.707 < 2e-16 ***
## country_f61 0.387952 0.041704 9.303 < 2e-16 ***
## country_f62 -0.337781 0.041220 -8.195 3.80e-16 ***
## country_f63 -0.259760 0.044779 -5.801 7.35e-09 ***
## country_f64 0.771826 0.050320 15.338 < 2e-16 ***
## country_f65 0.949886 0.065606 14.479 < 2e-16 ***
## country_f66 0.350150 0.044747 7.825 7.17e-15 ***
## country_f67 -0.651497 0.041172 -15.824 < 2e-16 ***
## country_f68 -0.128995 0.041251 -3.127 0.001784 **
## country_f69 0.008814 0.045210 0.195 0.845440
## country_f70 -0.251850 0.042843 -5.878 4.64e-09 ***
## country_f71 -0.330624 0.043386 -7.621 3.45e-14 ***
## country_f72 -0.635309 0.048574 -13.079 < 2e-16 ***
## country_f73 -0.697559 0.044301 -15.746 < 2e-16 ***
## country_f74 -0.146948 0.044286 -3.318 0.000918 ***
## country_f76 0.010711 0.047175 0.227 0.820411
## country_f77 0.833581 0.043290 19.256 < 2e-16 ***
## country_f78 -1.105945 0.046788 -23.637 < 2e-16 ***
## country_f79 -1.939535 0.041334 -46.923 < 2e-16 ***
## country_f80 -1.539557 0.042681 -36.072 < 2e-16 ***
## country_f81 -1.359055 0.041265 -32.935 < 2e-16 ***
## country_f82 0.426294 0.041119 10.367 < 2e-16 ***
## country_f83 0.647830 0.041182 15.731 < 2e-16 ***
## country_f84 0.418555 0.056035 7.470 1.07e-13 ***
## country_f85 0.053313 0.055587 0.959 0.337601
## country_f86 -0.027722 0.040876 -0.678 0.497704
## country_f87 -0.610045 0.045569 -13.387 < 2e-16 ***
## country_f89 -0.390746 0.047160 -8.286 < 2e-16 ***
## country_f91 -2.241439 0.041358 -54.195 < 2e-16 ***
## country_f92 -1.693048 0.042632 -39.713 < 2e-16 ***
## country_f93 -1.062980 0.041347 -25.709 < 2e-16 ***
## country_f94 -0.939023 0.047529 -19.757 < 2e-16 ***
## country_f95 0.111158 0.046421 2.395 0.016707 *
## country_f96 -0.995379 0.050057 -19.885 < 2e-16 ***
## country_f97 0.158108 0.042187 3.748 0.000182 ***
## country_f98 -0.355784 0.045064 -7.895 4.15e-15 ***
## country_f99 -1.131414 0.043437 -26.047 < 2e-16 ***
## country_f101 0.561476 0.047187 11.899 < 2e-16 ***
## country_f102 0.693318 0.057553 12.047 < 2e-16 ***
## country_f103 -0.350231 0.078815 -4.444 9.19e-06 ***
## country_f104 -0.737881 0.061880 -11.924 < 2e-16 ***
## country_f105 0.604345 0.053201 11.360 < 2e-16 ***
## country_f106 0.440257 0.060017 7.336 2.89e-13 ***
## country_f107 0.727325 0.048048 15.138 < 2e-16 ***
## country_f108 0.655984 0.058760 11.164 < 2e-16 ***
## country_f110 0.082554 0.045937 1.797 0.072424 .
## country_f111 -0.220533 0.052461 -4.204 2.71e-05 ***
## country_f112 0.497444 0.052904 9.403 < 2e-16 ***
## country_f113 0.247127 0.053885 4.586 4.72e-06 ***
## country_f114 0.638162 0.047725 13.372 < 2e-16 ***
## country_f116 -0.192216 0.049565 -3.878 0.000108 ***
## country_f117 0.808797 0.054897 14.733 < 2e-16 ***
## country_f118 0.635817 0.056960 11.163 < 2e-16 ***
## country_f119 -0.436869 0.060082 -7.271 4.62e-13 ***
## country_f120 -0.127130 0.043102 -2.950 0.003209 **
## country_f122 0.502084 0.044633 11.249 < 2e-16 ***
## country_f123 0.713371 0.058033 12.293 < 2e-16 ***
## country_f124 0.893549 0.052122 17.143 < 2e-16 ***
## country_f125 -0.602866 0.042373 -14.228 < 2e-16 ***
## country_f126 0.539690 0.057529 9.381 < 2e-16 ***
## country_f128 -0.200672 0.045153 -4.444 9.17e-06 ***
## country_f129 0.764097 0.062377 12.250 < 2e-16 ***
## country_f130 -0.118661 0.054084 -2.194 0.028318 *
## country_f131 0.731500 0.066171 11.055 < 2e-16 ***
## country_f132 -1.137835 0.056866 -20.009 < 2e-16 ***
## year_f1961 0.035883 0.025082 1.431 0.152652
## year_f1962 0.079099 0.024685 3.204 0.001369 **
## year_f1963 0.105967 0.024648 4.299 1.77e-05 ***
## year_f1964 0.141043 0.024512 5.754 9.67e-09 ***
## year_f1965 0.164482 0.024398 6.742 1.90e-11 ***
## year_f1966 0.186794 0.024221 7.712 1.72e-14 ***
## year_f1967 0.207793 0.024290 8.555 < 2e-16 ***
## year_f1968 0.241314 0.024288 9.936 < 2e-16 ***
## year_f1969 0.282157 0.024376 11.575 < 2e-16 ***
## year_f1970 0.319016 0.024447 13.049 < 2e-16 ***
## year_f1971 0.355129 0.024406 14.551 < 2e-16 ***
## year_f1972 0.376303 0.024535 15.337 < 2e-16 ***
## year_f1973 0.404055 0.024680 16.371 < 2e-16 ***
## year_f1974 0.440244 0.024825 17.734 < 2e-16 ***
## year_f1975 0.452843 0.024904 18.183 < 2e-16 ***
## year_f1976 0.482772 0.025110 19.226 < 2e-16 ***
## year_f1977 0.504968 0.025336 19.930 < 2e-16 ***
## year_f1978 0.526284 0.025574 20.579 < 2e-16 ***
## year_f1979 0.540096 0.025798 20.936 < 2e-16 ***
## year_f1980 0.544420 0.026076 20.878 < 2e-16 ***
## year_f1981 0.547954 0.026301 20.834 < 2e-16 ***
## year_f1982 0.533165 0.026557 20.077 < 2e-16 ***
## year_f1983 0.520768 0.026820 19.417 < 2e-16 ***
## year_f1984 0.517421 0.027102 19.092 < 2e-16 ***
## year_f1985 0.523403 0.027429 19.082 < 2e-16 ***
## year_f1986 0.527585 0.027774 18.996 < 2e-16 ***
## year_f1987 0.527344 0.028255 18.664 < 2e-16 ***
## reg 0.027458 0.012644 2.172 0.029974 *
## edt 0.021767 0.005799 3.754 0.000178 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1497 on 2758 degrees of freedom
## (1226 observations deleted due to missingness)
## Multiple R-squared: 0.9801, Adjusted R-squared: 0.9791
## F-statistic: 962.7 on 141 and 2758 DF, p-value: < 2.2e-16
# Correcting for uncertainty with clustered standard errors:
coeftest(plm_twfe, vcov. = vcovHC(plm_twfe, method = "arellano"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## reg 0.027458 0.027678 0.9921 0.3213
## edt 0.021767 0.020064 1.0849 0.2781
Assumes the individuals come from a common population. Each individual’s unobservable heterogeneous effect on the outcome follows a normal distribution with an unknown variance \(\sigma_\alpha^2\). Then we may have the following model setup:
\[ \begin{aligned} y_{i t} &=\beta_{0}+\beta_{1} x_{i t}+\alpha_{i}+\varepsilon_{i t} \\ \alpha_{i} & \sim \mathcal{N}\left(0, \sigma_{\alpha}^{2}\right) \\ \varepsilon_{i t} & \sim \mathcal{N}\left(0, \sigma_\varepsilon^{2}\right) \end{aligned} \]
plm
(feasible generalized least
squares by defaulty), nlme
(maximum likelihood by default),
and lme4
(restricted maximum likelihood by default)brms
or even manually specify the model in
JAGS
and stan
.plm
cannot deal with mixed effects, i.e., the mean of
\(\alpha_i\) varies due to certain
covariates, i.e. \(\alpha_i \sim
\mathcal{N}(\gamma^\prime z_{it}, \sigma_\alpha^2)\), but
nlme
and lme4
can do so.nlme
can take ARMA structure into account, but
plm
and lme4
can only consider AR terms.nlme
is difficult to specify crossed
(non-nested) random effect structure (i.e. multiple separated random
effects), while lme4
is easy to do so.nlme
and
lme4
, you may be confused by their different use of “fixed
effects” and “random effects.”3
A typical mixed effect model is as follows:
\[ y_{it} = \underbrace{\beta^\prime x_{it}}_\text{"fixed effect"} + \underbrace{\gamma^\prime z_i}_\text{"random effects"} + \varepsilon_{it} \]
or can be specified as follows: \[ \begin{aligned} y_{i t} &=\beta^\prime x_{i t}+\alpha_{i}+\varepsilon_{i t} \\ \alpha_{i} & \sim \mathcal{N}\left(\gamma^\prime z_i, \sigma_{\alpha}^{2}\right) \\ \varepsilon_{i t} & \sim \mathcal{N}\left(0, \sigma_\varepsilon^{2}\right) \end{aligned} \]
# RE model using `plm`
plm_re <- plm(
ln_gdpw ~ reg + edt + oil,
data = data,
index = c("country", "year"),
model = "random",
effect = "individual"
)
summary(plm_re)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = ln_gdpw ~ reg + edt + oil, data = data, effect = "individual",
## model = "random", index = c("country", "year"))
##
## Unbalanced Panel: n = 113, T = 5-28, N = 2900
##
## Effects:
## var std.dev share
## idiosyncratic 0.03271 0.18086 0.091
## individual 0.32784 0.57258 0.909
## theta:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8601 0.9382 0.9404 0.9381 0.9404 0.9404
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.82664 -0.10254 0.01505 0.00081 0.11712 0.75912
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 7.7290183 0.0608325 127.0540 < 2.2e-16 ***
## reg -0.0030963 0.0150650 -0.2055 0.837159
## edt 0.1717169 0.0036682 46.8124 < 2.2e-16 ***
## oil 0.5458284 0.1705342 3.2007 0.001371 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 173.89
## Residual Sum of Squares: 96.136
## R-Squared: 0.44728
## Adj. R-Squared: 0.44671
## Chisq: 2200.57 on 3 DF, p-value: < 2.22e-16
# RE model using `nlme`
lme_re <- lme(
fixed = ln_gdpw ~ reg + edt + oil,
random = ~ 1 | country,
data = data,
na.action = na.omit # need to specify how we deal with NA values, otherwise throws an error
)
summary(lme_re)
## Linear mixed-effects model fit by REML
## Data: data
## AIC BIC logLik
## -996.8293 -961.0027 504.4146
##
## Random effects:
## Formula: ~1 | country
## (Intercept) Residual
## StdDev: 0.6733608 0.1808866
##
## Fixed effects: ln_gdpw ~ reg + edt + oil
## Value Std.Error DF t-value p-value
## (Intercept) 7.735312 0.07004634 2785 110.43135 0.0000
## reg -0.006041 0.01498196 2785 -0.40321 0.6868
## edt 0.170680 0.00366084 2785 46.62312 0.0000
## oil 0.543484 0.19892569 111 2.73210 0.0073
## Correlation:
## (Intr) reg edt
## reg -0.066
## edt -0.258 -0.058
## oil -0.335 0.010 0.032
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.59951054 -0.52967154 0.08334205 0.61991189 4.00624582
##
## Number of Observations: 2900
## Number of Groups: 113
# RE model using `lme4`
lmer_re <- lmer(
ln_gdpw ~ reg + edt + oil + (1 | country),
data = data
)
summary(lmer_re)
## Linear mixed model fit by REML ['lmerMod']
## Formula: ln_gdpw ~ reg + edt + oil + (1 | country)
## Data: data
##
## REML criterion at convergence: -1008.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.5995 -0.5297 0.0833 0.6199 4.0062
##
## Random effects:
## Groups Name Variance Std.Dev.
## country (Intercept) 0.45341 0.6734
## Residual 0.03272 0.1809
## Number of obs: 2900, groups: country, 113
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 7.735312 0.070046 110.431
## reg -0.006041 0.014982 -0.403
## edt 0.170680 0.003661 46.623
## oil 0.543484 0.198926 2.732
##
## Correlation of Fixed Effects:
## (Intr) reg edt
## reg -0.066
## edt -0.258 -0.058
## oil -0.335 0.010 0.032
In FE model, we assume conditional ignorability \(E[\varepsilon_{it}|x_{it}, u_i] = 0\). That is, we assume that conditional on unobservable confounder \(u_i\), \(\varepsilon_{it}\) and \(x_{it}\) is uncorrelated. \(x_{it}\) and \(u_i\) have some correlation.
In RE model, we assume there is no unmeasured confounder, so ignorability holds without conditioning on \(u_i\): \(E[\varepsilon_{it} | x_{it}] = E[u_i] = 0\). \(x_{it}\) and \(u_i\) have no correlation.
But an intuitive way to think about it is that FE model rule out between-group variation; but in RE model, we preserve the information of between-group variation by modeling \(\sigma_\alpha^2\).
It is difficult to show it mathematically, because it requires to go through the variance-covariance matrix of the random effects and requires some understanding what Gaussian processes are. If you want to learn how to model random effects, you should consider taking CS&SS 560 and CS&SS 592.
We can think of RE model doing quasi-demeaning or partial-pooling:
\[ y_{it} - \theta\bar{y}_i = \beta^\prime (x_{it} - \theta \bar{x}_i) + (\varepsilon_{it} - \theta\bar{\varepsilon}_i) \] where \(\theta = 0\) implies pooled OLS and \(\theta = 1\) implies fixed effects. Doing some math shows that
\[ \theta = 1 - \sqrt{\frac{\sigma_\varepsilon^2}{\sigma_\varepsilon^2 + T \cdot \sigma_\alpha^2}} \]
We know that for some countries which are good at growth maintenance,
gdpw
may be a non-stationary time series; whereas for some
countries experienced economic fluctuation, their gdpw
may
have mean-reverting or even worse behavior in the long run.
ggplot(pdata, aes(x = year, y = gdpw, group = country, color = region)) +
geom_line(alpha = 0.5) +
theme_bw()
ggplot(pdata, aes(x = year, y = ln_gdpw, group = country, color = region)) +
geom_line(alpha = 0.5) +
theme_bw()
We can use statistical tests to formally detect stationarity of time series in panel data.
purtest()
in plm
packagepunitroots
package: the author no longer maintain this
packageadf.test()
and pp.test()
to each
time series, and plot the distribution of p values.pseries <- split(pdata$gdpw, pdata$country_f)
# Illustration: pp test for a single panel/ts
pseries[1]
## $`1`
## 1-1962 1-1963 1-1964 1-1965 1-1966 1-1967 1-1968 1-1969 1-1970 1-1971 1-1972
## 5012 6083 6502 6620 6612 6982 7848 8378 8536 7816 9372
## 1-1973 1-1974 1-1975 1-1976 1-1977 1-1978 1-1979 1-1980 1-1981 1-1982 1-1983
## 9361 10480 10870 11381 11653 11971 12956 12710 12660 12876 13177
## 1-1984 1-1985 1-1986 1-1987 1-1988 1-1989 1-1990
## 13381 13434 13278 12962 12266 12530 12176
pp.test(pseries$`1`)
##
## Phillips-Perron Unit Root Test
##
## data: pseries$`1`
## Dickey-Fuller Z(alpha) = -0.14626, Truncation lag parameter = 2,
## p-value = 0.99
## alternative hypothesis: stationary
pp.test(pseries$`1`)[["p.value"]]
## [1] 0.99
# Now we use sapply() function to loop this analysis in all the panels
pp_pval <- sapply(pseries, function(x){pp.test(x)[["p.value"]]})
adf_pval <- sapply(pseries, function(x){adf.test(x)[["p.value"]]})
# Look at the distribution of the statistical tests to assess stationary
hist(pp_pval, breaks = 100)
hist(adf_pval, breaks = 100)
We detect many countries have non-stationary time series for the level dependent variable, but what about a differences dependent variable?
pdata$gdpw_diff <- diff(pdata$gdpw) # because we already have a pdata.frame, we can safely use plm::diff() to transform it
ggplot(pdata, aes(x = year, y = gdpw_diff, group = country, color = region)) +
geom_line(alpha = 0.5) +
theme_bw() # much more stationary!
pseries <- split(pdata$gdpw_diff, pdata$country)
# have NA values in the first observation, need to remove them in statistical tests
# sapply will not include NAs
pp_pval <- sapply(pseries, function(x){pp.test(na.omit(x))[["p.value"]]})
adf_pval <- sapply(pseries, function(x){adf.test(na.omit(x))[["p.value"]]})
hist(pp_pval, breaks = 100)
hist(adf_pval, breaks = 100)
We will create lags using the plm::lag
function into
pdata
class object. However, notice that you can perform
the same operation using dplyr lag
in a tidy class object
with group_by()
specified.
# create lags using pdata class
pdata$gdpw_diff_lag1 <- lag(pdata$gdpw_diff)
pdata$gdpw_diff_lag2 <- lag(pdata$gdpw_diff, 2)
# fixed effect ARIMA(1, 1, 0); cannot incorporate MA structure
fe_ar1 <- plm(
gdpw_diff ~ gdpw_diff_lag1 + reg + edt,
data = pdata,
model = "within",
effect = "individual"
)
# fixed effect ARIMA(2, 1, 0)
fe_ar2 <- plm(
gdpw_diff ~ gdpw_diff_lag1 + gdpw_diff_lag2 + reg + edt,
data = pdata,
model = "within",
effect = "individual"
)
# Breusch–Godfrey Test for Panel Models
# null means no serial correlation
pbgtest(fe_ar1) # there is remaining serial correlation
##
## Breusch-Godfrey/Wooldridge test for serial correlation in panel models
##
## data: gdpw_diff ~ gdpw_diff_lag1 + reg + edt
## chisq = 37.209, df = 3, p-value = 4.155e-08
## alternative hypothesis: serial correlation in idiosyncratic errors
pbgtest(fe_ar2) # there is no serial correlation for AR(2) model
##
## Breusch-Godfrey/Wooldridge test for serial correlation in panel models
##
## data: gdpw_diff ~ gdpw_diff_lag1 + gdpw_diff_lag2 + reg + edt
## chisq = 4.0656, df = 2, p-value = 0.131
## alternative hypothesis: serial correlation in idiosyncratic errors
In contrast to plm
, lme
estimates the panel
using ML
or REML
, and it allows the
specificaiton of moving average processes.
# random effects ARIMA(1, 1, 0)
# note: the time varibale must be an integer, not a character or factor.
re_arima <- lme(
fixed = gdpw_diff ~ reg + edt + oil,
random = ~ 1 | country,
correlation = corARMA(
# year nested in country
form = ~ year | country,
p = 1, # AR(p)
q = 0 # MA(q)
),
data = pdata,
na.action = na.omit
)
summary(re_arima)
## Linear mixed-effects model fit by REML
## Data: pdata
## AIC BIC logLik
## 44165.63 44207.3 -22075.81
##
## Random effects:
## Formula: ~1 | country
## (Intercept) Residual
## StdDev: 0.2261146 581.998
##
## Correlation Structure: AR(1)
## Formula: ~year | country
## Parameter estimate(s):
## Phi
## 0.2268233
## Fixed effects: gdpw_diff ~ reg + edt + oil
## Value Std.Error DF t-value p-value
## (Intercept) 27.43342 26.89294 2732 1.020097 0.3078
## reg 107.53138 33.46420 2732 3.213326 0.0013
## edt 24.95241 5.35870 2732 4.656433 0.0000
## oil -14.76787 43.07012 111 -0.342880 0.7323
## Correlation:
## (Intr) reg edt
## reg 0.071
## edt -0.735 -0.561
## oil -0.328 0.018 0.137
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -16.05507865 -0.28772677 -0.03423844 0.33234239 13.38099374
##
## Number of Observations: 2847
## Number of Groups: 113
There is a tradeoff when we both incorporate FE and lagged dependent variable in one model.
The estimate of \(\phi\) will always be underestimated in dynamic FE model, but the bias vanishes as \(T\) increases. This is called “Nickell bias.”
To obtain an unbiased estimate with small \(T\), we may choose dynamic panel data model, or prefer either FE or LDV model. If we care about causal inference and want to use a different-in-difference design (suppose it is a simple \(2\times 2\) DID) to estimate treatment effect, then we may need to think about the underlying causal identification assumption:
FE and LDV have a very useful bracketing property (Angrist and Pischke 2009; Ding and Li 2019):
Recent literature attempts to go beyond the additive assumption of unobservable confounders. We can allow unit-level effect and time effect interact with each other so that we can account for time-invariant and time-varying unit-level confounders at the same time. Recent literature consider decomposing the matrix of error terms to several latent factors (e.g. use PCA) to account for this interactive structure (Bai 2009; Xu 2017).
nlme
package, we use fixed.effects()
or
fixef()
to extract coefficients that are fixed across all
groups. (note that here we have different meaning of “fixed effect” and
“random effect.”)random.effects()
or ranef()
.coefficients()
or coef()
will
aggregate “fixed effect” and “random effect” for each group.sims <- 1000
simbetas <- mvrnorm(sims,
fixef(re_arima),
vcov(re_arima))
# Make matrix of hypothetical x's
formula <- gdpw_diff ~ reg + edt + oil
periods.out <- 50
xhyp <- cfMake(formula, pdata, periods.out)
for (i in 1:periods.out)
xhyp <- cfChange(xhyp, "edt", x = mean(pdata$edt, na.rm=TRUE) + sd(pdata$edt, na.rm=TRUE), scen=i) # we assume a permanent increase in education (+ 1 SD) over the next 50 years
phi <- 0.25 # from model summary()
lagY <- mean(pdata$gdpw_diff, na.rm = TRUE) # Hypothetical previous change in Y for simulation
initialY <- mean(pdata$gdpw, na.rm = TRUE) # Hypothetical initial level of Y for simulation
# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_evre <- ldvsimev(xhyp, # The matrix of hypothetical x's
simbetas, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=1, # Column containing the constant
# set to NA for no constant
phi=phi, # estimated AR parameters; length must match lagY
lagY=lagY, # lags of y, most recent last
transform="diff", # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
initialY=initialY # for differenced models, the lag of the level of y
)
# Simulate first differences in Y (on original level scale)
# out to periods.out given hypothetical future values of X, X0,
# and initial lags of the change in Y
sim_fdre <- ldvsimfd(xhyp, # The matrix of hypothetical x's
simbetas, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=1, # Column containing the constant
# set to NA for no constant
phi=phi, # estimated AR parameters; length must match lagY
lagY=lagY, # lags of y, most recent last
transform="diff" # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
)
Now plot with base R.
plot.new()
par(usr=c(1,50,-15000,15000))
axis(1,at=seq(1,50,10))
axis(2,at=seq(-15000,15000,5000))
title(xlab="Time",ylab="Expected change in constant GDP ($ pc)",main="Random effects ARIMA(1,1,0)")
# Make the x-coord of a confidence envelope polygon
xpoly <- c(1:periods.out,
rev(1:periods.out),
1)
# Make the y-coord of a confidence envelope polygon
ypoly <- c(sim_fdre$lower,
rev(sim_fdre$upper),
sim_fdre$lower[1])
# Choose the color of the polygon
col <- "lightblue"
# Plot the polygon first, before the points & lines
polygon(x=xpoly,
y=ypoly,
col=col,
border=FALSE
)
# Plot the fitted line
lines(x=1:periods.out,y=sim_fdre$pe,col="darkblue")
lines(x=c(0,50),y=c(0,0),lty="solid")
fixef()
to extract group-specific
intercepts.sims <- 1000
simparam <- mvrnorm(sims, coef(fe_ar1), vcovBK(fe_ar1))
# Pull off the simulated lag coefficient
simphi <- simparam[,1]
# Put together the "constant" term (avg of the FEs, or a specific FE if you like)
# with the rest of the regressors
simbetas <- cbind(rep(mean(fixef(fe_ar1)), sims), simparam[,2:ncol(simparam)])
# Make matrix of hypothetical x's: drop OIL!
# and now we need hypothetical countries!
# and no intercept!
# Make matrix of hypothetical x's
# Note the formula below is a little different; we've removed the lag
# and will let the simulator handle lag structure
formula <- gdpw_diff ~ reg + edt
periods.out <- 50
xhyp <- cfMake(formula, pdata, periods.out)
for (i in 1:periods.out)
xhyp <- cfChange(xhyp, "edt", x=mean(pdata$edt,na.rm=TRUE)+sd(pdata$edt,na.rm=TRUE), scen=i)
phi <- mean(simphi)
lagY <- mean(pdata$gdpw_diff, na.rm = TRUE) # Hypothetical previous change in Y for simulation
initialY <- mean(pdata$gdpw, na.rm = TRUE) # Hypothetical initial level of Y for simulation
# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_evfe <- ldvsimev(xhyp, # The matrix of hypothetical x's
simbetas, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # NA indicates no constant!
phi=phi, # estimated AR parameters; length must match lagY
lagY=lagY, # lags of y, most recent last
transform="diff", # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
initialY=initialY # for differenced models, the lag of the level of y
)
# Simulate first differences in Y (on original level scale)
# out to periods.out given hypothetical future values of X, X0,
# and initial lags of the change in Y
sim_fdfe <- ldvsimfd(xhyp, # The matrix of hypothetical x's
simbetas, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # Column containing the constant
# set to NA for no constant
phi=phi, # estimated AR parameters; length must match lagY
lagY=lagY, # lags of y, most recent last
transform="diff" # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
)
Now plot with base R.
plot.new()
par(usr=c(1,50,-15000,15000))
axis(1,at=seq(1,50,10))
axis(2,at=seq(-15000,15000,5000))
title(xlab="Time",ylab="Expected change in constant GDP ($ pc)",main="Fixed effects ARIMA(1,1,0), pcse")
# Make the x-coord of a confidence envelope polygon
xpoly <- c(1:periods.out,
rev(1:periods.out),
1)
# Make the y-coord of a confidence envelope polygon
ypoly <- c(sim_fdfe$lower,
rev(sim_fdfe$upper),
sim_fdfe$lower[1])
# Choose the color of the polygon
col <- "lightgreen"
# Plot the polygon first, before the points & lines
polygon(x=xpoly,
y=ypoly,
col=col,
border=FALSE
)
# Plot the fitted line
lines(x=1:periods.out,y=sim_fdfe$pe,col="darkgreen")
lines(x=c(0,50),y=c(0,0),lty="solid")