Recall that we can remove fixed effects by differencing:
\[ \begin{aligned} y_{it} & = \phi y_{it-1} + \alpha_i + \epsilon_{it} \\ y_{it} - \bar{y_i} & = \phi(y_{it-1} - \bar{y_i})+(x_{it}-\bar{x}_{it})\beta+(\epsilon_{it}-\bar{\epsilon}_{it}) \end{aligned} \]
Alternatively, we can include dummy variables for each group. This is the “within” estimator or fixed effects model. However, we introduce bias when we difference the model in this way.
This is because \(\bar{y}_i\) is computed using the past \(y\)’s. This is correlated with \(\bar{\epsilon}_{it}\), which is computed using the past \(\epsilon\)’s. Or:
\[ \bar{y_i} = \frac{\bar{X_i}\beta + \bar{\epsilon_i}}{1-\phi} \]
Specifically, this creates bias in the lagged dependent variable (LDV). If the other regressors are correlated with the LDV, then their coefficients may also be seriously biased.
The degree of bias is order \(1/T\), so it is big for small \(T\).
Furthermore,
The bias increases as \(\beta\) decreases
The bias increases as \(\phi\) increases
Small \(N\) is not the problem. Small \(T\) is the problem
Increasing \(N\) does not mitigate the problem. Purging serial correlation in the errors or getting the specification right (including other regressors) also doesn’t solve the problem.
We therefore turn to instrumental variables.
Recall that an instrumental variable must fulfill two conditions:
Relevance: \(z\) is correlated with \(x\);
Exogeneity: \(z\) is uncorrelated with \(\epsilon\). It must influence \(y\) only through \(x\).
\[ \begin{aligned} y_{it} - \bar{y_i} & = \phi(y_{it-1} - \bar{y_i})+(x_{it}-\bar{x}_{it})\beta+(\epsilon_{it}-\bar{\epsilon}_{it}) \\ \Delta y_{it} & = \phi \Delta y_{it-1} + \Delta x_{it}\beta + \Delta \epsilon_{it} \end{aligned} \]
We use lagged levels and lagged differences of \(\Delta y_{it}\) as instruments for the LDV. These help to predict \(\Delta y_{it}\) but not \(\Delta \epsilon_{it}\) if the errors are iid (see lecture slides for why).
Note: This is different than instrumenting \(x_{it}\). We are not addressing the endogeneity that may exist there. One should not be conflated with the other.
Recall what is a moment in statistics. Moments are used to describe features of a distribution or a dataset, and provide information about the central tendency, the shape, and many other distributional properties. For example, the mean and the variance are the first and second central moments of a distribution. The motivation behind GMM estimation is that the usual least squares IV approach cannot handle multiple equations.
To give a brief overview of the intuition behind GMM, consider a linear model:
\[y_t = \boldsymbol{x}_t \boldsymbol{\beta} + e_t\]
Recall that the exogeneity assumption (aka no misspsecification) holds under Gauss-Markov.
\[E[\boldsymbol{x}_t\epsilon_t]=0\]
Assume there exists some combination instrumental variables \(\boldsymbol{z}_t\) that implies the following:
\[ \begin{aligned} E[\boldsymbol{z}_t\epsilon_t] & =0 \\ E[\boldsymbol{z}_t\epsilon_t] & = E[\boldsymbol{z}_t(y_t-\boldsymbol{x}_t\boldsymbol{\beta})]=0 \end{aligned} \]
Intuition: GMM estimates a General Weighting Matrix \(W\) that returns uncorrelated covariances between the set of instruments \(Z_t\) and the predicted errors \(\hat{e_t}\) from the sample.
Recall the distinction between DGP/population and sample in statistical inference:
\[E[\boldsymbol{z}_t(y_t-\boldsymbol{x}_t\boldsymbol{\beta})]=0\]
\[\frac{1}{n}\sum^n_{t=1}\boldsymbol{z}_t(y-\boldsymbol{x_t}\boldsymbol{\beta})\]
\[\substack{\frac{1}{n}\sum^n_{t=1}z_{1t}(y-\boldsymbol{x}_t \boldsymbol{\beta})\\ \vdots\\ \frac{1}{n}\sum^n_{t=1}z_{Kt}(y-\boldsymbol{x}_t \boldsymbol{\beta}) }\]
Therefore, a generalized method of moments (GMM) esitmator of \(\beta\) is a vector of \(\hat{\beta}\) that solves the problem:
\[ \hat{\beta}_{\text{gmm}} = \underset{b}{\text{arg min}} \left[ \sum Z_i'(y_i - X_ib) \right]' \hat{W} \left[ \sum Z_i'(y_i - X_ib) \right] \]
We can compute the above analytically or numerically, where weighting matrix \(\hat{W}\) provides a solution that minimizes the weighted sum fo squares of variances between instruments and residuals. Thus,
\[\boldsymbol{S}_{zy}-\boldsymbol{S}_{zx}\boldsymbol{\beta}=0 \;\;\; \text{where}\]
\[\boldsymbol{S}_{zy}=n^{-1}\sum^n_{t=1}\boldsymbol{z}_t y_{t} \;\;\; \text{and} \;\;\; \boldsymbol{S}_{zx}=n^{-1}\sum^n_{t=1}\boldsymbol{z}_t x_{t}\]
We solve for
\[\hat{\boldsymbol{\beta}}=\boldsymbol{S}^{-1}_{zx}\boldsymbol{S}_{xy}\]
\[\hat{\boldsymbol{\beta}}=\boldsymbol{S}^{-1}_{zx}\boldsymbol{S}_{xy}\]
Important: GMM is not magical solution to endogeneity or misspecification. GMM is only and inference technique, such as least squares or maximum likelihood. Rather, is the estimator specification of some authors that tries to address Nickell bias from the lagged panel models.
Anderson-Hsiao estimator: uses the second and third lagged levels as instruments.
Arellano-Bond Difference GMM: uses \(\Delta y\) as the outcome and all available lagged levels as instruments in each period.
Arellano-Bover/Blundell-Bond System GMM: adds the available lagged differences as instruments.
When using panel GMM, we need to verify some modeling assumptions:
Variable | Description1 |
---|---|
cpi |
consumer price index |
pop |
state population |
packpc |
number of packs per capita |
income95 |
state personal income (total) |
income95pc |
state personal income (per capita) |
tax95 |
average state, federal, and average local excise taxes for fiscal year |
taxs95 |
average excise taxes for fiscal year, including sales taxes |
avgprs95 |
average price during fiscal year, including sales taxes (cents) |
pretax95 |
pretax95 = avgprs95 -
taxs95 |
# Load cigarette consumption data (Jonathan Gruber, MIT)
# Variables (see codebook):
# state year cpi pop packpc income tax avgprs taxs
data <- read.csv("data/cigarette.csv")
colnames(data)
## [1] "state" "year" "cpi" "pop" "packpc" "income" "tax" "avgprs"
## [9] "taxs"
data$state %>% unique() %>% length() # N = 48
## [1] 48
data$year %>% unique() %>% length() # T = 11
## [1] 11
# Quick inflation adjustment to 1995 dollars
inflAdjust <- function(x,cpi,year,target) {
unique(cpi[year==target])*x/cpi
#Multiply x with cpi in target year then divide by cpi in observed year
}
data <-
data %>%
mutate(
# Make adjustments to state personal income
income95 = inflAdjust(income, cpi, year, 1995),
# Average state, federal, and average local excise taxes
tax95 = inflAdjust(tax, cpi, year, 1995),
# Average price, including sales taxes
avgprs95 = inflAdjust(avgprs, cpi, year, 1995),
# Average excise taxes, including sales taxes
taxs95 = inflAdjust(taxs, cpi, year, 1995),
# Create per capita income (in k)
income95pc = income95/pop,
# Create pretax price, 1995 dollars
pretax95 = avgprs95 - taxs95
)
attach(data)
# Look at the time series
data %>%
ggplot(aes(x = year,
y = packpc,
group = state,
color = state)) +
geom_line() +
scale_x_continuous(breaks = seq(1985, 1995))
# Look at the ACF and PACF
acf_data <-
split(data$packpc, data$state) %>%
map(function(x) acf(x, plot = FALSE)$acf) %>%
map(as.vector) %>%
bind_cols() %>%
mutate(lag = 0:10) %>%
pivot_longer(!lag, names_to = "state", values_to = "acf") %>%
arrange(state, lag)
acf_data %>%
ggplot(aes(x = lag, y = acf)) +
geom_bar(stat = "identity", width = 0.1) +
geom_hline(yintercept = 0.2, color="red",
linetype="dashed", linewidth=0.2) +
geom_hline(yintercept = -0.2, color="red",
linetype="dashed", linewidth=0.2) +
facet_wrap(state~.) +
labs(title = "ACF of US states")
pacf_data <-
split(data$packpc, data$state) %>%
map(function(x) pacf(x, plot = FALSE)$acf) %>%
map(as.vector) %>%
bind_cols() %>%
mutate(lag = 1:10) %>%
pivot_longer(!lag, names_to = "state", values_to = "pacf") %>%
arrange(state, lag)
pacf_data %>%
ggplot(aes(x = lag, y = pacf)) +
geom_bar(stat = "identity", width = 0.1) +
geom_hline(yintercept = 0.2, color="red",
linetype="dashed", linewidth=0.2) +
geom_hline(yintercept = -0.2, color="red",
linetype="dashed", linewidth=0.2) +
facet_wrap(state~.) +
labs(title = "PACF of US states")
# Are the series stationary?
pseries <- split(data$packpc, data$state)
pp_pval <- sapply(pseries, function(x){pp.test(x)[["p.value"]]})
adf_pval <- sapply(pseries, function(x){adf.test(x)[["p.value"]]})
hist(pp_pval, breaks = 10)
hist(adf_pval, breaks = 10)
# Check for time invariant variables:
pvar(data, index = c("state", "year"))
## no time variation: state
## no individual variation: year cpi
# "within" option tells plm to do fixed effects
# Note that if you want to add year fixed effects then set effect="time" and for both state
# and year fixed effects set effect effect="twoway"
fe_res <- plm(packpc ~ income95pc + pretax95 + taxs95, data = data, model="within", effect="twoway")
coeftest(fe_res, vcov. = vcovHC, method = "arellano")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## income95pc 0.969966 0.643947 1.5063 0.13267
## pretax95 -0.188551 0.077797 -2.4236 0.01575 *
## taxs95 -0.481852 0.044020 -10.9462 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Some tests for serial correlation of errors (needed because we have a linear regression
# with lags of the dependent variable on the RHS
# the standard LM test (note we could specify order)
pbgtest(fe_res)
##
## Breusch-Godfrey/Wooldridge test for serial correlation in panel models
##
## data: packpc ~ income95pc + pretax95 + taxs95
## chisq = 129.6, df = 11, p-value < 2.2e-16
## alternative hypothesis: serial correlation in idiosyncratic errors
# Estimate a random effects AR(I)MA(p,q) model using lme (Restricted ML)
re_res <- lme(# A formula object including the response,
# the fixed covariates, and any grouping variables
fixed = packpc ~ income95pc + pretax95 + taxs95,
# i.e. response variable and explanatory variables
# The random effects component
random = ~ 1 | state,
# 1 indicates the intercept and state indicates the grouping
# The TS dynamics: specify the time & group variables,
# and the order of the ARMA(p,q) process
correlation = corARMA(form = ~ year | state,
p = 1, # AR(p) order
q = 0 # MA(q) order
)
)
summary(re_res)
## Linear mixed-effects model fit by REML
## Data: NULL
## AIC BIC logLik
## 3253.21 3283.04 -1619.605
##
## Random effects:
## Formula: ~1 | state
## (Intercept) Residual
## StdDev: 0.01123977 20.92621
##
## Correlation Structure: AR(1)
## Formula: ~year | state
## Parameter estimate(s):
## Phi
## 0.9764735
## Fixed effects: packpc ~ income95pc + pretax95 + taxs95
## Value Std.Error DF t-value p-value
## (Intercept) 173.08136 8.574965 477 20.184498 0.0000
## income95pc -1.05746 0.387602 477 -2.728198 0.0066
## pretax95 -0.14537 0.024800 477 -5.861684 0.0000
## taxs95 -0.46630 0.040769 477 -11.437827 0.0000
## Correlation:
## (Intr) incm95 prtx95
## income95pc -0.856
## pretax95 -0.099 -0.223
## taxs95 -0.160 -0.097 -0.035
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.79963471 -0.57137010 -0.08122771 0.45749547 3.97887773
##
## Number of Observations: 528
## Number of Groups: 48
First, you will need to create an object data with class
pdata.frame
.
# Panel based diagnostics available in the plm library
# (This package recently expanded to contain many many panel data tests
# for serial correlation, fixed effects, and unit roots)
# First, create a plm data frame (special data frame that "knows" the
# unit variable and time variable
pdata <- pdata.frame(data, index=c("state", "year"))
head(pdata)
## state year cpi pop packpc income tax avgprs taxs
## AL-1985 AL 1985 1.076 3973000 116.4863 46014968 32.5 102.1817 33.34834
## AL-1986 AL 1986 1.096 3992000 117.1593 48703940 32.5 107.9892 33.40584
## AL-1987 AL 1987 1.136 4016000 115.8367 51846312 32.5 113.5273 33.46067
## AL-1988 AL 1988 1.183 4024000 115.2584 55698852 32.5 120.0334 33.52509
## AL-1989 AL 1989 1.240 4030000 109.2060 60044480 32.5 133.2560 33.65600
## AL-1990 AL 1990 1.307 4048508 111.7449 64094948 32.5 143.4486 33.75692
## income95 tax95 avgprs95 taxs95 income95pc pretax95
## AL-1985 65173615 46.03160 144.7257 47.23314 16.40413 97.49257
## AL-1986 67723361 45.19161 150.1601 46.45118 16.96477 103.70893
## AL-1987 69554378 43.60035 152.3025 44.88914 17.31932 107.41336
## AL-1988 71754049 41.86813 154.6331 43.18869 17.83152 111.44437
## AL-1989 73796599 39.94355 163.7759 41.36431 18.31181 122.41162
## AL-1990 74736574 37.89595 167.2652 39.36155 18.46028 127.90366
# Do an panel unit root test on the undifferenced cigarette data;
# there are many options; see ?purtest
# Note: this purtest() function isn't working due to unbalanced panel
#purtest(packpc~1, data=pdata, test="ips")
To estimate a dynamic panel model, we use the function
plm::pgmm
. To estimate Arellano-Bond with GMM for fixed
effects with lagged DV, pgmm needs formulas in a specific format:
# Estimate Arellano-Bond GMM for fixed effects with lagged DV
# note that pgmm formulas construct lag() properly for pdata.frame,
# if the data is not pdata.frame, lag() will create a wrong lag structure
formula_linear <-
packpc ~
lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99)
# if you don't want to include too many instruments, you can just set the second part as `lag(packpc, 2:...)`
pgmm_1a <- pgmm(
formula_linear,
data = pdata,
effect = "individual", # should consider two-way (default) for small T; can also set `individual` FE
transformation = "d" # should do `ld` if T=3, `d` for difference GMM and `ld` for system GMM; default is `d` (difference GMM)
)
Sargan test has a null of the instruments as a group being exogenous- higher p is good. The residuals of the differences equations should exhibit AR(1) but not AR(2) behavior higher p for AR(2) test is good
summary(pgmm_1a)
## Oneway (individual) effect One-step model Difference GMM
##
## Call:
## pgmm(formula = formula_linear, data = pdata, effect = "individual",
## transformation = "d")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 432
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -25.4308 -2.5080 0.1463 0.1238 2.7384 25.6025
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(packpc, 1) 0.638987 0.055342 11.5462 < 2.2e-16 ***
## income95pc -0.475568 0.486760 -0.9770 0.3286
## avgprs95 -0.180791 0.027799 -6.5035 7.848e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(44) = 47.99763 (p-value = 0.31401)
## Autocorrelation test (1): normal = -3.948861 (p-value = 7.8524e-05)
## Autocorrelation test (2): normal = -0.5688819 (p-value = 0.56944)
## Wald test for coefficients: chisq(3) = 2496.07 (p-value = < 2.22e-16)
# Good Sargan test, Good AR(2) test
Some examples of Mis-specified models.
# only include lag 2 and lag 3 terms
pgmm(
packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:3),
data = pdata, effect = "individual", transformation = "d"
) %>% summary()
## Oneway (individual) effect One-step model Difference GMM
##
## Call:
## pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
## lag(packpc, 2:3), data = pdata, effect = "individual", transformation = "d")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 432
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -25.9207 -2.4789 0.1232 0.1159 2.7106 25.5997
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(packpc, 1) 0.660475 0.052854 12.4963 < 2.2e-16 ***
## income95pc -0.258571 0.456052 -0.5670 0.5707
## avgprs95 -0.175368 0.026891 -6.5215 6.96e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(16) = 40.04482 (p-value = 0.00076694)
## Autocorrelation test (1): normal = -3.831746 (p-value = 0.00012724)
## Autocorrelation test (2): normal = -0.4941905 (p-value = 0.62117)
## Wald test for coefficients: chisq(3) = 2405.391 (p-value = < 2.22e-16)
# only include lag 2-5 terms
pgmm(
packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:5),
data = pdata, effect = "individual", transformation = "d"
) %>% summary()
## Oneway (individual) effect One-step model Difference GMM
##
## Call:
## pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
## lag(packpc, 2:5), data = pdata, effect = "individual", transformation = "d")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 432
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -25.6948 -2.5357 0.1483 0.1217 2.6744 25.5813
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(packpc, 1) 0.650350 0.055545 11.7086 < 2.2e-16 ***
## income95pc -0.325994 0.497744 -0.6549 0.5125
## avgprs95 -0.181297 0.027242 -6.6550 2.834e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(29) = 42.46279 (p-value = 0.050998)
## Autocorrelation test (1): normal = -3.907354 (p-value = 9.3312e-05)
## Autocorrelation test (2): normal = -0.5460146 (p-value = 0.58506)
## Wald test for coefficients: chisq(3) = 2503.149 (p-value = < 2.22e-16)
Let’s try different model specifications:
# model 1a: linear + difference GMM + individual FE
summary(pgmm_1a)
## Oneway (individual) effect One-step model Difference GMM
##
## Call:
## pgmm(formula = formula_linear, data = pdata, effect = "individual",
## transformation = "d")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 432
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -25.4308 -2.5080 0.1463 0.1238 2.7384 25.6025
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(packpc, 1) 0.638987 0.055342 11.5462 < 2.2e-16 ***
## income95pc -0.475568 0.486760 -0.9770 0.3286
## avgprs95 -0.180791 0.027799 -6.5035 7.848e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(44) = 47.99763 (p-value = 0.31401)
## Autocorrelation test (1): normal = -3.948861 (p-value = 7.8524e-05)
## Autocorrelation test (2): normal = -0.5688819 (p-value = 0.56944)
## Wald test for coefficients: chisq(3) = 2496.07 (p-value = < 2.22e-16)
# model 1b: linear + system GMM + individual FE
pgmm_1b <- pgmm(
packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99),
data = pdata, effect = "individual", transformation = "ld"
)
summary(pgmm_1b)
## Oneway (individual) effect One-step model System GMM
##
## Call:
## pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
## lag(packpc, 2:99), data = pdata, effect = "individual", transformation = "ld")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 912
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -35.0188 -2.5248 0.1318 0.2344 2.8275 29.0107
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(packpc, 1) 0.9372190 0.0149757 62.5826 <2e-16 ***
## income95pc 0.2033459 0.1245084 1.6332 0.1024
## avgprs95 -0.0021312 0.0098748 -0.2158 0.8291
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(55) = 47.79775 (p-value = 0.74372)
## Autocorrelation test (1): normal = -3.451448 (p-value = 0.00055759)
## Autocorrelation test (2): normal = 0.5944316 (p-value = 0.55222)
## Wald test for coefficients: chisq(3) = 109661.9 (p-value = < 2.22e-16)
# model 1c: linear + difference GMM + two-way FE
pgmm_1c <- pgmm(
packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99),
data = pdata, effect = "twoways", transformation = "d"
)
summary(pgmm_1c)
## Twoways effects One-step model Difference GMM
##
## Call:
## pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
## lag(packpc, 2:99), data = pdata, effect = "twoways", transformation = "d")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 432
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -18.939 -1.890 -0.259 0.000 1.824 20.425
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(packpc, 1) 0.252415 0.117744 2.1438 0.03205 *
## income95pc 1.062384 0.674055 1.5761 0.11500
## avgprs95 -0.285703 0.060572 -4.7168 2.396e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(44) = 45.92782 (p-value = 0.39225)
## Autocorrelation test (1): normal = -3.571845 (p-value = 0.00035448)
## Autocorrelation test (2): normal = 0.02846648 (p-value = 0.97729)
## Wald test for coefficients: chisq(3) = 35.65439 (p-value = 8.8603e-08)
## Wald test for time dummies: chisq(9) = 70.99223 (p-value = 9.7256e-12)
# model 1d: linear + system GMM + two-way FE
pgmm_1d <- pgmm(
packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99),
data = pdata, effect = "twoways", transformation = "ld"
)
summary(pgmm_1d)
## Twoways effects One-step model System GMM
##
## Call:
## pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
## lag(packpc, 2:99), data = pdata, effect = "twoways", transformation = "ld")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 912
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -31.81880 -2.37490 -0.03745 0.00000 2.18303 27.94124
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(packpc, 1) 0.912506 0.035086 26.0074 < 2.2e-16 ***
## income95pc -0.016144 0.110832 -0.1457 0.884186
## avgprs95 -0.101336 0.032190 -3.1481 0.001644 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(55) = 45.80834 (p-value = 0.80677)
## Autocorrelation test (1): normal = -3.133862 (p-value = 0.0017252)
## Autocorrelation test (2): normal = 0.7322591 (p-value = 0.46401)
## Wald test for coefficients: chisq(3) = 2942.746 (p-value = < 2.22e-16)
## Wald test for time dummies: chisq(9) = 87.03407 (p-value = 6.3969e-15)
stargazer(pgmm_1a,pgmm_1b,pgmm_1c,pgmm_1d,
type="text")
##
## =====================================================
## Dependent variable:
## --------------------------------------
## packpc
## (1) (2) (3) (4)
## -----------------------------------------------------
## lag(packpc, 1) 0.639*** 0.937*** 0.252** 0.913***
## (0.055) (0.015) (0.118) (0.035)
##
## income95pc -0.476 0.203 1.062 -0.016
## (0.487) (0.125) (0.674) (0.111)
##
## avgprs95 -0.181*** -0.002 -0.286*** -0.101***
## (0.028) (0.010) (0.061) (0.032)
##
## -----------------------------------------------------
## Observations 48 48 48 48
## =====================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Now, we will fit the log-log version:
formula_loglog <-
log(packpc) ~ lag(log(packpc), 1) + log(income95pc) + log(avgprs95) | lag(log(packpc), 2:99)
# model 2a: log-log + difference GMM + individual FE
pgmm_2a <- pgmm(formula_loglog, data = pdata, effect = "individual", transformation = "d")
summary(pgmm_2a)
## Oneway (individual) effect One-step model Difference GMM
##
## Call:
## pgmm(formula = formula_loglog, data = pdata, effect = "individual",
## transformation = "d")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 432
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.263020 -0.024090 0.002736 0.001092 0.027682 0.216378
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(log(packpc), 1) 0.674051 0.061660 10.9317 < 2.2e-16 ***
## log(income95pc) -0.066044 0.111880 -0.5903 0.555
## log(avgprs95) -0.305656 0.043073 -7.0963 1.281e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(44) = 46.02872 (p-value = 0.38824)
## Autocorrelation test (1): normal = -4.050332 (p-value = 5.1145e-05)
## Autocorrelation test (2): normal = 0.05678179 (p-value = 0.95472)
## Wald test for coefficients: chisq(3) = 2140.806 (p-value = < 2.22e-16)
# model 2b: log-log + system GMM + individual FE
pgmm_2b <- pgmm(formula_loglog, data = pdata, effect = "individual", transformation = "ld")
summary(pgmm_2b)
## Oneway (individual) effect One-step model System GMM
##
## Call:
## pgmm(formula = formula_loglog, data = pdata, effect = "individual",
## transformation = "ld")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 912
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.366760 -0.022702 0.002000 0.001797 0.025148 0.278323
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(log(packpc), 1) 0.9860119 0.0094198 104.6742 <2e-16 ***
## log(income95pc) -0.0057655 0.0154370 -0.3735 0.7088
## log(avgprs95) 0.0110743 0.0077600 1.4271 0.1535
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(55) = 47.59902 (p-value = 0.75038)
## Autocorrelation test (1): normal = -3.352045 (p-value = 0.00080217)
## Autocorrelation test (2): normal = 0.982078 (p-value = 0.32606)
## Wald test for coefficients: chisq(3) = 4737095 (p-value = < 2.22e-16)
# model 2c: log-log + difference GMM + two-way FE
pgmm_2c <- pgmm(formula_loglog, data = pdata, effect = "twoways", transformation = "d")
summary(pgmm_2c)
## Twoways effects One-step model Difference GMM
##
## Call:
## pgmm(formula = formula_loglog, data = pdata, effect = "twoways",
## transformation = "d")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 432
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.181013 -0.019492 -0.001678 0.000000 0.017812 0.204343
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(log(packpc), 1) 0.33315 0.14500 2.2976 0.02158 *
## log(income95pc) 0.18131 0.18166 0.9981 0.31825
## log(avgprs95) -0.62271 0.11127 -5.5965 2.188e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(44) = 43.30027 (p-value = 0.5015)
## Autocorrelation test (1): normal = -3.620059 (p-value = 0.00029454)
## Autocorrelation test (2): normal = 0.5219654 (p-value = 0.60169)
## Wald test for coefficients: chisq(3) = 45.27327 (p-value = 8.0945e-10)
## Wald test for time dummies: chisq(9) = 71.0946 (p-value = 9.2856e-12)
# model 2d: log-log + system GMM + two-way FE
pgmm_2d <- pgmm(formula_loglog, data = pdata, effect = "twoways", transformation = "ld")
summary(pgmm_2d)
## Twoways effects One-step model System GMM
##
## Call:
## pgmm(formula = formula_loglog, data = pdata, effect = "twoways",
## transformation = "ld")
##
## Balanced Panel: n = 48, T = 11, N = 528
##
## Number of Observations Used: 912
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.3421969 -0.0233093 0.0006081 0.0000000 0.0224292 0.2716041
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## lag(log(packpc), 1) 0.9454089 0.0286753 32.9694 < 2.2e-16 ***
## log(income95pc) -0.0072777 0.0214627 -0.3391 0.734544
## log(avgprs95) -0.1650673 0.0494761 -3.3363 0.000849 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sargan test: chisq(55) = 44.19932 (p-value = 0.85117)
## Autocorrelation test (1): normal = -3.066312 (p-value = 0.0021672)
## Autocorrelation test (2): normal = 1.1093 (p-value = 0.2673)
## Wald test for coefficients: chisq(3) = 4820.892 (p-value = < 2.22e-16)
## Wald test for time dummies: chisq(9) = 90.44993 (p-value = 1.3224e-15)
Unfortunately, plm
package does not have
predict
method for plm()
and
pgmm()
functions. So we have to manually construct
predictions from new data using simcf
package (literally
the name of package means “simulate counterfactual”).
To simulate counterfactual outcomes, we need the following things:
packpc
in 1995)Simulations with and without year FE dummies are a bit different.
Here we take pgmm_1a
and pgmm_1c
as two
examples.
When deciding what could be a plausible scenario, first look at the sample statistics.
# How big a change in price to simulate?
# How about "double" the average tax in the most recent year?
summary(pdata$taxs95[pdata$year==1995])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34.44 48.75 59.84 61.87 74.78 112.63
# The average (and median) tax is about 60 cents/pack
sd(pdata$taxs95[pdata$year==1995])
## [1] 18.47741
If the mean is around 61, and the standard deviation is 18.47, a +60 cents increase would also be about 3 sd’s, and raise the tax to a bit more than the max observed. Other possibilities:
Now, let’s create a matrix with our year dummies. Here we are using
he function simcf::makeFEdummies
. Once we have created this
matrix, we cbind
it with the data.
# Construct the year dummies
# it is equivalent to `model.matrix(~ -1 + year, data = pdata)`
yearfe <- makeFEdummies(pdata$year)
# Why drop first 2 col's? (the first 2 years are missing due to instrumental variables y_{t-2})
yearfe <- yearfe[, 3:ncol(yearfe)]
# List all the years
yearlist <- unique(pdata$year)
# List the years and exclude the first two
yearlist <- yearlist[3:length(yearlist)]
# Create names for the year dummies
colnames(yearfe) <- paste0("y", yearlist)
head(yearfe)
## y1987 y1988 y1989 y1990 y1991 y1992 y1993 y1994 y1995
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0
## 3 1 0 0 0 0 0 0 0 0
## 4 0 1 0 0 0 0 0 0 0
## 5 0 0 1 0 0 0 0 0 0
## 6 0 0 0 1 0 0 0 0 0
# concatenate year FE dummies to the pdata.frame
datayearfe <- cbind(pdata, yearfe)
head(datayearfe)
## state year cpi pop packpc income tax avgprs taxs income95
## 1 AL 1985 1.076 3973000 116.4863 46014968 32.5 102.1817 33.34834 65173615
## 2 AL 1986 1.096 3992000 117.1593 48703940 32.5 107.9892 33.40584 67723361
## 3 AL 1987 1.136 4016000 115.8367 51846312 32.5 113.5273 33.46067 69554378
## 4 AL 1988 1.183 4024000 115.2584 55698852 32.5 120.0334 33.52509 71754049
## 5 AL 1989 1.240 4030000 109.2060 60044480 32.5 133.2560 33.65600 73796599
## 6 AL 1990 1.307 4048508 111.7449 64094948 32.5 143.4486 33.75692 74736574
## tax95 avgprs95 taxs95 income95pc pretax95 y1987 y1988 y1989 y1990 y1991
## 1 46.03160 144.7257 47.23314 16.40413 97.49257 0 0 0 0 0
## 2 45.19161 150.1601 46.45118 16.96477 103.70893 0 0 0 0 0
## 3 43.60035 152.3025 44.88914 17.31932 107.41336 1 0 0 0 0
## 4 41.86813 154.6331 43.18869 17.83152 111.44437 0 1 0 0 0
## 5 39.94355 163.7759 41.36431 18.31181 122.41162 0 0 1 0 0
## 6 37.89595 167.2652 39.36155 18.46028 127.90366 0 0 0 1 0
## y1992 y1993 y1994 y1995
## 1 0 0 0 0
## 2 0 0 0 0
## 3 0 0 0 0
## 4 0 0 0 0
## 5 0 0 0 0
## 6 0 0 0 0
Now, we specify the formulas of each model, and save the specificaiton in an object.
# Construct formulas -- linear + difference GMM + individual FE (without year FE dummies)
formula_1a <- packpc ~ income95pc + avgprs95 - 1 #with Income and Price as covariates
# Construct formulas -- linear + system GMM + individual FE (without year dummies but with intercept)
formula_1b <- packpc ~ income95pc + avgprs95
# Construct formulas -- linear + difference GMM + two-way FE (with year FE dummies)
formula_1c <- update(
formula_1a,
paste(c("~ .", colnames(yearfe)), collapse = "+") # add year FE dummies to the formula
)
formula_1c
## packpc ~ income95pc + avgprs95 + y1987 + y1988 + y1989 + y1990 +
## y1991 + y1992 + y1993 + y1994 + y1995 - 1
# Construct formulas -- linear + system GMM + two-way FE (with year dummies and with intercept)
formula_1d <- update(formula_1b, paste(c("~ .", colnames(yearfe)), collapse = "+"))
formula_1d
## packpc ~ income95pc + avgprs95 + y1987 + y1988 + y1989 + y1990 +
## y1991 + y1992 + y1993 + y1994 + y1995
Simulation of parameters.
# Number of simulations
sims <- 1000
# Recall model 1a:Difference GMM with state fixed effects
#packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99)
# Simulate parameters from an multivariate normal distribution
simparam_1a <- mvrnorm(sims, coefficients(pgmm_1a), vcovHC(pgmm_1a))
simphi_1a <- simparam_1a[, 1] # Extract the simulated phis
head(simphi_1a)
## [1] 0.6305317 0.6135699 0.6745606 0.5531332 0.6445569 0.5076610
simbeta_1a <- simparam_1a[, 2:ncol(simparam_1a)] # Extract the simulated betas
head(simbeta_1a)
## income95pc avgprs95
## [1,] -0.7780931 -0.1853170
## [2,] -0.4285905 -0.1875193
## [3,] -0.2197715 -0.1790330
## [4,] -0.8106588 -0.2266810
## [5,] -0.6892746 -0.1731899
## [6,] -1.1943628 -0.2060910
Now, create matrix/data with hypothetical predictors. In this example:
Moreover, pgmm
uses covariates in differenced form.
Therefore, in your xhyp data object with counterfactuals, we
must set all the covariates to 0 (no change) in the matrix
x
(which refers to the present). Exceptions:
At this moment, we can ignore the state fixed effects for now and add them later because model is linear.
# Forecast for 3 years from 1996 to 1998
periods.out <- 3
# Make matrix of hypothetical x's: covariates
xhyp_1a <- cfMake(formula=formula_1a,
data=datayearfe,
nscen=periods.out)
#With mean packpc, income, and price for the forecast period
# make every element in 'x' matrix be zero
xhyp_1a$x <- xhyp_1a$xpre <- 0 * xhyp_1a$x
# change the value in the predictor of interest (again, 'x' matrix)
xhyp_1a <- cfChange(xhyp_1a,
covname="avgprs95",
x=60, # only change avgprs to 60 (cents)
scen=1)
# Create baseline scenario
xbase_1a <- xhyp_1a
xbase_1a$x <- xbase_1a$xpre
# We need a lag of the price per pack
lagY_1a <- NULL
# Hypothetical previous change in Y for simulation
for (i in 1:length(pgmm_1a$model)){ #For 1 to 48
lagY_1a <- c(lagY_1a, as.data.frame(pgmm_1a$model[[i]])["1995",]$packpc)
}
#Hypothetical change in packpc for each state in 1995
#Find the mean of these hypothetical previous changes
lagY_1a <- mean(lagY_1a, na.rm=TRUE)
# Hypothetical initial level of Y for simulation
initialY <- mean(pdata$packpc[pdata$year==1995], na.rm=TRUE) #The mean of packpc in 1995
Now that we have everything that we need for the simulation, we can
use simcf::ldvsimev
, simcf::ldvsimfd
and
simcf::ldvsimrr
to estimate the quantities of interest.
# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_ev1a <- ldvsimev(xhyp_1a, # The matrix of hypothetical x's
simbeta_1a, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # NA indicates no constant!
phi=simphi_1a, # estimated AR parameters; length must match lagY
lagY=lagY_1a, # lags of y, most recent last
transform="diff", # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
initialY=initialY # for differenced models, the lag of the level of y
)
# Simulate expected values of Y given no change in covariates
sim_base1a <- ldvsimev(xbase_1a, # The matrix of hypothetical x's
simbeta_1a, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # NA indicates no constant!
phi=simphi_1a, # estimated AR parameters; length must match lagY
lagY=lagY_1a, # lags of y, most recent last
transform="diff", # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
initialY=initialY # for differenced models, the lag of the level of y
)
# Simulate first differences in y
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_fd1a <- ldvsimfd(xhyp_1a, # The matrix of hypothetical x's
simbeta_1a, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # Column containing the constant
# set to NA for no constant
phi=simphi_1a, # estimated AR parameters; length must match lagY
lagY=lagY_1a, # lags of y, most recent last
transform="diff" # Model is differenced
#initialY=initialY # Redundant in this case (fd of linear differenced Y)
)
# Simulate relative risks in y
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_rr1a <- ldvsimrr(xhyp_1a, # The matrix of hypothetical x's
simbeta_1a, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # Column containing the constant
# set to NA for no constant
phi=simphi_1a, # estimated AR parameters; length must match lagY
lagY=lagY_1a, # lags of y, most recent last
transform="diff", # Model is differenced
initialY=initialY # Required for differenced Y in ldvsimrr
)
Now we are going to plot the simulations using Chris’s
tile
package and the custom function
cigLineplots
. The latter is already saved in the
TS_code.R script, in the source folder.
# Make plots of expected values, first differences, and percent changes
# using custom tile code in helperCigs.R
# Hypothetical initial level of Y for simulation
initialY <- mean(pdata$packpc[pdata$year==1995], na.rm=TRUE)
# axis limits
limitsEV <- c(50, 105)
limitsFD <- c(-40, 5)
limitsRR <- c(-40, 5)
# Model 1a Lineplot: Packs
cigLineplots(sim_ev1a, sim_base1a, sim_fd1a, sim_rr1a,
limitsEV, limitsFD, limitsRR, initialY,
main="Cigarette Taxes & Consumption: 1a. Linear Difference GMM, Individual Effects",
file=NULL)
# Save pdf output
cigLineplots(sim_ev1a, sim_base1a, sim_fd1a, sim_rr1a,
limitsEV, limitsFD, limitsRR, initialY,
main="Cigarette Taxes & Consumption: 1a. Linear Difference GMM, Individual Effects",
file="output/m1aEVFDRR")
# Recall model 1g: packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99)
# Difference GMM with state and year fixed effects
# Simulate parameters
simparam_1c <- mvrnorm(sims, coefficients(pgmm_1c), vcovHC(pgmm_1c)) #Sample parameters
simphi_1c <- simparam_1c[,1, drop = FALSE] #Extract the phis
head(simphi_1c)
## lag(packpc, 1)
## [1,] 0.3161586
## [2,] 0.2152047
## [3,] 0.2021806
## [4,] 0.2948359
## [5,] 0.4313263
## [6,] 0.1955249
simbeta_1c <- simparam_1c[,2:ncol(simparam_1c)] #Extract the betas
head(simbeta_1c)
## income95pc avgprs95 1987 1988 1989 1990
## [1,] 0.4719479 -0.2614730 -1.0492861 -1.2055932 -3.0057420 -4.289267
## [2,] 0.7890498 -0.3076465 -0.4239592 -0.3645147 -2.4363030 -4.908887
## [3,] 0.8743731 -0.4148937 0.4281356 0.9074744 0.6340861 -1.022933
## [4,] 1.7670041 -0.2568263 -0.3245827 -1.2810467 -3.3943722 -5.475072
## [5,] 0.4427081 -0.2250346 0.9184665 -0.2594482 -1.1895247 -2.637772
## [6,] 1.3613348 -0.3162667 -0.8361219 -1.3616656 -3.0899072 -5.009428
## 1991 1992 1993 1994 1995
## [1,] -4.2187835 -3.555403 -8.189917 -10.131184 -10.158173
## [2,] -4.6489377 -3.368394 -9.584629 -12.749927 -9.765093
## [3,] 0.3525874 1.733858 -3.660171 -7.518644 -5.889305
## [4,] -4.8518883 -4.150860 -9.539685 -12.481546 -11.085336
## [5,] -2.1694373 -1.276903 -5.036891 -6.637915 -5.844811
## [6,] -4.7459816 -3.601002 -8.752850 -12.792358 -10.104429
# Make matrix of hypothetical x's:
# Assume an average state raised taxes 60 cents starting 1996
#
# Issues -- we need to somehow include the state and year FEs:
# Let's set the state to be an "average" state in 1995,
# and year to be like the last year (1995)
# Make matrix of hypothetical x's: covariates
xhyp_1c <- cfMake(formula=formula_1c,
data=datayearfe,
nscen=periods.out) #Including the year fixed effects
# pgmm uses covariates in differenced form
# so we want most of them to be 0 (no change)
# exceptions:
# (1) changes in covariates of interest
# (2) differenced time dummies require special care
xhyp_1c$x <- xhyp_1c$xpre <- 0*xhyp_1c$x
xhyp_1c <- cfChange(xhyp_1c, "avgprs95", x=60, scen=1) #Assume tax is raised 60 cents in 1996
xhyp_1c
## $x
## packpc income95pc avgprs95 y1987 y1988 y1989 y1990 y1991 y1992 y1993 y1994
## 1 0 0 60 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0
## y1995
## 1 0
## 2 0
## 3 0
##
## $xpre
## packpc income95pc avgprs95 y1987 y1988 y1989 y1990 y1991 y1992 y1993 y1994
## 1 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0
## y1995
## 1 0
## 2 0
## 3 0
##
## $model
## packpc ~ income95pc + avgprs95 + y1987 + y1988 + y1989 + y1990 +
## y1991 + y1992 + y1993 + y1994 + y1995 - 1
##
## attr(,"class")
## [1] "list" "counterfactual"
# We can "ignore" the state fixed effects for now and add them later
# because model is total linear
# Create baseline scenario
xbase_1c <- xhyp_1c
xbase_1c$x <- xbase_1c$xpre
xbase_1c
## $x
## packpc income95pc avgprs95 y1987 y1988 y1989 y1990 y1991 y1992 y1993 y1994
## 1 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0
## y1995
## 1 0
## 2 0
## 3 0
##
## $xpre
## packpc income95pc avgprs95 y1987 y1988 y1989 y1990 y1991 y1992 y1993 y1994
## 1 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0
## y1995
## 1 0
## 2 0
## 3 0
##
## $model
## packpc ~ income95pc + avgprs95 + y1987 + y1988 + y1989 + y1990 +
## y1991 + y1992 + y1993 + y1994 + y1995 - 1
##
## attr(,"class")
## [1] "list" "counterfactual"
# We need a lag of the price per pack
lagY_1c <- NULL # Hypothetical previous change in Y for simulation
for (i in 1:length(pgmm_1c$model)){
lagY_1c <- c(lagY_1c, as.data.frame(pgmm_1c$model[[i]])["1995",]$packpc)
}
#Store change in packpc 1995 for each state
lagY_1c <- mean(lagY_1c, na.rm=TRUE)
#Find the mean for all packpc changes in 1995
# Hypothetical initial level of Y for simulation
pdata$packpc[pdata$year==1995]
## AL-1995 AR-1995 AZ-1995 CA-1995 CO-1995 CT-1995 DE-1995 FL-1995
## 101.08543 111.04297 71.95417 56.85931 82.58292 79.47219 124.46660 93.07455
## GA-1995 IA-1995 ID-1995 IL-1995 IN-1995 KS-1995 KY-1995 LA-1995
## 97.47462 92.40160 74.84978 83.26508 134.25835 88.75344 172.64778 105.17613
## MA-1995 MD-1995 ME-1995 MI-1995 MN-1995 MO-1995 MS-1995 MT-1995
## 76.62064 77.47355 102.46978 81.38825 82.94530 122.45028 105.58245 87.15957
## NC-1995 ND-1995 NE-1995 NH-1995 NJ-1995 NM-1995 NV-1995 NY-1995
## 121.53806 79.80697 87.27071 156.33675 80.37137 64.66887 93.52612 70.81732
## OH-1995 OK-1995 OR-1995 PA-1995 RI-1995 SC-1995 SD-1995 TN-1995
## 111.38010 108.68011 92.15575 95.64309 92.59980 108.08275 97.21923 122.32005
## TX-1995 UT-1995 VA-1995 VT-1995 WA-1995 WI-1995 WV-1995 WY-1995
## 73.07931 49.27220 105.38687 122.33475 65.53092 92.46635 115.56883 112.23814
initialY <- mean(pdata$packpc[pdata$year==1995], na.rm=TRUE)
#Set the initial mean value of pack in 1995 across states
# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_ev1c <- ldvsimev(xhyp_1c, # The matrix of hypothetical x's
simbeta_1c, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # NA indicates no constant!
phi=simphi_1c, # estimated AR parameters; length must match lagY
lagY=lagY_1c, # lags of y, most recent last
transform="diff", # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
initialY=initialY # for differenced models, the lag of the level of y
)
# Simulate expected values of Y given no change in covariates
sim_base1c <- ldvsimev(xbase_1c, # The matrix of hypothetical x's
simbeta_1c, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # NA indicates no constant!
phi=simphi_1c, # estimated AR parameters; length must match lagY
lagY=lagY_1c, # lags of y, most recent last
transform="diff", # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
initialY=initialY # for differenced models, the lag of the level of y
)
# Simulate first differences in y
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_fd1c <- ldvsimfd(xhyp_1c, # The matrix of hypothetical x's
simbeta_1c, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # Column containing the constant
# set to NA for no constant
phi=simphi_1c, # estimated AR parameters; length must match lagY
lagY=lagY_1c, # lags of y, most recent last
transform="diff", # Model is differenced
#initialY=initialY # Redundant in this case (fd of linear differenced Y)
)
# Simulate relative risks in y
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_rr1c <- ldvsimrr(xhyp_1c, # The matrix of hypothetical x's
simbeta_1c, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # Column containing the constant
# set to NA for no constant
phi=simphi_1c, # estimated AR parameters; length must match lagY
lagY=lagY_1c, # lags of y, most recent last
transform="diff", # Model is differenced
initialY=initialY # Required for differenced Y in ldvsimrr
)
# Model xhyp_1c Lineplot: Packs
cigLineplots(sim_ev1c, sim_base1c, sim_fd1c, sim_rr1c,
limitsEV, limitsFD, limitsRR, initialY,
main="Cigarette Taxes & Consumption: 1c. Linear Difference GMM, Two-way Effects",
file=NULL
)
# save output
cigLineplots(sim_ev1c, sim_base1c, sim_fd1c, sim_rr1c,
limitsEV, limitsFD, limitsRR, initialY,
main="Cigarette Taxes & Consumption: 1c. Linear Difference GMM, Two-way Effects",
file="output/m1cEVFDRR"
)