This is because \(\bar{y}_i\) is computed using the past \(y\)’s. This is correlated with \(\bar{\epsilon}_{it}\), which is computed using the past \(\epsilon\)’s. Or:
Specifically, this creates bias in the lagged dependent variable (LDV). If the other regressors are correlated with the LDV, then their coefficients may also be seriously biased.
Nickell bias
The degree of bias is order \(1/T\), so it is big for small \(T\).
Furthermore,
The bias increases as \(\beta\) decreases
The bias increases as \(\phi\) increases
Small \(N\) is not the problem. Small \(T\) is the problem
Increasing \(N\) does not mitigate the problem. Purging serial correlation in the errors or getting the specification right (including other regressors) also doesn’t solve the problem.
Instrumental variables
We therefore turn to instrumental variables.
Recall that an instrumental variable must fulfill two conditions:
Relevance: \(z\) is correlated with \(x\);
Exogeneity: \(z\) is uncorrelated with \(\epsilon\). It must influence \(y\) only through \(x\).
We use lagged levels and lagged differences of \(\Delta y_{it}\) as instruments for the LDV. These help to predict \(\Delta y_{it}\) but not \(\Delta \epsilon_{it}\) if the errors are iid (see lecture slides for why).
Note: This is different than instrumenting \(x_{it}\). We are not addressing the endogeneity that may exist there. One should not be conflated with the other.
Generalized Method of Moments
Estimation is done using GMM. The usual IV approach cannot handle the number of instruments.
To give a brief overview of the intuition behind GMM, consider a linear model: \[y_t = \boldsymbol{x}_t \boldsymbol{\beta} + e_t\] Recall that the following condition holds under Gauss-Markov. \[E[\boldsymbol{x}_t\epsilon_t]=0\] Assume there exists some combination instrumental variables \(\boldsymbol{z}_t\) that implies the following: \[E[\boldsymbol{z}_t\epsilon_t] = E[\boldsymbol{z}_t(y_t-\boldsymbol{x}_t\boldsymbol{\beta})]=0\] Intuition: GMM attempts to find the \(\boldsymbol{\beta}\) that makes this true.
Generalized method of moments
Population moment: \[E[\boldsymbol{z}_t(y_t-\boldsymbol{x}_t\boldsymbol{\beta})]=0\] Sample moments: \[\frac{1}{n}\sum^n_{t=1}\boldsymbol{z}_t(y-\boldsymbol{x_t}\boldsymbol{\beta})\]\[\substack{\frac{1}{n}\sum^n_{t=1}z_{1t}(y-\boldsymbol{x}_t \boldsymbol{\beta})\\
\vdots\\
\frac{1}{n}\sum^n_{t=1}z_{Kt}(y-\boldsymbol{x}_t \boldsymbol{\beta})
}\] We therefore have \[\boldsymbol{S}_{zy}-\boldsymbol{S}_{zx}\boldsymbol{\beta}=0 \;\;\; \text{where}\]\[\boldsymbol{S}_{zy}=n^{-1}\sum^n_{t=1}\boldsymbol{z}_t y_{t} \;\;\; \text{and} \;\;\; \boldsymbol{S}_{zx}=n^{-1}\sum^n_{t=1}\boldsymbol{z}_t x_{t}\] We solve for \[\hat{\boldsymbol{\beta}}=\boldsymbol{S}^{-1}_{zx}\boldsymbol{S}_{xy}\]
Can be solved analytically. But we can also use an iterative search.
Anderson-Hsiao estimator: uses the second and third lagged levels as instruments.
Arellano-Bond Difference GMM: uses \(\Delta y\) as the outcome and all available lagged levels as instruments in each period.
Arellano-Bover/Blundell-Bond System GMM: adds the available lagged differences as instruments.
Dynamic Panel Model: Verify Key Assumptions
When using panel GMM, we need to verify some modeling assumptions:
The number or the weakness of instruments
Sargan test: if \(\mathbb{E}[\hat{\varepsilon} Z] = 0\), then the combination of instruments is strong enough (over-identifying restrictions).
If we don’t reject the null (p > 0.05 or 0.1), then the instruments are strong enough.
Serial correlation of error terms – should be AR(1) only, otherwise, for example, the lag 2 term becomes a weak instrument.
Arellano-Bond’s test for serial correlation
Difference or system GMM (whether to use lagged differences as IV)
Use of year fixed effects
Wald Chi-squared test on year FE dummies: to see whether the inclusion of FE dummies is statistically significant
Use of linear or log-log specification \(\rightarrow\) interpretation of results will be different
linear: one unit increase in \(x\) will induce \(\beta\) increase in \(y\)
log-log: 1% increase in \(x\) will induce \(100\times\beta\)% increase in \(y\). (elasticity)
Case Study: How Does Tax Affect Cigarrete Consumption?
# Load librarieslibrary(plm) # Econometrics package for linear panel modelslibrary(nlme) # Estimation of mixed effects modelslibrary(lme4) # Alternative package for mixed effects modelslibrary(tseries) # For ADF unit root testlibrary(simcf) # For panel functions and simulators# devtools::install_github("chrisadolph/tile-simcf/tile")library(tile) # For visualization of model inferencelibrary(RColorBrewer) # For nice colorslibrary(MASS) # For mvrnorm()library(lmtest)source("Lab7_data/helperCigs.R") # For graphics functions# Load cigarette consumption data (Jonathan Gruber, MIT)# Variables (see codebook):# state year cpi pop packpc income tax avgprs taxsdata <-read.csv("Lab7_data/cigarette.csv")colnames(data)
# Quick inflation adjustment to 1995 dollarsinflAdjust <-function(x,cpi,year,target) {unique(cpi[year==target])*x/cpi #Multiply x with cpi in target year then divide by cpi in observed year}#Make adjustments to state personal incomedata$income95 <-with(data, inflAdjust(income, cpi, year, 1995))#Average state, federal, and average local excise taxesdata$tax95 <-with(data, inflAdjust(tax, cpi, year, 1995)) #Average price, including sales taxesdata$avgprs95 <-with(data, inflAdjust(avgprs, cpi, year, 1995)) #Average excise taxes, including sales taxesdata$taxs95 <-with(data, inflAdjust(taxs, cpi, year, 1995)) # Create per capita income (in k)data$income95pc <- data$income95/data$pop# Create pretax price, 1995 dollarsdata$pretax95 <- data$avgprs95 - data$taxs95attach(data)
# Check for time invariant variables:pvar(data, index =c("state", "year"))
no time variation: state
no individual variation: year cpi
# "within" option tells plm to do fixed effects# Note that if you want to add year fixed effects then set effect="time" and for both state# and year fixed effects set effect effect="twoway"fe_res <-plm(packpc ~ income95pc + pretax95 + taxs95, data = data, model="within", effect="twoway")coeftest(fe_res, vcov. = vcovHC, method ="arellano")
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
income95pc 0.969966 0.643947 1.5063 0.13267
pretax95 -0.188551 0.077797 -2.4236 0.01575 *
taxs95 -0.481852 0.044020 -10.9462 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Some tests for serial correlation of errors (needed because we have a linear regression# with lags of the dependent variable on the RHS# the standard LM test (note we could specify order)pbgtest(fe_res)
Breusch-Godfrey/Wooldridge test for serial correlation in panel models
data: packpc ~ income95pc + pretax95 + taxs95
chisq = 129.6, df = 11, p-value < 2.2e-16
alternative hypothesis: serial correlation in idiosyncratic errors
(Dynamic) Ramdom Effect Model
# Estimate a random effects AR(I)MA(p,q) model using lme (Restricted ML)re_res <-lme(# A formula object including the response,# the fixed covariates, and any grouping variablesfixed = packpc ~ income95pc + pretax95 + taxs95, # i.e. response variable and explanatory variables # The random effects componentrandom =~1| state, # 1 indicates the intercept and state indicates the grouping# The TS dynamics: specify the time & group variables,# and the order of the ARMA(p,q) processcorrelation =corARMA(form =~ year | state,p =1, # AR(p) orderq =0# MA(q) order ) )summary(re_res)
Linear mixed-effects model fit by REML
Data: NULL
AIC BIC logLik
3253.21 3283.04 -1619.605
Random effects:
Formula: ~1 | state
(Intercept) Residual
StdDev: 0.01127294 20.92621
Correlation Structure: AR(1)
Formula: ~year | state
Parameter estimate(s):
Phi
0.9764735
Fixed effects: packpc ~ income95pc + pretax95 + taxs95
Value Std.Error DF t-value p-value
(Intercept) 173.08136 8.574965 477 20.184499 0.0000
income95pc -1.05746 0.387602 477 -2.728198 0.0066
pretax95 -0.14537 0.024800 477 -5.861684 0.0000
taxs95 -0.46630 0.040769 477 -11.437827 0.0000
Correlation:
(Intr) incm95 prtx95
income95pc -0.856
pretax95 -0.099 -0.223
taxs95 -0.160 -0.097 -0.035
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.79963472 -0.57137010 -0.08122771 0.45749547 3.97887775
Number of Observations: 528
Number of Groups: 48
Dynamic Panel Model
# Panel based diagnostics available in the plm library# (This package recently expanded to contain many many panel data tests# for serial correlation, fixed effects, and unit roots)# First, create a plm data frame (special data frame that "knows" the# unit variable and time variablepdata <-pdata.frame(data, index=c("state", "year"))head(pdata)
# Do an panel unit root test on the undifferenced cigarette data;# there are many options; see ?purtest# Note: this purtest() function isn't working due to unbalanced panel#purtest(packpc~1, data=pdata, test="ips")
Dynamic Panel Model
# Estimate Arellano-Bond GMM for fixed effects with lagged DV## pgmm needs formulas in a specific format:# 1. in the first part of the RHS, include lags of DV and covariates, as shown# 2. in the second part, include the panel data instruments (99 here means use up to the 99th lag of the difference as an instrument)# 3. in an optional (not shown) third part of the RHS, include any other instruments## note that pgmm formulas construct lag() properly for pdata.frame,# if the data is not pdata.frame, lag() will create a wrong lag structureformula_linear <- packpc ~lag(packpc, 1) + income95pc + avgprs95 |lag(packpc, 2:99) # if you don't want to include too many instruments, you can just set the second part as `lag(packpc, 2:...)`pgmm_1a <-pgmm( formula_linear,data = pdata,effect ="individual", # should consider two-way (default) for small T; can also set `individual` FEtransformation ="d"# should do `ld` if T=3, `d` for difference GMM and `ld` for system GMM; default is `d` (difference GMM))
Dynamic Panel Model: Investigate AR(1)/AR(2) Autocorrelation and Instrument Weakness
summary(pgmm_1a)
Oneway (individual) effect One-step model Difference GMM
Call:
pgmm(formula = formula_linear, data = pdata, effect = "individual",
transformation = "d")
Balanced Panel: n = 48, T = 11, N = 528
Number of Observations Used: 432
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-25.4308 -2.5080 0.1463 0.1238 2.7384 25.6025
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
lag(packpc, 1) 0.638987 0.055342 11.5462 < 2.2e-16 ***
income95pc -0.475568 0.486760 -0.9770 0.3286
avgprs95 -0.180791 0.027799 -6.5035 7.848e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sargan test: chisq(44) = 47.99763 (p-value = 0.31401)
Autocorrelation test (1): normal = -3.948861 (p-value = 7.8524e-05)
Autocorrelation test (2): normal = -0.5688819 (p-value = 0.56944)
Wald test for coefficients: chisq(3) = 2496.07 (p-value = < 2.22e-16)
# Good Sargan test, Good AR(2) test# Sargan test has a null of the instruments as a group being exogenous- higher p is good# The residuals of the differenced equations should exhibit AR(1) but not AR(2) behavior# higher p for AR(2) test is good
Some examples of Mis-specified models
# only include lag 2 and lag 3 termspgmm( packpc ~lag(packpc, 1) + income95pc + avgprs95 |lag(packpc, 2:3),data = pdata, effect ="individual", transformation ="d") |>summary()
Oneway (individual) effect One-step model Difference GMM
Call:
pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
lag(packpc, 2:3), data = pdata, effect = "individual", transformation = "d")
Balanced Panel: n = 48, T = 11, N = 528
Number of Observations Used: 432
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-25.9207 -2.4789 0.1232 0.1159 2.7106 25.5997
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
lag(packpc, 1) 0.660475 0.052854 12.4963 < 2.2e-16 ***
income95pc -0.258571 0.456052 -0.5670 0.5707
avgprs95 -0.175368 0.026891 -6.5215 6.96e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sargan test: chisq(16) = 40.04482 (p-value = 0.00076694)
Autocorrelation test (1): normal = -3.831746 (p-value = 0.00012724)
Autocorrelation test (2): normal = -0.4941905 (p-value = 0.62117)
Wald test for coefficients: chisq(3) = 2405.391 (p-value = < 2.22e-16)
# only include lag 2-5 termspgmm( packpc ~lag(packpc, 1) + income95pc + avgprs95 |lag(packpc, 2:5),data = pdata, effect ="individual", transformation ="d") |>summary()
Oneway (individual) effect One-step model Difference GMM
Call:
pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
lag(packpc, 2:5), data = pdata, effect = "individual", transformation = "d")
Balanced Panel: n = 48, T = 11, N = 528
Number of Observations Used: 432
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-25.6948 -2.5357 0.1483 0.1217 2.6744 25.5813
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
lag(packpc, 1) 0.650350 0.055545 11.7086 < 2.2e-16 ***
income95pc -0.325994 0.497744 -0.6549 0.5125
avgprs95 -0.181297 0.027242 -6.6550 2.834e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sargan test: chisq(29) = 42.46279 (p-value = 0.050998)
Autocorrelation test (1): normal = -3.907354 (p-value = 9.3312e-05)
Autocorrelation test (2): normal = -0.5460146 (p-value = 0.58506)
Wald test for coefficients: chisq(3) = 2503.149 (p-value = < 2.22e-16)
Dynamic Panel Model
# Let's try different model specifications:# model 1a: linear + difference GMM + individual FE# model 1b: linear + system GMM + individual FE# model 1c: linear + difference GMM + two-way FE# model 1b: linear + system GMM + two-way FEpgmm_1b <-pgmm( packpc ~lag(packpc, 1) + income95pc + avgprs95 |lag(packpc, 2:99),data = pdata, effect ="individual", transformation ="ld")summary(pgmm_1b)
Oneway (individual) effect One-step model System GMM
Call:
pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
lag(packpc, 2:99), data = pdata, effect = "individual", transformation = "ld")
Balanced Panel: n = 48, T = 11, N = 528
Number of Observations Used: 912
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-35.0188 -2.5248 0.1318 0.2344 2.8275 29.0107
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
lag(packpc, 1) 0.9372190 0.0149757 62.5826 <2e-16 ***
income95pc 0.2033459 0.1245084 1.6332 0.1024
avgprs95 -0.0021312 0.0098748 -0.2158 0.8291
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sargan test: chisq(55) = 47.79775 (p-value = 0.74372)
Autocorrelation test (1): normal = -3.451448 (p-value = 0.00055759)
Autocorrelation test (2): normal = 0.5944316 (p-value = 0.55222)
Wald test for coefficients: chisq(3) = 109661.9 (p-value = < 2.22e-16)
Twoways effects One-step model System GMM
Call:
pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 |
lag(packpc, 2:99), data = pdata, effect = "twoways", transformation = "ld")
Balanced Panel: n = 48, T = 11, N = 528
Number of Observations Used: 912
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-31.81880 -2.37490 -0.03745 0.00000 2.18303 27.94124
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
lag(packpc, 1) 0.912506 0.035086 26.0074 < 2.2e-16 ***
income95pc -0.016144 0.110832 -0.1457 0.884186
avgprs95 -0.101336 0.032190 -3.1481 0.001644 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sargan test: chisq(55) = 45.80834 (p-value = 0.80677)
Autocorrelation test (1): normal = -3.133862 (p-value = 0.0017252)
Autocorrelation test (2): normal = 0.7322591 (p-value = 0.46401)
Wald test for coefficients: chisq(3) = 2942.747 (p-value = < 2.22e-16)
Wald test for time dummies: chisq(9) = 87.03407 (p-value = 6.3969e-15)
Dynamic Panel Model
# Let's try different model specifications:# model 2a: log-log + difference GMM + individual FE# model 2b: log-log + system GMM + individual FE# model 2c: log-log + difference GMM + two-way FE# model 2b: log-log + system GMM + two-way FEformula_loglog <-log(packpc) ~lag(log(packpc), 1) +log(income95pc) +log(avgprs95) |lag(log(packpc), 2:99)pgmm_2a <-pgmm(formula_loglog, data = pdata, effect ="individual", transformation ="d")summary(pgmm_2a)
Oneway (individual) effect One-step model Difference GMM
Call:
pgmm(formula = formula_loglog, data = pdata, effect = "individual",
transformation = "d")
Balanced Panel: n = 48, T = 11, N = 528
Number of Observations Used: 432
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.263020 -0.024090 0.002736 0.001092 0.027682 0.216378
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
lag(log(packpc), 1) 0.674051 0.061660 10.9317 < 2.2e-16 ***
log(income95pc) -0.066044 0.111880 -0.5903 0.555
log(avgprs95) -0.305656 0.043073 -7.0963 1.281e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sargan test: chisq(44) = 46.02872 (p-value = 0.38824)
Autocorrelation test (1): normal = -4.050332 (p-value = 5.1145e-05)
Autocorrelation test (2): normal = 0.05678179 (p-value = 0.95472)
Wald test for coefficients: chisq(3) = 2140.806 (p-value = < 2.22e-16)
pgmm_2b <-pgmm(formula_loglog, data = pdata, effect ="individual", transformation ="ld")summary(pgmm_2b)
Oneway (individual) effect One-step model System GMM
Call:
pgmm(formula = formula_loglog, data = pdata, effect = "individual",
transformation = "ld")
Balanced Panel: n = 48, T = 11, N = 528
Number of Observations Used: 912
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.366760 -0.022702 0.002000 0.001797 0.025148 0.278323
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
lag(log(packpc), 1) 0.9860119 0.0094198 104.6742 <2e-16 ***
log(income95pc) -0.0057655 0.0154370 -0.3735 0.7088
log(avgprs95) 0.0110743 0.0077600 1.4271 0.1535
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sargan test: chisq(55) = 47.59902 (p-value = 0.75038)
Autocorrelation test (1): normal = -3.352045 (p-value = 0.00080217)
Autocorrelation test (2): normal = 0.982078 (p-value = 0.32606)
Wald test for coefficients: chisq(3) = 4737094 (p-value = < 2.22e-16)
pgmm_2c <-pgmm(formula_loglog, data = pdata, effect ="twoways", transformation ="d")summary(pgmm_2c)
Twoways effects One-step model Difference GMM
Call:
pgmm(formula = formula_loglog, data = pdata, effect = "twoways",
transformation = "d")
Balanced Panel: n = 48, T = 11, N = 528
Number of Observations Used: 432
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.181013 -0.019492 -0.001678 0.000000 0.017812 0.204343
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
lag(log(packpc), 1) 0.33315 0.14500 2.2976 0.02158 *
log(income95pc) 0.18131 0.18166 0.9981 0.31825
log(avgprs95) -0.62271 0.11127 -5.5965 2.188e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sargan test: chisq(44) = 43.30027 (p-value = 0.5015)
Autocorrelation test (1): normal = -3.620058 (p-value = 0.00029454)
Autocorrelation test (2): normal = 0.5219654 (p-value = 0.60169)
Wald test for coefficients: chisq(3) = 45.27325 (p-value = 8.0946e-10)
Wald test for time dummies: chisq(9) = 71.0946 (p-value = 9.2856e-12)
pgmm_2d <-pgmm(formula_loglog, data = pdata, effect ="twoways", transformation ="ld")summary(pgmm_2d)
Twoways effects One-step model System GMM
Call:
pgmm(formula = formula_loglog, data = pdata, effect = "twoways",
transformation = "ld")
Balanced Panel: n = 48, T = 11, N = 528
Number of Observations Used: 912
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.3421969 -0.0233093 0.0006081 0.0000000 0.0224292 0.2716041
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
lag(log(packpc), 1) 0.9454089 0.0286753 32.9694 < 2.2e-16 ***
log(income95pc) -0.0072777 0.0214627 -0.3391 0.734544
log(avgprs95) -0.1650673 0.0494761 -3.3363 0.000849 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sargan test: chisq(55) = 44.19932 (p-value = 0.85117)
Autocorrelation test (1): normal = -3.066312 (p-value = 0.0021672)
Autocorrelation test (2): normal = 1.1093 (p-value = 0.2673)
Wald test for coefficients: chisq(3) = 4820.892 (p-value = < 2.22e-16)
Wald test for time dummies: chisq(9) = 90.44993 (p-value = 1.3225e-15)
Simulate Conditional Forecasting
Unfortunately, plm package does not have predict method for plm() and pgmm() functions. So we have to manually construct predictions from new data using simcf package (literrally the name of package means “simulate counterfactual”).
To simulate counterfactual outcomes, we need the following things:
Simulated parameters
Hypothetical Scenarios
Lagged outcome \(y_{t-1}\)
Initial outcome \(y_0\) (average packpc in 1995)
Simulations with and without year FE dummies are a bit different. Here we take pgmm_1a and pgmm_1c as two examples.
Simulate Conditional Forecasting
# Forecast for 3 years from 1996 to 1998periods.out <-3sims <-1000# How big a change in price to simulate?# How about "double" the average tax in the most recent year?summary(pdata$taxs95[pdata$year==1995])
Min. 1st Qu. Median Mean 3rd Qu. Max.
34.44 48.75 59.84 61.87 74.78 112.63
# The average (and median) tax is about 60 cents/packsd(pdata$taxs95[pdata$year==1995])
[1] 18.47741
Simulate Conditional Forecasting
# A 60 cent increase would also be about 3 sd's,# and raise the tax to a bit more than the max observed# Other possibilities:# (2) A 10 cent increase# (3) Raise every state to the max observed for any state in 1995 (112.60 cents)# Construct the year dummiesyearfe <-makeFEdummies(pdata$year) # Construct the dummies for each year; it is equivalent to `model.matrix(~ -1 + year, data = pdata)`yearfe <- yearfe[, 3:ncol(yearfe)] # Why drop first 2 col's? (the first 2 years are missing due to instrumental variables y_{t-2})yearlist <-unique(pdata$year) # List all the yearsyearlist <- yearlist[3:length(yearlist)] # List the years and exclude the first twocolnames(yearfe) <-paste0("y", yearlist) # Create names for the year dummieshead(yearfe)
# Construct formulas -- linear + difference GMM + individual FE (without year FE dummies)formula_1a <- packpc ~ income95pc + avgprs95 -1#with Income and Price as covariates# Construct formulas -- linear + system GMM + individual FE (without year dummies but with intercept)formula_1b <- packpc ~ income95pc + avgprs95# Construct formulas -- linear + difference GMM + two-way FE (with year FE dummies)formula_1c <-update( formula_1a,paste(c("~ .", colnames(yearfe)), collapse ="+") # add year FE dummies to the formula)formula_1c
# Construct formulas -- linear + system GMM + two-way FE (with year dummies and with intercept)formula_1d <-update(formula_1b, paste(c("~ .", colnames(yearfe)), collapse ="+"))formula_1d
# Make matrix of hypothetical x's:# Assume an average state raised taxes 60 cents starting 1996## Make matrix of hypothetical x's: covariatesxhyp_1a <-cfMake(formula_1a, datayearfe, periods.out) #With mean packpc, income, and price for the forecast period# pgmm uses covariates in differenced form# so we want most of them to be 0 (no change)# exceptions:# (1) changes in covariates of interest# (2) time dummies aren't differencedxhyp_1a$x <- xhyp_1a$xpre <-0* xhyp_1a$x # make every element in matrix be zeroxhyp_1a <-cfChange(xhyp_1a, "avgprs95", x=60, scen=1) # only change avgprs to 60 (cents)# We can "ignore" the state fixed effects for now and add them later# because model is total linear
# Create baseline scenarioxbase_1a <- xhyp_1axbase_1a$x <- xbase_1a$xpre# We need a lag of the price per packlagY_1a <-NULL# Hypothetical previous change in Y for simulationfor (i in1:length(pgmm_1a$model)){ #For 1 to 48 lagY_1a <-c(lagY_1a, as.data.frame(pgmm_1a$model[[i]])["1995",]$packpc)}#Hypothetical change in packpc for each state in 1995lagY_1a <-mean(lagY_1a, na.rm=TRUE) #Find the mean of these hypothetical previous changes# Hypothetical initial level of Y for simulationinitialY <-mean(pdata$packpc[pdata$year==1995], na.rm=TRUE) #The mean of packpc in 1995
# Simulate expected values of Y (on original level scale)# out to periods.out given hypothetical future values of X,# initial lags of the change in Y, and an initial level of Ysim_ev1a <-ldvsimev(xhyp_1a, # The matrix of hypothetical x's simbeta_1a, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # NA indicates no constant!phi=simphi_1a, # estimated AR parameters; length must match lagY lagY=lagY_1a, # lags of y, most recent lasttransform="diff", # "log" to undo log transformation,# "diff" to under first differencing# "difflog" to do bothinitialY=initialY # for differenced models, the lag of the level of y )# Simulate expected values of Y given no change in covariatessim_base1a <-ldvsimev(xbase_1a, # The matrix of hypothetical x's simbeta_1a, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # NA indicates no constant!phi=simphi_1a, # estimated AR parameters; length must match lagY lagY=lagY_1a, # lags of y, most recent lasttransform="diff", # "log" to undo log transformation,# "diff" to under first differencing# "difflog" to do bothinitialY=initialY # for differenced models, the lag of the level of y ) # Simulate first differences in y # out to periods.out given hypothetical future values of x, xpre,# and initial lags of the change in ysim_fd1a <-ldvsimfd(xhyp_1a, # The matrix of hypothetical x's simbeta_1a, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # Column containing the constant# set to NA for no constantphi=simphi_1a, # estimated AR parameters; length must match lagY lagY=lagY_1a, # lags of y, most recent lasttransform="diff", # Model is differenced#initialY=initialY # Redundant in this case (fd of linear differenced Y) )# Simulate relative risks in y# out to periods.out given hypothetical future values of x, xpre,# and initial lags of the change in ysim_rr1a <-ldvsimrr(xhyp_1a, # The matrix of hypothetical x's simbeta_1a, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # Column containing the constant# set to NA for no constantphi=simphi_1a, # estimated AR parameters; length must match lagY lagY=lagY_1a, # lags of y, most recent lasttransform="diff", # Model is differencedinitialY=initialY # Required for differenced Y in ldvsimrr )
# Make plots of expected values, first differences, and percent changes# using custom tile code in helperCigs.R# Hypothetical initial level of Y for simulationinitialY <-mean(pdata$packpc[pdata$year==1995], na.rm=TRUE)# axis limitslimitsEV <-c(50, 105)limitsFD <-c(-40, 5)limitsRR <-c(-40, 5)# Model 1a Lineplot: PackscigLineplots(sim_ev1a, sim_base1a, sim_fd1a, sim_rr1a, limitsEV, limitsFD, limitsRR, initialY,"Cigarette Taxes & Consumption: 1a. Linear Difference GMM, Individual Effects","Lab7_attachments/m1aEVFDRR" )
[1] "Lab7_attachments/m1aEVFDRR.png"
What if we include year FE dummies?
# Recall model 1g: packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99)# Difference GMM with state and year fixed effects# Simulate parameterssimparam_1c <-mvrnorm(sims, coefficients(pgmm_1c), vcovHC(pgmm_1c)) #Sample parameterssimphi_1c <- simparam_1c[,1, drop =FALSE] #Extract the phishead(simphi_1c)
# Make matrix of hypothetical x's:# Assume an average state raised taxes 60 cents starting 1996## Issues -- we need to somehow include the state and year FEs:# Let's set the state to be an "average" state in 1995,# and year to be like the last year (1995)# Make matrix of hypothetical x's: covariatesxhyp_1c <-cfMake(formula_1c, datayearfe, periods.out) #Including the year fixed effects# pgmm uses covariates in differenced form# so we want most of them to be 0 (no change)# exceptions:# (1) changes in covariates of interest# (2) differenced time dummies require special carexhyp_1c$x <- xhyp_1c$xpre <-0*xhyp_1c$xxhyp_1c <-cfChange(xhyp_1c, "avgprs95", x=60, scen=1) #Assume tax is raised 60 cents in 1996xhyp_1c
# We can "ignore" the state fixed effects for now and add them later# because model is total linear# Create baseline scenarioxbase_1c <- xhyp_1cxbase_1c$x <- xbase_1c$xprexbase_1c
# We need a lag of the price per packlagY_1c <-NULL# Hypothetical previous change in Y for simulationfor (i in1:length(pgmm_1c$model)) lagY_1c <-c(lagY_1c, as.data.frame(pgmm_1c$model[[i]])["1995",]$packpc) #Store change in packpc 1995 for each statelagY_1c <-mean(lagY_1c, na.rm=TRUE) #Find the mean for all packpc changes in 1995# Hypothetical initial level of Y for simulationpdata$packpc[pdata$year==1995]
initialY <-mean(pdata$packpc[pdata$year==1995], na.rm=TRUE)#Set the initial mean value of pack in 1995 across states
# Simulate expected values of Y (on original level scale)# out to periods.out given hypothetical future values of X,# initial lags of the change in Y, and an initial level of Ysim_ev1c <-ldvsimev(xhyp_1c, # The matrix of hypothetical x's simbeta_1c, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # NA indicates no constant!phi=simphi_1c, # estimated AR parameters; length must match lagY lagY=lagY_1c, # lags of y, most recent lasttransform="diff", # "log" to undo log transformation,# "diff" to under first differencing# "difflog" to do bothinitialY=initialY # for differenced models, the lag of the level of y )# Simulate expected values of Y given no change in covariatessim_base1c <-ldvsimev(xbase_1c, # The matrix of hypothetical x's simbeta_1c, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # NA indicates no constant!phi=simphi_1c, # estimated AR parameters; length must match lagY lagY=lagY_1c, # lags of y, most recent lasttransform="diff", # "log" to undo log transformation,# "diff" to under first differencing# "difflog" to do bothinitialY=initialY # for differenced models, the lag of the level of y ) # Simulate first differences in y # out to periods.out given hypothetical future values of x, xpre,# and initial lags of the change in ysim_fd1c <-ldvsimfd(xhyp_1c, # The matrix of hypothetical x's simbeta_1c, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # Column containing the constant# set to NA for no constantphi=simphi_1c, # estimated AR parameters; length must match lagY lagY=lagY_1c, # lags of y, most recent lasttransform="diff", # Model is differenced#initialY=initialY # Redundant in this case (fd of linear differenced Y) )# Simulate relative risks in y# out to periods.out given hypothetical future values of x, xpre,# and initial lags of the change in ysim_rr1c <-ldvsimrr(xhyp_1c, # The matrix of hypothetical x's simbeta_1c, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # Column containing the constant# set to NA for no constantphi=simphi_1c, # estimated AR parameters; length must match lagY lagY=lagY_1c, # lags of y, most recent lasttransform="diff", # Model is differencedinitialY=initialY # Required for differenced Y in ldvsimrr )
The topic 9 lecture (in-sample simulation) showed the difference in forecasting results between simulation with average covariates (\(\bar{x}\)) and simulation with individual covariates (\(x_i\)).
We will replicate these differences today, including drawing plots with forecasting results.
In-sample simulation answers the same “what-if” question, but with different conditions: rather than “holding other conditions constant at their average values (\(h = \bar{h}\)),” in-sample simulation cares about counterfactuls “holding other conditions at their observed values (\(h = h_i\)).”
In the case of cigarette tax, we care about the policy implications of cigarette tax in the real world. A hypothetical “average state” does not exist in reality.
Steps for In-sample Simulation:
Calculate quantity of interest (e.g. expected values, first difference and relative risk) at each iteration for each unit
Calculate the point estimate of quantity of interest and its confidence interval across iterations for each unit
Average the obtained quantity of interest across units.
We can make this as a weighted average. For example, weighted by each state’s population.
If you have year FE in your model, we have two choices:
To generate simulations within sample periods (e.g. 1993 is in the range of 1985-1995), we need to set FE dummies in a give year to 1, and others to 0.
To generate simulations out of sample periods (e.g. 1996 is out of the range of 1985-1995), we need to set all FE dummies to 0.
If our model is a differenced model and we want to generate simulations in 1993, we need to let the FE in 1993 subtract the FE in 1992.
Model 2c: Elasticity Specification + Difference GMM + Two-way FE
simparam_2c <-mvrnorm(n = sims, coefficients(pgmm_2c), vcovHC(pgmm_2c))## for some reason, vcovHC() produces a generalized inversesimphis_2c <- simparam_2c[, 1]simbetas_2c <- simparam_2c[,2:ncol(simparam_2c)]
Let us assume that an “average state” raised taxes by 60 cents in 1993.
# Make matrix of hypothetical x's: covariates# Still use the 1g formula (no logs) -- we will handle logging manually# to get the differences of logs rightxhyp_2c <- simcf::cfMake(formula_1c, datayearfe, periods.out)# pgmm uses covariates in differenced form# so we want most of them to be 0 (no change)# exceptions:# (1) changes in covariates of interest# (2) time dummies aren't differencedchangeTax <-c(60,0,0) # create hypothetical change in cigarette tax in t = 1, 2, 3
Create a Logged Differences of Key Covariates
# Need log version of differenced key covariate (doubling tax in avg state)meanPrice <-mean(pdata$avgprs, na.rm=TRUE)meanIncome <-mean(pdata$income95pc, na.rm=TRUE)xhyp_2c <-cfChange(xhyp_2c, "avgprs95",x=log(meanPrice+changeTax[1]) -log(meanPrice),xpre=log(meanPrice) -log(meanPrice),scen=1)xhyp_2c <-cfChange(xhyp_2c, "avgprs95",x=log(meanPrice+changeTax[2]) -log(meanPrice),xpre=log(meanPrice) -log(meanPrice),scen=2)xhyp_2c <-cfChange(xhyp_2c, "avgprs95",x=log(meanPrice+changeTax[3]) -log(meanPrice),xpre=log(meanPrice) -log(meanPrice),scen=3)xhyp_2c <-cfChange(xhyp_2c, "income95pc",x=log(meanIncome) -log(meanIncome),xpre=log(meanIncome) -log(meanIncome),scen=1:3)
Specify the Year Effects
for (iyear in1985:1991) xhyp_2c <-cfChange(xhyp_2c, paste0("y",iyear), x=0, xpre=0, scen=1:3)xhyp_2c <-cfChange(xhyp_2c, "y1992", x=-1, xpre=-1, scen=1) # most tricky part: why set 1992 to -1?xhyp_2c <-cfChange(xhyp_2c, "y1992", x=0, xpre=0, scen=2)xhyp_2c <-cfChange(xhyp_2c, "y1992", x=0, xpre=0, scen=3)xhyp_2c <-cfChange(xhyp_2c, "y1993", x=1, xpre=1, scen=1)xhyp_2c <-cfChange(xhyp_2c, "y1993", x=-1, xpre=-1, scen=2)xhyp_2c <-cfChange(xhyp_2c, "y1993", x=0, xpre=0, scen=3)xhyp_2c <-cfChange(xhyp_2c, "y1994", x=0, xpre=0, scen=1)xhyp_2c <-cfChange(xhyp_2c, "y1994", x=1, xpre=1, scen=2)xhyp_2c <-cfChange(xhyp_2c, "y1994", x=-1, xpre=-1, scen=3)xhyp_2c <-cfChange(xhyp_2c, "y1995", x=0, xpre=0, scen=1)xhyp_2c <-cfChange(xhyp_2c, "y1995", x=0, xpre=0, scen=2)xhyp_2c <-cfChange(xhyp_2c, "y1995", x=1, xpre=1, scen=3)# We can "ignore" the state fixed effects for now and add them later# because model is total linear
Create Baseline Senario for Comparison
# Create baseline scenario: we want to use it to generate first difference and relative risk laterxbase_2c <- xhyp_2cxbase_2c$x <- xbase_2c$xpre# We need a lag of the price per packlagY_2c <-NULL# Hypothetical previous change in Y for simulationfor (iunit in1:length(pgmm_2c$model)) lagY_2c <-c(lagY_2c, as.data.frame(pgmm_2c$model[[iunit]])["1995", 1])lagY_2c <-mean(lagY_2c, na.rm=TRUE)initialY <-mean(pdata$packpc[pdata$year==1993], na.rm=TRUE)# Hypothetical initial level of Y for simulation
Forecast: Model 2C, Tax +60, “Average (Representative) State”
# Simulate expected values of Y (on original level scale)# out to periods.out given hypothetical future values of X,# initial lags of the change in Y, and an initial level of Ysim_ev2c <- simcf::ldvsimev(xhyp_2c, # The matrix of hypothetical x's simbetas_2c, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # NA indicates no constant!phi=simphis_2c, # estimated AR parameters; length must match lagY lagY=lagY_2c, # lags of y, most recent lasttransform="difflog", # "log" to undo log transformation,# "diff" to under first differencing# "difflog" to do bothinitialY=initialY # for differenced models, the lag of the level of y )# Simulate expected values of Y given no change in covariatessim_base2c <-ldvsimev(xbase_2c, # The matrix of hypothetical x's simbetas_2c, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # NA indicates no constant!phi=simphis_2c, # estimated AR parameters; length must match lagY lagY=lagY_2c, # lags of y, most recent lasttransform="difflog", # "log" to undo log transformation,# "diff" to under first differencing# "difflog" to do bothinitialY=initialY # for differenced models, the lag of the level of y ) # Simulate first differences in y (as cumulated changes)# out to periods.out given hypothetical future values of x, xpre,# and initial lags of the change in ysim_fd2c <- simcf::ldvsimfd(xhyp_2c, # The matrix of hypothetical x's simbetas_2c, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # Column containing the constant# set to NA for no constantphi=simphis_2c, # estimated AR parameters; length must match lagY lagY=lagY_2c, # lags of y, most recent lasttransform="difflog",# Model is differenced logsinitialY=initialY # Required in this case (fd of differenced log Y) )# Simulate relative risks in y# out to periods.out given hypothetical future values of x, xpre,# and initial lags of the change in ysim_rr2c <-ldvsimrr(xhyp_2c, # The matrix of hypothetical x's simbetas_2c, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # Column containing the constant# set to NA for no constantphi=simphis_2c, # estimated AR parameters; length must match lagY lagY=lagY_2c, # lags of y, most recent lasttransform="difflog",# Model is differenced logsinitialY=initialY # Required for differenced Y in ldvsimrr )
Now we perform an in-sample simulation procedure. We need to generate simulations of quantity of interests for each state.
# Forecasts: avg state, by state, and weighted avg of all states# for +.6 and raise to max scenarios# outcomes of interest: packs FD, packs RR/%change,# revenue pc FD, revenue mils FD, revenue %GSP FD# Visuals: line plots. 1x2 and 1x3 for two diff scenarios (x2)# List for resultssimEV_2c_scen1 <- simBASE_2c_scen1 <- simFD_2c_scen1 <- simRR_2c_scen1 <-list()# All states modeledallstates <-names(pgmm_2c$model)# Storage vector for weightspopweight <-rep(NA, length(allstates))
# Loop over states in modelfor (i in1:length(pgmm_2c$model)) { curstate <- allstates[i]# Grab population weight for later popweight[i] <- pdata$pop[(pdata$year==1995)&(pdata$state==curstate)]# Make matrix of hypothetical x's: covariates# Still use the 1g formula (no logs) -- we will handle logging manually# to get the differences of logs right xhyp_2c <-cfMake(formula_1c, datayearfe, periods.out)# pgmm uses covariates in differenced form# so we want most of them to be 0 (no change)# exceptions:# (1) changes in covariates of interest# (2) time dummies aren't differenced#xhyp_2c$x <- xhyp_2c$xpre <- 0*xhyp_2c$x# Need log version of differenced key covariate (doubling tax in avg state) curPrice92 <- pdata$avgprs95[(pdata$state==curstate)&(pdata$year==1992)] curPrice93 <- pdata$avgprs95[(pdata$state==curstate)&(pdata$year==1993)] curPrice94 <- pdata$avgprs95[(pdata$state==curstate)&(pdata$year==1994)] curPrice95 <- pdata$avgprs95[(pdata$state==curstate)&(pdata$year==1995)] curIncome92 <- pdata$income95pc[(pdata$state==curstate)&(pdata$year==1992)] curIncome93 <- pdata$income95pc[(pdata$state==curstate)&(pdata$year==1993)] curIncome94 <- pdata$income95pc[(pdata$state==curstate)&(pdata$year==1994)] curIncome95 <- pdata$income95pc[(pdata$state==curstate)&(pdata$year==1995)] changeTax <-c(60, 0, 0) xhyp_2c <-cfChange(xhyp_2c, "avgprs95",x=log(curPrice93+changeTax[1]) -log(curPrice92),xpre=log(curPrice93) -log(curPrice92),scen=1) xhyp_2c <-cfChange(xhyp_2c, "avgprs95",x=log(curPrice94+changeTax[2]) -log(curPrice93),xpre=log(curPrice94) -log(curPrice93),scen=2) xhyp_2c <-cfChange(xhyp_2c, "avgprs95",x=log(curPrice95+changeTax[3]) -log(curPrice94),xpre=log(curPrice95) -log(curPrice94),scen=3) xhyp_2c <-cfChange(xhyp_2c, "income95pc",x=log(curIncome93) -log(curIncome92),xpre=log(curIncome93) -log(curIncome92),scen=1) xhyp_2c <-cfChange(xhyp_2c, "income95pc",x=log(curIncome94) -log(curIncome93),xpre=log(curIncome94) -log(curIncome93),scen=2) xhyp_2c <-cfChange(xhyp_2c, "income95pc",x=log(curIncome95) -log(curIncome94),xpre=log(curIncome95) -log(curIncome94),scen=3)for (iyear in1985:1991) xhyp_2c <-cfChange(xhyp_2c, paste0("y",iyear), x=0, xpre=0, scen=1:3) xhyp_2c <-cfChange(xhyp_2c, "y1992", x=-1, xpre=-1, scen=1) xhyp_2c <-cfChange(xhyp_2c, "y1992", x=0, xpre=0, scen=2) xhyp_2c <-cfChange(xhyp_2c, "y1992", x=0, xpre=0, scen=3) xhyp_2c <-cfChange(xhyp_2c, "y1993", x=1, xpre=1, scen=1) xhyp_2c <-cfChange(xhyp_2c, "y1993", x=-1, xpre=-1, scen=2) xhyp_2c <-cfChange(xhyp_2c, "y1993", x=0, xpre=0, scen=3) xhyp_2c <-cfChange(xhyp_2c, "y1994", x=0, xpre=0, scen=1) xhyp_2c <-cfChange(xhyp_2c, "y1994", x=1, xpre=1, scen=2) xhyp_2c <-cfChange(xhyp_2c, "y1994", x=-1, xpre=-1, scen=3) xhyp_2c <-cfChange(xhyp_2c, "y1995", x=0, xpre=0, scen=1) xhyp_2c <-cfChange(xhyp_2c, "y1995", x=0, xpre=0, scen=2) xhyp_2c <-cfChange(xhyp_2c, "y1995", x=1, xpre=1, scen=3)# Baseline scenario xbase_2c <- xhyp_2c xbase_2c$x <- xbase_2c$xpre# We need a lag of the price per pack lagY_2c <-as.data.frame(pgmm_2c$model[[i]])["1992",1]# This restores the fixed effect by state initialY <- pdata$packpc[(pdata$state==curstate)&(pdata$year==1992)]# Simulate expected values of Y (on original level scale)# out to periods.out given hypothetical future values of X,# initial lags of the change in Y, and an initial level of Y simState_ev2c <-ldvsimev(xhyp_2c, # The matrix of hypothetical x's simbetas_2c, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # NA indicates no constant!phi=simphis_2c, # estimated AR parameters; length must match lagY lagY=lagY_2c, # lags of y, most recent lasttransform="difflog", # "log" to undo log transformation,# "diff" to under first differencing# "difflog" to do bothinitialY=initialY, # for differenced models, the lag of the level of ysimulates=TRUE# store raw simulations )# Simulate expected values of Y given no change in covariates simState_base2c <-ldvsimev(xbase_2c, # The matrix of hypothetical x's simbetas_2c, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # NA indicates no constant!phi=simphis_2c, # estimated AR parameters; length must match lagY lagY=lagY_2c, # lags of y, most recent lasttransform="difflog", # "log" to undo log transformation,# "diff" to under first differencing# "difflog" to do bothinitialY=initialY, # for differenced models, the lag of the level of ysimulates=TRUE ) # Simulate first differences in y (as cumulated changes)# out to periods.out given hypothetical future values of x, xpre,# and initial lags of the change in y simState_fd2c <-ldvsimfd(xhyp_2c, # The matrix of hypothetical x's simbetas_2c, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # Column containing the constant# set to NA for no constantphi=simphis_2c, # estimated AR parameters; length must match lagY lagY=lagY_2c, # lags of y, most recent lasttransform="difflog",# Model is differenced logsinitialY=initialY, # Required in this case (fd of differenced log Y)simulates=TRUE#for FD or RR, it will store the simulations of y1, y0 and y1 - y0 )# Simulate relative risks in y# out to periods.out given hypothetical future values of x, xpre,# and initial lags of the change in y simState_rr2c <-ldvsimrr(xhyp_2c, # The matrix of hypothetical x's simbetas_2c, # The matrix of simulated betasci=0.95, # Desired confidence intervalconstant=NA, # Column containing the constant# set to NA for no constantphi=simphis_2c, # estimated AR parameters; length must match lagY lagY=lagY_2c, # lags of y, most recent lasttransform="difflog",# Model is differenced logsinitialY=initialY, # Required for differenced Y in ldvsimrrsimulates=TRUE )# Collect results (only works for 1 CI) simEV_2c_scen1 <-collectUnitSims(simEV_2c_scen1, simState_ev2c) simBASE_2c_scen1 <-collectUnitSims(simBASE_2c_scen1, simState_base2c) simFD_2c_scen1 <-collectUnitSims(simFD_2c_scen1, simState_fd2c) simRR_2c_scen1 <-collectUnitSims(simRR_2c_scen1, simState_rr2c)}# Compute weighted average across statessimEV_2c_scen1 <-aggregateUnitSims(simEV_2c_scen1, weighted.mean, w=popweight)simBASE_2c_scen1 <-aggregateUnitSims(simBASE_2c_scen1, weighted.mean, w=popweight)simFD_2c_scen1 <-aggregateUnitSims(simFD_2c_scen1, weighted.mean, w=popweight)simRR_2c_scen1 <-aggregateUnitSims(simRR_2c_scen1, weighted.mean, w=popweight)
Plot the Individual Forecasting Results
############## Plot the FD and RR results using a custom tile-graphics function# Nice colorscol <-brewer.pal(4,"Dark2")[3:4]# Other settings common to all plotsavg <-"weighted.mean"periods <-1993:1995limits <-c(1991.9, 1995.25, -43, 2)xat <-c(1992, 1993, 1994, 1995)plottitles <-c("Change, Packs pc after +$.60 tax","Percent Change, Packs pc after +$.60 tax")xtitles <-c("Forecast Year", "Forecast Year")ytitles <-c("","")# Plot the results by statecigUnitLineplots(FD=simFD_2c_scen1, RR=simRR_2c_scen1,periods=periods, units=allstates,showUnits=TRUE, showMean=FALSE,avg=avg, avgLab="US",unitLabX=1995.125, labSet="tb", labK=1,initialZero=TRUE, CI=TRUE,limitsFD=limits, limitsRR=limits, xat=xat,col=col, dots=TRUE, RRasPD=TRUE,plottitles=plottitles,xtitles=xtitles, ytitles=ytitles,file="Lab7_attachments/m2cUnitsFDRR")
[1] "Lab7_attachments/m2cUnitsFDRR.png"
# Plot the results by state with weighted meancigUnitLineplots(FD=simFD_2c_scen1, RR=simRR_2c_scen1,periods=periods, units=allstates,showUnits=TRUE, showMean=TRUE,avg=avg, avgLab="US",unitLabX=1995.125, labSet="tb", labK=1,initialZero=TRUE, CI=FALSE,limitsFD=limits, limitsRR=limits, xat=xat,col=col, dots=TRUE, RRasPD=TRUE,plottitles=plottitles,xtitles=xtitles, ytitles=ytitles,file="Lab7_attachments/m2cUnitsMeanFDRR")
[1] "Lab7_attachments/m2cUnitsMeanFDRR.png"
# Plot the results by state with weighted meancigUnitLineplots(FD=simFD_2c_scen1, RR=simRR_2c_scen1,periods=periods, units=allstates,showUnits=FALSE, showMean=TRUE,avg=avg, avgLab="US",unitLabX=1995.125, labSet="tb", labK=1,initialZero=TRUE, CI=TRUE,limitsFD=limits, limitsRR=limits, xat=xat,col=col, dots=TRUE, RRasPD=TRUE,plottitles=plottitles,xtitles=xtitles, ytitles=ytitles,file="Lab7_attachments/m2cMeanFDRR")
# Plot the comparison of aggregate results and representative resultscigUnitLineplots(FD=simFD_2c_scen1, RR=simRR_2c_scen1,compFD=sim_fd2c, compRR=sim_rr2c,periods=periods, units=allstates,showUnits=FALSE, showMean=TRUE,avg=avg, avgLab="Wgt", compLab="Rep",unitLabX=1995.15, labSet="tb", labK=1,initialZero=TRUE, CI=TRUE, compCI=TRUE,limitsFD=limits, limitsRR=limits, xat=xat,col=col, dots=TRUE, RRasPD=TRUE,plottitles=plottitles,xtitles=xtitles, ytitles=ytitles,file="Lab7_attachments/m2cMeanCompFDRR")
[1] "Lab7_attachments/m2cMeanCompFDRR.png"
state_full <-sapply(allstates, function(x) {state.name[grep(x, state.abb)]}) # both state.name and state.abb are built-in vector of state names# Plot dotplot of results for final period by state: all statescigUnitDotplot(x=simFD_2c_scen1$units,units=state_full, period=3,showRank=TRUE,vertmark=0,limits=c(-62, 5),labspace=0.35,at=seq(-60,0,10),col=col[1],xtitle="Change, Packs pc 3 years after +$.60 tax",toptitle="Change, Packs pc 3 years after +$.60 tax",file="Lab7_attachments/m2cUnitFDdotplot")