CSSS/POLS 512: Lab 7

Panel Data Analysis with Small \(T\)

Tao Lin

May 13, 2022

Agenda

Dynamic panel data models
Counterfactual Forecasting

Nickell Bias

Recall that we can remove fixed effects by differencing:

\[y_{it}= \phi y_{it-1} + \alpha_i + \epsilon_{it}\] \[y_{it} - \bar{y_i} = \phi(y_{it-1} - \bar{y_i})+(x_{it}-\bar{x}_{it})\beta+(\epsilon_{it}-\bar{\epsilon}_{it})\]

This is the “within” estimator or fixed effects model.

Alternatively, we can include dummy variables for each group.

Nickell bias

However, we introduce bias when we difference the model in this way.

\[y_{it} - \bar{y_i} = \phi(y_{it-1} - \bar{y_i})+(x_{it}-\bar{x}_{it})\beta+(\epsilon_{it}-\bar{\epsilon}_{it})\]

This is because \(\bar{y}_i\) is computed using the past \(y\)’s. This is correlated with \(\bar{\epsilon}_{it}\), which is computed using the past \(\epsilon\)’s. Or:

\[ \bar{y_i} = \frac{\bar{X_i}\beta + \bar{\epsilon_i}}{1-\phi} \]

Specifically, this creates bias in the lagged dependent variable (LDV). If the other regressors are correlated with the LDV, then their coefficients may also be seriously biased.

Nickell bias

The degree of bias is order \(1/T\), so it is big for small \(T\).

Furthermore,

The bias increases as \(\beta\) decreases
The bias increases as \(\phi\) increases
Small \(N\) is not the problem. Small \(T\) is the problem

Increasing \(N\) does not mitigate the problem. Purging serial correlation in the errors or getting the specification right (including other regressors) also doesn’t solve the problem.

Instrumental variables

We therefore turn to instrumental variables.

Recall that an instrumental variable must fulfill two conditions:

Relevance: \(z\) is correlated with \(x\);
Exogeneity: \(z\) is uncorrelated with \(\epsilon\). It must influence \(y\) only through \(x\).

\[ y_{it} - \bar{y_i} = \phi(y_{it-1} - \bar{y_i})+(x_{it}-\bar{x}_{it})\beta+(\epsilon_{it}-\bar{\epsilon}_{it}) \]

\[ \Delta y_{it} = \phi \Delta y_{it-1} + \Delta x_{it}\beta + \Delta \epsilon_{it} \]

We use lagged levels and lagged differences of \(\Delta y_{it}\) as instruments for the LDV. These help to predict \(\Delta y_{it}\) but not \(\Delta \epsilon_{it}\) if the errors are iid (see lecture slides for why).

Note: This is different than instrumenting \(x_{it}\). We are not addressing the endogeneity that may exist there. One should not be conflated with the other.

Generalized Method of Moments

Estimation is done using GMM. The usual IV approach cannot handle the number of instruments.

To give a brief overview of the intuition behind GMM, consider a linear model: \[y_t = \boldsymbol{x}_t \boldsymbol{\beta} + e_t\] Recall that the following condition holds under Gauss-Markov. \[E[\boldsymbol{x}_t\epsilon_t]=0\] Assume there exists some combination instrumental variables \(\boldsymbol{z}_t\) that implies the following: \[E[\boldsymbol{z}_t\epsilon_t] = E[\boldsymbol{z}_t(y_t-\boldsymbol{x}_t\boldsymbol{\beta})]=0\] Intuition: GMM attempts to find the \(\boldsymbol{\beta}\) that makes this true.

Generalized method of moments

Population moment: \[E[\boldsymbol{z}_t(y_t-\boldsymbol{x}_t\boldsymbol{\beta})]=0\] Sample moments: \[\frac{1}{n}\sum^n_{t=1}\boldsymbol{z}_t(y-\boldsymbol{x_t}\boldsymbol{\beta})\] \[\substack{\frac{1}{n}\sum^n_{t=1}z_{1t}(y-\boldsymbol{x}_t \boldsymbol{\beta})\\ \vdots\\ \frac{1}{n}\sum^n_{t=1}z_{Kt}(y-\boldsymbol{x}_t \boldsymbol{\beta}) }\] We therefore have \[\boldsymbol{S}_{zy}-\boldsymbol{S}_{zx}\boldsymbol{\beta}=0 \;\;\; \text{where}\] \[\boldsymbol{S}_{zy}=n^{-1}\sum^n_{t=1}\boldsymbol{z}_t y_{t} \;\;\; \text{and} \;\;\; \boldsymbol{S}_{zx}=n^{-1}\sum^n_{t=1}\boldsymbol{z}_t x_{t}\] We solve for \[\hat{\boldsymbol{\beta}}=\boldsymbol{S}^{-1}_{zx}\boldsymbol{S}_{xy}\]

Generalized method of moments

\[\hat{\boldsymbol{\beta}}=\boldsymbol{S}^{-1}_{zx}\boldsymbol{S}_{xy}\]

Can be solved analytically. But we can also use an iterative search.

Anderson-Hsiao estimator: uses the second and third lagged levels as instruments.
Arellano-Bond Difference GMM: uses \(\Delta y\) as the outcome and all available lagged levels as instruments in each period.
Arellano-Bover/Blundell-Bond System GMM: adds the available lagged differences as instruments.

Dynamic Panel Model: Verify Key Assumptions

When using panel GMM, we need to verify some modeling assumptions:

The number or the weakness of instruments
- Sargan test: if \(\mathbb{E}[\hat{\varepsilon} Z] = 0\), then the combination of instruments is strong enough (over-identifying restrictions).
- If we don’t reject the null (p > 0.05 or 0.1), then the instruments are strong enough.
Serial correlation of error terms – should be AR(1) only, otherwise, for example, the lag 2 term becomes a weak instrument.
- Arellano-Bond’s test for serial correlation
Difference or system GMM (whether to use lagged differences as IV)
Use of year fixed effects
- Wald Chi-squared test on year FE dummies: to see whether the inclusion of FE dummies is statistically significant
Use of linear or log-log specification \(\rightarrow\) interpretation of results will be different
- linear: one unit increase in \(x\) will induce \(\beta\) increase in \(y\)
- log-log: 1% increase in \(x\) will induce \(100\times\beta\)% increase in \(y\). (elasticity)

Case Study: How Does Tax Affect Cigarrete Consumption?

# Load libraries
library(plm)            # Econometrics package for linear panel models
library(nlme)           # Estimation of mixed effects models
library(lme4)           # Alternative package for mixed effects models
library(tseries)        # For ADF unit root test
library(simcf)          # For panel functions and simulators
# devtools::install_github("chrisadolph/tile-simcf/tile")
library(tile)           # For visualization of model inference
library(RColorBrewer)   # For nice colors
library(MASS)           # For mvrnorm()
library(lmtest)
source("Lab7_data/helperCigs.R")  # For graphics functions

# Load cigarette consumption data (Jonathan Gruber, MIT)
# Variables (see codebook):
# state year    cpi pop packpc  income  tax avgprs  taxs
data <- read.csv("Lab7_data/cigarette.csv")
colnames(data)

[1] "state"  "year"   "cpi"    "pop"    "packpc" "income" "tax"    "avgprs"
[9] "taxs"

data$state |> unique() |> length() # N = 48

[1] 48

data$year |> unique() |> length() # T = 11

[1] 11

Examining the time series

# Quick inflation adjustment to 1995 dollars
inflAdjust <- function(x,cpi,year,target) {
    unique(cpi[year==target])*x/cpi     
  #Multiply x with cpi in target year then divide by cpi in observed year
}
#Make adjustments to state personal income
data$income95 <- with(data, inflAdjust(income, cpi, year, 1995))
#Average state, federal, and average local excise taxes
data$tax95 <- with(data, inflAdjust(tax, cpi, year, 1995))          
#Average price, including sales taxes
data$avgprs95 <- with(data, inflAdjust(avgprs, cpi, year, 1995))    
#Average excise taxes, including sales taxes
data$taxs95 <- with(data, inflAdjust(taxs, cpi, year, 1995))        
# Create per capita income (in k)
data$income95pc <- data$income95/data$pop
# Create pretax price, 1995 dollars
data$pretax95 <- data$avgprs95 - data$taxs95

attach(data)

Cigarette Consumption Data

Variable	Description¹
`cpi`	consumer price index
`pop`	state population
`packpc`	number of packs per capita
`income95`	state personal income (total)
`income95pc`	state personal income (per capita)
`tax95`	average state, federal, and average local excise taxes for fiscal year
`taxs95`	average excise taxes for fiscal year, including sales taxes
`avgprs95`	average price during fiscal year, including sales taxes (cents)
`pretax95`	`pretax95` = `avgprs95` - `taxs95`

Examining the time series

pseries <- split(data$packpc, data$state)
pp_pval <- sapply(pseries, function(x){pp.test(x)[["p.value"]]})
adf_pval <- sapply(pseries, function(x){adf.test(x)[["p.value"]]})
hist(pp_pval, breaks = 10)

hist(adf_pval, breaks = 10)

Fixed Effect Model

# Check for time invariant variables:
pvar(data, index = c("state", "year"))

no time variation:       state 
no individual variation: year cpi

# "within" option tells plm to do fixed effects
# Note that if you want to add year fixed effects then set effect="time" and for both state
# and year fixed effects set effect effect="twoway"
fe_res <- plm(packpc ~ income95pc + pretax95 + taxs95, data = data, model="within", effect="twoway")
coeftest(fe_res, vcov. = vcovHC, method = "arellano")


t test of coefficients:

            Estimate Std. Error  t value Pr(>|t|)    
income95pc  0.969966   0.643947   1.5063  0.13267    
pretax95   -0.188551   0.077797  -2.4236  0.01575 *  
taxs95     -0.481852   0.044020 -10.9462  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Some tests for serial correlation of errors (needed because we have a linear regression
# with lags of the dependent variable on the RHS
# the standard LM test (note we could specify order)
pbgtest(fe_res)


    Breusch-Godfrey/Wooldridge test for serial correlation in panel models

data:  packpc ~ income95pc + pretax95 + taxs95
chisq = 129.6, df = 11, p-value < 2.2e-16
alternative hypothesis: serial correlation in idiosyncratic errors

(Dynamic) Ramdom Effect Model

# Estimate a random effects AR(I)MA(p,q) model using lme (Restricted ML)
re_res <- lme(# A formula object including the response,
              # the fixed covariates, and any grouping variables
              fixed = packpc ~ income95pc + pretax95 + taxs95,  
              # i.e. response variable and explanatory variables 

              # The random effects component
              random = ~ 1 | state,                     
              # 1 indicates the intercept and state indicates the grouping

              # The TS dynamics: specify the time & group variables,
              # and the order of the ARMA(p,q) process
              correlation = corARMA(form = ~ year | state,
                                    p = 1,  # AR(p) order
                                    q = 0   # MA(q) order
                                    ) 
              )

summary(re_res)

Linear mixed-effects model fit by REML
  Data: NULL 
      AIC     BIC    logLik
  3253.21 3283.04 -1619.605

Random effects:
 Formula: ~1 | state
        (Intercept) Residual
StdDev:  0.01127294 20.92621

Correlation Structure: AR(1)
 Formula: ~year | state 
 Parameter estimate(s):
      Phi 
0.9764735 
Fixed effects:  packpc ~ income95pc + pretax95 + taxs95 
                Value Std.Error  DF    t-value p-value
(Intercept) 173.08136  8.574965 477  20.184499  0.0000
income95pc   -1.05746  0.387602 477  -2.728198  0.0066
pretax95     -0.14537  0.024800 477  -5.861684  0.0000
taxs95       -0.46630  0.040769 477 -11.437827  0.0000
 Correlation: 
           (Intr) incm95 prtx95
income95pc -0.856              
pretax95   -0.099 -0.223       
taxs95     -0.160 -0.097 -0.035

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-2.79963472 -0.57137010 -0.08122771  0.45749547  3.97887775 

Number of Observations: 528
Number of Groups: 48

Dynamic Panel Model

# Panel based diagnostics available in the plm library
# (This package recently expanded to contain many many panel data tests
#  for serial correlation, fixed effects, and unit roots)

# First, create a plm data frame (special data frame that "knows" the
# unit variable and time variable
pdata <- pdata.frame(data, index=c("state", "year"))
head(pdata)

        state year   cpi     pop   packpc   income  tax   avgprs     taxs
AL-1985    AL 1985 1.076 3973000 116.4863 46014968 32.5 102.1817 33.34834
AL-1986    AL 1986 1.096 3992000 117.1593 48703940 32.5 107.9892 33.40584
AL-1987    AL 1987 1.136 4016000 115.8367 51846312 32.5 113.5273 33.46067
AL-1988    AL 1988 1.183 4024000 115.2584 55698852 32.5 120.0334 33.52509
AL-1989    AL 1989 1.240 4030000 109.2060 60044480 32.5 133.2560 33.65600
AL-1990    AL 1990 1.307 4048508 111.7449 64094948 32.5 143.4486 33.75692
        income95    tax95 avgprs95   taxs95 income95pc  pretax95
AL-1985 65173615 46.03160 144.7257 47.23314   16.40413  97.49257
AL-1986 67723361 45.19161 150.1601 46.45118   16.96477 103.70893
AL-1987 69554378 43.60035 152.3025 44.88914   17.31932 107.41336
AL-1988 71754049 41.86813 154.6331 43.18869   17.83152 111.44437
AL-1989 73796599 39.94355 163.7759 41.36431   18.31181 122.41162
AL-1990 74736574 37.89595 167.2652 39.36155   18.46028 127.90366

# Do an panel unit root test on the undifferenced cigarette data;
# there are many options; see ?purtest

# Note: this purtest() function isn't working due to unbalanced panel
#purtest(packpc~1, data=pdata, test="ips")

Dynamic Panel Model

# Estimate Arellano-Bond GMM for fixed effects with lagged DV
#
# pgmm needs formulas in a specific format:
# 1. in the first part of the RHS, include lags of DV and covariates, as shown
# 2. in the second part, include the panel data instruments (99 here means use up to the 99th lag of the difference as an instrument)
# 3. in an optional (not shown) third part of the RHS, include any other instruments
#
# note that pgmm formulas construct lag() properly for pdata.frame,
# if the data is not pdata.frame, lag() will create a wrong lag structure
formula_linear <- packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99) # if you don't want to include too many instruments, you can just set the second part as `lag(packpc, 2:...)`
pgmm_1a <- pgmm(
  formula_linear,
  data = pdata,
  effect = "individual", # should consider two-way (default) for small T; can also set `individual` FE
  transformation = "d" # should do `ld` if T=3, `d` for difference GMM and `ld` for system GMM; default is `d` (difference GMM)
)

Dynamic Panel Model: Investigate AR(1)/AR(2) Autocorrelation and Instrument Weakness

summary(pgmm_1a)

Oneway (individual) effect One-step model Difference GMM 

Call:
pgmm(formula = formula_linear, data = pdata, effect = "individual", 
    transformation = "d")

Balanced Panel: n = 48, T = 11, N = 528

Number of Observations Used: 432
Residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-25.4308  -2.5080   0.1463   0.1238   2.7384  25.6025 

Coefficients:
                Estimate Std. Error z-value  Pr(>|z|)    
lag(packpc, 1)  0.638987   0.055342 11.5462 < 2.2e-16 ***
income95pc     -0.475568   0.486760 -0.9770    0.3286    
avgprs95       -0.180791   0.027799 -6.5035 7.848e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(44) = 47.99763 (p-value = 0.31401)
Autocorrelation test (1): normal = -3.948861 (p-value = 7.8524e-05)
Autocorrelation test (2): normal = -0.5688819 (p-value = 0.56944)
Wald test for coefficients: chisq(3) = 2496.07 (p-value = < 2.22e-16)

# Good Sargan test, Good AR(2) test
# Sargan test has a null of the instruments as a group being exogenous- higher p is good
# The residuals of the differenced equations should exhibit AR(1) but not AR(2) behavior
# higher p for AR(2) test is good

Some examples of Mis-specified models

# only include lag 2 and lag 3 terms
pgmm(
  packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:3),
  data = pdata, effect = "individual", transformation = "d"
) |> summary()

Oneway (individual) effect One-step model Difference GMM 

Call:
pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 | 
    lag(packpc, 2:3), data = pdata, effect = "individual", transformation = "d")

Balanced Panel: n = 48, T = 11, N = 528

Number of Observations Used: 432
Residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-25.9207  -2.4789   0.1232   0.1159   2.7106  25.5997 

Coefficients:
                Estimate Std. Error z-value  Pr(>|z|)    
lag(packpc, 1)  0.660475   0.052854 12.4963 < 2.2e-16 ***
income95pc     -0.258571   0.456052 -0.5670    0.5707    
avgprs95       -0.175368   0.026891 -6.5215  6.96e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(16) = 40.04482 (p-value = 0.00076694)
Autocorrelation test (1): normal = -3.831746 (p-value = 0.00012724)
Autocorrelation test (2): normal = -0.4941905 (p-value = 0.62117)
Wald test for coefficients: chisq(3) = 2405.391 (p-value = < 2.22e-16)

# only include lag 2-5 terms
pgmm(
  packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:5),
  data = pdata, effect = "individual", transformation = "d"
) |> summary()

Oneway (individual) effect One-step model Difference GMM 

Call:
pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 | 
    lag(packpc, 2:5), data = pdata, effect = "individual", transformation = "d")

Balanced Panel: n = 48, T = 11, N = 528

Number of Observations Used: 432
Residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-25.6948  -2.5357   0.1483   0.1217   2.6744  25.5813 

Coefficients:
                Estimate Std. Error z-value  Pr(>|z|)    
lag(packpc, 1)  0.650350   0.055545 11.7086 < 2.2e-16 ***
income95pc     -0.325994   0.497744 -0.6549    0.5125    
avgprs95       -0.181297   0.027242 -6.6550 2.834e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(29) = 42.46279 (p-value = 0.050998)
Autocorrelation test (1): normal = -3.907354 (p-value = 9.3312e-05)
Autocorrelation test (2): normal = -0.5460146 (p-value = 0.58506)
Wald test for coefficients: chisq(3) = 2503.149 (p-value = < 2.22e-16)

Dynamic Panel Model

# Let's try different model specifications:
# model 1a: linear + difference GMM + individual FE
# model 1b: linear + system GMM + individual FE
# model 1c: linear + difference GMM + two-way FE
# model 1b: linear + system GMM + two-way FE
pgmm_1b <- pgmm(
  packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99),
  data = pdata, effect = "individual", transformation = "ld"
)
summary(pgmm_1b)

Oneway (individual) effect One-step model System GMM 

Call:
pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 | 
    lag(packpc, 2:99), data = pdata, effect = "individual", transformation = "ld")

Balanced Panel: n = 48, T = 11, N = 528

Number of Observations Used: 912
Residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-35.0188  -2.5248   0.1318   0.2344   2.8275  29.0107 

Coefficients:
                 Estimate Std. Error z-value Pr(>|z|)    
lag(packpc, 1)  0.9372190  0.0149757 62.5826   <2e-16 ***
income95pc      0.2033459  0.1245084  1.6332   0.1024    
avgprs95       -0.0021312  0.0098748 -0.2158   0.8291    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(55) = 47.79775 (p-value = 0.74372)
Autocorrelation test (1): normal = -3.451448 (p-value = 0.00055759)
Autocorrelation test (2): normal = 0.5944316 (p-value = 0.55222)
Wald test for coefficients: chisq(3) = 109661.9 (p-value = < 2.22e-16)

pgmm_1c <- pgmm(
  packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99),
  data = pdata, effect = "twoways", transformation = "d"
)
summary(pgmm_1c)

Twoways effects One-step model Difference GMM 

Call:
pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 | 
    lag(packpc, 2:99), data = pdata, effect = "twoways", transformation = "d")

Balanced Panel: n = 48, T = 11, N = 528

Number of Observations Used: 432
Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-18.939  -1.890  -0.259   0.000   1.824  20.425 

Coefficients:
                Estimate Std. Error z-value  Pr(>|z|)    
lag(packpc, 1)  0.252415   0.117744  2.1438   0.03205 *  
income95pc      1.062384   0.674055  1.5761   0.11500    
avgprs95       -0.285703   0.060572 -4.7168 2.396e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(44) = 45.92782 (p-value = 0.39225)
Autocorrelation test (1): normal = -3.571843 (p-value = 0.00035448)
Autocorrelation test (2): normal = 0.02846648 (p-value = 0.97729)
Wald test for coefficients: chisq(3) = 35.6544 (p-value = 8.8603e-08)
Wald test for time dummies: chisq(9) = 70.99219 (p-value = 9.7258e-12)

pgmm_1d <- pgmm(
  packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99),
  data = pdata, effect = "twoways", transformation = "ld"
)
summary(pgmm_1d)

Twoways effects One-step model System GMM 

Call:
pgmm(formula = packpc ~ lag(packpc, 1) + income95pc + avgprs95 | 
    lag(packpc, 2:99), data = pdata, effect = "twoways", transformation = "ld")

Balanced Panel: n = 48, T = 11, N = 528

Number of Observations Used: 912
Residuals:
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-31.81880  -2.37490  -0.03745   0.00000   2.18303  27.94124 

Coefficients:
                Estimate Std. Error z-value  Pr(>|z|)    
lag(packpc, 1)  0.912506   0.035086 26.0074 < 2.2e-16 ***
income95pc     -0.016144   0.110832 -0.1457  0.884186    
avgprs95       -0.101336   0.032190 -3.1481  0.001644 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(55) = 45.80834 (p-value = 0.80677)
Autocorrelation test (1): normal = -3.133862 (p-value = 0.0017252)
Autocorrelation test (2): normal = 0.7322591 (p-value = 0.46401)
Wald test for coefficients: chisq(3) = 2942.747 (p-value = < 2.22e-16)
Wald test for time dummies: chisq(9) = 87.03407 (p-value = 6.3969e-15)

Dynamic Panel Model

# Let's try different model specifications:
# model 2a: log-log + difference GMM + individual FE
# model 2b: log-log + system GMM + individual FE
# model 2c: log-log + difference GMM + two-way FE
# model 2b: log-log + system GMM + two-way FE
formula_loglog <- log(packpc) ~ lag(log(packpc), 1) + log(income95pc) + log(avgprs95) | lag(log(packpc), 2:99)
pgmm_2a <- pgmm(formula_loglog, data = pdata, effect = "individual", transformation = "d")
summary(pgmm_2a)

Oneway (individual) effect One-step model Difference GMM 

Call:
pgmm(formula = formula_loglog, data = pdata, effect = "individual", 
    transformation = "d")

Balanced Panel: n = 48, T = 11, N = 528

Number of Observations Used: 432
Residuals:
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.263020 -0.024090  0.002736  0.001092  0.027682  0.216378 

Coefficients:
                     Estimate Std. Error z-value  Pr(>|z|)    
lag(log(packpc), 1)  0.674051   0.061660 10.9317 < 2.2e-16 ***
log(income95pc)     -0.066044   0.111880 -0.5903     0.555    
log(avgprs95)       -0.305656   0.043073 -7.0963 1.281e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(44) = 46.02872 (p-value = 0.38824)
Autocorrelation test (1): normal = -4.050332 (p-value = 5.1145e-05)
Autocorrelation test (2): normal = 0.05678179 (p-value = 0.95472)
Wald test for coefficients: chisq(3) = 2140.806 (p-value = < 2.22e-16)

pgmm_2b <- pgmm(formula_loglog, data = pdata, effect = "individual", transformation = "ld")
summary(pgmm_2b)

Oneway (individual) effect One-step model System GMM 

Call:
pgmm(formula = formula_loglog, data = pdata, effect = "individual", 
    transformation = "ld")

Balanced Panel: n = 48, T = 11, N = 528

Number of Observations Used: 912
Residuals:
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.366760 -0.022702  0.002000  0.001797  0.025148  0.278323 

Coefficients:
                      Estimate Std. Error  z-value Pr(>|z|)    
lag(log(packpc), 1)  0.9860119  0.0094198 104.6742   <2e-16 ***
log(income95pc)     -0.0057655  0.0154370  -0.3735   0.7088    
log(avgprs95)        0.0110743  0.0077600   1.4271   0.1535    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(55) = 47.59902 (p-value = 0.75038)
Autocorrelation test (1): normal = -3.352045 (p-value = 0.00080217)
Autocorrelation test (2): normal = 0.982078 (p-value = 0.32606)
Wald test for coefficients: chisq(3) = 4737094 (p-value = < 2.22e-16)

pgmm_2c <- pgmm(formula_loglog, data = pdata, effect = "twoways", transformation = "d")
summary(pgmm_2c)

Twoways effects One-step model Difference GMM 

Call:
pgmm(formula = formula_loglog, data = pdata, effect = "twoways", 
    transformation = "d")

Balanced Panel: n = 48, T = 11, N = 528

Number of Observations Used: 432
Residuals:
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.181013 -0.019492 -0.001678  0.000000  0.017812  0.204343 

Coefficients:
                    Estimate Std. Error z-value  Pr(>|z|)    
lag(log(packpc), 1)  0.33315    0.14500  2.2976   0.02158 *  
log(income95pc)      0.18131    0.18166  0.9981   0.31825    
log(avgprs95)       -0.62271    0.11127 -5.5965 2.188e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(44) = 43.30027 (p-value = 0.5015)
Autocorrelation test (1): normal = -3.620058 (p-value = 0.00029454)
Autocorrelation test (2): normal = 0.5219654 (p-value = 0.60169)
Wald test for coefficients: chisq(3) = 45.27325 (p-value = 8.0946e-10)
Wald test for time dummies: chisq(9) = 71.0946 (p-value = 9.2856e-12)

pgmm_2d <- pgmm(formula_loglog, data = pdata, effect = "twoways", transformation = "ld")
summary(pgmm_2d)

Twoways effects One-step model System GMM 

Call:
pgmm(formula = formula_loglog, data = pdata, effect = "twoways", 
    transformation = "ld")

Balanced Panel: n = 48, T = 11, N = 528

Number of Observations Used: 912
Residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.3421969 -0.0233093  0.0006081  0.0000000  0.0224292  0.2716041 

Coefficients:
                      Estimate Std. Error z-value  Pr(>|z|)    
lag(log(packpc), 1)  0.9454089  0.0286753 32.9694 < 2.2e-16 ***
log(income95pc)     -0.0072777  0.0214627 -0.3391  0.734544    
log(avgprs95)       -0.1650673  0.0494761 -3.3363  0.000849 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(55) = 44.19932 (p-value = 0.85117)
Autocorrelation test (1): normal = -3.066312 (p-value = 0.0021672)
Autocorrelation test (2): normal = 1.1093 (p-value = 0.2673)
Wald test for coefficients: chisq(3) = 4820.892 (p-value = < 2.22e-16)
Wald test for time dummies: chisq(9) = 90.44993 (p-value = 1.3225e-15)

Simulate Conditional Forecasting

Unfortunately, plm package does not have predict method for plm() and pgmm() functions. So we have to manually construct predictions from new data using simcf package (literrally the name of package means “simulate counterfactual”).
To simulate counterfactual outcomes, we need the following things:
- Simulated parameters
- Hypothetical Scenarios
- Lagged outcome \(y_{t-1}\)
- Initial outcome \(y_0\) (average packpc in 1995)
Simulations with and without year FE dummies are a bit different. Here we take pgmm_1a and pgmm_1c as two examples.

Simulate Conditional Forecasting

# Forecast for 3 years from 1996 to 1998
periods.out <- 3
sims <- 1000

# How big a change in price to simulate?
# How about "double" the average tax in the most recent year?
summary(pdata$taxs95[pdata$year==1995])

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  34.44   48.75   59.84   61.87   74.78  112.63

# The average (and median) tax is about 60 cents/pack
sd(pdata$taxs95[pdata$year==1995])

[1] 18.47741

Simulate Conditional Forecasting

# A 60 cent increase would also be about 3 sd's,
# and raise the tax to a bit more than the max observed

# Other possibilities:
# (2) A 10 cent increase
# (3) Raise every state to the max observed for any state in 1995 (112.60 cents)

# Construct the year dummies
yearfe <- makeFEdummies(pdata$year) # Construct the dummies for each year; it is equivalent to `model.matrix(~ -1 + year, data = pdata)`
yearfe <- yearfe[, 3:ncol(yearfe)] # Why drop first 2 col's? (the first 2 years are missing due to instrumental variables y_{t-2})
yearlist <- unique(pdata$year) # List all the years
yearlist <- yearlist[3:length(yearlist)]    # List the years and exclude the first two
colnames(yearfe) <- paste0("y", yearlist)   # Create names for the year dummies
head(yearfe)

  y1987 y1988 y1989 y1990 y1991 y1992 y1993 y1994 y1995
1     0     0     0     0     0     0     0     0     0
2     0     0     0     0     0     0     0     0     0
3     1     0     0     0     0     0     0     0     0
4     0     1     0     0     0     0     0     0     0
5     0     0     1     0     0     0     0     0     0
6     0     0     0     1     0     0     0     0     0

# concatenate year FE dummies to the pdata.frame
datayearfe <- cbind(pdata, yearfe)
head(datayearfe)

  state year   cpi     pop   packpc   income  tax   avgprs     taxs income95
1    AL 1985 1.076 3973000 116.4863 46014968 32.5 102.1817 33.34834 65173615
2    AL 1986 1.096 3992000 117.1593 48703940 32.5 107.9892 33.40584 67723361
3    AL 1987 1.136 4016000 115.8367 51846312 32.5 113.5273 33.46067 69554378
4    AL 1988 1.183 4024000 115.2584 55698852 32.5 120.0334 33.52509 71754049
5    AL 1989 1.240 4030000 109.2060 60044480 32.5 133.2560 33.65600 73796599
6    AL 1990 1.307 4048508 111.7449 64094948 32.5 143.4486 33.75692 74736574
     tax95 avgprs95   taxs95 income95pc  pretax95 y1987 y1988 y1989 y1990 y1991
1 46.03160 144.7257 47.23314   16.40413  97.49257     0     0     0     0     0
2 45.19161 150.1601 46.45118   16.96477 103.70893     0     0     0     0     0
3 43.60035 152.3025 44.88914   17.31932 107.41336     1     0     0     0     0
4 41.86813 154.6331 43.18869   17.83152 111.44437     0     1     0     0     0
5 39.94355 163.7759 41.36431   18.31181 122.41162     0     0     1     0     0
6 37.89595 167.2652 39.36155   18.46028 127.90366     0     0     0     1     0
  y1992 y1993 y1994 y1995
1     0     0     0     0
2     0     0     0     0
3     0     0     0     0
4     0     0     0     0
5     0     0     0     0
6     0     0     0     0

# Construct formulas -- linear + difference GMM + individual FE (without year FE dummies)
formula_1a <- packpc ~ income95pc + avgprs95 - 1 #with Income and Price as covariates
# Construct formulas -- linear + system GMM + individual FE (without year dummies but with intercept)
formula_1b <- packpc ~ income95pc + avgprs95
# Construct formulas -- linear + difference GMM + two-way FE (with year FE dummies)
formula_1c <- update(
  formula_1a,
  paste(c("~ .", colnames(yearfe)), collapse = "+") # add year FE dummies to the formula
)
formula_1c

packpc ~ income95pc + avgprs95 + y1987 + y1988 + y1989 + y1990 + 
    y1991 + y1992 + y1993 + y1994 + y1995 - 1

# Construct formulas -- linear + system GMM + two-way FE (with year dummies and with intercept)
formula_1d <- update(formula_1b, paste(c("~ .", colnames(yearfe)), collapse = "+"))
formula_1d

packpc ~ income95pc + avgprs95 + y1987 + y1988 + y1989 + y1990 + 
    y1991 + y1992 + y1993 + y1994 + y1995

# Recall model 1a: packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99) # Difference GMM with state fixed effects
# Simulate parameters from an multivariate normal distribution
simparam_1a <- mvrnorm(sims, coefficients(pgmm_1a), vcovHC(pgmm_1a))
simphi_1a <- simparam_1a[, 1, drop = FALSE] # Extract the simulated phis
head(simphi_1a)

     lag(packpc, 1)
[1,]      0.5416002
[2,]      0.6030392
[3,]      0.6241619
[4,]      0.7102634
[5,]      0.6687738
[6,]      0.6624475

simbeta_1a <- simparam_1a[, 2:ncol(simparam_1a)] # Extract the simulated betas
head(simbeta_1a)

      income95pc   avgprs95
[1,] -1.52934504 -0.2003619
[2,] -1.27610642 -0.1904990
[3,] -0.94904496 -0.1894847
[4,] -0.05079141 -0.1546419
[5,] -0.03481707 -0.1958935
[6,] -0.06886777 -0.1625038

# Make matrix of hypothetical x's:
# Assume an average state raised taxes 60 cents starting 1996
#

# Make matrix of hypothetical x's: covariates
xhyp_1a <- cfMake(formula_1a, datayearfe, periods.out)  
#With mean packpc, income, and price for the forecast period

# pgmm uses covariates in differenced form
# so we want most of them to be 0 (no change)
# exceptions:
# (1) changes in covariates of interest
# (2) time dummies aren't differenced
xhyp_1a$x <- xhyp_1a$xpre <- 0 * xhyp_1a$x # make every element in matrix be zero
xhyp_1a <- cfChange(xhyp_1a, "avgprs95", x=60, scen=1) # only change avgprs to 60 (cents)

# We can "ignore" the state fixed effects for now and add them later
# because model is total linear

# Create baseline scenario
xbase_1a <- xhyp_1a
xbase_1a$x <- xbase_1a$xpre

# We need a lag of the price per pack
lagY_1a <- NULL # Hypothetical previous change in Y for simulation
for (i in 1:length(pgmm_1a$model)){     #For 1 to 48
    lagY_1a <- c(lagY_1a, as.data.frame(pgmm_1a$model[[i]])["1995",]$packpc)
}
#Hypothetical change in packpc for each state in 1995
lagY_1a <- mean(lagY_1a, na.rm=TRUE)    #Find the mean of these hypothetical previous changes

# Hypothetical initial level of Y for simulation
initialY <- mean(pdata$packpc[pdata$year==1995], na.rm=TRUE)    #The mean of packpc in 1995

# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_ev1a <- ldvsimev(xhyp_1a,               # The matrix of hypothetical x's
                     simbeta_1a,           # The matrix of simulated betas
                     ci=0.95,            # Desired confidence interval
                     constant=NA,        # NA indicates no constant!
                     phi=simphi_1a,            # estimated AR parameters; length must match lagY 
                     lagY=lagY_1a,       # lags of y, most recent last
                     transform="diff",   # "log" to undo log transformation,
                                         # "diff" to under first differencing
                                         # "difflog" to do both
                     initialY=initialY   # for differenced models, the lag of the level of y
                     )

# Simulate expected values of Y given no change in covariates
sim_base1a <- ldvsimev(xbase_1a,               # The matrix of hypothetical x's
                       simbeta_1a,           # The matrix of simulated betas
                       ci=0.95,            # Desired confidence interval
                       constant=NA,        # NA indicates no constant!
                       phi=simphi_1a,            # estimated AR parameters; length must match lagY 
                       lagY=lagY_1a,          # lags of y, most recent last
                       transform="diff",   # "log" to undo log transformation,
                                         # "diff" to under first differencing
                                         # "difflog" to do both
                       initialY=initialY   # for differenced models, the lag of the level of y
                       )    

# Simulate first differences in y 
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_fd1a <- ldvsimfd(xhyp_1a,            # The matrix of hypothetical x's
                     simbeta_1a,        # The matrix of simulated betas
                     ci=0.95,            # Desired confidence interval
                     constant=NA,        # Column containing the constant
                                         # set to NA for no constant
                     phi=simphi_1a,     # estimated AR parameters; length must match lagY 
                     lagY=lagY_1a,       # lags of y, most recent last
                     transform="diff",   # Model is differenced
                     #initialY=initialY   # Redundant in this case (fd of linear differenced Y)
                     )

# Simulate relative risks in y
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_rr1a <- ldvsimrr(xhyp_1a,            # The matrix of hypothetical x's
                     simbeta_1a,        # The matrix of simulated betas
                     ci=0.95,            # Desired confidence interval
                     constant=NA,        # Column containing the constant
                                         # set to NA for no constant
                     phi=simphi_1a,     # estimated AR parameters; length must match lagY 
                     lagY=lagY_1a,       # lags of y, most recent last
                     transform="diff",   # Model is differenced
                     initialY=initialY   # Required for differenced Y in ldvsimrr
                     )

# Make plots of expected values, first differences, and percent changes
# using custom tile code in helperCigs.R

# Hypothetical initial level of Y for simulation
initialY <- mean(pdata$packpc[pdata$year==1995], na.rm=TRUE)

# axis limits
limitsEV <- c(50, 105)
limitsFD <- c(-40, 5)
limitsRR <- c(-40, 5)

# Model 1a Lineplot: Packs
cigLineplots(sim_ev1a, sim_base1a, sim_fd1a, sim_rr1a,
             limitsEV, limitsFD, limitsRR, initialY,
             "Cigarette Taxes & Consumption: 1a. Linear Difference GMM, Individual Effects",
             "Lab7_attachments/m1aEVFDRR"
             )

[1] "Lab7_attachments/m1aEVFDRR.png"

What if we include year FE dummies?

# Recall model 1g: packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99)
# Difference GMM with state and year fixed effects

# Simulate parameters
simparam_1c <- mvrnorm(sims, coefficients(pgmm_1c), vcovHC(pgmm_1c))        #Sample parameters
simphi_1c <- simparam_1c[,1, drop = FALSE]  #Extract the phis
head(simphi_1c)

     lag(packpc, 1)
[1,]      0.2580608
[2,]      0.3679568
[3,]      0.3494057
[4,]      0.1388827
[5,]      0.1354802
[6,]      0.1497302

simbeta_1c <- simparam_1c[,2:ncol(simparam_1c)] #Extract the betas
head(simbeta_1c)

     income95pc   avgprs95       1987       1988      1989      1990      1991
[1,]  0.7092844 -0.2665089 -1.3755206 -1.4020538 -4.207497 -7.473687 -6.242708
[2,]  1.1209489 -0.3111138 -0.4324139 -0.2739949 -1.427281 -3.702081 -2.322482
[3,]  1.0245717 -0.1755754 -1.6365644 -2.4005438 -5.115813 -9.296309 -7.704801
[4,]  2.2529545 -0.2239747 -1.1385864 -2.7439790 -5.855614 -8.855478 -8.623209
[5,]  0.2272713 -0.2646165 -1.5440783 -1.4155598 -4.176758 -6.957275 -6.643816
[6,]  0.8928155 -0.2028516 -0.6088584 -1.8232832 -4.466801 -7.959842 -8.249410
          1992      1993       1994      1995
[1,] -4.757071 -10.04655 -12.094746 -11.08552
[2,] -0.974558  -5.33021  -8.214082  -6.61522
[3,] -7.429799 -10.45009 -14.028626 -11.00619
[4,] -8.845390 -13.52683 -17.925828 -16.41499
[5,] -5.968225 -11.13828 -13.702666 -12.24326
[6,] -8.581993 -12.50323 -16.145075 -13.87948

# Make matrix of hypothetical x's:
# Assume an average state raised taxes 60 cents starting 1996
#
# Issues -- we need to somehow include the state and year FEs:
#           Let's set the state to be an "average" state in 1995,
#           and year to be like the last year (1995)

# Make matrix of hypothetical x's: covariates
xhyp_1c <- cfMake(formula_1c, datayearfe, periods.out)      #Including the year fixed effects

# pgmm uses covariates in differenced form
# so we want most of them to be 0 (no change)
# exceptions:
# (1) changes in covariates of interest
# (2) differenced time dummies require special care
xhyp_1c$x <- xhyp_1c$xpre <- 0*xhyp_1c$x
xhyp_1c <- cfChange(xhyp_1c, "avgprs95", x=60, scen=1)      #Assume tax is raised 60 cents in 1996
xhyp_1c

$x
  packpc income95pc avgprs95 y1987 y1988 y1989 y1990 y1991 y1992 y1993 y1994
1      0          0       60     0     0     0     0     0     0     0     0
2      0          0        0     0     0     0     0     0     0     0     0
3      0          0        0     0     0     0     0     0     0     0     0
  y1995
1     0
2     0
3     0

$xpre
  packpc income95pc avgprs95 y1987 y1988 y1989 y1990 y1991 y1992 y1993 y1994
1      0          0        0     0     0     0     0     0     0     0     0
2      0          0        0     0     0     0     0     0     0     0     0
3      0          0        0     0     0     0     0     0     0     0     0
  y1995
1     0
2     0
3     0

$model
packpc ~ income95pc + avgprs95 + y1987 + y1988 + y1989 + y1990 + 
    y1991 + y1992 + y1993 + y1994 + y1995 - 1

attr(,"class")
[1] "list"           "counterfactual"

# We can "ignore" the state fixed effects for now and add them later
# because model is total linear
# Create baseline scenario
xbase_1c <- xhyp_1c
xbase_1c$x <- xbase_1c$xpre
xbase_1c

$x
  packpc income95pc avgprs95 y1987 y1988 y1989 y1990 y1991 y1992 y1993 y1994
1      0          0        0     0     0     0     0     0     0     0     0
2      0          0        0     0     0     0     0     0     0     0     0
3      0          0        0     0     0     0     0     0     0     0     0
  y1995
1     0
2     0
3     0

$xpre
  packpc income95pc avgprs95 y1987 y1988 y1989 y1990 y1991 y1992 y1993 y1994
1      0          0        0     0     0     0     0     0     0     0     0
2      0          0        0     0     0     0     0     0     0     0     0
3      0          0        0     0     0     0     0     0     0     0     0
  y1995
1     0
2     0
3     0

$model
packpc ~ income95pc + avgprs95 + y1987 + y1988 + y1989 + y1990 + 
    y1991 + y1992 + y1993 + y1994 + y1995 - 1

attr(,"class")
[1] "list"           "counterfactual"

# We need a lag of the price per pack
lagY_1c <- NULL # Hypothetical previous change in Y for simulation
for (i in 1:length(pgmm_1c$model))
    lagY_1c <- c(lagY_1c, as.data.frame(pgmm_1c$model[[i]])["1995",]$packpc) 
#Store change in packpc 1995 for each state

lagY_1c <- mean(lagY_1c, na.rm=TRUE)    
#Find the mean for all packpc changes in 1995

# Hypothetical initial level of Y for simulation
pdata$packpc[pdata$year==1995]

  AL-1995   AR-1995   AZ-1995   CA-1995   CO-1995   CT-1995   DE-1995   FL-1995 
101.08543 111.04297  71.95417  56.85931  82.58292  79.47219 124.46660  93.07455 
  GA-1995   IA-1995   ID-1995   IL-1995   IN-1995   KS-1995   KY-1995   LA-1995 
 97.47462  92.40160  74.84978  83.26508 134.25835  88.75344 172.64778 105.17613 
  MA-1995   MD-1995   ME-1995   MI-1995   MN-1995   MO-1995   MS-1995   MT-1995 
 76.62064  77.47355 102.46978  81.38825  82.94530 122.45028 105.58245  87.15957 
  NC-1995   ND-1995   NE-1995   NH-1995   NJ-1995   NM-1995   NV-1995   NY-1995 
121.53806  79.80697  87.27071 156.33675  80.37137  64.66887  93.52612  70.81732 
  OH-1995   OK-1995   OR-1995   PA-1995   RI-1995   SC-1995   SD-1995   TN-1995 
111.38010 108.68011  92.15575  95.64309  92.59980 108.08275  97.21923 122.32005 
  TX-1995   UT-1995   VA-1995   VT-1995   WA-1995   WI-1995   WV-1995   WY-1995 
 73.07931  49.27220 105.38687 122.33475  65.53092  92.46635 115.56883 112.23814

initialY <- mean(pdata$packpc[pdata$year==1995], na.rm=TRUE)
#Set the initial mean value of pack in 1995 across states

# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_ev1c <- ldvsimev(xhyp_1c,               # The matrix of hypothetical x's
                     simbeta_1c,           # The matrix of simulated betas
                     ci=0.95,            # Desired confidence interval
                     constant=NA,        # NA indicates no constant!
                     phi=simphi_1c,            # estimated AR parameters; length must match lagY 
                     lagY=lagY_1c,       # lags of y, most recent last
                     transform="diff",   # "log" to undo log transformation,
                                         # "diff" to under first differencing
                                         # "difflog" to do both
                     initialY=initialY   # for differenced models, the lag of the level of y
                     )

# Simulate expected values of Y given no change in covariates
sim_base1c <- ldvsimev(xbase_1c,               # The matrix of hypothetical x's
                       simbeta_1c,           # The matrix of simulated betas
                       ci=0.95,            # Desired confidence interval
                       constant=NA,        # NA indicates no constant!
                       phi=simphi_1c,            # estimated AR parameters; length must match lagY 
                       lagY=lagY_1c,          # lags of y, most recent last
                       transform="diff",   # "log" to undo log transformation,
                                         # "diff" to under first differencing
                                         # "difflog" to do both
                       initialY=initialY   # for differenced models, the lag of the level of y
                       )    

# Simulate first differences in y 
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_fd1c <- ldvsimfd(xhyp_1c,            # The matrix of hypothetical x's
                     simbeta_1c,        # The matrix of simulated betas
                     ci=0.95,            # Desired confidence interval
                     constant=NA,        # Column containing the constant
                                         # set to NA for no constant
                     phi=simphi_1c,     # estimated AR parameters; length must match lagY 
                     lagY=lagY_1c,       # lags of y, most recent last
                     transform="diff",   # Model is differenced
                     #initialY=initialY   # Redundant in this case (fd of linear differenced Y)
                     )

# Simulate relative risks in y
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_rr1c <- ldvsimrr(xhyp_1c,            # The matrix of hypothetical x's
                     simbeta_1c,        # The matrix of simulated betas
                     ci=0.95,            # Desired confidence interval
                     constant=NA,        # Column containing the constant
                                         # set to NA for no constant
                     phi=simphi_1c,     # estimated AR parameters; length must match lagY 
                     lagY=lagY_1c,       # lags of y, most recent last
                     transform="diff",   # Model is differenced
                     initialY=initialY   # Required for differenced Y in ldvsimrr
                     )

# Model xhyp_1c Lineplot: Packs
cigLineplots(sim_ev1c, sim_base1c, sim_fd1c, sim_rr1c,
             limitsEV, limitsFD, limitsRR, initialY,
             "Cigarette Taxes & Consumption: 1c. Linear Difference GMM, Two-way Effects",
             "Lab7_attachments/m1cEVFDRR"
             )

[1] "Lab7_attachments/m1cEVFDRR.png"

In-Sample Simulation with Panel Data Models

The topic 9 lecture (in-sample simulation) showed the difference in forecasting results between simulation with average covariates (\(\bar{x}\)) and simulation with individual covariates (\(x_i\)).

We will replicate these differences today, including drawing plots with forecasting results.

In-sample simulation answers the same “what-if” question, but with different conditions: rather than “holding other conditions constant at their average values (\(h = \bar{h}\)),” in-sample simulation cares about counterfactuls “holding other conditions at their observed values (\(h = h_i\)).”

In the case of cigarette tax, we care about the policy implications of cigarette tax in the real world. A hypothetical “average state” does not exist in reality.

Steps for In-sample Simulation:

Calculate quantity of interest (e.g. expected values, first difference and relative risk) at each iteration for each unit
Calculate the point estimate of quantity of interest and its confidence interval across iterations for each unit
Average the obtained quantity of interest across units.
- We can make this as a weighted average. For example, weighted by each state’s population.
If you have year FE in your model, we have two choices:
- To generate simulations within sample periods (e.g. 1993 is in the range of 1985-1995), we need to set FE dummies in a give year to 1, and others to 0.
- To generate simulations out of sample periods (e.g. 1996 is out of the range of 1985-1995), we need to set all FE dummies to 0.
  - If our model is a differenced model and we want to generate simulations in 1993, we need to let the FE in 1993 subtract the FE in 1992.

Model 2c: Elasticity Specification + Difference GMM + Two-way FE

simparam_2c <- mvrnorm(n = sims, coefficients(pgmm_2c), vcovHC(pgmm_2c))
## for some reason, vcovHC() produces a generalized inverse
simphis_2c <- simparam_2c[, 1]
simbetas_2c <- simparam_2c[,2:ncol(simparam_2c)]

Let us assume that an “average state” raised taxes by 60 cents in 1993.

# Make matrix of hypothetical x's: covariates
# Still use the 1g formula (no logs) -- we will handle logging manually
# to get the differences of logs right
xhyp_2c <- simcf::cfMake(formula_1c, datayearfe, periods.out)

# pgmm uses covariates in differenced form
# so we want most of them to be 0 (no change)
# exceptions:
# (1) changes in covariates of interest
# (2) time dummies aren't differenced
changeTax <- c(60,0,0) # create hypothetical change in cigarette tax in t = 1, 2, 3

Create a Logged Differences of Key Covariates

# Need log version of differenced key covariate (doubling tax in avg state)
meanPrice <- mean(pdata$avgprs, na.rm=TRUE)
meanIncome <- mean(pdata$income95pc, na.rm=TRUE)

xhyp_2c <- cfChange(xhyp_2c, "avgprs95",
                    x=log(meanPrice+changeTax[1]) - log(meanPrice),
                    xpre=log(meanPrice) - log(meanPrice),
                    scen=1)

xhyp_2c <- cfChange(xhyp_2c, "avgprs95",
                    x=log(meanPrice+changeTax[2]) - log(meanPrice),
                    xpre=log(meanPrice) - log(meanPrice),
                    scen=2)

xhyp_2c <- cfChange(xhyp_2c, "avgprs95",
                    x=log(meanPrice+changeTax[3]) - log(meanPrice),
                    xpre=log(meanPrice) - log(meanPrice),
                    scen=3)

xhyp_2c <- cfChange(xhyp_2c, "income95pc",
                    x=log(meanIncome) - log(meanIncome),
                    xpre=log(meanIncome) - log(meanIncome),
                    scen=1:3)

Specify the Year Effects

for (iyear in 1985:1991)
    xhyp_2c <- cfChange(xhyp_2c, paste0("y",iyear), x=0, xpre=0, scen=1:3)


xhyp_2c <- cfChange(xhyp_2c, "y1992", x=-1, xpre=-1, scen=1)   # most tricky part: why set 1992 to -1?
xhyp_2c <- cfChange(xhyp_2c, "y1992", x=0, xpre=0, scen=2)
xhyp_2c <- cfChange(xhyp_2c, "y1992", x=0, xpre=0, scen=3)

xhyp_2c <- cfChange(xhyp_2c, "y1993", x=1, xpre=1, scen=1)
xhyp_2c <- cfChange(xhyp_2c, "y1993", x=-1, xpre=-1, scen=2)
xhyp_2c <- cfChange(xhyp_2c, "y1993", x=0, xpre=0, scen=3)

xhyp_2c <- cfChange(xhyp_2c, "y1994", x=0, xpre=0, scen=1)
xhyp_2c <- cfChange(xhyp_2c, "y1994", x=1, xpre=1, scen=2)
xhyp_2c <- cfChange(xhyp_2c, "y1994", x=-1, xpre=-1, scen=3)

xhyp_2c <- cfChange(xhyp_2c, "y1995", x=0, xpre=0, scen=1)
xhyp_2c <- cfChange(xhyp_2c, "y1995", x=0, xpre=0, scen=2)
xhyp_2c <- cfChange(xhyp_2c, "y1995", x=1, xpre=1, scen=3)

# We can "ignore" the state fixed effects for now and add them later
# because model is total linear

Create Baseline Senario for Comparison

# Create baseline scenario: we want to use it to generate first difference and relative risk later
xbase_2c <- xhyp_2c
xbase_2c$x <- xbase_2c$xpre

# We need a lag of the price per pack
lagY_2c <- NULL # Hypothetical previous change in Y for simulation
for (iunit in 1:length(pgmm_2c$model))
    lagY_2c <- c(lagY_2c, as.data.frame(pgmm_2c$model[[iunit]])["1995", 1])
lagY_2c <- mean(lagY_2c, na.rm=TRUE)

initialY <- mean(pdata$packpc[pdata$year==1993], na.rm=TRUE)
               # Hypothetical initial level of Y for simulation

Forecast: Model 2C, Tax +60, “Average (Representative) State”

# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_ev2c <- simcf::ldvsimev(xhyp_2c,               # The matrix of hypothetical x's
                     simbetas_2c,           # The matrix of simulated betas
                     ci=0.95,            # Desired confidence interval
                     constant=NA,        # NA indicates no constant!
                     phi=simphis_2c,            # estimated AR parameters; length must match lagY 
                     lagY=lagY_2c,          # lags of y, most recent last
                     transform="difflog",   # "log" to undo log transformation,
                                         # "diff" to under first differencing
                                         # "difflog" to do both
                     initialY=initialY   # for differenced models, the lag of the level of y
                     )

# Simulate expected values of Y given no change in covariates
sim_base2c <- ldvsimev(xbase_2c,               # The matrix of hypothetical x's
                       simbetas_2c,           # The matrix of simulated betas
                       ci=0.95,            # Desired confidence interval
                       constant=NA,        # NA indicates no constant!
                       phi=simphis_2c,            # estimated AR parameters; length must match lagY 
                       lagY=lagY_2c,          # lags of y, most recent last
                       transform="difflog",   # "log" to undo log transformation,
                                        # "diff" to under first differencing
                                        # "difflog" to do both
                       initialY=initialY   # for differenced models, the lag of the level of y
                       ) 

# Simulate first differences in y (as cumulated changes)
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_fd2c <- simcf::ldvsimfd(xhyp_2c,            # The matrix of hypothetical x's
                     simbetas_2c,        # The matrix of simulated betas
                     ci=0.95,            # Desired confidence interval
                     constant=NA,        # Column containing the constant
                                         # set to NA for no constant
                     phi=simphis_2c,     # estimated AR parameters; length must match lagY 
                     lagY=lagY_2c,       # lags of y, most recent last
                     transform="difflog",# Model is differenced logs
                     initialY=initialY   # Required in this case (fd of differenced log Y)
                     )

# Simulate relative risks in y
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_rr2c <- ldvsimrr(xhyp_2c,            # The matrix of hypothetical x's
                     simbetas_2c,        # The matrix of simulated betas
                     ci=0.95,            # Desired confidence interval
                     constant=NA,        # Column containing the constant
                                         # set to NA for no constant
                     phi=simphis_2c,     # estimated AR parameters; length must match lagY 
                     lagY=lagY_2c,       # lags of y, most recent last
                     transform="difflog",# Model is differenced logs
                     initialY=initialY   # Required for differenced Y in ldvsimrr
                     )

Plotting forecasting results

# Model 3g Lineplot: Packs
cigLineplots(sim_ev2c, sim_base2c, sim_fd2c, sim_rr2c,
             limitsEV, limitsFD, limitsRR, initialY,
             "Cigarette Taxes & Consumption: 3g. Log-Log Difference GMM, Two-Way Effects",
             "Lab7_attachments/m2cEVFDRR"
             )

[1] "Lab7_attachments/m2cEVFDRR.png"

Forecast: Model 2C, Tax +60, “Individual States”

Now we perform an in-sample simulation procedure. We need to generate simulations of quantity of interests for each state.

# Forecasts: avg state, by state, and weighted avg of all states
#         for +.6 and raise to max scenarios
#         outcomes of interest: packs FD, packs RR/%change,
#                               revenue pc FD, revenue mils FD, revenue %GSP FD

# Visuals: line plots.  1x2 and 1x3 for two diff scenarios (x2)

# List for results
simEV_2c_scen1 <- simBASE_2c_scen1 <- simFD_2c_scen1 <- simRR_2c_scen1 <- list()

# All states modeled
allstates <- names(pgmm_2c$model)

# Storage vector for weights
popweight <- rep(NA, length(allstates))

# Loop over states in model
for (i in 1:length(pgmm_2c$model)) {

    curstate <- allstates[i]

    # Grab population weight for later
    popweight[i] <- pdata$pop[(pdata$year==1995)&(pdata$state==curstate)]
    
    # Make matrix of hypothetical x's: covariates
    # Still use the 1g formula (no logs) -- we will handle logging manually
    #  to get the differences of logs right
    xhyp_2c <- cfMake(formula_1c, datayearfe, periods.out)

    # pgmm uses covariates in differenced form
    # so we want most of them to be 0 (no change)
    # exceptions:
    # (1) changes in covariates of interest
    # (2) time dummies aren't differenced
    #xhyp_2c$x <- xhyp_2c$xpre <- 0*xhyp_2c$x

    # Need log version of differenced key covariate (doubling tax in avg state)
    curPrice92 <- pdata$avgprs95[(pdata$state==curstate)&(pdata$year==1992)]
    curPrice93 <- pdata$avgprs95[(pdata$state==curstate)&(pdata$year==1993)]
    curPrice94 <- pdata$avgprs95[(pdata$state==curstate)&(pdata$year==1994)]
    curPrice95 <- pdata$avgprs95[(pdata$state==curstate)&(pdata$year==1995)]
    curIncome92 <-  pdata$income95pc[(pdata$state==curstate)&(pdata$year==1992)]
    curIncome93 <-  pdata$income95pc[(pdata$state==curstate)&(pdata$year==1993)]
    curIncome94 <-  pdata$income95pc[(pdata$state==curstate)&(pdata$year==1994)]
    curIncome95 <-  pdata$income95pc[(pdata$state==curstate)&(pdata$year==1995)]

    changeTax <- c(60, 0, 0)
    
    xhyp_2c <- cfChange(xhyp_2c, "avgprs95",
                        x=log(curPrice93+changeTax[1]) - log(curPrice92),
                        xpre=log(curPrice93) - log(curPrice92),
                        scen=1)

    xhyp_2c <- cfChange(xhyp_2c, "avgprs95",
                        x=log(curPrice94+changeTax[2]) - log(curPrice93),
                        xpre=log(curPrice94) - log(curPrice93),
                        scen=2)
    
    xhyp_2c <- cfChange(xhyp_2c, "avgprs95",
                        x=log(curPrice95+changeTax[3]) - log(curPrice94),
                        xpre=log(curPrice95) - log(curPrice94),
                        scen=3)

    xhyp_2c <- cfChange(xhyp_2c, "income95pc",
                        x=log(curIncome93) - log(curIncome92),
                        xpre=log(curIncome93) - log(curIncome92),
                        scen=1)
    
    xhyp_2c <- cfChange(xhyp_2c, "income95pc",
                        x=log(curIncome94) - log(curIncome93),
                        xpre=log(curIncome94) - log(curIncome93),
                        scen=2)
    
    xhyp_2c <- cfChange(xhyp_2c, "income95pc",
                        x=log(curIncome95) - log(curIncome94),
                        xpre=log(curIncome95) - log(curIncome94),
                        scen=3)
    
    for (iyear in 1985:1991)
        xhyp_2c <- cfChange(xhyp_2c, paste0("y",iyear), x=0, xpre=0, scen=1:3)
    
    xhyp_2c <- cfChange(xhyp_2c, "y1992", x=-1, xpre=-1, scen=1)
    xhyp_2c <- cfChange(xhyp_2c, "y1992", x=0, xpre=0, scen=2)
    xhyp_2c <- cfChange(xhyp_2c, "y1992", x=0, xpre=0, scen=3)
    
    xhyp_2c <- cfChange(xhyp_2c, "y1993", x=1, xpre=1, scen=1)
    xhyp_2c <- cfChange(xhyp_2c, "y1993", x=-1, xpre=-1, scen=2)
    xhyp_2c <- cfChange(xhyp_2c, "y1993", x=0, xpre=0, scen=3)
    
    xhyp_2c <- cfChange(xhyp_2c, "y1994", x=0, xpre=0, scen=1)
    xhyp_2c <- cfChange(xhyp_2c, "y1994", x=1, xpre=1, scen=2)
    xhyp_2c <- cfChange(xhyp_2c, "y1994", x=-1, xpre=-1, scen=3)
    
    xhyp_2c <- cfChange(xhyp_2c, "y1995", x=0, xpre=0, scen=1)
    xhyp_2c <- cfChange(xhyp_2c, "y1995", x=0, xpre=0, scen=2)
    xhyp_2c <- cfChange(xhyp_2c, "y1995", x=1, xpre=1, scen=3)

    # Baseline scenario
    xbase_2c <- xhyp_2c
    xbase_2c$x <- xbase_2c$xpre
    
    # We need a lag of the price per pack
    lagY_2c <- as.data.frame(pgmm_2c$model[[i]])["1992",1]

    # This restores the fixed effect by state
    initialY <- pdata$packpc[(pdata$state==curstate)&(pdata$year==1992)]

    
    # Simulate expected values of Y (on original level scale)
    # out to periods.out given hypothetical future values of X,
    # initial lags of the change in Y, and an initial level of Y
    simState_ev2c <- ldvsimev(xhyp_2c,               # The matrix of hypothetical x's
                              simbetas_2c,           # The matrix of simulated betas
                              ci=0.95,            # Desired confidence interval
                              constant=NA,        # NA indicates no constant!
                              phi=simphis_2c,            # estimated AR parameters; length must match lagY 
                              lagY=lagY_2c,          # lags of y, most recent last
                              transform="difflog",   # "log" to undo log transformation,
                                        # "diff" to under first differencing
                                        # "difflog" to do both
                              initialY=initialY,   # for differenced models, the lag of the level of y
                              simulates=TRUE # store raw simulations
                              )

    # Simulate expected values of Y given no change in covariates
    simState_base2c <- ldvsimev(xbase_2c,               # The matrix of hypothetical x's
                                simbetas_2c,           # The matrix of simulated betas
                                ci=0.95,            # Desired confidence interval
                                constant=NA,        # NA indicates no constant!
                                phi=simphis_2c,            # estimated AR parameters; length must match lagY 
                                lagY=lagY_2c,          # lags of y, most recent last
                                transform="difflog",   # "log" to undo log transformation,
                                        # "diff" to under first differencing
                                        # "difflog" to do both
                                initialY=initialY,   # for differenced models, the lag of the level of y
                                simulates=TRUE
                                ) 
    
    # Simulate first differences in y (as cumulated changes)
    # out to periods.out given hypothetical future values of x, xpre,
    # and initial lags of the change in y
    simState_fd2c <- ldvsimfd(xhyp_2c,            # The matrix of hypothetical x's
                              simbetas_2c,        # The matrix of simulated betas
                              ci=0.95,            # Desired confidence interval
                              constant=NA,        # Column containing the constant
                                        # set to NA for no constant
                              phi=simphis_2c,     # estimated AR parameters; length must match lagY 
                              lagY=lagY_2c,       # lags of y, most recent last
                              transform="difflog",# Model is differenced logs
                              initialY=initialY,  # Required in this case (fd of differenced log Y)
                              simulates=TRUE #for FD or RR, it will store the simulations of y1, y0 and y1 - y0
                              )

    # Simulate relative risks in y
    # out to periods.out given hypothetical future values of x, xpre,
    # and initial lags of the change in y
    simState_rr2c <- ldvsimrr(xhyp_2c,            # The matrix of hypothetical x's
                              simbetas_2c,        # The matrix of simulated betas
                              ci=0.95,            # Desired confidence interval
                              constant=NA,        # Column containing the constant
                                        # set to NA for no constant
                              phi=simphis_2c,     # estimated AR parameters; length must match lagY 
                              lagY=lagY_2c,       # lags of y, most recent last
                              transform="difflog",# Model is differenced logs
                              initialY=initialY,  # Required for differenced Y in ldvsimrr
                              simulates=TRUE
                              )

    # Collect results (only works for 1 CI)
    simEV_2c_scen1 <- collectUnitSims(simEV_2c_scen1, simState_ev2c)
    simBASE_2c_scen1 <- collectUnitSims(simBASE_2c_scen1, simState_base2c)
    simFD_2c_scen1 <- collectUnitSims(simFD_2c_scen1, simState_fd2c)
    simRR_2c_scen1 <- collectUnitSims(simRR_2c_scen1, simState_rr2c)
}

# Compute weighted average across states
simEV_2c_scen1 <- aggregateUnitSims(simEV_2c_scen1, weighted.mean, w=popweight)
simBASE_2c_scen1 <- aggregateUnitSims(simBASE_2c_scen1, weighted.mean, w=popweight)
simFD_2c_scen1 <- aggregateUnitSims(simFD_2c_scen1, weighted.mean, w=popweight)
simRR_2c_scen1 <- aggregateUnitSims(simRR_2c_scen1, weighted.mean, w=popweight)

Plot the Individual Forecasting Results

#############
# Plot the FD and RR results using a custom tile-graphics function

# Nice colors
col <- brewer.pal(4,"Dark2")[3:4]

# Other settings common to all plots
avg <- "weighted.mean"
periods <- 1993:1995
limits <- c(1991.9, 1995.25, -43, 2)
xat <- c(1992, 1993, 1994, 1995)
plottitles <- c("Change, Packs pc after +$.60 tax",
                "Percent Change, Packs pc after +$.60 tax")
xtitles <- c("Forecast Year", "Forecast Year")
ytitles <- c("","")

# Plot the results by state
cigUnitLineplots(FD=simFD_2c_scen1, RR=simRR_2c_scen1,
                 periods=periods, units=allstates,
                 showUnits=TRUE, showMean=FALSE,
                 avg=avg, avgLab="US",
                 unitLabX=1995.125, labSet="tb", labK=1,
                 initialZero=TRUE, CI=TRUE,
                 limitsFD=limits, limitsRR=limits, xat=xat,
                 col=col, dots=TRUE, RRasPD=TRUE,
                 plottitles=plottitles,
                 xtitles=xtitles, ytitles=ytitles,
                 file="Lab7_attachments/m2cUnitsFDRR")

[1] "Lab7_attachments/m2cUnitsFDRR.png"

# Plot the results by state with weighted mean
cigUnitLineplots(FD=simFD_2c_scen1, RR=simRR_2c_scen1,
                 periods=periods, units=allstates,
                 showUnits=TRUE, showMean=TRUE,
                 avg=avg, avgLab="US",
                 unitLabX=1995.125, labSet="tb", labK=1,
                 initialZero=TRUE, CI=FALSE,
                 limitsFD=limits, limitsRR=limits, xat=xat,
                 col=col, dots=TRUE, RRasPD=TRUE,
                 plottitles=plottitles,
                 xtitles=xtitles, ytitles=ytitles,
                 file="Lab7_attachments/m2cUnitsMeanFDRR")

[1] "Lab7_attachments/m2cUnitsMeanFDRR.png"

# Plot the results by state with weighted mean
cigUnitLineplots(FD=simFD_2c_scen1, RR=simRR_2c_scen1,
                 periods=periods, units=allstates,
                 showUnits=FALSE, showMean=TRUE,
                 avg=avg, avgLab="US",
                 unitLabX=1995.125, labSet="tb", labK=1,
                 initialZero=TRUE, CI=TRUE,
                 limitsFD=limits, limitsRR=limits, xat=xat,
                 col=col, dots=TRUE, RRasPD=TRUE,
                 plottitles=plottitles,
                 xtitles=xtitles, ytitles=ytitles,
                 file="Lab7_attachments/m2cMeanFDRR")

[1] "Lab7_attachments/m2cMeanFDRR.png"

# Plot the representative state results
cigUnitLineplots(FD=simFD_2c_scen1, RR=simRR_2c_scen1,
                 compFD=sim_fd2c, compRR=sim_rr2c,
                 periods=periods, units=allstates,
                 showUnits=FALSE, showMean=FALSE,
                 avg=avg, avgLab="Wgt", compLab="Rep",
                 unitLabX=1995.15, labSet="tb", labK=1,
                 initialZero=TRUE, CI=TRUE, compCI=TRUE,
                 limitsFD=limits, limitsRR=limits, xat=xat,
                 col=col, dots=TRUE, RRasPD=TRUE,
                 plottitles=plottitles,
                 xtitles=xtitles, ytitles=ytitles,
                 file="Lab7_attachments/m2cCompFDRR")

[1] "Lab7_attachments/m2cCompFDRR.png"

# Plot the comparison of aggregate results and representative results
cigUnitLineplots(FD=simFD_2c_scen1, RR=simRR_2c_scen1,
                 compFD=sim_fd2c, compRR=sim_rr2c,
                 periods=periods, units=allstates,
                 showUnits=FALSE, showMean=TRUE,
                 avg=avg, avgLab="Wgt", compLab="Rep",
                 unitLabX=1995.15, labSet="tb", labK=1,
                 initialZero=TRUE, CI=TRUE, compCI=TRUE,
                 limitsFD=limits, limitsRR=limits, xat=xat,
                 col=col, dots=TRUE, RRasPD=TRUE,
                 plottitles=plottitles,
                 xtitles=xtitles, ytitles=ytitles,
                 file="Lab7_attachments/m2cMeanCompFDRR")

[1] "Lab7_attachments/m2cMeanCompFDRR.png"

state_full <- sapply(allstates, function(x) {state.name[grep(x, state.abb)]}) # both state.name and state.abb are built-in vector of state names

# Plot dotplot of results for final period by state: all states
cigUnitDotplot(x=simFD_2c_scen1$units,
               units=state_full, period=3,
               showRank=TRUE,
               vertmark=0,
               limits=c(-62, 5),
               labspace=0.35,
               at=seq(-60,0,10),
               col=col[1],
               xtitle="Change, Packs pc 3 years after +$.60 tax",
               toptitle="Change, Packs pc 3 years after +$.60 tax",
               file="Lab7_attachments/m2cUnitFDdotplot")

[1] "Lab7_attachments/m2cUnitFDdotplot.png"

Footnotes

See https://vincentarelbundock.github.io/Rdatasets/doc/Ecdat/Cigarette.html.