---
title: "CSSS/POLS 512 - Time Series and Panel Data Methods"
subtitle: "Lab 6: Nickell Bias and GMM Dynamic Panel Estimators"
author: "Ramses Llobet"
fontfamily: accanthis
linkcolor: blue
output:
html_document:
toc: true
toc_depth: 4
editor_options:
chunk_output_type: console
fontsize: 11pt
header-includes:
\usepackage{float}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, error=FALSE, warning=FALSE, message = FALSE)
# clean environment
rm(list=ls())
# Load packages
library(nlme) # Estimation of mixed effects models
library(lme4) # Alternative package for mixed effects models
library(plm) # Econometrics package for linear panel models
library(fixest) # Alternative package for fixed effect models
library(arm) # Gelman & Hill code for mixed effects simulation
library(pcse) # Calculate PCSEs for LS models (Beck & Katz)
library(tseries) # For ADF unit root test
library(simcf) # For panel functions and simulators
library(tile) # To plot using tile
library(lmtest) # Print results after correcting standard errors
library(stargazer) # For output tables
library(RColorBrewer) # For nice colors
library(MASS) # For mvrnorm()
library(tidyverse) # For data wrangling
# load source
source("source/TS_code.R") # Chris's TS custom functions
```
# Dynamic Panel Estimators
## Nickell Bias
Recall that we can remove fixed effects by differencing:
$$
\begin{aligned}
y_{it} & = \phi y_{it-1} + \alpha_i + \epsilon_{it} \\
y_{it} - \bar{y_i} & = \phi(y_{it-1} - \bar{y_i})+(x_{it}-\bar{x}_{it})\beta+(\epsilon_{it}-\bar{\epsilon}_{it})
\end{aligned}
$$
Alternatively, we can include dummy variables for each group. This is the "within" estimator or fixed effects model. However, we introduce bias when we difference the model in this way.
This is because $\bar{y}_i$ is computed using the past $y$'s. This is correlated with $\bar{\epsilon}_{it}$, which is computed using the past $\epsilon$'s. Or:
$$
\bar{y_i} = \frac{\bar{X_i}\beta + \bar{\epsilon_i}}{1-\phi}
$$
Specifically, this creates bias in the lagged dependent variable (LDV). If the other regressors are correlated with the LDV, then their coefficients may also be seriously biased.
The degree of bias is order $1/T$, so it is big for small $T$.
Furthermore,
+ The bias increases as $\beta$ decreases
+ The bias increases as $\phi$ increases
+ Small $N$ is not the problem. Small $T$ is the problem
Increasing $N$ does not mitigate the problem. Purging serial correlation in the errors or getting the specification right (including other regressors) also doesn't solve the problem.
## Instrumental variables
We therefore turn to instrumental variables.
Recall that an instrumental variable must fulfill two conditions:
1) Relevance: $z$ is correlated with $x$;
2) Exogeneity: $z$ is uncorrelated with $\epsilon$. It must influence $y$ only through $x$.
$$
\begin{aligned}
y_{it} - \bar{y_i} & = \phi(y_{it-1} - \bar{y_i})+(x_{it}-\bar{x}_{it})\beta+(\epsilon_{it}-\bar{\epsilon}_{it}) \\
\Delta y_{it} & = \phi \Delta y_{it-1} + \Delta x_{it}\beta + \Delta \epsilon_{it}
\end{aligned}
$$
We use lagged levels and lagged differences of $\Delta y_{it}$ as instruments for the LDV. These help to predict $\Delta y_{it}$ but not $\Delta \epsilon_{it}$ if the errors are iid (see lecture slides for why).
Note: This is different than instrumenting $x_{it}$. We are not addressing the endogeneity that may exist there. One should not be conflated with the other.
## Generalized Method of Moments
Recall what is a moment in statistics. Moments are used to describe features of a distribution or a dataset, and provide information about the central tendency, the shape, and many other distributional properties. For example, the **mean** and the **variance** are the first and second central moments of a distribution. The motivation behind GMM estimation is that the usual least squares IV approach cannot handle multiple equations.
To give a brief overview of the intuition behind GMM, consider a linear model:
$$y_t = \boldsymbol{x}_t \boldsymbol{\beta} + e_t$$
Recall that the exogeneity assumption (aka no misspsecification) holds under Gauss-Markov.
$$E[\boldsymbol{x}_t\epsilon_t]=0$$
Assume there exists some combination instrumental variables $\boldsymbol{z}_t$ that implies the following:
$$
\begin{aligned}
E[\boldsymbol{z}_t\epsilon_t] & =0 \\
E[\boldsymbol{z}_t\epsilon_t] & = E[\boldsymbol{z}_t(y_t-\boldsymbol{x}_t\boldsymbol{\beta})]=0
\end{aligned}
$$
Intuition: GMM estimates a General Weighting Matrix $W$ that returns uncorrelated covariances between the set of instruments $Z_t$ and the predicted errors $\hat{e_t}$ from the sample.
Recall the distinction between DGP/population and sample in statistical inference:
- Population moment:
$$E[\boldsymbol{z}_t(y_t-\boldsymbol{x}_t\boldsymbol{\beta})]=0$$
- Sample moments:
$$\frac{1}{n}\sum^n_{t=1}\boldsymbol{z}_t(y-\boldsymbol{x_t}\boldsymbol{\beta})$$
$$\substack{\frac{1}{n}\sum^n_{t=1}z_{1t}(y-\boldsymbol{x}_t \boldsymbol{\beta})\\
\vdots\\
\frac{1}{n}\sum^n_{t=1}z_{Kt}(y-\boldsymbol{x}_t \boldsymbol{\beta})
}$$
Therefore, a generalized method of moments (GMM) esitmator of $\beta$ is a vector of $\hat{\beta}$ that solves the problem:
$$
\hat{\beta}_{\text{gmm}} = \underset{b}{\text{arg min}} \left[ \sum Z_i'(y_i - X_ib) \right]' \hat{W} \left[ \sum Z_i'(y_i - X_ib) \right]
$$
We can compute the above analytically or numerically, where weighting matrix $\hat{W}$ provides a solution that minimizes the weighted sum fo squares of variances between instruments and residuals. Thus,
$$\boldsymbol{S}_{zy}-\boldsymbol{S}_{zx}\boldsymbol{\beta}=0 \;\;\; \text{where}$$
$$\boldsymbol{S}_{zy}=n^{-1}\sum^n_{t=1}\boldsymbol{z}_t y_{t} \;\;\; \text{and} \;\;\; \boldsymbol{S}_{zx}=n^{-1}\sum^n_{t=1}\boldsymbol{z}_t x_{t}$$
We solve for
$$\hat{\boldsymbol{\beta}}=\boldsymbol{S}^{-1}_{zx}\boldsymbol{S}_{xy}$$
$$\hat{\boldsymbol{\beta}}=\boldsymbol{S}^{-1}_{zx}\boldsymbol{S}_{xy}$$
**Important**: GMM is not magical solution to endogeneity or misspecification. GMM is only and inference technique, such as least squares or maximum likelihood. Rather, is the estimator specification of some authors that tries to address Nickell bias from the lagged panel models.
- **Anderson-Hsiao estimator**: uses the second and third lagged levels as instruments.
- **Arellano-Bond Difference GMM**: uses $\Delta y$ as the outcome and all available lagged levels as instruments in each period.
- **Arellano-Bover/Blundell-Bond System GMM**: adds the available lagged differences as instruments.
## Dynamic Panel Model: Verify Key Assumptions
When using panel GMM, we need to verify some modeling assumptions:
- The number or the weakness of instruments
- Sargan test: if $\mathbb{E}[\hat{\varepsilon} Z] = 0$, then the combination of instruments is strong enough (over-identifying restrictions).
- If we don't reject the null (p > 0.05 or 0.1), then the instruments are strong enough.
- Serial correlation of error terms – should be AR(1) only, otherwise, for example, the lag 2 term becomes a weak instrument.
- Arellano-Bond's test for serial correlation
- Difference or system GMM (whether to use lagged differences as IV)
- Use of year fixed effects
- Wald Chi-squared test on year FE dummies: to see whether the inclusion of FE dummies is statistically significant
- Use of linear or log-log specification $\rightarrow$ interpretation of results will be different
- linear: one unit increase in $x$ will induce $\beta$ increase in $y$
- log-log: 1% increase in $x$ will induce $100\times\beta$% increase in $y$. (elasticity)
# Appplication: How Does Tax Affect Cigarrete Consumption?
## Cigarette Consumption Data
|Variable|Description^[See [https://vincentarelbundock.github.io/Rdatasets/doc/Ecdat/Cigarette.html](https://vincentarelbundock.github.io/Rdatasets/doc/Ecdat/Cigarette.html).]|
|---|---|
|`cpi`|consumer price index|
|`pop`|state population|
|**`packpc`**|number of packs per capita|
|`income95`|state personal income (total)|
|**`income95pc`**|state personal income (per capita)|
|`tax95`|average state, federal, and average local excise taxes for fiscal year|
|**`taxs95`**|average excise taxes for fiscal year, including sales taxes|
|**`avgprs95`**|average price during fiscal year, including sales taxes (cents)|
|**`pretax95`**|`pretax95` = `avgprs95` - `taxs95`|
```{r}
# Load cigarette consumption data (Jonathan Gruber, MIT)
# Variables (see codebook):
# state year cpi pop packpc income tax avgprs taxs
data <- read.csv("data/cigarette.csv")
colnames(data)
data$state %>% unique() %>% length() # N = 48
data$year %>% unique() %>% length() # T = 11
# Quick inflation adjustment to 1995 dollars
inflAdjust <- function(x,cpi,year,target) {
unique(cpi[year==target])*x/cpi
#Multiply x with cpi in target year then divide by cpi in observed year
}
data <-
data %>%
mutate(
# Make adjustments to state personal income
income95 = inflAdjust(income, cpi, year, 1995),
# Average state, federal, and average local excise taxes
tax95 = inflAdjust(tax, cpi, year, 1995),
# Average price, including sales taxes
avgprs95 = inflAdjust(avgprs, cpi, year, 1995),
# Average excise taxes, including sales taxes
taxs95 = inflAdjust(taxs, cpi, year, 1995),
# Create per capita income (in k)
income95pc = income95/pop,
# Create pretax price, 1995 dollars
pretax95 = avgprs95 - taxs95
)
attach(data)
```
## Examining the time series
```{r}
# Look at the time series
data %>%
ggplot(aes(x = year,
y = packpc,
group = state,
color = state)) +
geom_line() +
scale_x_continuous(breaks = seq(1985, 1995))
# Look at the ACF and PACF
acf_data <-
split(data$packpc, data$state) %>%
map(function(x) acf(x, plot = FALSE)$acf) %>%
map(as.vector) %>%
bind_cols() %>%
mutate(lag = 0:10) %>%
pivot_longer(!lag, names_to = "state", values_to = "acf") %>%
arrange(state, lag)
acf_data %>%
ggplot(aes(x = lag, y = acf)) +
geom_bar(stat = "identity", width = 0.1) +
geom_hline(yintercept = 0.2, color="red",
linetype="dashed", linewidth=0.2) +
geom_hline(yintercept = -0.2, color="red",
linetype="dashed", linewidth=0.2) +
facet_wrap(state~.) +
labs(title = "ACF of US states")
pacf_data <-
split(data$packpc, data$state) %>%
map(function(x) pacf(x, plot = FALSE)$acf) %>%
map(as.vector) %>%
bind_cols() %>%
mutate(lag = 1:10) %>%
pivot_longer(!lag, names_to = "state", values_to = "pacf") %>%
arrange(state, lag)
pacf_data %>%
ggplot(aes(x = lag, y = pacf)) +
geom_bar(stat = "identity", width = 0.1) +
geom_hline(yintercept = 0.2, color="red",
linetype="dashed", linewidth=0.2) +
geom_hline(yintercept = -0.2, color="red",
linetype="dashed", linewidth=0.2) +
facet_wrap(state~.) +
labs(title = "PACF of US states")
# Are the series stationary?
pseries <- split(data$packpc, data$state)
pp_pval <- sapply(pseries, function(x){pp.test(x)[["p.value"]]})
adf_pval <- sapply(pseries, function(x){adf.test(x)[["p.value"]]})
hist(pp_pval, breaks = 10)
hist(adf_pval, breaks = 10)
```
## Fixed Effect Model
```{r}
# Check for time invariant variables:
pvar(data, index = c("state", "year"))
# "within" option tells plm to do fixed effects
# Note that if you want to add year fixed effects then set effect="time" and for both state
# and year fixed effects set effect effect="twoway"
fe_res <- plm(packpc ~ income95pc + pretax95 + taxs95, data = data, model="within", effect="twoway")
coeftest(fe_res, vcov. = vcovHC, method = "arellano")
# Some tests for serial correlation of errors (needed because we have a linear regression
# with lags of the dependent variable on the RHS
# the standard LM test (note we could specify order)
pbgtest(fe_res)
```
## Ramdom Effect Model
```{r}
# Estimate a random effects AR(I)MA(p,q) model using lme (Restricted ML)
re_res <- lme(# A formula object including the response,
# the fixed covariates, and any grouping variables
fixed = packpc ~ income95pc + pretax95 + taxs95,
# i.e. response variable and explanatory variables
# The random effects component
random = ~ 1 | state,
# 1 indicates the intercept and state indicates the grouping
# The TS dynamics: specify the time & group variables,
# and the order of the ARMA(p,q) process
correlation = corARMA(form = ~ year | state,
p = 1, # AR(p) order
q = 0 # MA(q) order
)
)
summary(re_res)
```
## Dynamic Panel Model
First, you will need to create an object data with class `pdata.frame`.
```{r}
# Panel based diagnostics available in the plm library
# (This package recently expanded to contain many many panel data tests
# for serial correlation, fixed effects, and unit roots)
# First, create a plm data frame (special data frame that "knows" the
# unit variable and time variable
pdata <- pdata.frame(data, index=c("state", "year"))
head(pdata)
# Do an panel unit root test on the undifferenced cigarette data;
# there are many options; see ?purtest
# Note: this purtest() function isn't working due to unbalanced panel
#purtest(packpc~1, data=pdata, test="ips")
```
To estimate a dynamic panel model, we use the function `plm::pgmm`. To estimate Arellano-Bond with GMM for fixed effects with lagged DV, pgmm needs formulas in a specific format:
1. in the first part of the RHS, include lags of DV and covariates, as shown below.
2. in the second part, include the panel data instruments (99 here means use up to the 99th lag of the difference as an instrument).
3. in an optional (not shown) third part of the RHS, include any other instruments.
```{r}
# Estimate Arellano-Bond GMM for fixed effects with lagged DV
# note that pgmm formulas construct lag() properly for pdata.frame,
# if the data is not pdata.frame, lag() will create a wrong lag structure
formula_linear <-
packpc ~
lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99)
# if you don't want to include too many instruments, you can just set the second part as `lag(packpc, 2:...)`
pgmm_1a <- pgmm(
formula_linear,
data = pdata,
effect = "individual", # should consider two-way (default) for small T; can also set `individual` FE
transformation = "d" # should do `ld` if T=3, `d` for difference GMM and `ld` for system GMM; default is `d` (difference GMM)
)
```
### Dynamic Panel Model: Investigate AR(1)/AR(2) Autocorrelation and Instrument Weakness
Sargan test has a null of the instruments as a group being exogenous- higher p is good. The residuals of the differences equations should exhibit AR(1) but not AR(2) behavior higher p for AR(2) test is good
```{r}
summary(pgmm_1a)
# Good Sargan test, Good AR(2) test
```
Some examples of Mis-specified models.
```{r}
# only include lag 2 and lag 3 terms
pgmm(
packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:3),
data = pdata, effect = "individual", transformation = "d"
) %>% summary()
# only include lag 2-5 terms
pgmm(
packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:5),
data = pdata, effect = "individual", transformation = "d"
) %>% summary()
```
Let's try different model specifications:
- model 1a: linear + difference GMM + individual FE
- model 1b: linear + system GMM + individual FE
- model 1c: linear + difference GMM + two-way FE
- model 1b: linear + system GMM + two-way FE
```{r}
# model 1a: linear + difference GMM + individual FE
summary(pgmm_1a)
# model 1b: linear + system GMM + individual FE
pgmm_1b <- pgmm(
packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99),
data = pdata, effect = "individual", transformation = "ld"
)
summary(pgmm_1b)
# model 1c: linear + difference GMM + two-way FE
pgmm_1c <- pgmm(
packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99),
data = pdata, effect = "twoways", transformation = "d"
)
summary(pgmm_1c)
# model 1d: linear + system GMM + two-way FE
pgmm_1d <- pgmm(
packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99),
data = pdata, effect = "twoways", transformation = "ld"
)
summary(pgmm_1d)
stargazer(pgmm_1a,pgmm_1b,pgmm_1c,pgmm_1d,
type="text")
```
Now, we will fit the log-log version:
- model 2a: log-log + difference GMM + individual FE
- model 2b: log-log + system GMM + individual FE
- model 2c: log-log + difference GMM + two-way FE
- model 2b: log-log + system GMM + two-way FE
```{r}
formula_loglog <-
log(packpc) ~ lag(log(packpc), 1) + log(income95pc) + log(avgprs95) | lag(log(packpc), 2:99)
# model 2a: log-log + difference GMM + individual FE
pgmm_2a <- pgmm(formula_loglog, data = pdata, effect = "individual", transformation = "d")
summary(pgmm_2a)
# model 2b: log-log + system GMM + individual FE
pgmm_2b <- pgmm(formula_loglog, data = pdata, effect = "individual", transformation = "ld")
summary(pgmm_2b)
# model 2c: log-log + difference GMM + two-way FE
pgmm_2c <- pgmm(formula_loglog, data = pdata, effect = "twoways", transformation = "d")
summary(pgmm_2c)
# model 2d: log-log + system GMM + two-way FE
pgmm_2d <- pgmm(formula_loglog, data = pdata, effect = "twoways", transformation = "ld")
summary(pgmm_2d)
```
### Simulate Conditional Forecasting
- Unfortunately, `plm` package does not have `predict` method for `plm()` and `pgmm()` functions. So we have to manually construct predictions from new data using `simcf` package (literally the name of package means "simulate counterfactual").
- To simulate counterfactual outcomes, we need the following things:
- Simulated parameters
- Hypothetical Scenarios
- Lagged outcome $y_{t-1}$
- Initial outcome $y_0$ (average `packpc` in 1995)
- Simulations with and without year FE dummies are a bit different. Here we take `pgmm_1a` and `pgmm_1c` as two examples.
When deciding what could be a plausible scenario, first look at the sample statistics.
```{r}
# How big a change in price to simulate?
# How about "double" the average tax in the most recent year?
summary(pdata$taxs95[pdata$year==1995])
# The average (and median) tax is about 60 cents/pack
sd(pdata$taxs95[pdata$year==1995])
```
If the mean is around 61, and the standard deviation is 18.47, a +60 cents increase would also be about 3 sd's, and raise the tax to a bit more than the max observed. Other possibilities:
- A standard deviation increase.
- The interquartile change: an increase from the first quartile (48.75) to the third one (74.78).
- Raise every state to the max observed for any state in 1995 (112.60 cents)
Now, let's create a matrix with our year dummies. Here we are using he function `simcf::makeFEdummies`. Once we have created this matrix, we `cbind` it with the data.
```{r}
# Construct the year dummies
# it is equivalent to `model.matrix(~ -1 + year, data = pdata)`
yearfe <- makeFEdummies(pdata$year)
# Why drop first 2 col's? (the first 2 years are missing due to instrumental variables y_{t-2})
yearfe <- yearfe[, 3:ncol(yearfe)]
# List all the years
yearlist <- unique(pdata$year)
# List the years and exclude the first two
yearlist <- yearlist[3:length(yearlist)]
# Create names for the year dummies
colnames(yearfe) <- paste0("y", yearlist)
head(yearfe)
# concatenate year FE dummies to the pdata.frame
datayearfe <- cbind(pdata, yearfe)
head(datayearfe)
```
Now, we specify the formulas of each model, and save the specificaiton in an object.
```{r}
# Construct formulas -- linear + difference GMM + individual FE (without year FE dummies)
formula_1a <- packpc ~ income95pc + avgprs95 - 1 #with Income and Price as covariates
# Construct formulas -- linear + system GMM + individual FE (without year dummies but with intercept)
formula_1b <- packpc ~ income95pc + avgprs95
# Construct formulas -- linear + difference GMM + two-way FE (with year FE dummies)
formula_1c <- update(
formula_1a,
paste(c("~ .", colnames(yearfe)), collapse = "+") # add year FE dummies to the formula
)
formula_1c
# Construct formulas -- linear + system GMM + two-way FE (with year dummies and with intercept)
formula_1d <- update(formula_1b, paste(c("~ .", colnames(yearfe)), collapse = "+"))
formula_1d
```
Simulation of parameters.
```{r}
# Number of simulations
sims <- 1000
# Recall model 1a:Difference GMM with state fixed effects
#packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99)
# Simulate parameters from an multivariate normal distribution
simparam_1a <- mvrnorm(sims, coefficients(pgmm_1a), vcovHC(pgmm_1a))
simphi_1a <- simparam_1a[, 1] # Extract the simulated phis
head(simphi_1a)
simbeta_1a <- simparam_1a[, 2:ncol(simparam_1a)] # Extract the simulated betas
head(simbeta_1a)
```
Now, create matrix/data with hypothetical predictors. In this example:
- Assume an average state raised taxes 60 cents starting 1996
Moreover, `pgmm` uses covariates in differenced form. Therefore, in your *xhyp* data object with counterfactuals, we must set all the covariates to 0 (no change) in the matrix `x` (which refers to the present). Exceptions:
1. changes in covariates of interest.
2. time dummies aren't differenced
At this moment, we can ignore the state fixed effects for now and add them later because model is linear.
```{r}
# Forecast for 3 years from 1996 to 1998
periods.out <- 3
# Make matrix of hypothetical x's: covariates
xhyp_1a <- cfMake(formula=formula_1a,
data=datayearfe,
nscen=periods.out)
#With mean packpc, income, and price for the forecast period
# make every element in 'x' matrix be zero
xhyp_1a$x <- xhyp_1a$xpre <- 0 * xhyp_1a$x
# change the value in the predictor of interest (again, 'x' matrix)
xhyp_1a <- cfChange(xhyp_1a,
covname="avgprs95",
x=60, # only change avgprs to 60 (cents)
scen=1)
# Create baseline scenario
xbase_1a <- xhyp_1a
xbase_1a$x <- xbase_1a$xpre
# We need a lag of the price per pack
lagY_1a <- NULL
# Hypothetical previous change in Y for simulation
for (i in 1:length(pgmm_1a$model)){ #For 1 to 48
lagY_1a <- c(lagY_1a, as.data.frame(pgmm_1a$model[[i]])["1995",]$packpc)
}
#Hypothetical change in packpc for each state in 1995
#Find the mean of these hypothetical previous changes
lagY_1a <- mean(lagY_1a, na.rm=TRUE)
# Hypothetical initial level of Y for simulation
initialY <- mean(pdata$packpc[pdata$year==1995], na.rm=TRUE) #The mean of packpc in 1995
```
Now that we have everything that we need for the simulation, we can use `simcf::ldvsimev`, `simcf::ldvsimfd` and `simcf::ldvsimrr` to estimate the quantities of interest.
```{r}
# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_ev1a <- ldvsimev(xhyp_1a, # The matrix of hypothetical x's
simbeta_1a, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # NA indicates no constant!
phi=simphi_1a, # estimated AR parameters; length must match lagY
lagY=lagY_1a, # lags of y, most recent last
transform="diff", # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
initialY=initialY # for differenced models, the lag of the level of y
)
# Simulate expected values of Y given no change in covariates
sim_base1a <- ldvsimev(xbase_1a, # The matrix of hypothetical x's
simbeta_1a, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # NA indicates no constant!
phi=simphi_1a, # estimated AR parameters; length must match lagY
lagY=lagY_1a, # lags of y, most recent last
transform="diff", # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
initialY=initialY # for differenced models, the lag of the level of y
)
# Simulate first differences in y
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_fd1a <- ldvsimfd(xhyp_1a, # The matrix of hypothetical x's
simbeta_1a, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # Column containing the constant
# set to NA for no constant
phi=simphi_1a, # estimated AR parameters; length must match lagY
lagY=lagY_1a, # lags of y, most recent last
transform="diff" # Model is differenced
#initialY=initialY # Redundant in this case (fd of linear differenced Y)
)
# Simulate relative risks in y
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_rr1a <- ldvsimrr(xhyp_1a, # The matrix of hypothetical x's
simbeta_1a, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # Column containing the constant
# set to NA for no constant
phi=simphi_1a, # estimated AR parameters; length must match lagY
lagY=lagY_1a, # lags of y, most recent last
transform="diff", # Model is differenced
initialY=initialY # Required for differenced Y in ldvsimrr
)
```
Now we are going to plot the simulations using Chris's `tile` package and the custom function `cigLineplots`. The latter is already saved in the **TS_code.R** script, in the source folder.
```{r}
# Make plots of expected values, first differences, and percent changes
# using custom tile code in helperCigs.R
# Hypothetical initial level of Y for simulation
initialY <- mean(pdata$packpc[pdata$year==1995], na.rm=TRUE)
# axis limits
limitsEV <- c(50, 105)
limitsFD <- c(-40, 5)
limitsRR <- c(-40, 5)
# Model 1a Lineplot: Packs
cigLineplots(sim_ev1a, sim_base1a, sim_fd1a, sim_rr1a,
limitsEV, limitsFD, limitsRR, initialY,
main="Cigarette Taxes & Consumption: 1a. Linear Difference GMM, Individual Effects",
file=NULL)
# Save pdf output
cigLineplots(sim_ev1a, sim_base1a, sim_fd1a, sim_rr1a,
limitsEV, limitsFD, limitsRR, initialY,
main="Cigarette Taxes & Consumption: 1a. Linear Difference GMM, Individual Effects",
file="output/m1aEVFDRR")
```
### What if we include year FE dummies?
```{r}
# Recall model 1g: packpc ~ lag(packpc, 1) + income95pc + avgprs95 | lag(packpc, 2:99)
# Difference GMM with state and year fixed effects
# Simulate parameters
simparam_1c <- mvrnorm(sims, coefficients(pgmm_1c), vcovHC(pgmm_1c)) #Sample parameters
simphi_1c <- simparam_1c[,1, drop = FALSE] #Extract the phis
head(simphi_1c)
simbeta_1c <- simparam_1c[,2:ncol(simparam_1c)] #Extract the betas
head(simbeta_1c)
# Make matrix of hypothetical x's:
# Assume an average state raised taxes 60 cents starting 1996
#
# Issues -- we need to somehow include the state and year FEs:
# Let's set the state to be an "average" state in 1995,
# and year to be like the last year (1995)
# Make matrix of hypothetical x's: covariates
xhyp_1c <- cfMake(formula=formula_1c,
data=datayearfe,
nscen=periods.out) #Including the year fixed effects
# pgmm uses covariates in differenced form
# so we want most of them to be 0 (no change)
# exceptions:
# (1) changes in covariates of interest
# (2) differenced time dummies require special care
xhyp_1c$x <- xhyp_1c$xpre <- 0*xhyp_1c$x
xhyp_1c <- cfChange(xhyp_1c, "avgprs95", x=60, scen=1) #Assume tax is raised 60 cents in 1996
xhyp_1c
# We can "ignore" the state fixed effects for now and add them later
# because model is total linear
# Create baseline scenario
xbase_1c <- xhyp_1c
xbase_1c$x <- xbase_1c$xpre
xbase_1c
```
```{r}
# We need a lag of the price per pack
lagY_1c <- NULL # Hypothetical previous change in Y for simulation
for (i in 1:length(pgmm_1c$model)){
lagY_1c <- c(lagY_1c, as.data.frame(pgmm_1c$model[[i]])["1995",]$packpc)
}
#Store change in packpc 1995 for each state
lagY_1c <- mean(lagY_1c, na.rm=TRUE)
#Find the mean for all packpc changes in 1995
# Hypothetical initial level of Y for simulation
pdata$packpc[pdata$year==1995]
initialY <- mean(pdata$packpc[pdata$year==1995], na.rm=TRUE)
#Set the initial mean value of pack in 1995 across states
# Simulate expected values of Y (on original level scale)
# out to periods.out given hypothetical future values of X,
# initial lags of the change in Y, and an initial level of Y
sim_ev1c <- ldvsimev(xhyp_1c, # The matrix of hypothetical x's
simbeta_1c, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # NA indicates no constant!
phi=simphi_1c, # estimated AR parameters; length must match lagY
lagY=lagY_1c, # lags of y, most recent last
transform="diff", # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
initialY=initialY # for differenced models, the lag of the level of y
)
# Simulate expected values of Y given no change in covariates
sim_base1c <- ldvsimev(xbase_1c, # The matrix of hypothetical x's
simbeta_1c, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # NA indicates no constant!
phi=simphi_1c, # estimated AR parameters; length must match lagY
lagY=lagY_1c, # lags of y, most recent last
transform="diff", # "log" to undo log transformation,
# "diff" to under first differencing
# "difflog" to do both
initialY=initialY # for differenced models, the lag of the level of y
)
# Simulate first differences in y
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_fd1c <- ldvsimfd(xhyp_1c, # The matrix of hypothetical x's
simbeta_1c, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # Column containing the constant
# set to NA for no constant
phi=simphi_1c, # estimated AR parameters; length must match lagY
lagY=lagY_1c, # lags of y, most recent last
transform="diff", # Model is differenced
#initialY=initialY # Redundant in this case (fd of linear differenced Y)
)
# Simulate relative risks in y
# out to periods.out given hypothetical future values of x, xpre,
# and initial lags of the change in y
sim_rr1c <- ldvsimrr(xhyp_1c, # The matrix of hypothetical x's
simbeta_1c, # The matrix of simulated betas
ci=0.95, # Desired confidence interval
constant=NA, # Column containing the constant
# set to NA for no constant
phi=simphi_1c, # estimated AR parameters; length must match lagY
lagY=lagY_1c, # lags of y, most recent last
transform="diff", # Model is differenced
initialY=initialY # Required for differenced Y in ldvsimrr
)
# Model xhyp_1c Lineplot: Packs
cigLineplots(sim_ev1c, sim_base1c, sim_fd1c, sim_rr1c,
limitsEV, limitsFD, limitsRR, initialY,
main="Cigarette Taxes & Consumption: 1c. Linear Difference GMM, Two-way Effects",
file=NULL
)
# save output
cigLineplots(sim_ev1c, sim_base1c, sim_fd1c, sim_rr1c,
limitsEV, limitsFD, limitsRR, initialY,
main="Cigarette Taxes & Consumption: 1c. Linear Difference GMM, Two-way Effects",
file="output/m1cEVFDRR"
)
```