Review Problem Set 1

Simulate to understand

If you are uncertain about the appearance of a specific dynamic process, it is advisable to simulate it.

Here, we use Chris’s makeTS() function to simulate various dynamic processes for exploration. This function is integrated within ts() and the custom function plot_time() I have created to directly visualize the time series plot, its autocorrelation function (ACF) and partial autocorrelation function (PACF), along with the Q-statistic test for white noise.

It is important to note that in the Q-test (Box-Pierce):

  • the null hypothesis \(H_0\) posits that the time series adheres to a white noise process.

  • the alternative hypothesis \(H_1\) posits that your time series has autocorrelation.

Therefore, if we reject the null hypotheses in favor to the alternative, \(Q:p-\text{value}\leq 0.10\), then we have evidence that the time series or its residuals still have dynamic processes to model. In contrast, if we do not reject the null hypothesis, \(Q:p-\text{value} > 0.10\), then we have evidence that our time series follows white noise and no dynamics may confound our analyses.

In your simulations, remember to set.seed() to replicate the same simulations (below, I am not setting any seed).

# What does time trend process looks like?

plot_time(ts(makeTS(n=100, trend=0.02)))

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 7.6741, df = 1, p-value = 0.005602
# What does an AR process looks like?

plot_time(ts(makeTS(n=100, ar=0.6)))

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 50.122, df = 1, p-value = 1.445e-12
# What does an MA process looks like?

plot_time(ts(makeTS(n=100, ma=0.8)))

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 30.442, df = 1, p-value = 3.44e-08
# What about additive seasonality?

plot_time(ts(makeTS(n=100,
             seasons=c(0,4,0, 
                       0,0,6,
                       0,0,0,
                       -3,0,0), 
             adjust="level")))

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 0.13482, df = 1, p-value = 0.7135
# What does an ARMA process looks like?

plot_time(ts(makeTS(n=100, ar=0.6, ma=0.8)))

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 58.217, df = 1, p-value = 2.343e-14

Problem Set 1

For the initial twelve time series (A-L), each time series’ data-generating process consists of at most one dynamic component. Hence, you can initially infer the process by visually inspecting the plot, autocorrelation function (ACF), and partial autocorrelation function (PACF).

If you are unsure of what is going on, you can employ the Box-Jenkins diagnostics method to systematically identify and isolate the dynamic components until the residual series exhibit characteristics of white noise. Remember:

1- Visualize the time series, using plot() or decompose().

2- If deterministic trend or seasonal additive seems to be present, you can verify it fitting time() or decompose()$seasonal in lm(), and check the residuals.

3- If neither trend or additive seasonality seems to be present, check for AR or MA processes, but be sure to first adjust for trends and/or seasonality or otherwise may be difficult to identify what’s going on.

4- If you still unsure, narrow down two or three potential candidates, and estimate them using arima(). Select those models that provide the lowest AIC

5- If in any of the above steps, the Q-statistic in Box.test() provides evidence of white noise, then very likely you have found the best candidate model of the underlying DGP.

6- If you have more than one final candidate, then use the most parsimonious specification. That is, the specification that estimates fewer parameters but returns white noise.

# load data

df <- read_csv(file="data/mysterytsUW.csv")


# time series: C

df_1c <- df %>% select(c) %>% ts(frequency = 12)

plot_time(df_1c)

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 11.868, df = 1, p-value = 0.0005712
# time series: D

df_1d <- df %>% select(d) %>% ts(frequency = 12)

plot_time(df_1d)

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 25.41, df = 1, p-value = 4.634e-07
# time series: G

df_1g <- df %>% select(g) %>% ts(frequency = 12)

plot_time(df_1g)

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 59.936, df = 1, p-value = 9.77e-15
# time series: H

df_1h <- df %>% select(h) %>% ts(frequency = 12)

plot_time(df_1h)

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 21.993, df = 1, p-value = 2.736e-06
# time series: I

df_1i <- df %>% select(i) %>% ts(frequency = 12)

plot_time(df_1i)

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 3.7496, df = 1, p-value = 0.05282
# in this last one, the Q-test is spurious. Look at the visualizations.

The six remaining time series M-R could have up to two components. These were more difficult to guess with only eyeballing.

# time series: Q

df_1q <- df %>% select(q) %>% ts(frequency = 12)

plot_time(df_1q)

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 10.661, df = 1, p-value = 0.001094
# could it be AR(2)? or MA(2)? or ARMA(1,1)?

res1 <- arima(df_1q, order = c(2,0,0))
res2 <- arima(df_1q, order = c(0,0,2))
res3 <- arima(df_1q, order = c(1,0,1))

stargazer(res1,res2,res3,
          type = "text")
## 
## ===============================================
##                        Dependent variable:     
##                   -----------------------------
##                               df_1q            
##                      (1)       (2)       (3)   
## -----------------------------------------------
## ar1               0.257***            0.763*** 
##                    (0.097)             (0.127) 
##                                                
## ar2               0.294***                     
##                    (0.099)                     
##                                                
## ma1                           0.165   -0.459***
##                              (0.103)   (0.163) 
##                                                
## ma2                         0.352***           
##                              (0.100)           
##                                                
## intercept           0.211     0.222     0.213  
##                    (0.210)   (0.146)   (0.217) 
##                                                
## -----------------------------------------------
## Observations         100       100       100   
## Log Likelihood    -137.752  -139.042  -139.119 
## sigma2              0.918     0.942     0.944  
## Akaike Inf. Crit.  283.505   286.083   286.238 
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01
plot_time(res1$residuals)

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 0.092565, df = 1, p-value = 0.7609

For the second part, we know that all the series follow an AR(p) process. However, some of these series may be not be stationary.

If you were uncertain, you could test for unit root using the Unit Root Test with the function PP.test(), where the null \(H_0\) states that the time series has unit root.

If the sum of your AR(p) estimates is below 1, then you have evidence of stationary, but if their sum is above 1, then your series are non-stationary.

# load data

df2 <- read_csv(file="data/mysterytsUW2.csv")


# time series: S

df_2s <- df2 %>% select(s) %>% ts(frequency = 12)

plot_time(df_2s)

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 984.31, df = 1, p-value < 2.2e-16
adf.test(df_2s)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  df_2s
## Dickey-Fuller = -1.9746, Lag order = 9, p-value = 0.5891
## alternative hypothesis: stationary
PP.test(df_2s)
## 
##  Phillips-Perron Unit Root Test
## 
## data:  df_2s
## Dickey-Fuller = -2.0696, Truncation lag parameter = 7, p-value = 0.5489
# time series: U

df_2u <- df2 %>% select(u) %>% ts(frequency = 12)

plot_time(df_2u)

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 910.15, df = 1, p-value < 2.2e-16
(res <- arima(df_2u, order = c(2,0,0)))
## 
## Call:
## arima(x = df_2u, order = c(2, 0, 0))
## 
## Coefficients:
##          ar1      ar2  intercept
##       1.1517  -0.2078     0.4623
## s.e.  0.0309   0.0309     0.5734
## 
## sigma^2 estimated as 1.062:  log likelihood = -1450.33,  aic = 2908.66
plot_time(res$residuals)

## 
##  Box-Pierce test
## 
## data:  df
## X-squared = 0.002517, df = 1, p-value = 0.96

Non-Stationary Time Series

For Problem Set 2, the most important techniques are unit root test, model estimation, model selection, and counterfactual forecasting.

Up to this point, our focus has primarily been on autoregressive \(AR(p)\) and moving average \(MA(q)\) processes, as the data-generating processes (DGPs) we’ve encountered have either been stationary or possess unit roots. However, in the case of nonstationary time series, the stationary process may be integrated \(I(d)\). Additionally, when faced with the possibility of multiplicative seasonality, we arrive at a comprehensive definition of dynamic specification with the Autoregressive Integrated Moving Average (ARIMA) model.

\[\text{ARIMA}(p,d,q)(P,D,Q)_m \]

Recall that stationary time series have three properties:

  1. Mean stationarity
    • mean does not depend on \(t\), constant over time
    • variance also does not depend on \(t\), constant over time
  2. Covariance stationarity
    • covariance of \(y_t\) and \(y_{t-1}\) do not depend on \(t\)
    • does not depend on the actual time the covariance is computed
  3. Ergodicity
    • sample moments converge in probability to the population moments
    • sample mean and variance tend to be the same as entire series

Nonstationary procesess lack these properties. Note that we are abstracting from trends and other covariates, which means that the social processes are a black-box in the time dimension.

Why do nonstationary processes matter?

  1. ACF and PACF not defined since the covariances in the nominator depend on \(t\)

  2. Spurious regression: we may detect strong correlations between nonstationary processes although they could be conditionally (mean) independent.

  3. Long run forecasts are unfeasible since the process does not revert to its mean.

How can we treat nonstationary processes?

  1. Analyze nonstationary process using ARIMA (differencing)
    • effective at capturing short run changes.
    • outcome is transformed to a difference.
    • long-run predictions not feasible.
  2. Analyze nonstationary process using cointegration
    • effective at capturing long run relationships between nonstationary processes.
    • outcome is left as a level.
    • short-run and long-run predictions feasible.
    • appropriate for analyzing multiple time series.

We will explore non-stationary modeling using the example discussed in topic 4 on president approvals.

# clean environment
rm(list = ls())
dev.off()
## null device 
##           1
# load source
source("source/TS_code.R")    # Chris's TS custom functions


# load data
approval <-  read_csv(file="data/approval.csv")

attach(approval)
colnames(approval)
## [1] "month"         "year"          "approve"       "disapprove"   
## [5] "unsure"        "sept.oct.2001" "iraq.war"      "avg.price"

As always, the first step is to plot the time series and try to make sense of its underlying DGP.

# plot presidential approval
plot(approve,
     type="l",
     ylab="Percent Approving",
     xlab="Time",
     main = "US Presidential Approval")

lines(x=c(8,8),
      y=c(-1000,1000),
      col="red")

lines(x=c(26,26),
      y=c(-1000,1000),
      col="blue")

text("9/11",
     x = 10, y = 40, 
     col="red",cex=0.7)

text("Iraq \n War",
     x = 28, y = 40, 
     col="blue",cex=0.7)