If you are uncertain about the appearance of a specific dynamic process, it is advisable to simulate it.

Here, we use Chris’s `makeTS()`

function to simulate
various dynamic processes for exploration. This function is integrated
within `ts()`

and the custom function
`plot_time()`

I have created to directly visualize the time
series plot, its **autocorrelation function** (ACF) and
**partial autocorrelation function** (PACF), along with the
Q-statistic test for **white noise**.

It is important to note that in the **Q-test**
(Box-Pierce):

the null hypothesis \(H_0\) posits that the time series adheres to a

**white noise**process.the alternative hypothesis \(H_1\) posits that your time series

**has autocorrelation**.

Therefore, if we reject the null hypotheses in favor to the alternative, \(Q:p-\text{value}\leq 0.10\), then we have evidence that the time series or its residuals still have dynamic processes to model. In contrast, if we do not reject the null hypothesis, \(Q:p-\text{value} > 0.10\), then we have evidence that our time series follows white noise and no dynamics may confound our analyses.

In your simulations, remember to `set.seed()`

to replicate
the same simulations (below, I am not setting any seed).

```
# What does time trend process looks like?
plot_time(ts(makeTS(n=100, trend=0.02)))
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 7.6741, df = 1, p-value = 0.005602
```

```
# What does an AR process looks like?
plot_time(ts(makeTS(n=100, ar=0.6)))
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 50.122, df = 1, p-value = 1.445e-12
```

```
# What does an MA process looks like?
plot_time(ts(makeTS(n=100, ma=0.8)))
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 30.442, df = 1, p-value = 3.44e-08
```

```
# What about additive seasonality?
plot_time(ts(makeTS(n=100,
seasons=c(0,4,0,
0,0,6,
0,0,0,
-3,0,0),
adjust="level")))
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 0.13482, df = 1, p-value = 0.7135
```

```
# What does an ARMA process looks like?
plot_time(ts(makeTS(n=100, ar=0.6, ma=0.8)))
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 58.217, df = 1, p-value = 2.343e-14
```

For the initial twelve time series (A-L), each time series’ data-generating process consists of at most one dynamic component. Hence, you can initially infer the process by visually inspecting the plot, autocorrelation function (ACF), and partial autocorrelation function (PACF).

If you are unsure of what is going on, you can employ the
**Box-Jenkins diagnostics** method to systematically
identify and isolate the dynamic components until the residual series
exhibit characteristics of white noise. Remember:

1- Visualize the time series, using `plot()`

or
`decompose()`

.

2- If deterministic trend or seasonal additive seems to be present,
you can verify it fitting `time()`

or
`decompose()$seasonal`

in `lm()`

, and check the
residuals.

3- If neither trend or additive seasonality seems to be present, check for AR or MA processes, but be sure to first adjust for trends and/or seasonality or otherwise may be difficult to identify what’s going on.

4- If you still unsure, narrow down two or three potential
candidates, and estimate them using `arima()`

. Select those
models that provide the lowest **AIC**

5- If in any of the above steps, the Q-statistic in
`Box.test()`

provides evidence of **white
noise**, then very likely you have found the best candidate model
of the underlying DGP.

6- If you have more than one final candidate, then use the most
**parsimonious** specification. That is, the specification
that estimates fewer parameters but returns white noise.

```
# load data
df <- read_csv(file="data/mysterytsUW.csv")
# time series: C
df_1c <- df %>% select(c) %>% ts(frequency = 12)
plot_time(df_1c)
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 11.868, df = 1, p-value = 0.0005712
```

```
# time series: D
df_1d <- df %>% select(d) %>% ts(frequency = 12)
plot_time(df_1d)
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 25.41, df = 1, p-value = 4.634e-07
```

```
# time series: G
df_1g <- df %>% select(g) %>% ts(frequency = 12)
plot_time(df_1g)
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 59.936, df = 1, p-value = 9.77e-15
```

```
# time series: H
df_1h <- df %>% select(h) %>% ts(frequency = 12)
plot_time(df_1h)
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 21.993, df = 1, p-value = 2.736e-06
```

```
# time series: I
df_1i <- df %>% select(i) %>% ts(frequency = 12)
plot_time(df_1i)
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 3.7496, df = 1, p-value = 0.05282
```

`# in this last one, the Q-test is spurious. Look at the visualizations.`

The six remaining time series M-R could have up to two components. These were more difficult to guess with only eyeballing.

```
# time series: Q
df_1q <- df %>% select(q) %>% ts(frequency = 12)
plot_time(df_1q)
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 10.661, df = 1, p-value = 0.001094
```

```
# could it be AR(2)? or MA(2)? or ARMA(1,1)?
res1 <- arima(df_1q, order = c(2,0,0))
res2 <- arima(df_1q, order = c(0,0,2))
res3 <- arima(df_1q, order = c(1,0,1))
stargazer(res1,res2,res3,
type = "text")
```

```
##
## ===============================================
## Dependent variable:
## -----------------------------
## df_1q
## (1) (2) (3)
## -----------------------------------------------
## ar1 0.257*** 0.763***
## (0.097) (0.127)
##
## ar2 0.294***
## (0.099)
##
## ma1 0.165 -0.459***
## (0.103) (0.163)
##
## ma2 0.352***
## (0.100)
##
## intercept 0.211 0.222 0.213
## (0.210) (0.146) (0.217)
##
## -----------------------------------------------
## Observations 100 100 100
## Log Likelihood -137.752 -139.042 -139.119
## sigma2 0.918 0.942 0.944
## Akaike Inf. Crit. 283.505 286.083 286.238
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
```

`plot_time(res1$residuals)`

```
##
## Box-Pierce test
##
## data: df
## X-squared = 0.092565, df = 1, p-value = 0.7609
```

For the second part, we know that all the series follow an
`AR(p)`

process. However, some of these series may be not be
stationary.

If you were uncertain, you could test for unit root using the Unit
Root Test with the function `PP.test()`

, where the null \(H_0\) states that the time series
**has** unit root.

If the sum of your `AR(p)`

estimates is below 1, then you
have evidence of stationary, but if their sum is above 1, then your
series are non-stationary.

```
# load data
df2 <- read_csv(file="data/mysterytsUW2.csv")
# time series: S
df_2s <- df2 %>% select(s) %>% ts(frequency = 12)
plot_time(df_2s)
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 984.31, df = 1, p-value < 2.2e-16
```

`adf.test(df_2s)`

```
##
## Augmented Dickey-Fuller Test
##
## data: df_2s
## Dickey-Fuller = -1.9746, Lag order = 9, p-value = 0.5891
## alternative hypothesis: stationary
```

`PP.test(df_2s)`

```
##
## Phillips-Perron Unit Root Test
##
## data: df_2s
## Dickey-Fuller = -2.0696, Truncation lag parameter = 7, p-value = 0.5489
```

```
# time series: U
df_2u <- df2 %>% select(u) %>% ts(frequency = 12)
plot_time(df_2u)
```

```
##
## Box-Pierce test
##
## data: df
## X-squared = 910.15, df = 1, p-value < 2.2e-16
```

`(res <- arima(df_2u, order = c(2,0,0)))`

```
##
## Call:
## arima(x = df_2u, order = c(2, 0, 0))
##
## Coefficients:
## ar1 ar2 intercept
## 1.1517 -0.2078 0.4623
## s.e. 0.0309 0.0309 0.5734
##
## sigma^2 estimated as 1.062: log likelihood = -1450.33, aic = 2908.66
```

`plot_time(res$residuals)`

```
##
## Box-Pierce test
##
## data: df
## X-squared = 0.002517, df = 1, p-value = 0.96
```

For Problem Set 2, the most important techniques are unit root test, model estimation, model selection, and counterfactual forecasting.

Up to this point, our focus has primarily been on
**autoregressive** \(AR(p)\) and **moving average**
\(MA(q)\) processes, as the
data-generating processes (DGPs) we’ve encountered have either been
**stationary** or possess **unit roots**.
However, in the case of nonstationary time series, the stationary
process may be integrated \(I(d)\).
Additionally, when faced with the possibility of multiplicative
seasonality, we arrive at a comprehensive definition of dynamic
specification with the *Autoregressive Integrated Moving Average
(ARIMA)* model.

\[\text{ARIMA}(p,d,q)(P,D,Q)_m \]

Recall that stationary time series have three properties:

**Mean stationarity**- mean does not depend on \(t\), constant over time
- variance also does not depend on \(t\), constant over time

**Covariance stationarity**- covariance of \(y_t\) and \(y_{t-1}\) do not depend on \(t\)
- does not depend on the actual time the covariance is computed

**Ergodicity**- sample moments converge in probability to the population moments
- sample mean and variance tend to be the same as entire series

Nonstationary procesess lack these properties. Note that we are abstracting from trends and other covariates, which means that the social processes are a black-box in the time dimension.

Why do nonstationary processes matter?

ACF and PACF

**not defined**since the covariances in the nominator depend on \(t\)**Spurious regression**: we may detect strong correlations between nonstationary processes although they could be conditionally (mean) independent.Long run forecasts are

**unfeasible**since the process does not revert to its mean.

How can we treat nonstationary processes?

- Analyze nonstationary process using
**ARIMA**(differencing)- effective at capturing short run changes.
- outcome is transformed to a difference.
- long-run predictions not feasible.

- Analyze nonstationary process using cointegration
- effective at capturing long run relationships between nonstationary processes.
- outcome is left as a level.
- short-run and long-run predictions feasible.
- appropriate for analyzing multiple time series.

We will explore non-stationary modeling using the example discussed in topic 4 on president approvals.

```
# clean environment
rm(list = ls())
dev.off()
```

```
## null device
## 1
```

```
# load source
source("source/TS_code.R") # Chris's TS custom functions
# load data
approval <- read_csv(file="data/approval.csv")
attach(approval)
colnames(approval)
```

```
## [1] "month" "year" "approve" "disapprove"
## [5] "unsure" "sept.oct.2001" "iraq.war" "avg.price"
```

As always, the first step is to plot the time series and try to make sense of its underlying DGP.

```
# plot presidential approval
plot(approve,
type="l",
ylab="Percent Approving",
xlab="Time",
main = "US Presidential Approval")
lines(x=c(8,8),
y=c(-1000,1000),
col="red")
lines(x=c(26,26),
y=c(-1000,1000),
col="blue")
text("9/11",
x = 10, y = 40,
col="red",cex=0.7)
text("Iraq \n War",
x = 28, y = 40,
col="blue",cex=0.7)
```