Time Series & Linear Regression Review
Tao Lin
April 8, 2022
rstandard()
to extract standardized residual.plot(model, which = 3)
.vif()
in car
package to calculate VIF for each coefficient.geom_errorbar
to visualize that.vcov()
to fetch variance and covariance matrix in the first place.mvrnorm()
to jointly simulate coefficients from models.ggplot2
to plot both point estimates and confidence intervals. In geom_errorbar
, let y = ev
, ymax = upper
and ymin = lower
.mpg
when disp
varies, holding other covariates at their medians.vcov
, and use MASS::mvrnorm
to simulate coefficients 1e4
times. Now we have 1e4
simulated regression lins.disp
varies from 70 to 500.disp
and plot the regression line and confidence interval using geom_ribbon
in ggplot2
.ts()
to coerce vector or matrix into time-series (ts
) object.ts
object includes:
ts
object corresponds to a column in matrix. To see why, we can look at the following two examples:# Monthly unemployment in Maine from January 1996 to August 2006
maine.month.ts <- read.table("Lab2_data/Maine.dat", header = TRUE) |>
ts(start = c(1996, 1), freq = 12)
maine.month.ts # it seems that we transform it to an object with multiple columns
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1996 6.7 6.7 6.4 5.9 5.2 4.8 4.8 4.0 4.2 4.4 5.0 5.0
1997 6.4 6.5 6.3 5.9 4.9 4.8 4.5 4.0 4.1 4.3 4.8 5.0
1998 6.2 5.7 5.6 4.6 4.0 4.2 4.1 3.6 3.7 4.1 4.3 4.0
1999 4.9 5.0 4.6 4.3 3.9 4.0 3.6 3.3 3.1 3.3 3.7 3.7
2000 4.4 4.4 4.1 3.5 3.1 3.0 2.8 2.5 2.6 2.8 3.1 3.0
2001 3.9 4.2 4.0 4.1 3.5 3.5 3.4 3.1 3.4 3.7 4.0 4.0
2002 5.0 4.9 5.0 4.7 4.0 4.2 4.0 3.6 3.7 3.9 4.5 4.6
2003 5.6 5.8 5.6 5.5 4.8 4.9 4.8 4.3 4.5 4.6 4.8 4.7
2004 5.6 5.6 5.5 4.8 4.2 4.3 4.2 3.8 4.0 4.2 4.6 4.6
2005 5.5 5.8 5.5 5.2 4.7 4.6 4.5 4.1 4.4 4.4 4.8 4.6
2006 5.3 5.6 4.9 4.6 4.2 4.4 4.4 3.9
as.matrix(maine.month.ts) |> as.data.frame() # but it is still one column if we transform it back to a data.frame
unemploy
1 6.7
2 6.7
3 6.4
4 5.9
5 5.2
6 4.8
7 4.8
8 4.0
9 4.2
10 4.4
11 5.0
12 5.0
13 6.4
14 6.5
15 6.3
16 5.9
17 4.9
18 4.8
19 4.5
20 4.0
21 4.1
22 4.3
23 4.8
24 5.0
25 6.2
26 5.7
27 5.6
28 4.6
29 4.0
30 4.2
31 4.1
32 3.6
33 3.7
34 4.1
35 4.3
36 4.0
37 4.9
38 5.0
39 4.6
40 4.3
41 3.9
42 4.0
43 3.6
44 3.3
45 3.1
46 3.3
47 3.7
48 3.7
49 4.4
50 4.4
51 4.1
52 3.5
53 3.1
54 3.0
55 2.8
56 2.5
57 2.6
58 2.8
59 3.1
60 3.0
61 3.9
62 4.2
63 4.0
64 4.1
65 3.5
66 3.5
67 3.4
68 3.1
69 3.4
70 3.7
71 4.0
72 4.0
73 5.0
74 4.9
75 5.0
76 4.7
77 4.0
78 4.2
79 4.0
80 3.6
81 3.7
82 3.9
83 4.5
84 4.6
85 5.6
86 5.8
87 5.6
88 5.5
89 4.8
90 4.9
91 4.8
92 4.3
93 4.5
94 4.6
95 4.8
96 4.7
97 5.6
98 5.6
99 5.5
100 4.8
101 4.2
102 4.3
103 4.2
104 3.8
105 4.0
106 4.2
107 4.6
108 4.6
109 5.5
110 5.8
111 5.5
112 5.2
113 4.7
114 4.6
115 4.5
116 4.1
117 4.4
118 4.4
119 4.8
120 4.6
121 5.3
122 5.6
123 4.9
124 4.6
125 4.2
126 4.4
127 4.4
128 3.9
# Electricity (millions of kWh), beer (Ml), and chocolate production (tonnes) in Australia from January 1958 to December 1990
cbe.ts <- read.table("Lab2_data/cbe.dat", header = TRUE) |>
ts(start = 1958, freq = 12)
cbe.ts # this example of multiple time series shows that each column is a single time series
choc beer elec
Jan 1958 1451 96.3 1497
Feb 1958 2037 84.4 1463
Mar 1958 2477 91.2 1648
Apr 1958 2785 81.9 1595
May 1958 2994 80.5 1777
Jun 1958 2681 70.4 1824
Jul 1958 3098 74.8 1994
Aug 1958 2708 75.9 1835
Sep 1958 2517 86.3 1787
Oct 1958 2445 98.7 1699
Nov 1958 2087 100.9 1633
Dec 1958 1801 113.8 1645
Jan 1959 1216 89.8 1597
Feb 1959 2173 84.4 1577
Mar 1959 2286 87.2 1709
Apr 1959 3121 85.6 1756
May 1959 3458 72.0 1936
Jun 1959 3511 69.2 2052
Jul 1959 3524 77.5 2105
Aug 1959 2767 78.1 2016
Sep 1959 2744 94.3 1914
Oct 1959 2603 97.7 1925
Nov 1959 2527 100.2 1824
Dec 1959 1846 116.4 1765
Jan 1960 1066 97.1 1721
Feb 1960 2327 93.0 1752
Mar 1960 3066 96.0 1914
Apr 1960 3048 80.5 1857
May 1960 3806 76.1 2159
Jun 1960 4042 69.9 2195
Jul 1960 3583 73.6 2287
Aug 1960 3438 92.6 2276
Sep 1960 2957 94.2 2096
Oct 1960 2885 93.5 2055
Nov 1960 2744 108.5 2004
Dec 1960 1837 109.4 1924
Jan 1961 1447 105.1 1851
Feb 1961 2504 92.5 1839
Mar 1961 3248 97.1 2019
Apr 1961 3098 81.4 1937
May 1961 4318 79.1 2270
Jun 1961 3561 72.1 2251
Jul 1961 3316 78.7 2382
Aug 1961 3379 87.1 2364
Sep 1961 2717 91.4 2129
Oct 1961 2354 109.9 2110
Nov 1961 2445 116.3 2072
Dec 1961 1542 113.0 1980
Jan 1962 1606 100.0 1995
Feb 1962 2590 84.8 1932
Mar 1962 3588 94.3 2171
Apr 1962 3202 87.1 2162
May 1962 4704 90.3 2489
Jun 1962 4005 72.4 2424
Jul 1962 3810 84.9 2641
Aug 1962 3488 92.7 2630
Sep 1962 2781 92.2 2324
Oct 1962 2944 114.9 2412
Nov 1962 2817 112.5 2284
Dec 1962 1960 118.3 2186
Jan 1963 1937 106.0 2184
Feb 1963 2903 91.2 2144
Mar 1963 3357 96.6 2379
Apr 1963 3552 96.3 2383
May 1963 4581 88.2 2717
Jun 1963 3905 70.2 2774
Jul 1963 4581 86.5 3051
Aug 1963 4037 88.2 2891
Sep 1963 3345 102.8 2613
Oct 1963 3175 119.1 2600
Nov 1963 2808 119.2 2493
Dec 1963 2050 125.1 2410
Jan 1964 1719 106.1 2390
Feb 1964 3143 102.1 2463
Mar 1964 3756 105.2 2616
Apr 1964 4776 101.0 2734
May 1964 4540 84.3 2970
Jun 1964 4309 87.5 3125
Jul 1964 4563 92.7 3342
Aug 1964 3506 94.4 3207
Sep 1964 3665 113.0 2964
Oct 1964 3361 113.9 2919
Nov 1964 3094 122.9 2764
Dec 1964 2440 132.7 2732
Jan 1965 1633 106.9 2622
Feb 1965 2935 96.6 2698
Mar 1965 4159 127.3 2950
Apr 1965 4159 98.2 2895
May 1965 4894 100.2 3200
Jun 1965 4921 89.4 3408
Jul 1965 4577 95.3 3679
Aug 1965 4155 104.2 3473
Sep 1965 3851 106.4 3154
Oct 1965 3429 116.2 3107
Nov 1965 3370 135.9 3052
Dec 1965 2726 134.0 2918
Jan 1966 1674 104.6 2786
Feb 1966 3257 107.1 2739
Mar 1966 4731 123.5 3125
Apr 1966 4481 98.8 3033
May 1966 5443 98.6 3486
Jun 1966 5566 90.6 3661
Jul 1966 5044 89.1 3927
Aug 1966 4781 105.2 3851
Sep 1966 4014 114.0 3456
Oct 1966 3561 122.1 3390
Nov 1966 3801 138.0 3280
Dec 1966 2685 142.2 3166
Jan 1967 1805 116.4 3080
Feb 1967 3756 112.6 3069
Mar 1967 4227 123.8 3340
Apr 1967 4595 103.6 3310
May 1967 5702 113.9 3798
Jun 1967 4681 98.6 3883
Jul 1967 4395 95.0 4191
Aug 1967 4459 116.0 4213
Sep 1967 4191 113.9 3766
Oct 1967 3742 127.5 3628
Nov 1967 3279 131.4 3520
Dec 1967 2468 145.9 3322
Jan 1968 1742 131.5 3250
Feb 1968 3366 131.0 3287
Mar 1968 3633 130.5 3552
Apr 1968 3701 118.9 3440
May 1968 4926 114.3 4153
Jun 1968 4522 85.7 4265
Jul 1968 5275 104.6 4655
Aug 1968 4717 105.1 4492
Sep 1968 4114 117.3 4051
Oct 1968 3851 142.5 3967
Nov 1968 3493 140.0 3807
Dec 1968 2654 159.8 3639
Jan 1969 2168 131.2 3647
Feb 1969 3561 125.4 3560
Mar 1969 4305 126.5 3929
Apr 1969 4413 119.4 3858
May 1969 5307 113.5 4485
Jun 1969 5361 98.7 4697
Jul 1969 4948 114.5 4977
Aug 1969 4472 113.8 4675
Sep 1969 3846 133.1 4596
Oct 1969 3715 143.4 4491
Nov 1969 3343 137.3 4127
Dec 1969 2939 165.2 4144
Jan 1970 1615 126.9 4014
Feb 1970 3497 124.0 3994
Mar 1970 3915 135.7 4320
Apr 1970 4858 130.0 4400
May 1970 4962 109.4 5002
Jun 1970 4504 117.8 5091
Jul 1970 4767 120.3 5471
Aug 1970 4291 121.0 5193
Sep 1970 4091 132.3 4997
Oct 1970 4164 142.9 4737
Nov 1970 3915 147.4 4546
Dec 1970 3130 175.9 4498
Jan 1971 1696 132.6 4350
Feb 1971 3887 123.7 4206
Mar 1971 4749 153.3 4743
Apr 1971 4781 134.0 4582
May 1971 5089 119.6 5191
Jun 1971 5484 116.2 5457
Jul 1971 5072 118.6 5891
Aug 1971 4611 130.7 5618
Sep 1971 4117 129.3 5158
Oct 1971 3910 144.4 5030
Nov 1971 4252 163.2 4800
Dec 1971 3624 179.4 4654
Jan 1972 1678 128.1 4453
Feb 1972 3851 138.4 4440
Mar 1972 5021 152.7 4945
Apr 1972 4581 120.0 4788
May 1972 6195 140.5 5425
Jun 1972 5338 116.2 5706
Jul 1972 4909 121.4 6061
Aug 1972 4640 127.8 5846
Sep 1972 3706 143.6 5242
Oct 1972 4113 157.6 5408
Nov 1972 3879 166.2 5114
Dec 1972 3411 182.3 5042
Jan 1973 2043 153.1 5008
Feb 1973 3736 147.6 4657
Mar 1973 4490 157.7 5359
Apr 1973 3715 137.2 5193
May 1973 5623 151.5 5891
Jun 1973 4671 98.7 5980
Jul 1973 5591 145.8 6390
Aug 1973 5461 151.7 6366
Sep 1973 4795 129.4 5756
Oct 1973 4846 174.1 5640
Nov 1973 4843 197.0 5429
Dec 1973 3278 193.9 5398
Jan 1974 2411 164.1 5413
Feb 1974 4278 142.8 5141
Mar 1974 4639 157.9 5695
Apr 1974 4559 159.2 5554
May 1974 6404 162.2 6369
Jun 1974 4851 123.1 6592
Jul 1974 6480 130.0 7107
Aug 1974 6394 150.1 6917
Sep 1974 5752 169.4 6353
Oct 1974 4718 179.7 6205
Nov 1974 4659 182.1 5830
Dec 1974 3842 194.3 5646
Jan 1975 2873 161.4 5379
Feb 1975 5556 169.4 5489
Mar 1975 5389 168.8 5824
Apr 1975 6135 158.1 5907
May 1975 6707 158.5 6482
Jun 1975 5220 135.3 6795
Jul 1975 6249 149.3 7028
Aug 1975 5281 143.4 6776
Sep 1975 4192 142.2 6274
Oct 1975 4867 188.4 6362
Nov 1975 3752 166.2 5940
Dec 1975 3492 199.2 5958
Jan 1976 1979 182.7 5769
Feb 1976 4584 145.2 5887
Mar 1976 5139 182.1 6367
Apr 1976 5044 158.7 6165
May 1976 5501 141.6 6868
Jun 1976 5044 132.6 7201
Jul 1976 5035 139.6 7601
Aug 1976 5167 147.0 7581
Sep 1976 4650 166.6 7090
Oct 1976 5298 157.0 6841
Nov 1976 4373 180.4 6408
Dec 1976 3941 210.2 6435
Jan 1977 2334 159.8 6176
Feb 1977 4381 157.8 6138
Mar 1977 5665 168.2 6717
Apr 1977 4393 158.4 6470
May 1977 5232 152.0 7312
Jun 1977 5876 142.2 7763
Jul 1977 5900 137.2 8171
Aug 1977 5704 152.6 7788
Sep 1977 4718 166.8 7311
Oct 1977 4650 165.6 6679
Nov 1977 4446 198.6 6704
Dec 1977 3061 201.5 6724
Jan 1978 2155 170.7 6552
Feb 1978 4274 164.4 6427
Mar 1978 4695 179.7 7105
Apr 1978 4362 157.0 6869
May 1978 4889 168.0 7683
Jun 1978 5370 139.3 8082
Jul 1978 5072 138.6 8555
Aug 1978 4985 153.4 8386
Sep 1978 3978 138.9 7553
Oct 1978 4139 172.1 7398
Nov 1978 3995 198.4 7112
Dec 1978 3025 217.8 6886
Jan 1979 1949 173.7 7077
Feb 1979 4357 153.8 6820
Mar 1979 4638 175.6 7426
Apr 1979 3994 147.1 7143
May 1979 6174 160.3 8261
Jun 1979 5656 135.2 8240
Jul 1979 4411 148.8 8977
Aug 1979 5504 151.0 8991
Sep 1979 4463 148.2 8026
Oct 1979 4458 182.2 7911
Nov 1979 4528 189.2 7510
Dec 1979 2830 183.1 7381
Jan 1980 1843 170.0 7366
Feb 1980 5042 158.4 7414
Mar 1980 5348 176.1 7824
Apr 1980 5257 156.2 7524
May 1980 6699 153.2 8279
Jun 1980 5388 117.9 8707
Jul 1980 6001 149.8 9486
Aug 1980 5966 156.6 8973
Sep 1980 4845 166.7 8231
Oct 1980 4507 156.8 8206
Nov 1980 4214 158.6 7927
Dec 1980 3460 210.8 7999
Jan 1981 1833 203.6 7834
Feb 1981 4978 175.2 7521
Mar 1981 6464 168.7 8284
Apr 1981 5820 155.9 7999
May 1981 6447 147.3 8940
Jun 1981 6191 137.0 9381
Jul 1981 6628 141.1 10078
Aug 1981 5452 167.4 9796
Sep 1981 5295 160.2 8471
Oct 1981 5080 191.9 8572
Nov 1981 5564 174.4 8150
Dec 1981 3965 208.2 8168
Jan 1982 2062 159.4 8166
Feb 1982 5099 161.1 7903
Mar 1982 6162 172.1 8606
Apr 1982 5529 158.4 8071
May 1982 6416 114.6 9178
Jun 1982 6382 159.6 9873
Jul 1982 5624 159.7 10476
Aug 1982 5785 159.4 9296
Sep 1982 4644 160.7 8818
Oct 1982 5331 165.5 8697
Nov 1982 5143 205.0 8381
Dec 1982 4596 205.2 8293
Jan 1983 2180 141.6 7942
Feb 1983 5786 148.1 8001
Mar 1983 5840 184.9 8744
Apr 1983 5666 132.5 8397
May 1983 6360 137.3 9115
Jun 1983 6219 135.5 9773
Jul 1983 6082 121.7 10358
Aug 1983 5653 166.1 9849
Sep 1983 5726 146.8 9083
Oct 1983 5049 162.8 9143
Nov 1983 5859 186.8 8800
Dec 1983 4091 185.5 8741
Jan 1984 2167 151.5 8492
Feb 1984 6480 158.1 8795
Mar 1984 7375 143.0 9354
Apr 1984 6583 151.2 8796
May 1984 7251 147.6 10072
Jun 1984 6730 130.7 10174
Jul 1984 6428 137.5 11326
Aug 1984 5228 146.1 10744
Sep 1984 4716 133.6 9806
Oct 1984 6101 167.9 9740
Nov 1984 5753 181.9 9373
Dec 1984 4000 202.0 9244
Jan 1985 2691 166.5 9407
Feb 1985 5898 151.3 8827
Mar 1985 6526 146.2 9880
Apr 1985 5840 148.3 9364
May 1985 6650 144.7 10580
Jun 1985 5717 123.6 10899
Jul 1985 7236 151.6 11687
Aug 1985 6523 133.9 11280
Sep 1985 5729 137.4 10208
Oct 1985 6004 181.6 10212
Nov 1985 5950 182.0 9725
Dec 1985 4690 190.0 9721
Jan 1986 3687 161.2 9846
Feb 1986 7791 155.5 9407
Mar 1986 7153 141.9 10265
Apr 1986 6434 164.6 9970
May 1986 7850 136.2 10801
Jun 1986 6809 126.8 11246
Jul 1986 8379 152.5 12167
Aug 1986 6914 126.6 11578
Sep 1986 6919 150.1 10645
Oct 1986 7265 186.3 10613
Nov 1986 6994 147.5 10104
Dec 1986 5503 200.4 10348
Jan 1987 3782 177.2 10263
Feb 1987 7502 127.4 9973
Mar 1987 8119 177.1 10803
Apr 1987 7292 154.4 10409
May 1987 6886 135.2 11458
Jun 1987 7049 126.4 11845
Jul 1987 7977 147.3 12559
Aug 1987 8519 140.6 12070
Sep 1987 6680 152.3 11221
Oct 1987 7994 151.2 11338
Nov 1987 7047 172.2 10761
Dec 1987 5782 215.3 11012
Jan 1988 3771 154.1 10923
Feb 1988 7906 159.3 10790
Mar 1988 8970 160.4 11427
Apr 1988 6077 151.9 10788
May 1988 7919 148.4 11772
Jun 1988 7340 139.6 12104
Jul 1988 7791 148.2 12634
Aug 1988 7368 153.5 12772
Sep 1988 8255 145.1 11764
Oct 1988 7816 183.7 11956
Nov 1988 7476 210.5 11646
Dec 1988 6696 203.3 11750
Jan 1989 4484 153.3 11485
Feb 1989 8274 144.3 11198
Mar 1989 8866 169.6 12265
Apr 1989 8572 143.7 11704
May 1989 9176 160.0 12419
Jun 1989 8645 135.5 13259
Jul 1989 8265 141.7 13945
Aug 1989 9558 159.9 13839
Sep 1989 7037 145.6 12387
Oct 1989 9101 183.4 12546
Nov 1989 8180 198.1 12038
Dec 1989 7072 186.7 11977
Jan 1990 3832 171.9 12336
Feb 1990 7253 150.5 11793
Mar 1990 8667 163.0 12877
Apr 1990 7658 153.6 11923
May 1990 8859 152.8 13306
Jun 1990 7291 135.4 13988
Jul 1990 7529 148.3 14002
Aug 1990 8715 148.3 14338
Sep 1990 8450 133.5 12867
Oct 1990 9085 193.8 12761
Nov 1990 8350 208.4 12449
Dec 1990 7080 197.0 12658
\[y_t = \beta_{0} + t\beta_{1} + e_t\] \[e_{t} \sim \mathcal{N}(0, \sigma^{2}) \]
Each period entails another \(\beta_1\) increase in \(\mathbb{E}(y_t)\)
Time has a purely systematic relationship with y
Once the time series is detrended, it is simply white noise
\[y_t - t\hat{\beta_1} = \hat{\beta_0} + \hat{e_t}\] \[\hat{e_t} \sim \mathcal{N}(0, \hat{\sigma}^2)\]
Simulate a deterministic trend with noise, de-trend the data, and plot the time series.
# Set the slope of the trend
b1 <- 3/2
#Set the intercept
b0 <- 2
#Set the number of periods
n <- 50
t <- seq(0, n) # create a sequence of t from 0 to 50
y <- rep(NA, n) # pre-allocate elements for a sequence of time series
# simulate the data
set.seed(98105)
for (i in 1:length(t)){
y[i] <- b1 * t[i] + b0 + rnorm(1, 0, 15)
#The rnorm gives us the noise with mean = 0 and sd = 15
}
y
[1] -38.484116 19.457351 -8.496254 7.344132 35.161226 21.884259
[7] 28.489241 32.197319 30.639978 2.229333 20.555835 34.128072
[13] 30.880365 27.779679 25.442973 31.197895 46.376686 49.433384
[19] 42.117503 44.285260 11.395613 37.742546 25.849821 42.118610
[25] 60.490933 51.781601 26.804279 33.634958 65.498712 23.719921
[31] 65.602853 60.516626 61.438060 33.378171 58.786994 55.863424
[37] 71.158262 85.087314 65.254664 74.966883 58.863447 33.258493
[43] 67.564891 39.195174 50.315585 59.863058 52.050526 59.059821
[49] 74.381470 61.656394 82.192530
Plot the simulated time series before de-trending.
Now de-trend the time series.
Suppose that we don’t know the true trend, we can use OLS to estimate it.
As we learned from the lecture, this is caused by high leverage of starting and ending point in a time series. It also can be caused by selection bias if we truncate the time series.
But ignoring selection bias, we could just use least squares estimate the model. Longer periods make the estimate better!
Besides using ACF and PACF to detect seasonality, we use decompose()
or stl()
to extract seasonality from time series. This is also known as STL decomposition (Seasonal and Trend decomposition using Loess2). It assumes that any time series can be decomposed in either additive or multiplicative ways:
Now let’s compare and re-examine two previous examples: Australia Beer consumption and Airline Passenger Numbers.
The seasonal variation looks constant; it doesn’t change very much when the time series value increases. We should use the additive model.
It seems that as the time series increases in magnitude, the seasonal variation increases as well. Here we should use a multiplicative model.
Let’s check the beer example.
decompose()
and stl()
will automatically use pre-specified frequency information to extract seasonality. In the beer example, the frequence is 12, so decompose()
will assume a 12-month cycle and extract seasonality.decompose()
or stl()
to extract seasonality based on a new frequency.# or you can extract seasonality from scratch - this is what decompose() or stl() actually does to extract seasonality
beer.detrend <- lm(aus.beer ~ time(aus.beer)) |> resid()
beer.detrend |>
matrix(ncol = 12, byrow = TRUE) |> # create a matrix in which each column denotes a month
colMeans() |> # calculate the average value of each month over years
rep(33) |> # repeat the cycle - there are 33 years in the time series
as.ts() |>
plot() # we get the same seasonal pattern in decompose() and stl()
We need to use lm()
to extract deterministic trend and then use decompose()
or stl()
to extract the seasonality pattern. After de-trending and removing seasonality, then we can use acf
and pacf
to determine whether there is autoregressive process in the remaining part of time series.
decompose()
in ACF and PACF plot. However, because decompose()
use LOESS to fit trend, it is difficult to identify the slope of deterministic trend in our hypothesized functional form of time series.lm()
to de-trend time series.beer.detrend <- lm(aus.beer ~ time(aus.beer)) |> resid() |> ts(freq = 12)
beer.stl <- stl(beer.detrend, s.window = 12)
pacf(beer.detrend - beer.stl$time.series[, "seasonal"])
The two PACF plots are different because in the first code block, it fit a loess to extract trend, whereas in the second code block, it fit a straight regression line. Again, if we want to recover the functional form of time series, linear assumption is more convenient for us.
If the time series has a multiplicative pattern, we need to take a log and then use division to remove trend and seasonality.
After de-trending and removing seasonality from the time series, we can check whether the time series have some forms of serial correlation. For today’s section, we will only focus on how to diagnose autoregressive process.
Remember what population correlation formula looks like:
\[ corr(x,y) = \frac{cov(x,y)}{\sigma_x\sigma_y} = \frac{\overbrace{E[(x-\mu_x)(y-\mu_y)]}^\text{Covariance: measures how much x and y covary}}{\underbrace{\sigma_x\sigma_y}_\text{Standard Deviation: normalizes the degree of covarying}} \]
Remember what (sample) correlation formula looks like:
\[ \hat{corr}(x,y) = \frac{\sum_{i=1}^n{(x_i-\overline{x})(y_i-\overline{y}})}{(n-1) s_x s_y} \]
To see why there should be \(n-1\), see Pearson correlation coefficient on Wikipedia.
Suppose we want to investigate the population of time series, that is, we assume the number of periods in a time series is infinite. Then population autocorrelation \(r_k\) is \(corr(y_t, y_{t-k})\):
\[ r_k = \frac{cov(y_t, y_{t-k})}{\sigma_y^2}=\frac{\sum_{t=k+1}^T(y_t-\overline{y})(y_{t-k}-\overline{y})}{\sum_{t=1}^T(y_t-\overline{y})^2} \]
Plainly, it determines how much each observation is correlated to the observations lagged by \(k\).
Or, how much each observation covariates with the observations lagged by \(k\) (which we call autocovariation) relative to the total variance of all observations \(y\).
We call this autocorrelation function, or acf, and denote by \(\rho_k\).
We denote autocovariation as \(\gamma_k\).
Since we use sampling intervals in our usual time-series data, we have to estimate the population autocorrelation with our sample autocorrelation.
First, the sample autocovariation, \(c_k\), is calculated as:
\[ c_k = \frac{1}{n-k}\sum_{t=1}^{n-k}(y_t-\overline{y})(y_{t-k}-\overline{y}) \]
Why do we have \(n-k\) in the denominator? Consider the following example:
Period | \(y_t\) | \(y_{t-1}\) | \(y_{t-3}\) |
---|---|---|---|
1 | 0.8 | NA | NA |
2 | 7.6 | 0.8 | NA |
3 | 3.2 | 7.6 | NA |
4 | -5.9 | 3.2 | 0.8 |
5 | 8.7 | -5.9 | 7.6 |
6 | 6.6 | 8.7 | 3.2 |
Since we use sampling intervals in our usual time-series data, we have to estimate the population autocorrelation with our sample autocorrelation.
First, the sample autocovariation, \(c_k\), is calculated as:
\[ c_k = \frac{1}{n-k}\sum_{t=1}^{n-k}(y_t-\overline{y})(y_{t-k}-\overline{y}) \]
Note that \(c_0\) will be just \(s_y^2\). Therefore, the sample autocorrleation, \(r_k\), is:
\[ r_k = \frac{c_k}{c_0} = \frac{\sum_{t=k+1}^{n}(y_t-\overline{y})(y_{t-k}-\overline{y})}{\sum_{t=1}^{n}(y_t-\overline{y})^2} \cdot \frac{n}{n-k} \]
Let’s calculate \(r_1\) with a hypothetical data. With manual check, we know that acf()
actually use the sample autocorrelation formula to calculate \(r_k\).
set.seed(98105)
y <- rnorm(5, 10, 1) # simulate a time series with 5 periods
var.y <- var(y) # (sigma_y)^2
mean <- mean(y)
c_1 <- ((y[1]-mean)*(y[2]-mean) + (y[2]-mean)*(y[3]-mean) + # manually calculate sample autocorrelation
(y[3]-mean)*(y[4]-mean) + (y[4]-mean)*(y[5]-mean))/(5-1)
r_1 <- c_1/var.y
r_1
[1] -0.3032223
Autocorrelations of series 'y', by lag
0 1 2 3 4
1.000 -0.303 0.057 0.148 -0.402
Let’s see another example for acf()
.
Let’s see another example for acf()
.
Autocorrelations of series 'wave', by lag
0 1 2 3 4 5 6 7 8 9 10
1.000 0.470 -0.263 -0.499 -0.379 -0.215 -0.038 0.178 0.269 0.130 -0.074
11 12 13 14 15 16 17 18 19 20 21
-0.079 0.029 0.070 0.063 -0.010 -0.102 -0.125 -0.109 -0.048 0.077 0.165
22 23 24 25
0.124 0.049 -0.005 -0.066
We call this correlogram.
To obtain the estimate population acf, we use sample acf. We assume that the sampling distribution of \(r_k\) is approximately normal, with a mean of \(-1/n\) and a variance of \(1/n\). Therefore, we set the 95% confidence interval for testing the null hypothesis \(r_k=0\) which is the blue dotted line in acf plot by calculating:
\[ \frac{-1}{n}\pm\frac{1.96}{\sqrt{n}} \]
Let’s see how ACF behaves when a time series has trend, seasonality, and AR(1) process.
# acf after de-trending
AP.trend <- lm(AP ~ seq_along(AP))
AP.detrend <- AP.trend$residuals
acf(AP.detrend) # now we only have fluctuation
# acf after removing seasonality
AP.remainder <- AP.detrend - decompose(AP)$seasonal
acf(AP.remainder) # less fluctuation now
The acf diminishes as lag increases due to the increasing trend, and the fluctuation is due to the seasonal variation.
As a true white noise simply varies randomly through time, ACFs will be zero (no correlation between lagged time series). More precisely, it is expected that 5% of the ACFs will be significantly different from zero at the 95% significance level.
As this white noise is \(T=50\), the blue dotted line ranges around \(\pm2/\sqrt{50}=\pm0.28\). The statistically significant value at lag 2 is due to sampling variation.
But it is often hard to determine the components of a series based only on ACF, especially when we have trends, seasonal variation or mixed ARMA processes.
Therefore, we also use partial ACF, PACF, which is a measure of correlation between time series observations that are \(k\) units apart, after the correlation at intermediate lags has been controlled for or “partialled” out. In other words, the PACF(\(k\)) is the correlation between \(y_t\) and \(y_{t-k}\) after removing the effect of the intermediate \(y\)’s.
We simply use function pacf()
to see the correlogram of PACF.
PACF can be derived by:
\[ PACF(j) = corr(y_{t-j}, \hat{e_t}) \]
where \(\hat{e_t}\) is an estimator of a white noise of our time series data.
Let’s see how PACF behaves for white noises.
Let’s see how PACF behaves under trend, seasonality, and autoregressive process.
We can easily detect the number of Airline Passengers is an AR(1) process with seasonality, because ACF tails off and PACF cuts off after \(t-1\) and \(t-12\). We will talk more about how to use ACF and PACF to determine whether the time series is an AR or MA process. To give you a sense, here is a table showing different pattern of ACF and PACF for ARMA models:
AR(\(p\)) | MA(\(q\)) | ARMA(\(p,q\)) | |
---|---|---|---|
ACF | Tails off | Cuts off after lag \(q\) | Tails off |
PACF | Cuts off after lag p; PACF(\(p\)) \(= \phi_p\) | Tails off | Tails off |
We will go back to this table and discuss more about the practical rules to identify the functional form of time series.
arima.sim()
to Informally Verifty Our HypothesisWe can use arima.sim()
to simulate time series and verify our guesses. For example, what are the differences in ACF and PACF between AR process with and without deterministic trend?
set.seed(98105)
ar.trend <- arima.sim(
list(order = c(1, 0, 0), # the functional form of time series: (AR-order, integrate-order, MA-order)
ar = 2/3, # the coefficient for y_{t-1}
ma = NULL, # the coefficient for \epsilon
beta = 4/5, # slope
alpha = 10), # intercept
n = 50 # length of time series
)
ar <- arima.sim(list(order = c(1, 0, 0), ar = 2/3, ma = NULL), n = 50)
As we discussed before, trend will induce an upward bias in ACF plot. That’s why we want to de-trend a time series in the first place.
Trend can induce slight bias for PACF at \(t=1\), but it does not affect PACF plot very much.
arima.sim()
to verify your guess.In conclusion, to inspect any unknown time series data, we will need to utilize all available resources, including:
regweight
to investigate these weights.library(tidyverse)
library(dataverse) # use this package to download their data from dataverse
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
jensen_data <- get_dataframe_by_name(
"jensen-rep.tab",
"10.7910/DVN/29098"
)
y_formula <- Fvar5 ~ regime + var5 + market + lgdppc + gdpgrowt + tradeofg + overallb + generalg + country + d2 + d3
y_model <- lm(y_formula, data = jensen_data)
summary(y_model) # democratic political institutions are associated with higher levels of FDI inflows
Call:
lm(formula = y_formula, data = jensen_data)
Residuals:
Min 1Q Median 3Q Max
-14.2092 -0.3934 -0.0639 0.2688 11.9197
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.780457 6.533424 1.191 0.23389
regime 0.021076 0.010401 2.026 0.04291 *
var5 0.364187 0.023856 15.266 < 2e-16 ***
market -0.554161 0.449721 -1.232 0.21805
lgdppc 0.834303 0.527355 1.582 0.11385
gdpgrowt 0.024384 0.007458 3.269 0.00110 **
tradeofg 0.005536 0.003071 1.803 0.07162 .
overallb -0.023144 0.008423 -2.748 0.00607 **
generalg -0.039327 0.014015 -2.806 0.00508 **
countryAlgeria -0.769471 1.759341 -0.437 0.66191
countryArgentina -0.217992 1.669869 -0.131 0.89615
countryAustralia -0.156684 1.514469 -0.103 0.91761
countryAustria -1.586642 1.408756 -1.126 0.26023
countryBahrain -2.871117 1.732004 -1.658 0.09759 .
countryBangladesh 0.529181 2.106034 0.251 0.80164
countryBelarus -1.314027 1.515115 -0.867 0.38593
countryBenin -1.260464 1.500536 -0.840 0.40104
countryBolivia 0.488885 1.391487 0.351 0.72538
countryBotswana 0.009360 1.422880 0.007 0.99475
countryBrazil 0.454032 2.141688 0.212 0.83214
countryBulgaria -0.938846 1.425483 -0.659 0.51024
countryBurkina Faso -0.593859 1.419675 -0.418 0.67578
countryBurundi -0.966488 1.367211 -0.707 0.47973
countryCameroon -0.528642 1.431846 -0.369 0.71203
countryCanada -0.256858 1.635617 -0.157 0.87523
countryCentral African Republic -0.408461 1.820240 -0.224 0.82248
countryChad -0.487813 1.406777 -0.347 0.72882
countryChile 0.202560 1.453261 0.139 0.88917
countryChina 4.383751 3.002044 1.460 0.14443
countryColombia -0.011690 1.670793 -0.007 0.99442
countryComoros -1.629000 1.658470 -0.982 0.32614
countryCongo, Dem. Rep. -0.101006 1.704363 -0.059 0.95275
countryCongo, Rep. -1.230937 1.472940 -0.836 0.40346
countryCosta Rica -0.878026 1.332592 -0.659 0.51007
countryCote d'Ivoire -0.378930 1.476118 -0.257 0.79744
countryCroatia -0.459311 1.516690 -0.303 0.76206
countryCzech Republic 0.090824 1.538914 0.059 0.95295
countryDenmark -1.366390 1.394951 -0.980 0.32748
countryEcuador -0.818150 1.401128 -0.584 0.55936
countryEgypt, Arab Rep. 1.142792 1.821610 0.627 0.53052
countryEstonia 0.415399 1.500746 0.277 0.78198
countryEthiopia 0.648511 1.905340 0.340 0.73363
countryFiji -1.539227 1.490522 -1.033 0.30192
countryFinland -1.674644 1.371825 -1.221 0.22238
countryFrance -0.047935 1.829928 -0.026 0.97911
countryGabon -1.806854 1.523626 -1.186 0.23585
countryGambia, The -1.103775 1.532171 -0.720 0.47139
countryGermany -0.627451 1.974268 -0.318 0.75067
countryGhana -0.268661 1.487119 -0.181 0.85666
countryGreece -1.064810 1.424256 -0.748 0.45480
countryGuatemala -0.706671 1.414018 -0.500 0.61732
countryGuinea -1.077860 1.467542 -0.734 0.46278
countryGuyana -2.994982 1.502347 -1.994 0.04638 *
countryHaiti -0.815099 1.428599 -0.571 0.56838
countryHonduras -1.415105 1.578564 -0.896 0.37016
countryHungary 0.052412 1.429351 0.037 0.97075
countryIceland -3.663865 1.848237 -1.982 0.04762 *
countryIndia 1.426339 2.852664 0.500 0.61715
countryIndonesia 1.030879 2.247355 0.459 0.64651
countryIran, Islamic Rep. -0.103101 1.838136 -0.056 0.95528
countryIreland -1.647015 1.361439 -1.210 0.22656
countryIsrael -1.396425 1.394339 -1.001 0.31675
countryItaly -0.701067 1.833155 -0.382 0.70219
countryJamaica -2.368045 1.366655 -1.733 0.08335 .
countryJapan -0.472606 2.074262 -0.228 0.81980
countryJordan -1.325352 1.357720 -0.976 0.32914
countryKenya -0.005241 1.604597 -0.003 0.99739
countryKorea, Rep. -0.798335 1.738910 -0.459 0.64623
countryKuwait -1.755731 1.493501 -1.176 0.23995
countryLatvia 2.503362 1.501608 1.667 0.09570 .
countryLesotho -1.070562 1.389505 -0.770 0.44115
countryLithuania -0.951983 1.465559 -0.650 0.51607
countryMadagascar -0.915276 1.505808 -0.608 0.54339
countryMalawi -0.543680 1.419650 -0.383 0.70180
countryMalaysia 1.191615 1.524981 0.781 0.43469
countryMali -0.661622 1.427505 -0.463 0.64309
countryMexico 0.568644 1.932491 0.294 0.76860
countryMongolia -1.823137 1.461429 -1.248 0.21241
countryMorocco -0.134527 1.596473 -0.084 0.93286
countryNamibia -0.042042 1.529543 -0.027 0.97808
countryNepal -0.606929 1.557997 -0.390 0.69692
countryNetherlands -0.638559 1.502349 -0.425 0.67087
countryNew Zealand -0.348592 1.360599 -0.256 0.79783
countryNicaragua -0.875398 1.327978 -0.659 0.50987
countryNiger -0.472562 1.478298 -0.320 0.74927
countryNigeria 1.471260 2.061675 0.714 0.47557
countryNorway -1.610421 1.371560 -1.174 0.24052
countryOman -1.189005 1.418525 -0.838 0.40205
countryPakistan 0.555077 2.068894 0.268 0.78851
countryPanama -1.718419 1.399232 -1.228 0.21960
countryParaguay -1.266933 1.326562 -0.955 0.33971
countryPeru -0.224679 1.551209 -0.145 0.88486
countryPhilippines 0.226883 1.849508 0.123 0.90238
countryPoland 0.348474 1.757334 0.198 0.84284
countryPortugal -0.986743 1.420537 -0.695 0.48740
countryRomania -0.238332 1.599100 -0.149 0.88154
countryRussian Federation 0.540775 2.297226 0.235 0.81393
countryRwanda -0.274914 1.417975 -0.194 0.84630
countrySenegal -0.852630 1.402597 -0.608 0.54335
countrySierra Leone -1.337035 1.335865 -1.001 0.31705
countrySingapore 1.645610 1.662746 0.990 0.32248
countrySouth Africa -0.236653 1.831588 -0.129 0.89721
countrySpain 0.051776 1.718197 0.030 0.97596
countrySri Lanka -0.715195 1.520287 -0.470 0.63811
countrySudan -0.288631 1.639942 -0.176 0.86032
countrySwaziland 1.005616 1.525757 0.659 0.50994
countrySweden -0.639113 1.431268 -0.447 0.65527
countrySwitzerland -1.635414 1.449412 -1.128 0.25936
countrySyrian Arab Republic -0.650697 1.449605 -0.449 0.65358
countryThailand 0.223848 1.821157 0.123 0.90219
countryTogo -0.061608 1.360865 -0.045 0.96390
countryTrinidad and Tobago -0.339657 1.453096 -0.234 0.81521
countryTunisia -0.209754 1.382703 -0.152 0.87945
countryTurkey -0.408241 1.808293 -0.226 0.82142
countryUganda -0.478112 1.667380 -0.287 0.77435
countryUnited Kingdom 0.543773 1.845459 0.295 0.76830
countryUnited States 0.591666 2.316175 0.255 0.79841
countryUruguay -1.515523 1.331549 -1.138 0.25523
countryVenezuela -0.816506 1.515953 -0.539 0.59024
countryYemen, Rep. -3.224999 1.668954 -1.932 0.05350 .
countryZambia 0.156704 1.392462 0.113 0.91041
countryZimbabwe -0.938322 1.405607 -0.668 0.50452
d2 -0.076561 0.132325 -0.579 0.56296
d3 0.263350 0.206296 1.277 0.20195
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.28 on 1506 degrees of freedom
Multiple R-squared: 0.6137, Adjusted R-squared: 0.5822
F-statistic: 19.45 on 123 and 1506 DF, p-value: < 2.2e-16
d_formula <- regime ~ var5 + market + lgdppc + gdpgrowt + tradeofg + overallb + generalg + country + d2 + d3
d_model <- lm(d_formula, data = jensen_data)
weights <- d_model$residuals^2
# we can look at how different nominal sample and effective sample are.
jensen_data %>%
select(-country, -year, -Fvar5, -regime) %>%
apply(., 2, function(x) {
c(mean(x), sd(x),
weighted.mean(x, weights),
sqrt(weighted.mean((x - weighted.mean(x, weights))^2, weights)))
}) %>%
t() %>%
`colnames<-`(c("Nominal Mean", "Nominal SD", "Effective Mean", "Effective SD"))
Nominal Mean Nominal SD Effective Mean Effective SD
var5 1.1752344 1.9642666 1.0396521 1.6495790
market 24.2338195 1.9218543 24.0152792 1.5606882
lgdppc 8.0542249 1.1339495 7.7979652 0.9124630
gdpgrowt 3.4257344 5.1275527 3.1994243 4.9895449
tradeofg 67.4539343 49.3379091 61.0085132 41.3038847
overallb -4.0107131 6.3324700 -3.2134829 5.8089877
generalg 15.6689503 6.3112568 13.1417353 4.9461324
d2 0.4693252 0.4992113 0.3610634 0.4803089
d3 0.3110429 0.4630624 0.3803371 0.4854696
jensen_data$weights <- weights
weighted_countries <- jensen_data %>%
select(country, weights) %>%
group_by(country) %>%
summarise(mean_weights = mean(weights)) %>% # average weight for each country over years
mutate(nominal = 1) # all countries are included in nominal sample
world <- map_data("world") %>%
left_join(., weighted_countries, by = c("region" = "country")) %>%
replace_na(list(mean_weights = 0, nominal = 0))
# geographic distribution of nominal sample
ggplot() +
geom_map(data = world, map = world,
aes(x = long, y = lat, map_id = region, fill = factor(nominal))) +
scale_fill_manual("Nominal Sample", values = c("white", "black"))
# geographical distribution of effective sample
ggplot() +
geom_map(data = world, map = world,
aes(x = long, y = lat, map_id = region, fill = mean_weights)) +
scale_fill_continuous("Effective Sample Weight %", limits = c(0, 16), low = "white", high = "black")
\[Y \leftarrow X \leftarrow D\]
D <- rnorm(1e3)
X <- 5 * D + rnorm(1e3)
Y <- 10 * X + rnorm(1e3)
lm(Y ~ D) |> summary() # this will give us roughly correct estimate
Call:
lm(formula = Y ~ D)
Residuals:
Min 1Q Median 3Q Max
-38.428 -6.333 0.191 6.528 34.798
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2348 0.3073 0.764 0.445
D 49.3792 0.2975 165.959 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.715 on 998 degrees of freedom
Multiple R-squared: 0.965, Adjusted R-squared: 0.965
F-statistic: 2.754e+04 on 1 and 998 DF, p-value: < 2.2e-16
lm(Y ~ D + X) |> summary() # including the post-treatment variable X will mask the relationship between D and Y
Call:
lm(formula = Y ~ D + X)
Residuals:
Min 1Q Median 3Q Max
-2.8764 -0.6670 0.0390 0.6896 3.1816
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02553 0.03149 0.811 0.418
D 0.13366 0.16342 0.818 0.414
X 9.98520 0.03255 306.734 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9953 on 997 degrees of freedom
Multiple R-squared: 0.9996, Adjusted R-squared: 0.9996
F-statistic: 1.359e+06 on 2 and 997 DF, p-value: < 2.2e-16
\[Y \rightarrow X \leftarrow D\]
D <- rnorm(1e3)
X <- 5 * D + 10 * Y + rnorm(1e3)
lm(Y ~ D) |> summary() # D and Y have no correlation
Call:
lm(formula = Y ~ D)
Residuals:
Min 1Q Median 3Q Max
-180.062 -32.761 0.583 36.685 163.049
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.059 1.641 0.645 0.519
D 2.186 1.625 1.345 0.179
Residual standard error: 51.91 on 998 degrees of freedom
Multiple R-squared: 0.001811, Adjusted R-squared: 0.0008105
F-statistic: 1.81 on 1 and 998 DF, p-value: 0.1788
Call:
lm(formula = Y ~ D + X)
Residuals:
Min 1Q Median 3Q Max
-0.35055 -0.06536 -0.00104 0.06393 0.32794
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.733e-03 3.088e-03 0.561 0.575
D -5.016e-01 3.060e-03 -163.948 <2e-16 ***
X 1.000e-01 5.954e-06 16796.220 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09763 on 997 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.413e+08 on 2 and 997 DF, p-value: < 2.2e-16
\[D \leftarrow U \rightarrow Y\]
\[D \rightarrow Y\]
We want to estimate the direct effect of \(D\) on \(Y\).
U <- rnorm(1e3)
D <- 2 * U + rnorm(1e3)
Y <- 5 * D + 10 * U + rnorm(1e3)
lm(Y ~ D) |> summary() # biased estimate of direct effect
Call:
lm(formula = Y ~ D)
Residuals:
Min 1Q Median 3Q Max
-16.4206 -3.0485 0.0342 3.2417 13.4430
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.08417 0.15252 -0.552 0.581
D 9.05345 0.06782 133.500 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.822 on 998 degrees of freedom
Multiple R-squared: 0.947, Adjusted R-squared: 0.9469
F-statistic: 1.782e+04 on 1 and 998 DF, p-value: < 2.2e-16
Call:
lm(formula = Y ~ D + U)
Residuals:
Min 1Q Median 3Q Max
-3.7377 -0.7171 -0.0568 0.6967 3.1830
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.861e-04 3.264e-02 0.015 0.988
D 4.950e+00 3.194e-02 155.000 <2e-16 ***
U 1.011e+01 7.011e-02 144.215 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.032 on 997 degrees of freedom
Multiple R-squared: 0.9976, Adjusted R-squared: 0.9976
F-statistic: 2.05e+05 on 2 and 997 DF, p-value: < 2.2e-16
CSSS/POLS 512 Time Series and Panel Data for the Social Sciences
See https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1540-5907.2012.00602.x.
This is the abbrevation of Locally Weighted Scatterplot Smoothing or locally weighted smoothing.