How to Look for Help? reprex, Piazza & Stack Overflow
Acknowledgement
The section materials are adapted from the original version made by Daniel Yoo and Inhwan Ko, former TAs to CSSS/POLS 512, Department of Political Science, University of Washington.
Logistics
Labs, Office Hours, and Homework
Lab Sessions: Fri, 1:30-3:20 pm via Zoom (link available on the website)
Covers application of material from lecture using examples; clarification and extention of lecture material; Q&A for homework and lectures
Materials will be available on the course website, and will be offered in a compact .zip file (thanks to participants’ suggestions today!).
Every section is expected to have slides, coding exercise, and data files. The solution for coding exercise will be offered after the section.
Office Hours: Wed 11:00-12:00 pm & Fri 3:20-4:20 pm via Zoom (the same link for lab session)
Available for trouble shooting and specific questions about homework and lecture materials
Homework: 3-4 due every 2 weeks or so
Using RStudio with R Markdown with write up in \(\LaTeX\)
Many packages: tseries, forecast, lmtest, urca, quantmod, etc.
Expectation and Goals
When this course is over, you should be able to do the following (and more):
Identify and understand time series dynamics: seasonality, deterministic trends, moving average processes, autoregressive processes
Distinguish between stationary and nonstationary time series, perform unit root tests, fit ARMA and ARIMA models, use cross validation for model assessment
Analyze multiple continuous time series using vector autoregression, perform cointegration tests, and estimate error correction models for cointegrated time series
Distinguish between random effects, fixed effects, and mixed effects and decide when each of these are appropriate
Understand Nickell bias and use an instrumental variable approach with GMM to address the issue
Perform multiple imputation and in-sample simulations for panel data
Expectation and Goals
The course moves fast: you should be comfortable doing the following for the homework assignments and project
tidying and transforming data, especially time series and panel data
importing and exporting datasets
generating plots of your data and results
writing basic functions and loops for repeated procedures
Fortunately, for those of you new to R, there are many resources to get you up to speed
library(reprex)library(datapasta)tribble_paste(mtcars) # if you have a small data.frame, you can use itreprex( { f <-function(x) x +1# create f g <-function(x) f(x) # create gg("a")ggplot() },session_info =TRUE, # include information about your computer and current R session (important!)style =TRUE, # make your code prettiervenue ="slack"# choose slack style)
Let’s refresh some basic packages and functions for wragling data:
Loading the packages:
library(haven) #library(foreign) # for loading datasetslibrary(tidyverse) # for data wranglinglibrary(ggplot2) # for data visualization; already integrated in tidyverse
Loading the data:
data <-read.csv("Lab1_data/Lab1data.csv", header =TRUE) # if NAs are coded as spaces, add an argument: na.strings = " "# if NAs are coded as, say, 77, add an argument: na.strings = "77"# using read_csv() will load data as a tibble# How to view the data?
Data Wrangling
Let’s refresh some basic packages and functions for wragling data:
Loading the packages:
library(haven) #library(foreign) # for loading datasetslibrary(tidyverse) # for data wranglinglibrary(ggplot2) # for data visualization; already integrated in tidyverse
Loading the data:
data <-read.csv("Lab1_data/Lab1data.csv", header =TRUE) # if NAs are coded as spaces, add an argument: na.strings = " "# if NAs are coded as, say, 77, add an argument: na.strings = "77"# using read_csv() will load data as a tibble# How to view the data?head(data, n =10) # view the first 10 rows
country Year GDP.per.capita.PPP.current.international polity2
1 Antigua and Barbuda 2000 12345.82 NA
2 Antigua and Barbuda 2001 12654.92 NA
3 Antigua and Barbuda 2002 12959.93 NA
4 Antigua and Barbuda 2003 13699.04 NA
5 Antigua and Barbuda 2004 14866.37 NA
6 Antigua and Barbuda 2005 15791.64 NA
7 Antigua and Barbuda 2006 18234.80 NA
8 Antigua and Barbuda 2007 20324.14 NA
9 Antigua and Barbuda 2008 20565.86 NA
10 Antigua and Barbuda 2009 18778.49 NA
data$country |>unique() |>length() # view the total number of countries
# R 4.0 introduce the built-in pipe, otherwise we need to write ugly code like this:# length(unique(data$country)) # will be very annoying when we deal with many parentheses# anyway, we have N=174, T=11 (2000~2010), gdp per capita, polity# It's panel data!
Data Wrangling
Data Wrangling
Change the name of variables:
rawdata <- data # save our rawdata before touching itcolnames(data)
# Honestly, "GDP.per.capita.PPP.current.international" is too long. # How do we fix this?colnames(data) <-c("country", "Year", "income", "polity2")colnames(data)
[1] "country" "Year" "income" "polity2"
# or try thisdata <- rawdatadata <- rawdata %>%# this pipe operator introduced by magittr; already integrated in tidyverserename("income"="GDP.per.capita.PPP.current.international")colnames(data)
[1] "country" "Year" "income" "polity2"
Data Wrangling
Subset the data
## only include year = 2000data2000 <-subset(data, subset = (Year ==2000))## ordata2000 <- data[data$Year ==2000, ]## ordata2000 <- data %>%filter(Year ==2000)head(data2000, n =10)
country Year income polity2
1 Antigua and Barbuda 2000 12345.817 NA
2 Afghanistan 2000 NA -7
3 Albania 2000 4259.308 5
4 Algeria 2000 5395.332 -3
5 Andorra 2000 NA NA
6 Angola 2000 2277.127 -3
7 Argentina 2000 9112.116 8
8 Armenia 2000 2033.462 5
9 Aruba 2000 NA NA
10 Azerbaijan 2000 2207.057 -7
Data Wrangling
Summarize the data
summary(data)
country Year income polity2
Length:1914 Min. :2000 Min. : 219.2 Min. :-10.000
Class :character 1st Qu.:2002 1st Qu.: 1625.0 1st Qu.: -4.000
Mode :character Median :2005 Median : 4299.2 Median : 5.000
Mean :2005 Mean : 7874.9 Mean : 2.431
3rd Qu.:2008 3rd Qu.: 9818.6 3rd Qu.: 8.000
Max. :2010 Max. :91712.3 Max. : 10.000
NA's :373 NA's :542
# if you only care about income and polity2, you can do thisdata.mean <-sapply(data[, 3:4], mean, na.rm =TRUE) # sapply() just apply the function mean() to each element in a vectordata.sd <-sapply(data[, 3:4], sd, na.rm =TRUE)rbind(data.mean, data.sd)
income polity2
data.mean 7874.852 2.430758
data.sd 10194.125 6.413676
quantile(data$income, probs = .25, na.rm =TRUE) # view the 25% quantile of income
25%
1624.957
quantile(data$income, probs =seq(0, 1, .25), na.rm =TRUE) # view quartiles; use seq() to create a sequence with equal margin between elements
# create a threshold for high- and low-income countriesdata2000$group <-NAdata2000$group[data2000$income >mean(data2000$income, na.rm =TRUE)] <-"High"data2000$group[data2000$income <=mean(data2000$income, na.rm =TRUE)] <-"Low"data2000$group <-as.factor(data2000$group) # make it as a "factor" variable# look at the distribution of income for each groupp3 <-ggplot(na.omit(data2000), aes(group, income, color = group), na.rm =TRUE)p3 +geom_boxplot() # boxplot
Time Series Data - Unemployment in Maine
# Acquire the data# Monthly unemployment in Maine from January 1996 to August 2006maine.month <-read.table("Lab1_data/Maine.dat", header =TRUE)# Attach the object and check its class (i.e. the type of data structure)attach(maine.month)class(maine.month)
[1] "data.frame"
# Monthly unemployment data head(maine.month, n =10)
# use ts() to create a time series object# try help(ts) or ?ts to see what this function doesmaine.month.ts <-ts( unemploy,start =c(1996, 1), # the starting year is 1996 Januaryfreq =12# the number of observations per unit of time is 1/12 (12 months))class(maine.month.ts)
# Find the mean unemployment per yearmaine.annual.ts <-aggregate(maine.month.ts) /12maine.annual.ts
Time Series:
Start = 1996
End = 2005
Frequency = 1
[1] 5.258333 5.125000 4.508333 3.950000 3.275000 3.733333 4.341667 4.991667
[9] 4.616667 4.841667
# what does aggregate() do?# ?aggregate tells us that it sums data by subsets# for "ts" object, it will split and sum data by year
Time Series Data - Unemployment in Maine
plot(maine.month.ts, ylab ="Unemployed (%)", main ="Monthly Unemployment Rate in Maine")
How would you describe the pattern of unemployment?
Time Series Data - Unemployment in Maine
plot(maine.annual.ts, ylab ="Unemployed (%)", main ="Annual Unemployment Rate in Maine")
How would you describe the pattern of unemployment?
Time Series Data - Unemployment in Maine
# Find unmployment rates for Feburary and Augustmaine.feb <-window(maine.month.ts, start =c(1996, 2), freq =1)maine.aug <-window(maine.month.ts, start =c(1996, 8), freq =1)# Find ratio of mean unemployment in Feb and August versus grand meanfeb.ratio <-mean(maine.feb) /mean(maine.month.ts)aug.ratio <-mean(maine.aug) /mean(maine.month.ts)# what does window() do?# what does `feb.ratio <- mean(maine.feb) / mean(maine.month.ts)` do?
Time Series Data - Unemployment in Maine
# Find unmployment rates for Feburary and Augustmaine.feb <-window( maine.month.ts,start =c(1996, 2), # start from Feburary 1996freq =1) # the margin between sampled observations is 1 year!# if you specify freq = 2, it turns out to be 1/2 year.maine.aug <-window(maine.month.ts, start =c(1996, 8), freq =1)# Find ratio of mean unemployment in Feb and August versus grand meanfeb.ratio <-mean(maine.feb) /mean(maine.month.ts)aug.ratio <-mean(maine.aug) /mean(maine.month.ts)# what does window() do?# what does `feb.ratio <- mean(maine.feb) / mean(maine.month.ts)` do?maine.feb # window() can subset time series between two cutoffs
Time Series:
Start = 1996.083
End = 2006.083
Frequency = 1
[1] 6.7 6.5 5.7 5.0 4.4 4.2 4.9 5.8 5.6 5.8 5.6
feb.ratio # compare the average unemployment rate in Feb to the average unemployment rate throughout the year
[1] 1.222529
aug.ratio
[1] 0.8163732
What Does ts() Do To Your Data?
ts object is a matrix-like object with information about “time” as rownames and colnames and with “random variables” as elements (in the above example, the “random variable” is Maine).
In the above example, we feed in a data.frame with one single column. ts() will first use data.matrix() to transform it as a matrix, and then treat its number of columns as the total number of time series (so we only have 1 time series), and then treat the number of rows as the total number of observations within that single time series.
Caveat ⚠️: if a vector, matrix or data.frame has any character, R will automatically transform other numeric variables as characters. So if you have NA in your vector, ts() will return NA, which is OK. However, if you use white space " " or other characters (e.g. "NA") to code missing values, other numeric variables will become characters. In this case, data.matrix() in ts() will first transform these characters to factor, and then transform factor to numeric values, and you will find the resulting numeric values are totally different from the original ones.2 So make sure that your data only contains numeric variables or NA.3
maine <-read.table("Lab1_data/Maine.dat", header =TRUE)maine[6, ] <-NA# assign NA to June 1996maine |>as.matrix() # we still get a numeric sequence
You can use debugonce(ts) to check its source code and understand how data is transformed within the function.
(statistical) definition of time series data:
realizations of sequences of random variables defined at fixed sampling intervals
sampling interval = one time unit of time-series (e.g. second, minute, hour, day, month, quarter)
For example, if we have quarterly pattern in our time-series data, the frequency should be 4:
maine <-read.table("Lab1_data/Maine.dat", header =TRUE)maine.quarter.ts <-ts( maine,start =c(1996, 1), # this time we can interpret the starting point as the 1st quarter of 1996freq =4# every year has 4 quarter)maine.quarter.ts |>window(start =1996, end =2015)
# it doesn't make sense, because this is actually a monthly data. But it's a good example to understand how `ts()` work.
Someone may think the format of ts object looks like data.frame, so can we transform it as a data.frame? The answer is, we don’t need to do so.
A ts object is essentially a sequence, but we use two “tags” to identify each element in this sequence. For example, we use “year” (as a cycle of 12 months) and “month” to identify each value of unemployment rate. We can also use “hour” (as a cycle of 60 minutes) and “minute” to identify the position of each minute in a day. That’s why we finally see a matrix-like object. But a single time series is just a column of a matrix!
To see why, you can use as.matrix or as.data.frame(), it will transform it as a matrix or data.frame with one single column, which is just the original form of our data. We will see an evidence of this fact in the next example.
Time Series Data - Global Temperature
# Acquire the data# Average global temperature from Univ. East Anglia and UK Met Office# Monthly from January 1856 to December 2005global <-read.table("Lab1_data/global.dat")global <-t(global) |>as.vector() |>as.data.frame()
Coding Exercise: Time Series Data - Global Temperature
Create a time series object using the data that starts in Jan 1856 and ends in Dec 2005 with monthly observations.
Find the mean temperature for each year and save in a new time series object.
Plot the two objects.
Observe global temperature from 1970 to 2005 using the window function and plot.
Multiple Time Series - Electricty, Beer, Chocolate Production
# Acquire the data# Electricity (millions of kWh), beer (Ml), and chocolate production (tonnes) # in Australia from January 1958 to December 1990 # from the Australian Bureau of Statisticscbe <-read.table("Lab1_data/cbe.dat", header =TRUE)class(cbe)
[1] "data.frame"
Multiple Time Series - Electricty, Beer, Chocolate Production
Multiple Time Series - Electricty, Beer, Chocolate Production
# Create separate time series objects for eachchoc.ts <-ts(cbe[, 1], start =1958, freq =12)beer.ts <-ts(cbe[, 2], start =1958, freq =12)elec.ts <-ts(cbe[, 3], start =1958, freq =12)# or since they have the same starting point adn same frequency (by our prior knowledge), we can write the code more compactly:cbe.ts <-ts(cbe, start =1958, freq =12) # Another evidence that a time series is just a column of matrix. Here we can see three time series by column.cbe.ts
choc beer elec
Jan 1958 1451 96.3 1497
Feb 1958 2037 84.4 1463
Mar 1958 2477 91.2 1648
Apr 1958 2785 81.9 1595
May 1958 2994 80.5 1777
Jun 1958 2681 70.4 1824
Jul 1958 3098 74.8 1994
Aug 1958 2708 75.9 1835
Sep 1958 2517 86.3 1787
Oct 1958 2445 98.7 1699
Nov 1958 2087 100.9 1633
Dec 1958 1801 113.8 1645
Jan 1959 1216 89.8 1597
Feb 1959 2173 84.4 1577
Mar 1959 2286 87.2 1709
Apr 1959 3121 85.6 1756
May 1959 3458 72.0 1936
Jun 1959 3511 69.2 2052
Jul 1959 3524 77.5 2105
Aug 1959 2767 78.1 2016
Sep 1959 2744 94.3 1914
Oct 1959 2603 97.7 1925
Nov 1959 2527 100.2 1824
Dec 1959 1846 116.4 1765
Jan 1960 1066 97.1 1721
Feb 1960 2327 93.0 1752
Mar 1960 3066 96.0 1914
Apr 1960 3048 80.5 1857
May 1960 3806 76.1 2159
Jun 1960 4042 69.9 2195
Jul 1960 3583 73.6 2287
Aug 1960 3438 92.6 2276
Sep 1960 2957 94.2 2096
Oct 1960 2885 93.5 2055
Nov 1960 2744 108.5 2004
Dec 1960 1837 109.4 1924
Jan 1961 1447 105.1 1851
Feb 1961 2504 92.5 1839
Mar 1961 3248 97.1 2019
Apr 1961 3098 81.4 1937
May 1961 4318 79.1 2270
Jun 1961 3561 72.1 2251
Jul 1961 3316 78.7 2382
Aug 1961 3379 87.1 2364
Sep 1961 2717 91.4 2129
Oct 1961 2354 109.9 2110
Nov 1961 2445 116.3 2072
Dec 1961 1542 113.0 1980
Jan 1962 1606 100.0 1995
Feb 1962 2590 84.8 1932
Mar 1962 3588 94.3 2171
Apr 1962 3202 87.1 2162
May 1962 4704 90.3 2489
Jun 1962 4005 72.4 2424
Jul 1962 3810 84.9 2641
Aug 1962 3488 92.7 2630
Sep 1962 2781 92.2 2324
Oct 1962 2944 114.9 2412
Nov 1962 2817 112.5 2284
Dec 1962 1960 118.3 2186
Jan 1963 1937 106.0 2184
Feb 1963 2903 91.2 2144
Mar 1963 3357 96.6 2379
Apr 1963 3552 96.3 2383
May 1963 4581 88.2 2717
Jun 1963 3905 70.2 2774
Jul 1963 4581 86.5 3051
Aug 1963 4037 88.2 2891
Sep 1963 3345 102.8 2613
Oct 1963 3175 119.1 2600
Nov 1963 2808 119.2 2493
Dec 1963 2050 125.1 2410
Jan 1964 1719 106.1 2390
Feb 1964 3143 102.1 2463
Mar 1964 3756 105.2 2616
Apr 1964 4776 101.0 2734
May 1964 4540 84.3 2970
Jun 1964 4309 87.5 3125
Jul 1964 4563 92.7 3342
Aug 1964 3506 94.4 3207
Sep 1964 3665 113.0 2964
Oct 1964 3361 113.9 2919
Nov 1964 3094 122.9 2764
Dec 1964 2440 132.7 2732
Jan 1965 1633 106.9 2622
Feb 1965 2935 96.6 2698
Mar 1965 4159 127.3 2950
Apr 1965 4159 98.2 2895
May 1965 4894 100.2 3200
Jun 1965 4921 89.4 3408
Jul 1965 4577 95.3 3679
Aug 1965 4155 104.2 3473
Sep 1965 3851 106.4 3154
Oct 1965 3429 116.2 3107
Nov 1965 3370 135.9 3052
Dec 1965 2726 134.0 2918
Jan 1966 1674 104.6 2786
Feb 1966 3257 107.1 2739
Mar 1966 4731 123.5 3125
Apr 1966 4481 98.8 3033
May 1966 5443 98.6 3486
Jun 1966 5566 90.6 3661
Jul 1966 5044 89.1 3927
Aug 1966 4781 105.2 3851
Sep 1966 4014 114.0 3456
Oct 1966 3561 122.1 3390
Nov 1966 3801 138.0 3280
Dec 1966 2685 142.2 3166
Jan 1967 1805 116.4 3080
Feb 1967 3756 112.6 3069
Mar 1967 4227 123.8 3340
Apr 1967 4595 103.6 3310
May 1967 5702 113.9 3798
Jun 1967 4681 98.6 3883
Jul 1967 4395 95.0 4191
Aug 1967 4459 116.0 4213
Sep 1967 4191 113.9 3766
Oct 1967 3742 127.5 3628
Nov 1967 3279 131.4 3520
Dec 1967 2468 145.9 3322
Jan 1968 1742 131.5 3250
Feb 1968 3366 131.0 3287
Mar 1968 3633 130.5 3552
Apr 1968 3701 118.9 3440
May 1968 4926 114.3 4153
Jun 1968 4522 85.7 4265
Jul 1968 5275 104.6 4655
Aug 1968 4717 105.1 4492
Sep 1968 4114 117.3 4051
Oct 1968 3851 142.5 3967
Nov 1968 3493 140.0 3807
Dec 1968 2654 159.8 3639
Jan 1969 2168 131.2 3647
Feb 1969 3561 125.4 3560
Mar 1969 4305 126.5 3929
Apr 1969 4413 119.4 3858
May 1969 5307 113.5 4485
Jun 1969 5361 98.7 4697
Jul 1969 4948 114.5 4977
Aug 1969 4472 113.8 4675
Sep 1969 3846 133.1 4596
Oct 1969 3715 143.4 4491
Nov 1969 3343 137.3 4127
Dec 1969 2939 165.2 4144
Jan 1970 1615 126.9 4014
Feb 1970 3497 124.0 3994
Mar 1970 3915 135.7 4320
Apr 1970 4858 130.0 4400
May 1970 4962 109.4 5002
Jun 1970 4504 117.8 5091
Jul 1970 4767 120.3 5471
Aug 1970 4291 121.0 5193
Sep 1970 4091 132.3 4997
Oct 1970 4164 142.9 4737
Nov 1970 3915 147.4 4546
Dec 1970 3130 175.9 4498
Jan 1971 1696 132.6 4350
Feb 1971 3887 123.7 4206
Mar 1971 4749 153.3 4743
Apr 1971 4781 134.0 4582
May 1971 5089 119.6 5191
Jun 1971 5484 116.2 5457
Jul 1971 5072 118.6 5891
Aug 1971 4611 130.7 5618
Sep 1971 4117 129.3 5158
Oct 1971 3910 144.4 5030
Nov 1971 4252 163.2 4800
Dec 1971 3624 179.4 4654
Jan 1972 1678 128.1 4453
Feb 1972 3851 138.4 4440
Mar 1972 5021 152.7 4945
Apr 1972 4581 120.0 4788
May 1972 6195 140.5 5425
Jun 1972 5338 116.2 5706
Jul 1972 4909 121.4 6061
Aug 1972 4640 127.8 5846
Sep 1972 3706 143.6 5242
Oct 1972 4113 157.6 5408
Nov 1972 3879 166.2 5114
Dec 1972 3411 182.3 5042
Jan 1973 2043 153.1 5008
Feb 1973 3736 147.6 4657
Mar 1973 4490 157.7 5359
Apr 1973 3715 137.2 5193
May 1973 5623 151.5 5891
Jun 1973 4671 98.7 5980
Jul 1973 5591 145.8 6390
Aug 1973 5461 151.7 6366
Sep 1973 4795 129.4 5756
Oct 1973 4846 174.1 5640
Nov 1973 4843 197.0 5429
Dec 1973 3278 193.9 5398
Jan 1974 2411 164.1 5413
Feb 1974 4278 142.8 5141
Mar 1974 4639 157.9 5695
Apr 1974 4559 159.2 5554
May 1974 6404 162.2 6369
Jun 1974 4851 123.1 6592
Jul 1974 6480 130.0 7107
Aug 1974 6394 150.1 6917
Sep 1974 5752 169.4 6353
Oct 1974 4718 179.7 6205
Nov 1974 4659 182.1 5830
Dec 1974 3842 194.3 5646
Jan 1975 2873 161.4 5379
Feb 1975 5556 169.4 5489
Mar 1975 5389 168.8 5824
Apr 1975 6135 158.1 5907
May 1975 6707 158.5 6482
Jun 1975 5220 135.3 6795
Jul 1975 6249 149.3 7028
Aug 1975 5281 143.4 6776
Sep 1975 4192 142.2 6274
Oct 1975 4867 188.4 6362
Nov 1975 3752 166.2 5940
Dec 1975 3492 199.2 5958
Jan 1976 1979 182.7 5769
Feb 1976 4584 145.2 5887
Mar 1976 5139 182.1 6367
Apr 1976 5044 158.7 6165
May 1976 5501 141.6 6868
Jun 1976 5044 132.6 7201
Jul 1976 5035 139.6 7601
Aug 1976 5167 147.0 7581
Sep 1976 4650 166.6 7090
Oct 1976 5298 157.0 6841
Nov 1976 4373 180.4 6408
Dec 1976 3941 210.2 6435
Jan 1977 2334 159.8 6176
Feb 1977 4381 157.8 6138
Mar 1977 5665 168.2 6717
Apr 1977 4393 158.4 6470
May 1977 5232 152.0 7312
Jun 1977 5876 142.2 7763
Jul 1977 5900 137.2 8171
Aug 1977 5704 152.6 7788
Sep 1977 4718 166.8 7311
Oct 1977 4650 165.6 6679
Nov 1977 4446 198.6 6704
Dec 1977 3061 201.5 6724
Jan 1978 2155 170.7 6552
Feb 1978 4274 164.4 6427
Mar 1978 4695 179.7 7105
Apr 1978 4362 157.0 6869
May 1978 4889 168.0 7683
Jun 1978 5370 139.3 8082
Jul 1978 5072 138.6 8555
Aug 1978 4985 153.4 8386
Sep 1978 3978 138.9 7553
Oct 1978 4139 172.1 7398
Nov 1978 3995 198.4 7112
Dec 1978 3025 217.8 6886
Jan 1979 1949 173.7 7077
Feb 1979 4357 153.8 6820
Mar 1979 4638 175.6 7426
Apr 1979 3994 147.1 7143
May 1979 6174 160.3 8261
Jun 1979 5656 135.2 8240
Jul 1979 4411 148.8 8977
Aug 1979 5504 151.0 8991
Sep 1979 4463 148.2 8026
Oct 1979 4458 182.2 7911
Nov 1979 4528 189.2 7510
Dec 1979 2830 183.1 7381
Jan 1980 1843 170.0 7366
Feb 1980 5042 158.4 7414
Mar 1980 5348 176.1 7824
Apr 1980 5257 156.2 7524
May 1980 6699 153.2 8279
Jun 1980 5388 117.9 8707
Jul 1980 6001 149.8 9486
Aug 1980 5966 156.6 8973
Sep 1980 4845 166.7 8231
Oct 1980 4507 156.8 8206
Nov 1980 4214 158.6 7927
Dec 1980 3460 210.8 7999
Jan 1981 1833 203.6 7834
Feb 1981 4978 175.2 7521
Mar 1981 6464 168.7 8284
Apr 1981 5820 155.9 7999
May 1981 6447 147.3 8940
Jun 1981 6191 137.0 9381
Jul 1981 6628 141.1 10078
Aug 1981 5452 167.4 9796
Sep 1981 5295 160.2 8471
Oct 1981 5080 191.9 8572
Nov 1981 5564 174.4 8150
Dec 1981 3965 208.2 8168
Jan 1982 2062 159.4 8166
Feb 1982 5099 161.1 7903
Mar 1982 6162 172.1 8606
Apr 1982 5529 158.4 8071
May 1982 6416 114.6 9178
Jun 1982 6382 159.6 9873
Jul 1982 5624 159.7 10476
Aug 1982 5785 159.4 9296
Sep 1982 4644 160.7 8818
Oct 1982 5331 165.5 8697
Nov 1982 5143 205.0 8381
Dec 1982 4596 205.2 8293
Jan 1983 2180 141.6 7942
Feb 1983 5786 148.1 8001
Mar 1983 5840 184.9 8744
Apr 1983 5666 132.5 8397
May 1983 6360 137.3 9115
Jun 1983 6219 135.5 9773
Jul 1983 6082 121.7 10358
Aug 1983 5653 166.1 9849
Sep 1983 5726 146.8 9083
Oct 1983 5049 162.8 9143
Nov 1983 5859 186.8 8800
Dec 1983 4091 185.5 8741
Jan 1984 2167 151.5 8492
Feb 1984 6480 158.1 8795
Mar 1984 7375 143.0 9354
Apr 1984 6583 151.2 8796
May 1984 7251 147.6 10072
Jun 1984 6730 130.7 10174
Jul 1984 6428 137.5 11326
Aug 1984 5228 146.1 10744
Sep 1984 4716 133.6 9806
Oct 1984 6101 167.9 9740
Nov 1984 5753 181.9 9373
Dec 1984 4000 202.0 9244
Jan 1985 2691 166.5 9407
Feb 1985 5898 151.3 8827
Mar 1985 6526 146.2 9880
Apr 1985 5840 148.3 9364
May 1985 6650 144.7 10580
Jun 1985 5717 123.6 10899
Jul 1985 7236 151.6 11687
Aug 1985 6523 133.9 11280
Sep 1985 5729 137.4 10208
Oct 1985 6004 181.6 10212
Nov 1985 5950 182.0 9725
Dec 1985 4690 190.0 9721
Jan 1986 3687 161.2 9846
Feb 1986 7791 155.5 9407
Mar 1986 7153 141.9 10265
Apr 1986 6434 164.6 9970
May 1986 7850 136.2 10801
Jun 1986 6809 126.8 11246
Jul 1986 8379 152.5 12167
Aug 1986 6914 126.6 11578
Sep 1986 6919 150.1 10645
Oct 1986 7265 186.3 10613
Nov 1986 6994 147.5 10104
Dec 1986 5503 200.4 10348
Jan 1987 3782 177.2 10263
Feb 1987 7502 127.4 9973
Mar 1987 8119 177.1 10803
Apr 1987 7292 154.4 10409
May 1987 6886 135.2 11458
Jun 1987 7049 126.4 11845
Jul 1987 7977 147.3 12559
Aug 1987 8519 140.6 12070
Sep 1987 6680 152.3 11221
Oct 1987 7994 151.2 11338
Nov 1987 7047 172.2 10761
Dec 1987 5782 215.3 11012
Jan 1988 3771 154.1 10923
Feb 1988 7906 159.3 10790
Mar 1988 8970 160.4 11427
Apr 1988 6077 151.9 10788
May 1988 7919 148.4 11772
Jun 1988 7340 139.6 12104
Jul 1988 7791 148.2 12634
Aug 1988 7368 153.5 12772
Sep 1988 8255 145.1 11764
Oct 1988 7816 183.7 11956
Nov 1988 7476 210.5 11646
Dec 1988 6696 203.3 11750
Jan 1989 4484 153.3 11485
Feb 1989 8274 144.3 11198
Mar 1989 8866 169.6 12265
Apr 1989 8572 143.7 11704
May 1989 9176 160.0 12419
Jun 1989 8645 135.5 13259
Jul 1989 8265 141.7 13945
Aug 1989 9558 159.9 13839
Sep 1989 7037 145.6 12387
Oct 1989 9101 183.4 12546
Nov 1989 8180 198.1 12038
Dec 1989 7072 186.7 11977
Jan 1990 3832 171.9 12336
Feb 1990 7253 150.5 11793
Mar 1990 8667 163.0 12877
Apr 1990 7658 153.6 11923
May 1990 8859 152.8 13306
Jun 1990 7291 135.4 13988
Jul 1990 7529 148.3 14002
Aug 1990 8715 148.3 14338
Sep 1990 8450 133.5 12867
Oct 1990 9085 193.8 12761
Nov 1990 8350 208.4 12449
Dec 1990 7080 197.0 12658
Multiple Time Series - Electricty, Beer, Chocolate Production
cbind(choc.ts, beer.ts, elec.ts) |>plot()
# orplot(cbe.ts)
Panel Data - Democracy and Income
Let’s use democracy and income data again here.
data <-read.csv("Lab1_data/Lab1data.csv", header =TRUE)colnames(data) <-c("country", "Year", "income", "polity2")head(unique(data$country)) # observations on 174 countries
[1] "Antigua and Barbuda" "Afghanistan" "Albania"
[4] "Algeria" "Andorra" "Angola"
head(tapply(data$country, data$Year, length)) # inspect how many countries in each year
p <-ggplot(data =na.omit(data), aes(x = Year, y = income, group = country, color = country))p +geom_line(alpha =0.5) +# make lines semi-transparentguides(color =FALSE) # turn off the legend because there are so many countries
Coding Exercise: Panel Data - Democracy and Income
Subset the data frame to show only country name and GDP per capita
Rearrange the columns of the data frame ascending by polity score
Show only values of GDP per capita for South Africa from 2002 to 2008
Create a new variable that takes the first letter of the country and attaches it to the year of observation
Find the mean of GDP per capita for each year of observation
More on TSCS Data Wrangling
use pivot_longer and pivot_wider in tidyverse to transform data (non-pipe solution: reshape2)
long version of data (see income and democracy data)
A good news is that if we use tibble, when we assign any character to a numeric column, it will give us an error saying that the data types are incompatible. This is one reason why someone prefer tibble to traditional data.frame.