---
title: "Lab 1: Practice Code"
author: "Your name"
output:
pdf_document: default
html_document:
df_print: paged
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Prerequiste
```{r, message=FALSE}
rm(list = ls()) # Clear memory
library(tidyverse) # Load package
```
# Vector Practice
1. `Vector1` : The numbers one through five and then the number six five times
2. `Vector2` : 10 randomly drawn numbers from a normal distribution with a mean 10 and a s.d. of 1
3. `Vector3` : Results of 10 single binomial trials with a probability of 0.4
4. `Vector4` : Sample 100 observations from a 5-trial binomial distribution with a probability of success of 0.4
5. `Vector5` : The numbers one through three and the word apple
```{r}
#1
Vector1 <- c(1, 2, 3, 4, 5, 6, 6, 6, 6, 6)
Vector1 <- c(1:5, 6, 6, 6, 6, 6)
Vector1 <- c(seq(from = 1, to = 5, by = 1), rep(6, 5))
#2
Vector2 <- rnorm(n = 10, mean = 10, sd = 1)
#3
Vector3 <- rbinom(n = 10, size = 1, prob = 0.4)
#4
Vector4 <- rbinom(n = 100, size = 5, prob = 0.4)
?rbinom
#The size is the total number of trials, of which size*prob are expected to be successes.
stem(rbinom(n = 10, size = 1, prob = 0.4))
# This gives the results of 10 runs of 1 coin flips each with 0.4 winning probability, returning the number of successes in each run
stem(rbinom(n = 100, size = 5, prob = 0.4))
# This gives the results of 100 runs of 5 coin flips each with 0.4 winning probability, returning the number of successes in each run
#5
Vector5 <- c(1:3, "apple")
```
6. What type of data is Vector2?
7. Round up Vector2 to two decimal place
8. What happened in Vector5?
```{r}
#6
is.character(Vector2)
mode(Vector2)
class(Vector2)
#7
round(Vector2, 2)
#8
class(Vector5)
Vector5
```
# Matrices Practice
1. Matrix1: Create 5 by 5 matrix containing all NAs
2. Assign Matrix1 the row names (a,b,c,d,e) and the column names (1,2,3,4,5)
3. Replace the NAs in the first columne of Matrix1 with "Inf"
```{r}
#1
Matrix1 <- matrix(data = NA, nrow=5, ncol=5)
rownames(Matrix1) <- c("a", "b", "c", "d", "e")
colnames(Matrix1) <- c(1, 2, 3, 4, 5)
Matrix1[, 1] <- Inf
#Matrix1[1,3]<-"apple"
```
# List Practice
1. Create a list that contains Vector1, Vector2, Vector3, and Matrix1
2. Name each list component as Vector1, Vector2, Vector3, and Matrix1 respectively
3. Locate Vector2 from the list
```{r}
#1
List1 <- list(Vector1, Vector2, Vector3, Matrix1)
#2
names(List1) <- c("Vector1", "Vector2", "Vector3", "Matrix1")
#3
List1[[2]]
#or
List1$Vector2
```
# Data Frames Practice 1
## 1. Load Lab01data.csv in R
```{r}
# Load data
#Dta <- read.csv("lab1_data.csv", header = TRUE, stringsAsFactors = FALSE)
DataURL <- "http://students.washington.edu/rllobet/mle_2024/Lab1/data/lab1_data.csv"
Dta <- read.csv(DataURL)
```
## 2. What is the data structure? What does that tell us about type?
```{r}
# Check structure
dim(Dta)
class(Dta)
is.data.frame(Dta)
is.matrix(Dta)
# Alternatively
str(Dta)
summary(Dta)
```
## 3. Check the names and summary statistics of the data. Fix any names that are less than good.
```{r}
# Check and fix names
names(Dta)
names(Dta)[3] <- "GdpPerCap"
names(Dta) # Check again
Dta <-
Dta |>
rename(Country = country,
Polity2 = polity2)
# Summary Statistics
summary(Dta)
```
## 4. Remove observations with missing values
```{r}
# Remove NAs
DataClean <- na.omit(Dta) # listwise deletion!!
Dta |> na.omit()
dim(Dta)
dim(DataClean)
```
## 5. Calculate the average GDP per capita for Brazil for the observed period. Repeat the calculation for all countries.
```{r}
# Base R
mean(DataClean[DataClean$Country == "Brazil", "GdpPerCap"])
# Tidy way
DataClean %>%
filter(Country == "Brazil") %>%
summarize(mean(GdpPerCap))
# Average GdpPerCap for all countries
DataClean %>%
group_by(Country) %>%
summarize(mean(GdpPerCap))
DataClean %>%
group_by(Country) %>%
summarize_at(vars(GdpPerCap), list(mean, median))
```
## 6. Plot GDP per capita (on the x-axis) and Polity2 (on the y-axis)
```{r}
# Base Graphics
plot(x = DataClean$GdpPerCap,
y = DataClean$Polity2)
# Try logging GDP
plot(x = log(DataClean$GdpPerCap),
y = DataClean$Polity2,
xlab = "Logged GDP per capita",
ylab = "Polity2")
# ggplot2
ggplot(DataClean, aes(y = Polity2, x = log(GdpPerCap))) +
geom_point() +
labs(x = "Logged GDP per capita", y = "Polity2") +
theme_classic()
```
## 7. Create a new variable called "democracy". Assign 0 to countries with negative value or zero polity2 score, and assign 1 to countries with positive score.
```{r, results='hide'}
# Create a variable called "democracy"
DataClean$democracy <- NA
head(DataClean)
# You can subset data based on a logical statement
DataClean$Polity2 <= 0
DataClean[DataClean$Polity2 <= 0, ] # ",__" specify all columns
# Take advantage of this: Assign values to "democracy" based on Polity2 values
DataClean$democracy[DataClean$Polity2 <= 0] <- 0
# Do the same for positive Polity2 score
DataClean$democracy[DataClean$Polity2 > 0] <- 1
# Tidy way
DataClean %>%
mutate(democracy = case_when(Polity2 <= 0 ~ 0,
TRUE ~ 1)) # "Polity2 > 0 ~ 1" also works
DataClean %>%
mutate(democracy = case_when(Polity2 <= 0 ~ 0,
Polity2 > 0 ~ 1))
```
## 8. Export (save) the data set with the new variable "democracy" both as .csv and .rdata files
```{r}
write_csv(DataClean, file = "data/DataClean.csv")
save(DataClean, file = "data/DataClean.Rdata")
```