---
title: "lab1_key"
author: "Inhwan Ko"
date: "Oct 1, 2021"
output:
html_document:
df_print: paged
pdf_document: default
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Prerequiste
```{r, message=FALSE}
rm(list = ls()) # Clear memory
library(tidyverse) # Load package
```
# Vector Practice
1. `vector1` : The numbers one through five and then the number six five times
2. `vector2` : 10 randomly drawn numbers from a normal distribution with a mean 10 and a s.d. of 1
3. `vector3` : Results of 10 single binomial trials with a probability of 0.4
4. `vector4` : Sample 100 observations from a 5-trial binomial distribution with a probability of success of 0.4
5. `vector5` : The numbers one through three and the word apple
```{r}
```
6. What type of data is `vector2`?
7. Round up `vector2` to two decimal place
8. What happened in `vector5`?
```{r}
```
# Matrices Practice
1. `matrix1`: Create 5 by 5 matrix containing all NAs
2. Assign matrix1 the row names (a,b,c,d,e) and the column names (1,2,3,4,5)
3. Replace the NAs in the first column of `matrix1` with "Inf"
```{r}
```
# List Practice
1. Create a list `list1` that contains `vector1`, `vector2`, `vector3`, and `matrix1`
2. Name each list component as `vector1`, `vector2`, `vector3`, and `matrix1` respectively
3. Locate `vector2` from the list
```{r}
```
# Data Frames Practice 1
# Working directory
Check if your working directory is correct (where you have saved `lab01_data.csv`)
```{r}
```
## 1. Load Lab1data.csv in R
```{r}
```
## 2. What is the data structure? What does that tell us about type?
```{r}
# Check structure
# Alternatively
```
## 3. Check the names and summary statistics of the data. Fix any names that are less than good.
```{r}
# Check and fix names
# Summary Statistics
```
## 4. Remove observations with missing values
```{r}
# Remove NAs
```
## 5. Calculate the average GDP per capita for Brazil for the observed period. Repeat the calculation for all countries.
```{r}
# Base R
# Tidy way
# Average gdp.per.cap for all countries
```
## 6. Plot GDP per capita (on the x-axis) and polity2 (on the y-axis).
```{r}
# Base Graphics
# Try logging GDP
# ggplot2
```
## 7. Create a new variable called "democracy". Assign 0 to countries with negative value or zero polity2 score, and assign 1 to countries with positive score.
```{r, results='hide'}
# Create a variable called "democracy"
# You can subset data based on a logical statement
# Take advantage of this: Assign values to "democracy" based on polity2 values
# Do the same for positive Polity2 score
# Tidy way
```
## 8. Use a loop to do the same coding.
```{r}
```
# Data Frames Practice 2
## 1. Read in the data "lab1_survey.csv"
```{r}
# Clear and load data
rm(list = ls())
survey_data <- read.csv(file = "lab1_survey.csv")
```
## 2. Inspect the data. What format are they in? What values do the data take, and how do those values correspond with the survey?
```{r}
str(survey_data)
```
## 3. Generate some summary statistics.
```{r}
summary(survey_data)
mean(survey_data$R)
mean(survey_data$latex)
median(survey_data$R)
median(survey_data$latex)
sd(survey_data$R)
sd(survey_data$latex)
# Tidy way
survey_data %>%
summarize_all(funs(mean, median, sd, min, max))
# %>% gather(key = "stat")
```
## 4. How are these two variables related to each other (assuming equal intervals b/w categories)?
```{r}
cor1 <- cor(survey_data$R, survey_data$latex)
```
The correlation b/w R knowledge and LaTeX knowledge is `r cor1`, or more nicely, `r round(cor1, 2)`.
## 5. Are there any problems with the way the data are coded? (Think about lecture yesterday.)
## 6. Recode the data
```{r}
survey_data %>%
mutate(# Recode R into categories
R_cat = case_when(R == 0 ~ "What's that?",
R == 1 ~ "I've heard of it",
R == 2 ~ "I can use it or apply it",
TRUE ~ "I understand it well"),
# Recode latex into categories
latex_cat = case_when(latex == 0 ~ "What's that?",
latex == 1 ~ "I've heard of it",
latex == 2 ~ "I can use it or apply it",
TRUE ~ "I understand it well"))
# We're repeating ourselves... Must be a faster way
survey_data <-
survey_data %>%
mutate_at(vars(R, latex),
function(x) case_when(x == 0 ~ "What's that?",
x == 1 ~ "I've heard of it",
x == 2 ~ "I can use it or apply it",
TRUE ~ "I understand it well"))
```
## 7. Why is this coding method better?
## 8. Generate some plots of the data: bar charts are good here, scatterplots even better.
```{r, echo= FALSE}
# Bar charts
ggplot(survey_data, aes(x = R)) +
geom_bar() +
labs(x = "R knowledge")
ggplot(survey_data, aes(x = latex)) +
geom_bar() +
labs(x = "LaTeX knowledge")
# Scatter plot
ggplot(survey_data, aes(x = R, y = latex)) +
geom_jitter(alpha = .7, height = .2, width = .2) +
labs(x = "R knowledge", y = "LaTeX knowledge") +
theme_classic()
##### Something is wrong? #####
# Convert two variables into factors
knowledge_levels <- c("What's that?",
"I've heard of it",
"I can use it or apply it",
"I understand it well")
survey_data <-
survey_data %>%
mutate(R = factor(R, levels = knowledge_levels),
latex = factor(latex, levels = knowledge_levels)
)
# Redo the scatter plot
ggplot(survey_data, aes(x = R, y = latex)) +
geom_jitter(alpha = .7, height = .2, width = .2) +
labs(x = "R knowledge", y = "LaTeX knowledge") +
scale_x_discrete(limits = knowledge_levels) +
theme_classic()
```
# LaTex in R Markdown
$$
1 + 1 = 2
$$
$$
11 \times 11 = 121 \\
$$
$$
E = mc^2
$$
I think it's Einstein who proposed $E = mc^2$.
$$
x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
$$
$$
\begin{split}
X & = (x+a)(x-b) \\
& = x(x-b) + a(x-b) \\
& = x^2 + x(a-b) - ab
\end{split}
$$