---
title: "Lab3CodePractice"
author: "Your name"
date: \today
fontfamily: accanthis
linkcolor: blue
output:
pdf_document: default
editor_options:
chunk_output_type: console
fontsize: 12pt
header-includes:
\usepackage{float}
---
## Prerequisite
In today's lab, we will use the built-in dataset *Salaries* from the `car`` / `carData` package, which provides salary data for 397 randomly selected U.S. university faculty members.
```{r setup, include=FALSE}
knitr::opts_chunk$set(include = TRUE,
echo = TRUE,
message = FALSE,
warning = FALSE,
error = FALSE)
# Good practice to remove all objects from the workspace
rm(list = ls())
library(tidyverse)
# Load the data
data("Salaries", package = "carData")
# Prepare data
df <-
Salaries %>%
as_tibble() %>% # optional: turn it into a tibble
mutate(salary=salary/1000) # re-scale salary
# Setting the seed ensures that we get the same random draw over and over again.
set.seed(20201009)
```
# Review of Least Squares - Normal homoskedastic
## 1. The outcome variable is salary. Manually calculate the mean salary for each category of the dichotomous variable sex (differentiating between Male and Female) and find the difference. Consider Female as the treatment group and Male as the control group. Then, use the `lm()` function to fit the following model: $salary = \alpha + \beta * sex + \epsilon$.
```{r}
## differences-in-means estimator (by hand):
# Regress salaries on sex using lm()
```
## 2. Now regress salaries salaries on years since phd: $salary = \alpha + \beta * yrs.since.phd + \epsilon$. Estimate again $\beta$ fromt his bivariate regression but without using `lm()` function.
```{r}
# Regress salaries on years since phd using lm()
# Estimate again the slope of "yrs.since.phd" without using lm() or optim()
```
Regression coefficients can be thought of as comparisons across predicted values or as comparisons among averages in the data.
## 3. Calculate the coefficients for the model $salary = \alpha + \beta_1 * sex + \beta_2 * yrs.since.phd + \epsilon$ using a custom function that minimizes the residual sum of squares. Remember that $\hat{\epsilon} = y - \hat{\alpha} - \hat{\beta} X$. Employ the optim function to identify the minimum sum of squared errors and provide the coefficient values. Subsequently, re-run the model using `lm()`.
```{r}
# Define function to minimize residual sum of squares
min_residuals <- function( ) {
# define the parameters
# define SSR
}
# Find coefficients of linear regression model with optim()
# regress salary on sex and yrs.since.phd using lm()
## compare point estimates
```
## 4. The most important mathematical assumption of the regression model is that its deterministic component, denoted as $y$, is a linear function of the individual predictors $\alpha + \beta$. Evaluate whether this assumption is valid in the model you fit in the previous exercise.
```{r}
# Question: how can we check for LS assumptions like additivity or linearity?
```
## 5. Devise a model specification that more accurately reflects the observed data generating process in the previous exercise. Please note that in this new model specification, you should not incorporate any additional predictors (e.g. rank or discipline) from the dataset.
```{r}
```
## 6. Compare the models estimated with `lm()` from all the above estimated with `lm()`. Which one provides the best fit?
```{r}
```
\newpage
# Maximum Likelihood Estimation - Normal homoskedastic
## 1. We will re-fit the model: $salary = \alpha + \beta_1 * sex + \beta_2 * yrs.since.phd + \epsilon$. This time, we will use maximum likelihood estimation to derive it from a normal distribution. We will estimate both the $\mu_i$ parameter and $\sigma_i^2$, where...
Stochastic component:
$$y_i \sim f_\mathcal{N}(\mu_{i}, \sigma_i^{2})$$
Systematic components:
$$\mu_i = \boldsymbol{x}_{i} \boldsymbol{\beta}$$
$$\sigma_{i}^{2} = \exp(\boldsymbol{z}_{i} \boldsymbol{\gamma})$$
After estimating both the LS homoskedastic normal and the MLE heteroskedastic normal, compare the output. What differences do you observe, and why do these differences occur?
```{r}
## copy and paste the code from "mle_normal.R" and make necessary
## adjustments to fit the model
```