Binary Models
- Maximum Likelihood Estimate from the Bernoulli Distribution
  - Logit Model
  - Probit Model
- Logit or Probit?
Application: Voting Data - NES 2000
- Quantities of Interest using simcf
  - Visualization: ggplot
  - Visualization: tile
Application: competing model with new predictor
- Quantities of Interest using simcf
  - Visualization: ggplot
  - Visualization: tile
What model fits best?

Binary Models

Maximum Likelihood Estimate from the Bernoulli Distribution

To derive the maximum likelihood estimate (MLE) from a Bernoulli distribution, we start by recognizing that binary data fit the assumptions of a Bernoulli distribution. This distribution models a single trial with two possible, mutually exclusive and exhaustive outcomes: success (\(y_i = 1\)) or failure (\(y_i = 0\)).

The probability mass function (PMF) of a Bernoulli trial is:

\[ \text{Pr}(y_i = 1|\pi_i) = f_{\text{Bernoulli}}(y_i|\pi_i) = \pi_i^{y_i}(1 - \pi_i)^{1 - y_i} \]

where \(\pi_i\) is the probability of success, and the expectation of \(y_i\) is \(\mathbb{E}(y_i) = \pi_i\). This framework allows us to construct the likelihood function for a sample of \(n\) observations:

\[ \mathcal{L}(\pi|\mathbf{y}) \propto \prod_{i=1}^{n} \pi_i^{y_i}(1 - \pi_i)^{1 - y_i} \]

Taking the log-likelihood simplifies the product into a sum:

\[ \log \mathcal{L}(\pi|\mathbf{y}) \propto \sum_{i=1}^{n} \left[ \; y_i \log \pi_i + (1 - y_i) \log (1 - \pi_i) \; \right] \]

This log-likelihood is straightforward to maximize numerically and is approximately quadratic, making it easy to solve using standard optimization methods.

This expression is general and applies to any binary model (like logit or probit). Now, we proceed to define the systematic component for both models and plug it into the log-likelihood.

Logit Model

One popular way to model the systematic component \(\pi_i\) (the probability of success) is to use the logistic function. The logistic function maps any real number \(x_i \beta\) to the interval \([0, 1]\), ensuring the limits of a probability (no extrapolation):

\[ \pi_i = \text{logit}^{-1}(x_i \beta) = \frac{\exp(x_i \beta)}{1 + \exp(x_i \beta)} = \frac{1}{1 + \exp(-x_i \beta)} \]

We now substitute the logit function for \(\pi_i\) into the log-likelihood:

\[ \log \mathcal{L}(\boldsymbol{\beta}|\mathbf{y},\mathbf{x}) \propto \sum_{i=1}^{n} \left[ y_i \log \left( \frac{1}{1 + \exp(-x_i \boldsymbol{\beta})} \right) + (1 - y_i) \log \left( 1 - \frac{1}{1 + \exp(-x_i \boldsymbol{\beta})} \right) \right] \]

We can simplify the above expression to get the log-likelihood for the logit model:

\[ \log \mathcal{L}(\boldsymbol{\beta}|\mathbf{y},\mathbf{x}) \propto \sum_{i=1}^{n} \left[ y_i x_i \boldsymbol{\beta} - \log(1 + \exp(x_i \boldsymbol{\beta})) \right] \]

To estimate \(\boldsymbol{\beta}\), we maximize this log-likelihood function with respect to \(\boldsymbol{\beta}\) via numerical methods using optim. The resulting estimates are the maximum likelihood estimates (MLE) of the parameters.

Probit Model

An alternative to the logit model is the probit model, which uses the cumulative distribution function (CDF) of the standard Normal distribution to map \(x_i \beta\) to \(\pi_i\):

\[ \pi_i = \Phi(x_i \beta) = \int_{-\infty}^{x_i \beta} \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{y_i^{*2}}{2}\right) dy_i^{*} \]

Here, \(\Phi(\cdot)\) is the probit function, which is the inverse of the standard normal CDF. The probit model assumes that the probability of success follows a normal CDF.

Once again, the starting point is the Bernoulli log-likelihood:

\[ \log \mathcal{L}(\pi|\mathbf{y}) \propto \sum_{i=1}^{n} \left[ y_i \log \pi_i + (1 - y_i) \log (1 - \pi_i) \right] \]

In the probit model, the systematic component \(\pi_i\) is expressed using the cumulative distribution function (CDF) of the standard normal distribution \(\Phi(\cdot)\):

\[ \pi_i = \Phi(\mathbf{x}_i \boldsymbol{\beta}) \]

We substitute the systematic component \(\Phi(\cdot)\):

\[ \log \mathcal{L}(\boldsymbol{\beta}|\mathbf{y}, \mathbf{x}) \propto \sum_{i=1}^{n} \left[ y_i \log \Phi(x_i \boldsymbol{\beta}) + (1 - y_i) \log (1 - \Phi(x_i \boldsymbol{\beta})) \right] \]

Unlike the logit model, the probit log-likelihood does not have a closed-form simplification like the logistic function because \(\Phi(\cdot)\) cannot be expressed in a simple algebraic form. We therefore compute the estimates of \(\boldsymbol{\beta}\) via numerical methods (e.g. gradient descent) using optim.

Logit or Probit?

Logit and probit models typically yield similar results when the latent probability distribution is symmetric, with a 0.5 baseline probability. If we believe the data-generating process (DGP) does not follow this symmetry, a model like scobit (skewed-logit) might be better suited, as it is robust to asymmetry. Generally, though, logit and probit models produce comparable probability estimates.

Their key difference lies in the link functions:

The logit model applies the logistic CDF: \(\frac{1}{1 + \exp(-\mathbf{x}_i \boldsymbol{\beta})}\).
The probit model uses the standard normal CDF: \(\Phi(\mathbf{x}_i \boldsymbol{\beta})\), which lacks a simple analytical form and thus requires numerical approximation.

Both models estimate \(\boldsymbol{\beta}\) by maximizing their log-likelihood functions. While the choice between logit and probit often depends on computational preferences and ease of interpretation, both are founded on the Bernoulli distribution.

Application: Voting Data - NES 2000

We are now going to replicate the same application that we saw during lectures, using the American National Election Survey (ANES) in 2000, using the item survey “Did you vote in 2000 election?”

Turnout is binary: people either vote (\(\text{Vote}_i = 1\)) or they do not (\(\text{Vote}_i = 0\))

We can model the probability of voting in the 2000 U.S. presidential election as a function of Age, High School Degree (HS Degree), and College Degree (ColDeg). \[ \begin{aligned} Vote_i & \sim \text{Bernoulli}(\pi_i) \\ \pi_i & = \text{logit}^{-1}(\mathbf{x}_i \boldsymbol{\beta}) \end{aligned} \]

where \(\mathbf{x}_i\) represents the vector of covariates for individual \(i\) (including age, high school degree, and college degree) and \(\boldsymbol{\beta}\) is the vector of coefficients.

Below we estimate this model by manually defining the likelihood and using optim

# Load data
data <- read.csv("https://faculty.washington.edu/cadolph/mle/nes00a.csv")
# Or library(readr) then use read_csv()
head(data)

##   vote00 age hsdeg coldeg vote96 married marriedo
## 1      1  49     1      0      1       1        1
## 2      0  35     1      0      0       1        1
## 3      1  57     1      0      1       1        1
## 4      1  63     1      0      1       1        0
## 5      1  40     1      0      1       1        1
## 6      1  77     0      0      1       1        1

# Estimate logit model using optim()
# Construct variables and model objects
y <- data$vote00


library(dplyr) 
x <- 
  data |> 
  mutate(age2 = age^2) |> 
  select(age,age2,hsdeg,coldeg) |> as.matrix()


# Likelihood function for logit
llkLogit <- function(param,y,x) {
  
  ## 1. Arrange the data and include an intercept
  
  # intercept
  os <- rep(1,nrow(x)) # Or length(x[,1])
  
  # bind intercept + covariates
  x <- cbind(os,x)
  
  ## 2. define parameters to optimize and systematic component
  b <- param[ 1 : ncol(x) ]
  xb <- x%*%b
  
  # define log-likelihood
  -sum(y*xb - log(1 + exp(xb)));
   # optim is a minimizer, so min -ln L(param|y)
}

# Fit logit model using optim
lsResult <- lm(y~x)  # use ls estimates as starting values
stval <- lsResult$coefficients  # initial guesses

logit_optim <- optim(par = stval,
                     fn = llkLogit,
                     method = "Nelder-Mead", # notice method
                     hessian = TRUE,
                     y = y,
                     x = x)


(pe_optim <- logit_optim$par)   # point estimates

##   (Intercept)          xage         xage2        xhsdeg       xcoldeg 
## -3.0120951884  0.0745076721 -0.0004408562  1.1219062683  1.0812752168

(vc_optim <- solve(logit_optim$hessian))  # var-cov matrix

##               (Intercept)          xage         xage2        xhsdeg
## (Intercept)  6.821849e-02 -8.281879e-04  3.462127e-06 -3.671212e-02
## xage        -8.281879e-04  1.774582e-05 -1.366226e-07  2.490703e-04
## xage2        3.462127e-06 -1.366226e-07  2.763014e-09 -1.633932e-06
## xhsdeg      -3.671212e-02  2.490703e-04 -1.633932e-06  3.241723e-02
## xcoldeg     -3.775257e-03  1.120854e-04 -1.717767e-06 -2.837779e-03
##                   xcoldeg
## (Intercept) -3.775257e-03
## xage         1.120854e-04
## xage2       -1.717767e-06
## xhsdeg      -2.837779e-03
## xcoldeg      1.822990e-02

(se_optim <- sqrt(diag(vc_optim)))    # standard errors

##  (Intercept)         xage        xage2       xhsdeg      xcoldeg 
## 2.611867e-01 4.212578e-03 5.256438e-05 1.800479e-01 1.350181e-01

(ll_optim <- -logit_optim$value)  # likelihood at maximum (reverse negative from function)

## [1] -1034.478

Alternatively, we can estimate a logit model using base R’s glm function, but remember to specify the correct link function in the family argument:

For logit model: binomial(link = "logit")
For probit model binomial(link = "probit")

# Estimate logit model using glm()
# Set up model formula and model specific data frame
model <- vote00 ~ age + I(age^2) + hsdeg + coldeg
mdata <- extractdata(model, data, na.rm=TRUE) # remove missing value

# Run logit & extract results
logit_glm <- glm(model,
                 family=binomial(link="logit"),
                 data=mdata)

(pe_glm <- logit_glm$coefficients)  # point estimates

##   (Intercept)           age      I(age^2)         hsdeg        coldeg 
## -3.0193890720  0.0747251922 -0.0004427014  1.1243907749  1.0795702357

(vc_glm <- vcov(logit_glm))         # var-cov matrix

##               (Intercept)           age      I(age^2)         hsdeg
## (Intercept)  1.748828e-01 -6.158232e-03  5.514176e-05 -2.453127e-02
## age         -6.158232e-03  2.837215e-04 -2.733927e-06 -3.354912e-04
## I(age^2)     5.514176e-05 -2.733927e-06  2.740567e-08  5.053554e-06
## hsdeg       -2.453127e-02 -3.354912e-04  5.053554e-06  3.240248e-02
## coldeg       9.264483e-04 -9.948239e-05  1.270344e-06 -3.618422e-03
##                    coldeg
## (Intercept)  9.264483e-04
## age         -9.948239e-05
## I(age^2)     1.270344e-06
## hsdeg       -3.618422e-03
## coldeg       1.721641e-02

(se_glm <- sqrt(diag(vc_glm)))    # standard errors

##  (Intercept)          age     I(age^2)        hsdeg       coldeg 
## 0.4181899367 0.0168440337 0.0001655466 0.1800068893 0.1312113265

Quantities of Interest using `simcf`

Once we have estimated our binary model, and extracted the point estimates and variance-covariance matrix, we can use our simulation approach to predict counterfactuals and scenarios of interest from the model estimated.

## Simulate quantities of interest using simcf

## We could do this from the optim or glm results;
## here, we use glm


# 1. Simulate parameter distributions using MASS:mvnorm
sims <- 10000

simbetas <- mvrnorm(n=sims, 
                    mu=pe_glm,      # point estimate 
                    Sigma=vc_glm)   # variance covariance matrix

# 2.  Set up counterfactuals:  all ages, each of three educations

## Age range
xhyp <- seq(18,97,1)
nscen <- length(xhyp)

## Scenarios of interest: college, high school, no high school

nohsScen <- hsScen <- collScen <- cfMake(formula=model, 
                                         data=mdata, 
                                         nscen=nscen) # how many scenarios we have

# cfMake creates simcf object
# We have now created 3 simcf objects, one for each scenario of interest
# thus, we use cfChange function to change the hypothetical values (age)
# from each simulation

for (i in 1:nscen) {
  # No High school scenarios (loop over each age)
  nohsScen <- cfChange(xscen=nohsScen, covname="age", x = xhyp[i], scen = i)
  nohsScen <- cfChange(xscen=nohsScen, covname="hsdeg", x = 0, scen = i)
  nohsScen <- cfChange(xscen=nohsScen, covname="coldeg", x = 0, scen = i)

  # HS grad scenarios (loop over each age)
  hsScen <- cfChange(xscen=hsScen, covname="age", x = xhyp[i], scen = i)
  hsScen <- cfChange(xscen=hsScen, covname="hsdeg", x = 1, scen = i)
  hsScen <- cfChange(xscen=hsScen, covname="coldeg", x = 0, scen = i)

  # College grad scenarios (loop over each age)
  collScen <- cfChange(xscen=collScen, covname="age", x = xhyp[i], scen = i)
  collScen <- cfChange(xscen=collScen, covname="hsdeg", x = 1, scen = i)
  collScen <- cfChange(xscen=collScen, covname="coldeg", x = 1, scen = i)
}

# it's about expected values; x changes, xpre doesn't change

# 3. Simulate quantities of interest

# Simulate expected probabilities for all scenarios
nohsSims <- logitsimev(x = nohsScen, # counterfactual object
                       b = simbetas, # simulated parameters
                       ci = 0.95)
hsSims <- logitsimev(hsScen, simbetas, ci=0.95)
collSims <- logitsimev(collScen, simbetas, ci=0.95)

Visualization: `ggplot`

# First, we need to arrange all the predicitons with the hypothetical values in one dataframe

# Data wrangling with tidyverse

nohsSims_tb <- 
  nohsSims |>
  bind_rows() |>
  mutate(
    xhyp = xhyp,  # add "xhyp" as a covariate
    edu = "nohs"  # add a column to identify which edu. scenario
  )

hsSims_tb <- hsSims |>
  bind_rows() |>
  mutate(xhyp = xhyp, edu = "hs")

collSims_tb <- collSims |>
  bind_rows() |>
  mutate(xhyp = xhyp, edu = "coll")

# bind them all together, in long format

allSims_tb <- bind_rows(nohsSims_tb, hsSims_tb, collSims_tb)
# unique(allSims_tb$edu)

# Visualize expected probabilities using ggplot2
# Basic: 

allSims_tb |>
  ggplot(aes(x = xhyp, 
             y = pe, 
             ymax = upper, 
             ymin = lower, 
             colour = edu, 
             fill = edu)) +
  geom_line() +
  geom_ribbon(alpha = 0.2, linetype = 0) +
  labs(y = "Probability of Voting", x = "Age of Respondent")

# More elaborate way if you are interested in approximating Chris's graph...
col <- brewer.pal(n=3, name="Dark2") # from RColorBrewer package

allSims_tb |>
  ggplot(aes(x = xhyp, 
             y = pe, 
             ymax = upper, 
             ymin = lower, 
             color = edu, 
             fill = edu)) +
  geom_line() +
  geom_ribbon(alpha = 0.2, linetype = 0) +
  annotate(geom = "text", x = 55, y = 0.26, 
           label = "Less than HS", col = col[1]) +
  annotate(geom = "text", x = 49, y = 0.56, 
           label = "High School", col = col[2]) +
  annotate(geom = "text", x = 30, y = 0.87, 
           label = "College", col = col[3]) +
  scale_color_manual(values = rev(col)) +
  scale_fill_manual(values = rev(col)) +
  scale_x_continuous(breaks = seq(20, 90, 10)) +
  scale_y_continuous(breaks = seq(0, 1, 0.2), 
                     limits = c(0, 1), expand = c(0, 0)) +
  labs(y = "Probability of Voting", x = "Age of Respondent") +
  
theme_cavis_hgrid +    # Brian's theme
  
theme(
  panel.background = element_rect(fill = NA),
  axis.line.x.bottom = element_line(size = 0.5),
  axis.ticks.length = unit(0.5, "char"),
  legend.position = "none"
)

Visualization: `tile`

We can replicate the visualization using Chris’s tile package.

With tile, we first create data trace objects. Since the predictor age is continuous, we’ll use tile::lineplot to plot lines for expected values and their confidence intervals. We’ll then add text labels with tile::textTile.

The lineplot function includes an extrapolate= argument, allowing you to limit visualization to counterfactuals without in-sample extrapolation. For this, the WhatIf package must be installed and loaded. If you’re unfamiliar with this setup, you can skip the argument, but note that your predictions may include extrapolated values.

# Set up lineplot traces of expected probabilities

# When recycling this code, omit the extrapolate input
# if you are unsure how to use it correctly
nohsTrace <- lineplot(x=xhyp,
                      y=nohsSims$pe,
                      lower=nohsSims$lower,
                      upper=nohsSims$upper,
                      col=col[1],
                      # extrapolate requires package WhatIf
                      extrapolate=list(data=mdata[,2:ncol(mdata)], # real data (w/0 y)
                                       cfact=nohsScen$x[,2:ncol(hsScen$x)], # hypo data (w/0 y)
                                       omit.extrapolated=TRUE),
                      plot=1) # attach to the plot 

# extrapolate: constrain the visualization, so that it includes only plausible, real, expected values 

# create trace for each geometry 

hsTrace <- lineplot(x=xhyp,
                    y=hsSims$pe,
                    lower=hsSims$lower,
                    upper=hsSims$upper,
                    col=col[2],
                      # extrapolate requires package WhatIf
                    extrapolate=list(data=mdata[,2:ncol(mdata)], 
                                     cfact=hsScen$x[,2:ncol(hsScen$x)], 
                                     omit.extrapolated=TRUE),
                    plot=1)

collTrace <- lineplot(x=xhyp,
                      y=collSims$pe,
                      lower=collSims$lower,
                      upper=collSims$upper,
                      col=col[3],
                      # extrapolate requires package WhatIf
                      extrapolate=list(data=mdata[,2:ncol(mdata)],
                                       cfact=collScen$x[,2:ncol(hsScen$x)],
                                       omit.extrapolated=TRUE),
                      plot=1)

# Set up traces with labels and legend
labelTrace <- textTile(labels=c("Less than HS", "High School", "College"),
                       x=c( 55,    49,     30),
                       y=c( 0.26,  0.56,   0.87),
                       col=col,
                       plot=1)

legendTrace <- textTile(labels=c("Logit estimates:", "95% confidence", "interval is shaded"),
                        x=c(80, 80, 80),
                        y=c(0.2, 0.15, 0.10),
                        cex=0.9,
                        plot=1)

# Plot traces using tile
tile(nohsTrace,
     hsTrace,
     collTrace,
     labelTrace,
     legendTrace,
     limits=c(18,94,0,1),# first two: x axis, last two: y axis
     xaxis=list(at=c(20,30,40,50,60,70,80,90), fontsize=11),
     yaxis=list(label.loc=-0.5, major=FALSE, fontsize=11),
     xaxistitle=list(labels="Age of Respondent", y=-0.5),
     yaxistitle=list(labels="Probability of Voting"),
     width=list(null=5,yaxistitle=4,yaxis.labelspace=-0.5)
     )

Application: competing model with new predictor

Next, we consider a new specification by adding the variable marriedo.

We’ll estimate this updated model with glm and simulate expected values for new scenarios comparing married and non-married respondents.

# Estimate logit model using glm()

# Set up model formula and model specific data frame
model2 <- vote00 ~ age + I(age^2) + hsdeg + coldeg + marriedo
mdata2 <- extractdata(model2, data, na.rm=TRUE)

# Run logit & extract results
logitM2 <- glm(model2, family=binomial(link="logit"), data=mdata2)
peM2 <- logitM2$coefficients  # point estimates
vcM2 <- vcov(logitM2)         # var-cov matrix

Quantities of Interest using `simcf`

Now, we will compute quantities of interest (QoI) again. Specifically, we will calculate the first differences and relative risks between the scenarios of being married and not being married. This allows us to visualize the marginal effect of marriage on the expected probability and relative risk of voting.

When using simcf to compute first differences or relative risks with simf, we need to set up two parallel scenarios, x and xpre, in the cfMake object. The first, x, represents the current value, while the second, xpre, represents the prior value. For example, the first difference is calculated as x - xpre.

# Simulate parameter distributions
sims <- 10000
simbetasM2 <- mvrnorm(n=sims, 
                      mu=peM2, 
                      Sigma=vcM2)


# Set up counterfactuals:  all ages, each of three educations
xhyp <- seq(18,97,1)
nscen <- length(xhyp)

marriedScen <- notmarrScen <- cfMake(formula=model2,
                                     data=mdata2,
                                     nscen=nscen)


for (i in 1:nscen) {
  
  # Married (loop over each age)
  # Note below the careful use of before scenarios (xpre) and after scenarios (x)
  # we will use the marriedScen counterfactuals in FDs and RRs as well as EVs
  marriedScen <- cfChange(xscen=marriedScen, 
                          covname="age", 
                          x = xhyp[i], 
                          xpre= xhyp[i],  
                          scen = i)
  
  marriedScen <- cfChange(xscen=marriedScen, 
                          covname="marriedo", 
                          x = 1,     # compare before and after
                          xpre= 0, 
                          scen = i)
  # We want to see the difference from not being married to married 
  # you can hold other things (hs, coll) at their means
  # we change age bc we are interested in age
  
  # Not married (loop over each age)
  notmarrScen <- cfChange(xscen=notmarrScen, 
                          covname="age",
                          x = xhyp[i], 
                          scen = i)
  
  notmarrScen <- cfChange(xscen=notmarrScen, 
                          covname="marriedo", 
                          x = 0, 
                          scen = i)
}

# Simulate expected probabilities for all scenarios
marriedSims <- logitsimev(marriedScen, simbetasM2, ci=0.95)
notmarrSims <- logitsimev(notmarrScen, simbetasM2, ci=0.95)

# Simulate first difference of voting with respct to marriage
marriedFD <- logitsimfd(marriedScen, simbetasM2, ci=0.95)

# Simulate relative risk of voting wrt marriage
marriedRR <- logitsimrr(marriedScen, simbetasM2, ci=0.95)

Visualization: `ggplot`

# Data wrangling in tidyverse

AllMarried <- 
  bind_rows(  
    
    marriedSims |>
      bind_rows() |>
      mutate(notmarried = 0,
             xhyp = xhyp,
             type = "EV"),
    
    notmarrSims |>
      bind_rows() |>
      mutate(notmarried = 1,
             xhyp = xhyp,
             type = "EV"),
    
    marriedFD |>
      bind_rows() |>
      mutate(notmarried = NA,
             xhyp = xhyp,
             type = "FD"),
    
    marriedRR |>
      bind_rows() |>
      mutate(notmarried = NA,
             xhyp = xhyp,
             type = "RR")
  
) |> 
  mutate(notmarried = factor(notmarried))
  
  
  

AllMarried |> 
  filter(type == "EV") |> 
  ggplot(aes(x = xhyp, y = pe, 
           color = notmarried, 
           fill = notmarried)) +
  # Point estimates lines
  geom_line(show.legend = FALSE) +
  scale_color_manual(values = c(col[1], 
                                col[2]), 
                     labels = c("Currently Married", 
                                "Not Married")) +
  # CIs for married
  geom_ribbon(aes(ymin = lower, 
                  ymax = upper),
              alpha = 0.3, 
              linetype = 0,
              show.legend = FALSE) +
  scale_fill_manual(values = c(col[1], col[2])) +
  # CIs for not married
  geom_line(aes(y = upper, 
                linetype = notmarried),
            show.legend = FALSE) +
  geom_line(aes(y = lower, 
                linetype = notmarried), show.legend = FALSE) +
  scale_linetype_manual(values = c(0, 2)) +
  # Label
  annotate(geom = "text", 
           x = 40, y = 0.86, 
           label = "Currently Married", 
           col = col[1]) +
  annotate(geom = "text", 
           x = 49, y = 0.46, 
           label = "Not Married", 
           col = col[2]) +
  # Other adjustments
  scale_x_continuous(breaks = seq(20, 90, 10)) +
  scale_y_continuous(breaks = seq(0, 1, 0.2), limits = c(0, 1), 
                     expand = c(0, 0)) +
  theme_cavis_hgrid +
  theme(legend.position.inside = c(0.2, 0.13), 
        legend.key.size = unit(0.2, "cm"),
        panel.grid.major.y = element_blank())+
  labs(y = "Probability of Voting", x = "Age of Respondent")

# First Difference
AllMarried |> 
  filter(type == "FD") |> 
  ggplot(aes(x = xhyp, y = pe)) +

  # Point estimates lines
  geom_line(color = col[1]) +
  # CIs for white voters
  geom_ribbon(aes(ymin = lower, ymax = upper),
              alpha = 0.3, linetype = 0, 
              show.legend = FALSE, fill = col[1]) +
  # Other adjustments
  scale_x_continuous(breaks = seq(20, 90, 10)) +
  scale_y_continuous(breaks = seq(-0.1, 0.5, 0.1), limits = c(-0.1, 0.5),
                    )+
  
  geom_hline(yintercept = 0)+
                     
  theme_cavis_hgrid +
  theme(legend.position.inside = c(0.2, 0.13), 
        legend.key.size = unit(0.2, "cm"),
        panel.grid.major.y = element_blank())+
  labs(y = "Difference in Probability of Voting", x = "Age of Respondent")

# Relative Risk
AllMarried |> 
  filter(type == "RR") |> 
  ggplot(aes(x = xhyp, y = pe)) +
  
  # Point estimates lines
  geom_line(color = col[1]) +
  # CIs for white voters
  geom_ribbon(aes(ymin = lower, ymax = upper),
              alpha = 0.3, linetype = 0, 
              show.legend = FALSE, fill = col[1]) +
  # Other adjustments
  scale_x_continuous(breaks = seq(20, 90, 10)) +
  scale_y_continuous(breaks = seq(0.9, 1.5, 0.1), limits = c(0.9, 1.5),
  )+
  
  geom_hline(yintercept = 1)+
  
  theme_cavis_hgrid +
  theme(legend.position.inside = c(0.2, 0.13), 
        legend.key.size = unit(0.2, "cm"),
        panel.grid.major.y = element_blank())+
  labs(y = "Relative Risk of Voting", x = "Age of Respondent")

Visualization: `tile`

Below we execute the same plot, but this time using tile.

Please open the Lab5_practice.rmd and make a plot of expected probabilities. Your plot should look similar to the one below.

Plot of first difference can be visualized in the following code.

# Plot First Difference

# Set up lineplot trace of relative risk
marriedFDTrace <- lineplot(x=xhyp,
                           y=marriedFD$pe,
                           lower=marriedFD$lower,
                           upper=marriedFD$upper,
                           col=col[1],
                           extrapolate=list(data=mdata2[,2:ncol(mdata2)],
                                            cfact=marriedScen$x[,2:ncol(marriedScen$x)],
                                            omit.extrapolated=TRUE),
                           plot=1)


# Set up baseline: for first difference, this is 0
baseline <- linesTile(x=c(18,94),
                      y=c(0,0),
                      plot=1)

# Set up traces with labels and legend
labelFDTrace <- textTile(labels=c("Married compared \n to Not Married"),
                       x=c( 40),
                       y=c( 0.20),
                       col=col[1],
                       plot=1)

legendFDTrace <- textTile(labels=c("Logit estimates:", "95% confidence", "interval is shaded"),
                        x=c(80, 80, 80),
                        y=c(-0.02, -0.05, -0.08),
                        cex=0.9,
                        plot=1)

# Plot traces using tile
tile(marriedFDTrace,
     labelFDTrace,
     legendFDTrace,
     baseline,
     limits=c(18,94,-0.1,0.5),
     xaxis=list(at=c(20,30,40,50,60,70,80,90), fontsize=11),
     yaxis=list(label.loc=-0.5, major=FALSE, fontsize=11),
     xaxistitle=list(labels="Age of Respondent", y=-0.5),
     yaxistitle=list(labels="Difference in Probability of Voting"),
     width=list(null=5,yaxistitle=4,yaxis.labelspace=-0.5)#,
     #output=list(file="marriedFD",width=5.5)
     )

Now, can you make a plot of relative risk? Again, your plot should look similar to the one below.

What model fits best?

In Maximum Likelihood Estimation (MLE), the likelihood function measures how well a model explains the observed data. It calculates the probability of the observed data given the model’s parameters. A higher likelihood value indicates that the model better fits the data.

In model selection, metrics like Akaike Information Criterion (AIC) are built upon the log likelihood. A higher log likelihood means a better fit, but AIC also includes a penalty for model complexity to avoid overfitting. Thus, likelihood is key in assessing model fit, but it must be balanced with complexity to ensure the model is both accurate and generalizable.

The AIC is a measure used to assess the fit of non-nested models. It balances model fit and complexity by penalizing overfitting. AIC is defined as:

\[ AIC(\mathcal{M}) = D(\mathcal{M}) + 2 \times |\mathcal{M}| \]

Where:

\(D(\mathcal{M})\) is -2 times the log likelihood, reflecting model fit.
\(|\mathcal{M}|\) is the number of parameters, and \(2 \times |\mathcal{M}|\) penalizes complexity.

A lower AIC indicates a better balance between goodness of fit and model simplicity. Smaller AIC values are preferred because they suggest that the model explains the data well without overfitting.

CSSS/POLS 510 - Maximum Likelihood Estimation

Lab 5. Binary Models and Intro to `tile`

Minji Jeong

Binary Models

Maximum Likelihood Estimate from the Bernoulli Distribution

Logit Model

Probit Model

Logit or Probit?

Application: Voting Data - NES 2000

Quantities of Interest using `simcf`

Visualization: `ggplot`

Visualization: `tile`

Application: competing model with new predictor

Quantities of Interest using `simcf`

Visualization: `ggplot`

Visualization: `tile`

What model fits best?

CSSS/POLS 510 - Maximum Likelihood Estimation

Lab 5. Binary Models and Intro to tile

Minji Jeong

Binary Models

Maximum Likelihood Estimate from the Bernoulli Distribution

Logit Model

Probit Model

Logit or Probit?

Application: Voting Data - NES 2000

Quantities of Interest using simcf

Visualization: ggplot

Visualization: tile

Application: competing model with new predictor

Quantities of Interest using simcf

Visualization: ggplot

Visualization: tile

What model fits best?

Lab 5. Binary Models and Intro to `tile`

Quantities of Interest using `simcf`

Visualization: `ggplot`

Visualization: `tile`

Quantities of Interest using `simcf`

Visualization: `ggplot`

Visualization: `tile`