tileTo derive the maximum likelihood estimate (MLE) from a Bernoulli distribution, we start by recognizing that binary data fit the assumptions of a Bernoulli distribution. This distribution models a single trial with two possible, mutually exclusive and exhaustive outcomes: success (\(y_i = 1\)) or failure (\(y_i = 0\)).
The probability mass function (PMF) of a Bernoulli trial is:
\[ \text{Pr}(y_i = 1|\pi_i) = f_{\text{Bernoulli}}(y_i|\pi_i) = \pi_i^{y_i}(1 - \pi_i)^{1 - y_i} \]
where \(\pi_i\) is the probability of success, and the expectation of \(y_i\) is \(\mathbb{E}(y_i) = \pi_i\). This framework allows us to construct the likelihood function for a sample of \(n\) observations:
\[ \mathcal{L}(\pi|\mathbf{y}) \propto \prod_{i=1}^{n} \pi_i^{y_i}(1 - \pi_i)^{1 - y_i} \]
Taking the log-likelihood simplifies the product into a sum:
\[ \log \mathcal{L}(\pi|\mathbf{y}) \propto \sum_{i=1}^{n} \left[ \; y_i \log \pi_i + (1 - y_i) \log (1 - \pi_i) \; \right] \]
This log-likelihood is straightforward to maximize numerically and is approximately quadratic, making it easy to solve using standard optimization methods.
This expression is general and applies to any binary model (like logit or probit). Now, we proceed to define the systematic component for both models and plug it into the log-likelihood.
One popular way to model the systematic component \(\pi_i\) (the probability of success) is to use the logistic function. The logistic function maps any real number \(x_i \beta\) to the interval \([0, 1]\), ensuring the limits of a probability (no extrapolation):
\[ \pi_i = \text{logit}^{-1}(x_i \beta) = \frac{\exp(x_i \beta)}{1 + \exp(x_i \beta)} = \frac{1}{1 + \exp(-x_i \beta)} \]
We now substitute the logit function for \(\pi_i\) into the log-likelihood:
\[ \log \mathcal{L}(\boldsymbol{\beta}|\mathbf{y},\mathbf{x}) \propto \sum_{i=1}^{n} \left[ y_i \log \left( \frac{1}{1 + \exp(-x_i \boldsymbol{\beta})} \right) + (1 - y_i) \log \left( 1 - \frac{1}{1 + \exp(-x_i \boldsymbol{\beta})} \right) \right] \]
We can simplify the above expression to get the log-likelihood for the logit model:
\[ \log \mathcal{L}(\boldsymbol{\beta}|\mathbf{y},\mathbf{x}) \propto \sum_{i=1}^{n} \left[ y_i x_i \boldsymbol{\beta} - \log(1 + \exp(x_i \boldsymbol{\beta})) \right] \]
To estimate \(\boldsymbol{\beta}\),
we maximize this log-likelihood function with respect to \(\boldsymbol{\beta}\) via numerical methods
using optim. The resulting estimates are the
maximum likelihood estimates (MLE) of the
parameters.
An alternative to the logit model is the probit model, which uses the cumulative distribution function (CDF) of the standard Normal distribution to map \(x_i \beta\) to \(\pi_i\):
\[ \pi_i = \Phi(x_i \beta) = \int_{-\infty}^{x_i \beta} \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{y_i^{*2}}{2}\right) dy_i^{*} \]
Here, \(\Phi(\cdot)\) is the probit function, which is the inverse of the standard normal CDF. The probit model assumes that the probability of success follows a normal CDF.
Once again, the starting point is the Bernoulli log-likelihood:
\[ \log \mathcal{L}(\pi|\mathbf{y}) \propto \sum_{i=1}^{n} \left[ y_i \log \pi_i + (1 - y_i) \log (1 - \pi_i) \right] \]
In the probit model, the systematic component \(\pi_i\) is expressed using the cumulative distribution function (CDF) of the standard normal distribution \(\Phi(\cdot)\):
\[ \pi_i = \Phi(\mathbf{x}_i \boldsymbol{\beta}) \]
We substitute the systematic component \(\Phi(\cdot)\):
\[ \log \mathcal{L}(\boldsymbol{\beta}|\mathbf{y}, \mathbf{x}) \propto \sum_{i=1}^{n} \left[ y_i \log \Phi(x_i \boldsymbol{\beta}) + (1 - y_i) \log (1 - \Phi(x_i \boldsymbol{\beta})) \right] \]
Unlike the logit model, the probit log-likelihood does not have a
closed-form simplification like the logistic function because \(\Phi(\cdot)\) cannot be expressed in a
simple algebraic form. We therefore compute the estimates of \(\boldsymbol{\beta}\) via numerical methods
(e.g. gradient descent) using optim.
Logit and probit models typically yield similar results when the latent probability distribution is symmetric, with a 0.5 baseline probability. If we believe the data-generating process (DGP) does not follow this symmetry, a model like scobit (skewed-logit) might be better suited, as it is robust to asymmetry. Generally, though, logit and probit models produce comparable probability estimates.
Their key difference lies in the link functions:
Both models estimate \(\boldsymbol{\beta}\) by maximizing their log-likelihood functions. While the choice between logit and probit often depends on computational preferences and ease of interpretation, both are founded on the Bernoulli distribution.
We are now going to replicate the same application that we saw during lectures, using the American National Election Survey (ANES) in 2000, using the item survey “Did you vote in 2000 election?”
Turnout is binary: people either vote (\(\text{Vote}_i = 1\)) or they do not (\(\text{Vote}_i = 0\))
We can model the probability of voting in the 2000 U.S. presidential election as a function of Age, High School Degree (HS Degree), and College Degree (ColDeg). \[ \begin{aligned} Vote_i & \sim \text{Bernoulli}(\pi_i) \\ \pi_i & = \text{logit}^{-1}(\mathbf{x}_i \boldsymbol{\beta}) \end{aligned} \]
where \(\mathbf{x}_i\) represents the vector of covariates for individual \(i\) (including age, high school degree, and college degree) and \(\boldsymbol{\beta}\) is the vector of coefficients.
Below we estimate this model by manually defining the
likelihood and using optim
# Load data
data <- read.csv("https://faculty.washington.edu/cadolph/mle/nes00a.csv")
# Or library(readr) then use read_csv()
head(data)
## vote00 age hsdeg coldeg vote96 married marriedo
## 1 1 49 1 0 1 1 1
## 2 0 35 1 0 0 1 1
## 3 1 57 1 0 1 1 1
## 4 1 63 1 0 1 1 0
## 5 1 40 1 0 1 1 1
## 6 1 77 0 0 1 1 1
# Estimate logit model using optim()
# Construct variables and model objects
y <- data$vote00
library(dplyr)
x <-
data |>
mutate(age2 = age^2) |>
select(age,age2,hsdeg,coldeg) |> as.matrix()
# Likelihood function for logit
llkLogit <- function(param,y,x) {
## 1. Arrange the data and include an intercept
# intercept
os <- rep(1,nrow(x)) # Or length(x[,1])
# bind intercept + covariates
x <- cbind(os,x)
## 2. define parameters to optimize and systematic component
b <- param[ 1 : ncol(x) ]
xb <- x%*%b
# define log-likelihood
-sum(y*xb - log(1 + exp(xb)));
# optim is a minimizer, so min -ln L(param|y)
}
# Fit logit model using optim
lsResult <- lm(y~x) # use ls estimates as starting values
stval <- lsResult$coefficients # initial guesses
logit_optim <- optim(par = stval,
fn = llkLogit,
method = "Nelder-Mead", # notice method
hessian = TRUE,
y = y,
x = x)
(pe_optim <- logit_optim$par) # point estimates
## (Intercept) xage xage2 xhsdeg xcoldeg
## -3.0120951884 0.0745076721 -0.0004408562 1.1219062683 1.0812752168
(vc_optim <- solve(logit_optim$hessian)) # var-cov matrix
## (Intercept) xage xage2 xhsdeg
## (Intercept) 6.821849e-02 -8.281879e-04 3.462127e-06 -3.671212e-02
## xage -8.281879e-04 1.774582e-05 -1.366226e-07 2.490703e-04
## xage2 3.462127e-06 -1.366226e-07 2.763014e-09 -1.633932e-06
## xhsdeg -3.671212e-02 2.490703e-04 -1.633932e-06 3.241723e-02
## xcoldeg -3.775257e-03 1.120854e-04 -1.717767e-06 -2.837779e-03
## xcoldeg
## (Intercept) -3.775257e-03
## xage 1.120854e-04
## xage2 -1.717767e-06
## xhsdeg -2.837779e-03
## xcoldeg 1.822990e-02
(se_optim <- sqrt(diag(vc_optim))) # standard errors
## (Intercept) xage xage2 xhsdeg xcoldeg
## 2.611867e-01 4.212578e-03 5.256438e-05 1.800479e-01 1.350181e-01
(ll_optim <- -logit_optim$value) # likelihood at maximum (reverse negative from function)
## [1] -1034.478
Alternatively, we can estimate a logit model using base R’s
glm function, but remember to specify the correct link
function in the family argument:
binomial(link = "logit")binomial(link = "probit")# Estimate logit model using glm()
# Set up model formula and model specific data frame
model <- vote00 ~ age + I(age^2) + hsdeg + coldeg
mdata <- extractdata(model, data, na.rm=TRUE) # remove missing value
# Run logit & extract results
logit_glm <- glm(model,
family=binomial(link="logit"),
data=mdata)
(pe_glm <- logit_glm$coefficients) # point estimates
## (Intercept) age I(age^2) hsdeg coldeg
## -3.0193890720 0.0747251922 -0.0004427014 1.1243907749 1.0795702357
(vc_glm <- vcov(logit_glm)) # var-cov matrix
## (Intercept) age I(age^2) hsdeg
## (Intercept) 1.748828e-01 -6.158232e-03 5.514176e-05 -2.453127e-02
## age -6.158232e-03 2.837215e-04 -2.733927e-06 -3.354912e-04
## I(age^2) 5.514176e-05 -2.733927e-06 2.740567e-08 5.053554e-06
## hsdeg -2.453127e-02 -3.354912e-04 5.053554e-06 3.240248e-02
## coldeg 9.264483e-04 -9.948239e-05 1.270344e-06 -3.618422e-03
## coldeg
## (Intercept) 9.264483e-04
## age -9.948239e-05
## I(age^2) 1.270344e-06
## hsdeg -3.618422e-03
## coldeg 1.721641e-02
(se_glm <- sqrt(diag(vc_glm))) # standard errors
## (Intercept) age I(age^2) hsdeg coldeg
## 0.4181899367 0.0168440337 0.0001655466 0.1800068893 0.1312113265
simcfOnce we have estimated our binary model, and extracted the point estimates and variance-covariance matrix, we can use our simulation approach to predict counterfactuals and scenarios of interest from the model estimated.
## Simulate quantities of interest using simcf
## We could do this from the optim or glm results;
## here, we use glm
# 1. Simulate parameter distributions using MASS:mvnorm
sims <- 10000
simbetas <- mvrnorm(n=sims,
mu=pe_glm, # point estimate
Sigma=vc_glm) # variance covariance matrix
# 2. Set up counterfactuals: all ages, each of three educations
## Age range
xhyp <- seq(18,97,1)
nscen <- length(xhyp)
## Scenarios of interest: college, high school, no high school
nohsScen <- hsScen <- collScen <- cfMake(formula=model,
data=mdata,
nscen=nscen) # how many scenarios we have
# cfMake creates simcf object
# We have now created 3 simcf objects, one for each scenario of interest
# thus, we use cfChange function to change the hypothetical values (age)
# from each simulation
for (i in 1:nscen) {
# No High school scenarios (loop over each age)
nohsScen <- cfChange(xscen=nohsScen, covname="age", x = xhyp[i], scen = i)
nohsScen <- cfChange(xscen=nohsScen, covname="hsdeg", x = 0, scen = i)
nohsScen <- cfChange(xscen=nohsScen, covname="coldeg", x = 0, scen = i)
# HS grad scenarios (loop over each age)
hsScen <- cfChange(xscen=hsScen, covname="age", x = xhyp[i], scen = i)
hsScen <- cfChange(xscen=hsScen, covname="hsdeg", x = 1, scen = i)
hsScen <- cfChange(xscen=hsScen, covname="coldeg", x = 0, scen = i)
# College grad scenarios (loop over each age)
collScen <- cfChange(xscen=collScen, covname="age", x = xhyp[i], scen = i)
collScen <- cfChange(xscen=collScen, covname="hsdeg", x = 1, scen = i)
collScen <- cfChange(xscen=collScen, covname="coldeg", x = 1, scen = i)
}
# it's about expected values; x changes, xpre doesn't change
# 3. Simulate quantities of interest
# Simulate expected probabilities for all scenarios
nohsSims <- logitsimev(x = nohsScen, # counterfactual object
b = simbetas, # simulated parameters
ci = 0.95)
hsSims <- logitsimev(hsScen, simbetas, ci=0.95)
collSims <- logitsimev(collScen, simbetas, ci=0.95)
ggplot# First, we need to arrange all the predicitons with the hypothetical values in one dataframe
# Data wrangling with tidyverse
nohsSims_tb <-
nohsSims |>
bind_rows() |>
mutate(
xhyp = xhyp, # add "xhyp" as a covariate
edu = "nohs" # add a column to identify which edu. scenario
)
hsSims_tb <- hsSims |>
bind_rows() |>
mutate(xhyp = xhyp, edu = "hs")
collSims_tb <- collSims |>
bind_rows() |>
mutate(xhyp = xhyp, edu = "coll")
# bind them all together, in long format
allSims_tb <- bind_rows(nohsSims_tb, hsSims_tb, collSims_tb)
# unique(allSims_tb$edu)
# Visualize expected probabilities using ggplot2
# Basic:
allSims_tb |>
ggplot(aes(x = xhyp,
y = pe,
ymax = upper,
ymin = lower,
colour = edu,
fill = edu)) +
geom_line() +
geom_ribbon(alpha = 0.2, linetype = 0) +
labs(y = "Probability of Voting", x = "Age of Respondent")
# More elaborate way if you are interested in approximating Chris's graph...
col <- brewer.pal(n=3, name="Dark2") # from RColorBrewer package
allSims_tb |>
ggplot(aes(x = xhyp,
y = pe,
ymax = upper,
ymin = lower,
color = edu,
fill = edu)) +
geom_line() +
geom_ribbon(alpha = 0.2, linetype = 0) +
annotate(geom = "text", x = 55, y = 0.26,
label = "Less than HS", col = col[1]) +
annotate(geom = "text", x = 49, y = 0.56,
label = "High School", col = col[2]) +
annotate(geom = "text", x = 30, y = 0.87,
label = "College", col = col[3]) +
scale_color_manual(values = rev(col)) +
scale_fill_manual(values = rev(col)) +
scale_x_continuous(breaks = seq(20, 90, 10)) +
scale_y_continuous(breaks = seq(0, 1, 0.2),
limits = c(0, 1), expand = c(0, 0)) +
labs(y = "Probability of Voting", x = "Age of Respondent") +
theme_cavis_hgrid + # Brian's theme
theme(
panel.background = element_rect(fill = NA),
axis.line.x.bottom = element_line(size = 0.5),
axis.ticks.length = unit(0.5, "char"),
legend.position = "none"
)
tileWe can replicate the visualization using Chris’s tile
package.
With tile, we first create data trace
objects. Since the predictor age is continuous, we’ll
use tile::lineplot to plot lines for expected values and
their confidence intervals. We’ll then add text labels with
tile::textTile.
The lineplot function includes an
extrapolate= argument, allowing you to limit visualization
to counterfactuals without in-sample extrapolation. For
this, the WhatIf package must be installed and loaded. If
you’re unfamiliar with this setup, you can skip the argument, but note
that your predictions may include extrapolated values.
# Set up lineplot traces of expected probabilities
# When recycling this code, omit the extrapolate input
# if you are unsure how to use it correctly
nohsTrace <- lineplot(x=xhyp,
y=nohsSims$pe,
lower=nohsSims$lower,
upper=nohsSims$upper,
col=col[1],
# extrapolate requires package WhatIf
extrapolate=list(data=mdata[,2:ncol(mdata)], # real data (w/0 y)
cfact=nohsScen$x[,2:ncol(hsScen$x)], # hypo data (w/0 y)
omit.extrapolated=TRUE),
plot=1) # attach to the plot
# extrapolate: constrain the visualization, so that it includes only plausible, real, expected values
# create trace for each geometry
hsTrace <- lineplot(x=xhyp,
y=hsSims$pe,
lower=hsSims$lower,
upper=hsSims$upper,
col=col[2],
# extrapolate requires package WhatIf
extrapolate=list(data=mdata[,2:ncol(mdata)],
cfact=hsScen$x[,2:ncol(hsScen$x)],
omit.extrapolated=TRUE),
plot=1)
collTrace <- lineplot(x=xhyp,
y=collSims$pe,
lower=collSims$lower,
upper=collSims$upper,
col=col[3],
# extrapolate requires package WhatIf
extrapolate=list(data=mdata[,2:ncol(mdata)],
cfact=collScen$x[,2:ncol(hsScen$x)],
omit.extrapolated=TRUE),
plot=1)
# Set up traces with labels and legend
labelTrace <- textTile(labels=c("Less than HS", "High School", "College"),
x=c( 55, 49, 30),
y=c( 0.26, 0.56, 0.87),
col=col,
plot=1)
legendTrace <- textTile(labels=c("Logit estimates:", "95% confidence", "interval is shaded"),
x=c(80, 80, 80),
y=c(0.2, 0.15, 0.10),
cex=0.9,
plot=1)
# Plot traces using tile
tile(nohsTrace,
hsTrace,
collTrace,
labelTrace,
legendTrace,
limits=c(18,94,0,1),# first two: x axis, last two: y axis
xaxis=list(at=c(20,30,40,50,60,70,80,90), fontsize=11),
yaxis=list(label.loc=-0.5, major=FALSE, fontsize=11),
xaxistitle=list(labels="Age of Respondent", y=-0.5),
yaxistitle=list(labels="Probability of Voting"),
width=list(null=5,yaxistitle=4,yaxis.labelspace=-0.5)
)
Next, we consider a new specification by adding the variable marriedo.
We’ll estimate this updated model with glm and simulate
expected values for new scenarios comparing married and non-married
respondents.
# Estimate logit model using glm()
# Set up model formula and model specific data frame
model2 <- vote00 ~ age + I(age^2) + hsdeg + coldeg + marriedo
mdata2 <- extractdata(model2, data, na.rm=TRUE)
# Run logit & extract results
logitM2 <- glm(model2, family=binomial(link="logit"), data=mdata2)
peM2 <- logitM2$coefficients # point estimates
vcM2 <- vcov(logitM2) # var-cov matrix
simcfNow, we will compute quantities of interest (QoI) again. Specifically, we will calculate the first differences and relative risks between the scenarios of being married and not being married. This allows us to visualize the marginal effect of marriage on the expected probability and relative risk of voting.
When using simcf to compute first differences or
relative risks with simf, we need to set up two parallel
scenarios, x and xpre, in the
cfMake object. The first, x, represents the
current value, while the second, xpre, represents the prior
value. For example, the first difference is calculated as
x - xpre.
# Simulate parameter distributions
sims <- 10000
simbetasM2 <- mvrnorm(n=sims,
mu=peM2,
Sigma=vcM2)
# Set up counterfactuals: all ages, each of three educations
xhyp <- seq(18,97,1)
nscen <- length(xhyp)
marriedScen <- notmarrScen <- cfMake(formula=model2,
data=mdata2,
nscen=nscen)
for (i in 1:nscen) {
# Married (loop over each age)
# Note below the careful use of before scenarios (xpre) and after scenarios (x)
# we will use the marriedScen counterfactuals in FDs and RRs as well as EVs
marriedScen <- cfChange(xscen=marriedScen,
covname="age",
x = xhyp[i],
xpre= xhyp[i],
scen = i)
marriedScen <- cfChange(xscen=marriedScen,
covname="marriedo",
x = 1, # compare before and after
xpre= 0,
scen = i)
# We want to see the difference from not being married to married
# you can hold other things (hs, coll) at their means
# we change age bc we are interested in age
# Not married (loop over each age)
notmarrScen <- cfChange(xscen=notmarrScen,
covname="age",
x = xhyp[i],
scen = i)
notmarrScen <- cfChange(xscen=notmarrScen,
covname="marriedo",
x = 0,
scen = i)
}
# Simulate expected probabilities for all scenarios
marriedSims <- logitsimev(marriedScen, simbetasM2, ci=0.95)
notmarrSims <- logitsimev(notmarrScen, simbetasM2, ci=0.95)
# Simulate first difference of voting with respct to marriage
marriedFD <- logitsimfd(marriedScen, simbetasM2, ci=0.95)
# Simulate relative risk of voting wrt marriage
marriedRR <- logitsimrr(marriedScen, simbetasM2, ci=0.95)
ggplot# Data wrangling in tidyverse
AllMarried <-
bind_rows(
marriedSims |>
bind_rows() |>
mutate(notmarried = 0,
xhyp = xhyp,
type = "EV"),
notmarrSims |>
bind_rows() |>
mutate(notmarried = 1,
xhyp = xhyp,
type = "EV"),
marriedFD |>
bind_rows() |>
mutate(notmarried = NA,
xhyp = xhyp,
type = "FD"),
marriedRR |>
bind_rows() |>
mutate(notmarried = NA,
xhyp = xhyp,
type = "RR")
) |>
mutate(notmarried = factor(notmarried))
AllMarried |>
filter(type == "EV") |>
ggplot(aes(x = xhyp, y = pe,
color = notmarried,
fill = notmarried)) +
# Point estimates lines
geom_line(show.legend = FALSE) +
scale_color_manual(values = c(col[1],
col[2]),
labels = c("Currently Married",
"Not Married")) +
# CIs for married
geom_ribbon(aes(ymin = lower,
ymax = upper),
alpha = 0.3,
linetype = 0,
show.legend = FALSE) +
scale_fill_manual(values = c(col[1], col[2])) +
# CIs for not married
geom_line(aes(y = upper,
linetype = notmarried),
show.legend = FALSE) +
geom_line(aes(y = lower,
linetype = notmarried), show.legend = FALSE) +
scale_linetype_manual(values = c(0, 2)) +
# Label
annotate(geom = "text",
x = 40, y = 0.86,
label = "Currently Married",
col = col[1]) +
annotate(geom = "text",
x = 49, y = 0.46,
label = "Not Married",
col = col[2]) +
# Other adjustments
scale_x_continuous(breaks = seq(20, 90, 10)) +
scale_y_continuous(breaks = seq(0, 1, 0.2), limits = c(0, 1),
expand = c(0, 0)) +
theme_cavis_hgrid +
theme(legend.position.inside = c(0.2, 0.13),
legend.key.size = unit(0.2, "cm"),
panel.grid.major.y = element_blank())+
labs(y = "Probability of Voting", x = "Age of Respondent")
# First Difference
AllMarried |>
filter(type == "FD") |>
ggplot(aes(x = xhyp, y = pe)) +
# Point estimates lines
geom_line(color = col[1]) +
# CIs for white voters
geom_ribbon(aes(ymin = lower, ymax = upper),
alpha = 0.3, linetype = 0,
show.legend = FALSE, fill = col[1]) +
# Other adjustments
scale_x_continuous(breaks = seq(20, 90, 10)) +
scale_y_continuous(breaks = seq(-0.1, 0.5, 0.1), limits = c(-0.1, 0.5),
)+
geom_hline(yintercept = 0)+
theme_cavis_hgrid +
theme(legend.position.inside = c(0.2, 0.13),
legend.key.size = unit(0.2, "cm"),
panel.grid.major.y = element_blank())+
labs(y = "Difference in Probability of Voting", x = "Age of Respondent")
# Relative Risk
AllMarried |>
filter(type == "RR") |>
ggplot(aes(x = xhyp, y = pe)) +
# Point estimates lines
geom_line(color = col[1]) +
# CIs for white voters
geom_ribbon(aes(ymin = lower, ymax = upper),
alpha = 0.3, linetype = 0,
show.legend = FALSE, fill = col[1]) +
# Other adjustments
scale_x_continuous(breaks = seq(20, 90, 10)) +
scale_y_continuous(breaks = seq(0.9, 1.5, 0.1), limits = c(0.9, 1.5),
)+
geom_hline(yintercept = 1)+
theme_cavis_hgrid +
theme(legend.position.inside = c(0.2, 0.13),
legend.key.size = unit(0.2, "cm"),
panel.grid.major.y = element_blank())+
labs(y = "Relative Risk of Voting", x = "Age of Respondent")
tileBelow we execute the same plot, but this time using
tile.
Please open the Lab5_practice.rmd and make a plot of expected probabilities. Your plot should look similar to the one below.
Plot of first difference can be visualized in the following code.
# Plot First Difference
# Set up lineplot trace of relative risk
marriedFDTrace <- lineplot(x=xhyp,
y=marriedFD$pe,
lower=marriedFD$lower,
upper=marriedFD$upper,
col=col[1],
extrapolate=list(data=mdata2[,2:ncol(mdata2)],
cfact=marriedScen$x[,2:ncol(marriedScen$x)],
omit.extrapolated=TRUE),
plot=1)
# Set up baseline: for first difference, this is 0
baseline <- linesTile(x=c(18,94),
y=c(0,0),
plot=1)
# Set up traces with labels and legend
labelFDTrace <- textTile(labels=c("Married compared \n to Not Married"),
x=c( 40),
y=c( 0.20),
col=col[1],
plot=1)
legendFDTrace <- textTile(labels=c("Logit estimates:", "95% confidence", "interval is shaded"),
x=c(80, 80, 80),
y=c(-0.02, -0.05, -0.08),
cex=0.9,
plot=1)
# Plot traces using tile
tile(marriedFDTrace,
labelFDTrace,
legendFDTrace,
baseline,
limits=c(18,94,-0.1,0.5),
xaxis=list(at=c(20,30,40,50,60,70,80,90), fontsize=11),
yaxis=list(label.loc=-0.5, major=FALSE, fontsize=11),
xaxistitle=list(labels="Age of Respondent", y=-0.5),
yaxistitle=list(labels="Difference in Probability of Voting"),
width=list(null=5,yaxistitle=4,yaxis.labelspace=-0.5)#,
#output=list(file="marriedFD",width=5.5)
)
Now, can you make a plot of relative risk? Again, your plot should look similar to the one below.
In Maximum Likelihood Estimation (MLE), the likelihood function measures how well a model explains the observed data. It calculates the probability of the observed data given the model’s parameters. A higher likelihood value indicates that the model better fits the data.
In model selection, metrics like Akaike Information Criterion (AIC) are built upon the log likelihood. A higher log likelihood means a better fit, but AIC also includes a penalty for model complexity to avoid overfitting. Thus, likelihood is key in assessing model fit, but it must be balanced with complexity to ensure the model is both accurate and generalizable.
The AIC is a measure used to assess the fit of non-nested models. It balances model fit and complexity by penalizing overfitting. AIC is defined as:
\[ AIC(\mathcal{M}) = D(\mathcal{M}) + 2 \times |\mathcal{M}| \]
Where:
A lower AIC indicates a better balance between goodness of fit and model simplicity. Smaller AIC values are preferred because they suggest that the model explains the data well without overfitting.