tile
To derive the maximum likelihood estimate (MLE) from a Bernoulli distribution, we start by recognizing that binary data fit the assumptions of a Bernoulli distribution. This distribution models a single trial with two possible, mutually exclusive and exhaustive outcomes: success (\(y_i = 1\)) or failure (\(y_i = 0\)).
The probability mass function (PMF) of a Bernoulli trial is:
\[ \text{Pr}(y_i = 1|\pi_i) = f_{\text{Bernoulli}}(y_i|\pi_i) = \pi_i^{y_i}(1 - \pi_i)^{1 - y_i} \]
where \(\pi_i\) is the probability of success, and the expectation of \(y_i\) is \(\mathbb{E}(y_i) = \pi_i\). This framework allows us to construct the likelihood function for a sample of \(n\) observations:
\[ \mathcal{L}(\pi|\mathbf{y}) \propto \prod_{i=1}^{n} \pi_i^{y_i}(1 - \pi_i)^{1 - y_i} \]
Taking the log-likelihood simplifies the product into a sum:
\[ \log \mathcal{L}(\pi|\mathbf{y}) \propto \sum_{i=1}^{n} \left[ \; y_i \log \pi_i + (1 - y_i) \log (1 - \pi_i) \; \right] \]
This log-likelihood is straightforward to maximize numerically and is approximately quadratic, making it easy to solve using standard optimization methods.
This expression is general and applies to any binary model (like logit or probit). Now, we proceed to define the systematic component for both models and plug it into the log-likelihood.
One popular way to model the systematic component \(\pi_i\) (the probability of success) is to use the logistic function. The logistic function maps any real number \(x_i \beta\) to the interval \([0, 1]\), ensuring the limits of a probability (no extrapolation):
\[ \pi_i = \text{logit}^{-1}(x_i \beta) = \frac{\exp(x_i \beta)}{1 + \exp(x_i \beta)} = \frac{1}{1 + \exp(-x_i \beta)} \]
We now substitute the logit function for \(\pi_i\) into the log-likelihood:
\[ \log \mathcal{L}(\boldsymbol{\beta}|\mathbf{y},\mathbf{x}) \propto \sum_{i=1}^{n} \left[ y_i \log \left( \frac{1}{1 + \exp(-x_i \boldsymbol{\beta})} \right) + (1 - y_i) \log \left( 1 - \frac{1}{1 + \exp(-x_i \boldsymbol{\beta})} \right) \right] \]
We can simplify the above expression to get the log-likelihood for the logit model:
\[ \log \mathcal{L}(\boldsymbol{\beta}|\mathbf{y},\mathbf{x}) \propto \sum_{i=1}^{n} \left[ y_i x_i \boldsymbol{\beta} - \log(1 + \exp(x_i \boldsymbol{\beta})) \right] \]
To estimate \(\boldsymbol{\beta}\),
we maximize this log-likelihood function with respect to \(\boldsymbol{\beta}\) via numerical methods
using optim
. The resulting estimates are the
maximum likelihood estimates (MLE) of the
parameters.
An alternative to the logit model is the probit model, which uses the cumulative distribution function (CDF) of the standard Normal distribution to map \(x_i \beta\) to \(\pi_i\):
\[ \pi_i = \Phi(x_i \beta) = \int_{-\infty}^{x_i \beta} \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{y_i^{*2}}{2}\right) dy_i^{*} \]
Here, \(\Phi(\cdot)\) is the probit function, which is the inverse of the standard normal CDF. The probit model assumes that the probability of success follows a normal CDF:
\[ \pi_i = \Phi(x_i \beta) \]
Once again, the starting point is the Bernoulli log-likelihood:
\[ \log \mathcal{L}(\pi|\mathbf{y}) \propto \sum_{i=1}^{n} \left[ y_i \log \pi_i + (1 - y_i) \log (1 - \pi_i) \right] \]
In the probit model, the systematic component \(\pi_i\) is expressed using the cumulative distribution function (CDF) of the standard normal distribution \(\Phi(\cdot)\):
\[ \pi_i = \Phi(\mathbf{x}_i \boldsymbol{\beta}) \]
We substitute the systematic component \(\Phi(\cdot)\):
\[ \log \mathcal{L}(\boldsymbol{\beta}|\mathbf{y}, \mathbf{x}) \propto \sum_{i=1}^{n} \left[ y_i \log \Phi(x_i \boldsymbol{\beta}) + (1 - y_i) \log (1 - \Phi(x_i \boldsymbol{\beta})) \right] \]
Unlike the logit model, the probit log-likelihood does not have a
closed-form simplification like the logistic function because \(\Phi(\cdot)\) cannot be expressed in a
simple algebraic form. We therefore compute the estimates of \(\boldsymbol{\beta}\) via numerical methods
(e.g. gradient descent) using optim
.
Logit and probit models typically yield similar results when the latent probability distribution is symmetric, with a 0.5 baseline probability. If we believe the data-generating process (DGP) does not follow this symmetry, a model like scobit (skewed-logit) might be better suited, as it is robust to asymmetry. Generally, though, logit and probit models produce comparable probability estimates.
Their key difference lies in the link functions:
Both models estimate \(\boldsymbol{\beta}\) by maximizing their log-likelihood functions. While the choice between logit and probit often depends on computational preferences and ease of interpretation, both are founded on the Bernoulli distribution.
We are now going to replicate the same application that we saw during lectures, using the American National Election Survey (ANES) in 2000, using the item survey “Did you vote in 2000 election?”
Turnout is binary: people either vote (\(\text{Vote}_i = 1\)) or they do not (\(\text{Vote}_i = 0\))
We can model the probability of voting in the 2000 U.S. presidential election as a function of Age, High School Degree (HS Degree), and College Degree (ColDeg). \[ \begin{aligned} Vote_i & \sim \text{Bernoulli}(\pi_i) \\ \pi_i & = \text{logit}^{-1}(\mathbf{x}_i \boldsymbol{\beta}) \end{aligned} \]
where \(\mathbf{x}_i\) represents the vector of covariates for individual \(i\) (including age, high school degree, and college degree) and \(\boldsymbol{\beta}\) is the vector of coefficients.
Below we estimate this model by manually defining the
likelihood and using optim
# Load data
data <- read_csv("https://faculty.washington.edu/cadolph/mle/nes00a.csv")
# Estimate logit model using optim()
# Construct variables and model objects
y <- data$vote00
x <-
data |>
mutate(age2 = age^2) |>
select(age,age2,hsdeg,coldeg) |> as.matrix()
# Likelihood function for logit
llkLogit <- function(param,y,x) {
## 1. Arrange the data and include an intercept
# intercept
os <- rep(1,length(x[,1]))
# bind intercept + covariates
x <- cbind(os,x)
## 2. define parameters to optimize and systematic component
b <- param[ 1 : ncol(x) ]
xb <- x%*%b
# define log-likelihood
-sum(y*xb - log(1 + exp(xb)));
# optim is a minimizer, so min -ln L(param|y)
}
# Fit logit model using optim
lsResult <- lm(y~x) # use ls estimates as starting values
stval <- lsResult$coefficients # initial guesses
logit_optim <- optim(par = stval,
fn = llkLogit,
method = "Nelder-Mead", # notice method
hessian = TRUE,
y = y,
x = x)
(pe_optim <- logit_optim$par) # point estimates
## (Intercept) xage xage2 xhsdeg xcoldeg
## -3.0120951884 0.0745076721 -0.0004408562 1.1219062683 1.0812752168
(vc_optim <- solve(logit_optim$hessian)) # var-cov matrix
## (Intercept) xage xage2 xhsdeg
## (Intercept) 6.821849e-02 -8.281879e-04 3.462127e-06 -3.671212e-02
## xage -8.281879e-04 1.774582e-05 -1.366226e-07 2.490703e-04
## xage2 3.462127e-06 -1.366226e-07 2.763014e-09 -1.633932e-06
## xhsdeg -3.671212e-02 2.490703e-04 -1.633932e-06 3.241723e-02
## xcoldeg -3.775257e-03 1.120854e-04 -1.717767e-06 -2.837779e-03
## xcoldeg
## (Intercept) -3.775257e-03
## xage 1.120854e-04
## xage2 -1.717767e-06
## xhsdeg -2.837779e-03
## xcoldeg 1.822990e-02
(se_optim <- sqrt(diag(vc_optim))) # standard errors
## (Intercept) xage xage2 xhsdeg xcoldeg
## 2.611867e-01 4.212578e-03 5.256438e-05 1.800479e-01 1.350181e-01
(ll_optim <- -logit_optim$value) # likelihood at maximum (reverse negative from function)
## [1] -1034.478
Alternatively, we can estimate a logit model using base R’s
glm
function, but remember to specify the correct link
function in the family argument:
binomial(link="logit")
binomial(link = "probit")
# Estimate logit model using glm()
# Set up model formula and model specific data frame
model <- vote00 ~ age + I(age^2) + hsdeg + coldeg
mdata <- extractdata(model, data, na.rm=TRUE)
# Run logit & extract results
logit_glm <- glm(model,
family=binomial(link="logit"),
data=mdata)
(pe_glm <- logit_glm$coefficients) # point estimates
## (Intercept) age I(age^2) hsdeg coldeg
## -3.0193890720 0.0747251922 -0.0004427014 1.1243907749 1.0795702357
(vc_glm <- vcov(logit_glm)) # var-cov matrix
## (Intercept) age I(age^2) hsdeg
## (Intercept) 1.748828e-01 -6.158232e-03 5.514176e-05 -2.453127e-02
## age -6.158232e-03 2.837215e-04 -2.733927e-06 -3.354912e-04
## I(age^2) 5.514176e-05 -2.733927e-06 2.740567e-08 5.053554e-06
## hsdeg -2.453127e-02 -3.354912e-04 5.053554e-06 3.240248e-02
## coldeg 9.264483e-04 -9.948239e-05 1.270344e-06 -3.618422e-03
## coldeg
## (Intercept) 9.264483e-04
## age -9.948239e-05
## I(age^2) 1.270344e-06
## hsdeg -3.618422e-03
## coldeg 1.721641e-02
(se_glm <- sqrt(diag(vc_glm))) # standard errors
## (Intercept) age I(age^2) hsdeg coldeg
## 0.4181899367 0.0168440337 0.0001655466 0.1800068893 0.1312113265
simcf
Once we have estimated our binary model, and extracted the point estimates and variance-covariance matrix, we can our simulation approach to predict counterfactuals and scenarios of interest from the model estimated.
## Simulate quantities of interest using simcf
##
## We could do this from the optim or glm results;
## here, we do from glm
# 1. Simulate parameter distributions using MASS:mvnorm
sims <- 10000
simbetas <- mvrnorm(n=sims,
mu=pe_glm,
Sigma=vc_glm)
# 2. Set up counterfactuals: all ages, each of three educations
## Age range
xhyp <- seq(18,97,1)
nscen <- length(xhyp)
## Scenarios of interest: college, high school, no high school
nohsScen <- hsScen <- collScen <- cfMake(formula=model,
data=mdata,
nscen=nscen)
# We have now created 3 simcf objects, one for each scenario of interest
# thus, we use cfChange function to change the hypothetical values (age)
# from each simulation, and the
for (i in 1:nscen) {
# No High school scenarios (loop over each age)
nohsScen <- cfChange(xscen=nohsScen, covname="age", x = xhyp[i], scen = i)
nohsScen <- cfChange(xscen=nohsScen, covname="hsdeg", x = 0, scen = i)
nohsScen <- cfChange(xscen=nohsScen, covname="coldeg", x = 0, scen = i)
# HS grad scenarios (loop over each age)
hsScen <- cfChange(xscen=hsScen, covname="age", x = xhyp[i], scen = i)
hsScen <- cfChange(xscen=hsScen, covname="hsdeg", x = 1, scen = i)
hsScen <- cfChange(xscen=hsScen, covname="coldeg", x = 0, scen = i)
# College grad scenarios (loop over each age)
collScen <- cfChange(xscen=collScen, covname="age", x = xhyp[i], scen = i)
collScen <- cfChange(xscen=collScen, covname="hsdeg", x = 1, scen = i)
collScen <- cfChange(xscen=collScen, covname="coldeg", x = 1, scen = i)
}
# 3. Simulate quantities of interest
# Simulate expected probabilities for all scenarios
nohsSims <- logitsimev(nohsScen, simbetas, ci=0.95)
hsSims <- logitsimev(hsScen, simbetas, ci=0.95)
collSims <- logitsimev(collScen, simbetas, ci=0.95)
ggplot
# ggplot
# First, we need to arrange in the same df all the predicitons with the hypothetical values
# Data wrangling with tidyverse
nohsSims_tb <-
nohsSims |>
bind_rows() |>
mutate(
xhyp = xhyp, # add "xhyp" as a covariate
edu = "nohs" # add a column to identify which edu. scenario
)
hsSims_tb <- hsSims |>
bind_rows() |>
mutate(xhyp = xhyp, edu = "hs")
collSims_tb <- collSims |>
bind_rows() |>
mutate(xhyp = xhyp, edu = "coll")
# bind them all together, in long format
allSims_tb <- bind_rows(nohsSims_tb, hsSims_tb, collSims_tb)
# Visualize expected probabilities using ggplot2
# Basic:
allSims_tb |>
ggplot(aes(x = xhyp,
y = pe,
ymax = upper,
ymin = lower,
colour = edu,
fill = edu)) +
geom_line() +
geom_ribbon(alpha = 0.2, linetype = 0) +
labs(y = "Probability of Voting", x = "Age of Respondent")
# More elaborate way if you are interested in approximating Chris's graph...
col <- brewer.pal(3, "Dark2")
allSims_tb |>
ggplot(aes(x = xhyp,
y = pe,
ymax = upper,
ymin = lower,
color = edu,
fill = edu)) +
geom_line() +
geom_ribbon(alpha = 0.2, linetype = 0) +
annotate(geom = "text", x = 55, y = 0.26,
label = "Less than HS", col = col[1]) +
annotate(geom = "text", x = 49, y = 0.56,
label = "High School", col = col[2]) +
annotate(geom = "text", x = 30, y = 0.87,
label = "College", col = col[3]) +
scale_color_manual(values = rev(col)) +
scale_fill_manual(values = rev(col)) +
scale_x_continuous(breaks = seq(20, 90, 10)) +
scale_y_continuous(breaks = seq(0, 1, 0.2),
limits = c(0, 1), expand = c(0, 0)) +
labs(y = "Probability of Voting", x = "Age of Respondent") +
theme_cavis_hgrid + # Brian's theme
theme(
panel.background = element_rect(fill = NA),
axis.line.x.bottom = element_line(size = 0.5),
axis.ticks.length = unit(0.5, "char"),
legend.position = "none"
)
tile
We can replicate the visualization using Chris’s tile
package.
With tile
, we first create data trace
objects. Since the predictor age is continuous, we’ll
use tile::lineplot
to plot lines for expected values and
their confidence intervals. We’ll then add text labels with
tile::textTile
.
The lineplot
function includes an
extrapolate=
argument, allowing you to limit visualization
to counterfactuals without in-sample extrapolation. For
this, the WhatIf
package must be installed and loaded. If
you’re unfamiliar with this setup, you can skip the argument, but note
that your predictions may include extrapolated values.
# tile
# Set up lineplot traces of expected probabilities
# When recycling this code, omit the extrapolate input
# if you are unsure how to use it correctly
nohsTrace <- lineplot(x=xhyp,
y=nohsSims$pe,
lower=nohsSims$lower,
upper=nohsSims$upper,
col=col[1],
# extrapolate requires package WhatIf
extrapolate=list(data=mdata[,2:ncol(mdata)],
cfact=nohsScen$x[,2:ncol(hsScen$x)],
omit.extrapolated=TRUE),
plot=1)
hsTrace <- lineplot(x=xhyp,
y=hsSims$pe,
lower=hsSims$lower,
upper=hsSims$upper,
col=col[2],
# extrapolate requires package WhatIf
extrapolate=list(data=mdata[,2:ncol(mdata)],
cfact=hsScen$x[,2:ncol(hsScen$x)],
omit.extrapolated=TRUE),
plot=1)
collTrace <- lineplot(x=xhyp,
y=collSims$pe,
lower=collSims$lower,
upper=collSims$upper,
col=col[3],
# extrapolate requires package WhatIf
extrapolate=list(data=mdata[,2:ncol(mdata)],
cfact=collScen$x[,2:ncol(hsScen$x)],
omit.extrapolated=TRUE),
plot=1)
# Set up traces with labels and legend
labelTrace <- textTile(labels=c("Less than HS", "High School", "College"),
x=c( 55, 49, 30),
y=c( 0.26, 0.56, 0.87),
col=col,
plot=1)
legendTrace <- textTile(labels=c("Logit estimates:", "95% confidence", "interval is shaded"),
x=c(82, 82, 82),
y=c(0.2, 0.15, 0.10),
cex=0.9,
plot=1)
# Plot traces using tile
tile(nohsTrace,
hsTrace,
collTrace,
labelTrace,
legendTrace,
limits=c(18,94,0,1),
xaxis=list(at=c(20,30,40,50,60,70,80,90)),
yaxis=list(label.loc=-0.5, major=FALSE),
xaxistitle=list(labels="Age of Respondent"),
yaxistitle=list(labels="Probability of Voting"),
width=list(null=5,yaxistitle=4,yaxis.labelspace=-0.5)
)
Next, we consider a new specification by adding the variable ever married.
We’ll estimate this updated model with glm
and simulate
expected values for new scenarios comparing married and non-married
respondents.
# Estimate logit model using glm()
# Set up model formula and model specific data frame
model2 <- vote00 ~ age + I(age^2) + hsdeg + coldeg + marriedo
mdata2 <- extractdata(model2, data, na.rm=TRUE)
# Run logit & extract results
logitM2 <- glm(model2, family=binomial(link="logit"), data=mdata2)
peM2 <- logitM2$coefficients # point estimates
vcM2 <- vcov(logitM2) # var-cov matrix
simcf
Now, we will compute quantities of interest (QoI) again. Specifically, we will calculate the first differences and relative risks between the scenarios of being married and not being married. This allows us to visualize the marginal effect of marriage on the expected probability and relative risk of voting.
When using simcf
to compute first differences or
relative risks with simf
, we need to set up two parallel
scenarios, x
and xpre
, in the
cfMake
object. The first, x
, represents the
current value, while the second, xpre
, represents the prior
value. For example, the first difference is calculated as
x - xpre
.
# Simulate parameter distributions
sims <- 10000
simbetasM2 <- mvrnorm(n=sims,
mu=peM2,
Sigma=vcM2)
# Set up counterfactuals: all ages, each of three educations
xhyp <- seq(18,97,1)
nscen <- length(xhyp)
marriedScen <- notmarrScen <-
cfMake(formula=model2,
data=mdata2,
nscen=nscen)
for (i in 1:nscen) {
# Married (loop over each age)
# Note below the careful use of before scenarios (xpre) and after scenarios (x)
# - we will use the marriedScen counterfactuals in FDs and RRs as well as EVs
marriedScen <- cfChange(xscen=marriedScen,
covname="age",
x = xhyp[i],
xpre= xhyp[i],
scen = i)
marriedScen <- cfChange(xscen=marriedScen,
covname="marriedo",
x = 1,
xpre= 0,
scen = i)
# Married (loop over each age)
notmarrScen <- cfChange(xscen=notmarrScen,
covname="age",
x = xhyp[i],
scen = i)
notmarrScen <- cfChange(xscen=notmarrScen,
covname="marriedo",
x = 0,
scen = i)
}
# Simulate expected probabilities for all scenarios
marriedSims <- logitsimev(marriedScen, simbetasM2, ci=0.95)
notmarrSims <- logitsimev(notmarrScen, simbetasM2, ci=0.95)
# Simulate first difference of voting wrt marriage
marriedFD <- logitsimfd(marriedScen, simbetasM2, ci=0.95)
# Simulate relative risk of voting wrt marriage
marriedRR <- logitsimrr(marriedScen, simbetasM2, ci=0.95)
ggplot
# ggplot
# Data wrangling in tidyverse
AllMarried <-
bind_rows(
marriedSims |>
bind_rows() |>
mutate(notmarried = 0,
xhyp = xhyp,
type = "EV"),
notmarrSims |>
bind_rows() |>
mutate(notmarried = 1,
xhyp = xhyp,
type = "EV"),
marriedFD |>
bind_rows() |>
mutate(notmarried = NA,
xhyp = xhyp,
type = "FD"),
marriedRR |>
bind_rows() |>
mutate(notmarried = NA,
xhyp = xhyp,
type = "RR")
) |>
mutate(notmarried = factor(notmarried))
AllMarried |>
filter(type == "EV") |>
ggplot(aes(x = xhyp, y = pe,
color =
notmarried,
fill = notmarried)) +
# Point estimates lines
geom_line(show.legend = FALSE) +
scale_color_manual(values = c(col[1],
col[2]),
labels = c("Currently Married",
"Not Married")) +
# CIs for married
geom_ribbon(aes(ymin = lower,
ymax = upper),
alpha = 0.3,
linetype = 0,
show.legend = FALSE) +
scale_fill_manual(values = c(col[1], col[2])) +
# CIs for not married
geom_line(aes(y = upper,
linetype = notmarried),
show.legend = FALSE) +
geom_line(aes(y = lower,
linetype = notmarried), show.legend = FALSE) +
scale_linetype_manual(values = c(0, 2)) +
# Label
annotate(geom = "text",
x = 40, y = 0.86,
label = "Currently Married",
col = col[1]) +
annotate(geom = "text",
x = 49, y = 0.46,
label = "Not Married",
col = col[2]) +
# Other adjustments
scale_x_continuous(breaks = seq(20, 90, 10)) +
scale_y_continuous(breaks = seq(0, 1, 0.2), limits = c(0, 1),
expand = c(0, 0)) +
theme_cavis_hgrid +
theme(legend.position = c(0.2, 0.13),
legend.key.size = unit(0.2, "cm"),
panel.grid.major.y = element_blank())+
labs(y = "Probability of Voting", x = "Age of Respondent")
# First Difference
AllMarried |>
filter(type == "FD") |>
ggplot(aes(x = xhyp, y = pe)) +
# Point estimates lines
geom_line(color = col[1]) +
# CIs for white voters
geom_ribbon(aes(ymin = lower, ymax = upper),
alpha = 0.3, linetype = 0,
show.legend = FALSE, fill = col[1]) +
# Other adjustments
scale_x_continuous(breaks = seq(20, 90, 10)) +
scale_y_continuous(breaks = seq(-0.1, 0.5, 0.1), limits = c(-0.1, 0.5),
)+
geom_hline(yintercept = 0)+
theme_cavis_hgrid +
theme(legend.position = c(0.2, 0.13),
legend.key.size = unit(0.2, "cm"),
panel.grid.major.y = element_blank())+
labs(y = "Difference in Probability of Voting", x = "Age of Respondent")
# Relative Risk
AllMarried |>
filter(type == "RR") |>
ggplot(aes(x = xhyp, y = pe)) +
# Point estimates lines
geom_line(color = col[1]) +
# CIs for white voters
geom_ribbon(aes(ymin = lower, ymax = upper),
alpha = 0.3, linetype = 0,
show.legend = FALSE, fill = col[1]) +
# Other adjustments
scale_x_continuous(breaks = seq(20, 90, 10)) +
scale_y_continuous(breaks = seq(0.9, 1.5, 0.1), limits = c(0.9, 1.5),
)+
geom_hline(yintercept = 1)+
theme_cavis_hgrid +
theme(legend.position = c(0.2, 0.13),
legend.key.size = unit(0.2, "cm"),
panel.grid.major.y = element_blank())+
labs(y = "Relative Risk of Voting", x = "Age of Respondent")
tile
Below we execute the same plot, but this time using
tile
. Notice
# tile
## Make plots using tile
# Get 3 nice colors for traces
col <- brewer.pal(3,"Dark2")
# Set up lineplot traces of expected probabilities
marriedTrace <- lineplot(x=xhyp,
y=marriedSims$pe,
lower=marriedSims$lower,
upper=marriedSims$upper,
col=col[1],
extrapolate=list(data=mdata2[,2:ncol(mdata2)],
cfact=marriedScen$x[,2:ncol(marriedScen$x)],
omit.extrapolated=TRUE),
plot=1)
notmarrTrace <- lineplot(x=xhyp,
y=notmarrSims$pe,
lower=notmarrSims$lower,
upper=notmarrSims$upper,
col=col[2],
ci = list(mark="dashed"),
extrapolate=list(data=mdata2[,2:ncol(mdata2)],
cfact=notmarrScen$x[,2:ncol(notmarrScen$x)],
omit.extrapolated=TRUE),
plot=1)
# Set up traces with labels and legend
labelTrace <- textTile(labels=c("Currently Married", "Not Married"),
x=c( 35, 53),
y=c( 0.8, 0.56),
col=col,
plot=1)
legendTrace <- textTile(labels=c("Logit estimates:", "95% confidence", "interval is shaded"),
x=c(80, 80, 80),
y=c(0.2, 0.15, 0.10),
cex=0.9,
plot=1)
# Plot traces using tile
tile(marriedTrace,
notmarrTrace,
labelTrace,
legendTrace,
limits=c(18,94,0,1),
xaxis=list(at=c(20,30,40,50,60,70,80,90)),
yaxis=list(label.loc=-0.5, major=FALSE),
xaxistitle=list(labels="Age of Respondent"),
yaxistitle=list(labels="Probability of Voting"),
width=list(null=5,yaxistitle=4,yaxis.labelspace=-0.5)
)
# Plot First Difference
# Set up lineplot trace of relative risk
marriedFDTrace <- lineplot(x=xhyp,
y=marriedFD$pe,
lower=marriedFD$lower,
upper=marriedFD$upper,
col=col[1],
extrapolate=list(data=mdata2[,2:ncol(mdata2)],
cfact=marriedScen$x[,2:ncol(marriedScen$x)],
omit.extrapolated=TRUE),
plot=1)
# Set up baseline: for first difference, this is 0
baseline <- linesTile(x=c(18,94),
y=c(0,0),
plot=1)
# Set up traces with labels and legend
labelFDTrace <- textTile(labels=c("Married compared \n to Not Married"),
x=c( 40),
y=c( 0.20),
col=col[1],
plot=1)
legendFDTrace <- textTile(labels=c("Logit estimates:", "95% confidence", "interval is shaded"),
x=c(80, 80, 80),
y=c(-0.02, -0.05, -0.08),
cex=0.9,
plot=1)
# Plot traces using tile
tile(marriedFDTrace,
labelFDTrace,
legendFDTrace,
baseline,
limits=c(18,94,-0.1,0.5),
xaxis=list(at=c(20,30,40,50,60,70,80,90)),
yaxis=list(label.loc=-0.5, major=FALSE),
xaxistitle=list(labels="Age of Respondent"),
yaxistitle=list(labels="Difference in Probability of Voting"),
width=list(null=5,yaxistitle=4,yaxis.labelspace=-0.5)#,
#output=list(file="marriedFD",width=5.5)
)
# Plot Relative Risk
# Set up lineplot trace of relative risk
marriedRRTrace <- lineplot(x=xhyp,
y=marriedRR$pe,
lower=marriedRR$lower,
upper=marriedRR$upper,
col=col[1],
extrapolate=list(data=mdata2[,2:ncol(mdata2)],
cfact=marriedScen$x[,2:ncol(marriedScen$x)],
omit.extrapolated=TRUE),
plot=1)
# Set up baseline: for relative risk, this is 1
baseline <- linesTile(x=c(18,94),
y=c(1,1),
plot=1)
# Set up traces with labels and legend
labelRRTrace <- textTile(labels=c("Married compared \n to Not Married"),
x=c( 55),
y=c( 1.25),
col=col[1],
plot=1)
legendRRTrace <- textTile(labels=c("Logit estimates:", "95% confidence", "interval is shaded"),
x=c(80, 80, 80),
y=c(0.98, 0.95, 0.92),
cex=0.9,
plot=1)
# Plot traces using tile
tile(marriedRRTrace,
labelRRTrace,
legendRRTrace,
baseline,
limits=c(18,94,0.9,1.5),
xaxis=list(at=c(20,30,40,50,60,70,80,90)),
yaxis=list(label.loc=-0.5, major=FALSE),
xaxistitle=list(labels="Age of Respondent"),
yaxistitle=list(labels="Relative Risk of Voting"),
width=list(null=5,yaxistitle=4,yaxis.labelspace=-0.5)
)
In Maximum Likelihood Estimation (MLE), the likelihood function measures how well a model explains the observed data. It calculates the probability of the observed data given the model’s parameters. A higher likelihood value indicates that the model better fits the data.
In model selection, metrics like Akaike Information Criterion (AIC) are built upon the log likelihood. A higher log likelihood means a better fit, but AIC also includes a penalty for model complexity to avoid overfitting. Thus, likelihood is key in assessing model fit, but it must be balanced with complexity to ensure the model is both accurate and generalizable.
The AIC is a measure used to assess the fit of non-nested models. It balances model fit and complexity by penalizing overfitting. AIC is defined as:
\[ AIC(\mathcal{M}) = D(\mathcal{M}) + 2 \times |\mathcal{M}| \]
Where:
A lower AIC indicates a better balance between goodness of fit and model simplicity. Smaller AIC values are preferred because they suggest that the model explains the data well without overfitting.