Ordered Models
- Simulating an Ordered Probit Model
Application: Sexist Attitudes to Working Mothers

Ordered Models

Ordered models are discrete distributions used to analyze dependent variables with more than two categories, where those categories have a natural order or ranking. Unlike binary models, which only accommodate two possible outcomes, ordered models allow for multiple ordered outcomes and are often employed in situations where the dependent variable is ordinal.

To estimate an ordered model, we typically use a latent variable interpretation of the dependent variable. The idea behind this interpretation is that while we would ideally measure the dependent variable on a continuous scale and express it as a linear function of one or more independent variables, in practice, the variable is only observed in a limited number of categories. This latent variable approach provides a convenient framework for modeling such outcomes.

The unobserved continuous measure is denoted as \(Y^*\) (a latent or underlying variable). The observed dependent variable, \(Y\), represents categories that each cover a specific range of this latent variable. Formally, given \(k\) categories and labeling the cutpoints that separate these categories as \(\tau_k\), \(Y\) can be defined as:

\[ Y = \begin{cases} 1 & \text{if} \quad Y^* < \tau_1 \\ 2 & \text{if} \quad \tau_1 \leq Y^* < \tau_2 \\ 3 & \text{if} \quad \tau_2 \leq Y^* < \tau_3 \\ \vdots & \\ k & \text{if} \quad Y^* \geq \tau_{k-1} \end{cases} \]

Simulating an Ordered Probit Model

In this example, we simulate an ordered probit model, which assumes that the unobserved latent variable \(Y^*\) follows a normal distribution.

The systematic component of the model is common across all categories, meaning that the effect of a change in an independent variable \(X\) on the probability of a respondent moving from one category to the next on the observed variable \(Y\) is the same across all categories. This feature is known as the parallel probits assumption or proportional odds assumption (more on this in the next lab).

To implement the latent variable framework, we generate random samples of \(Y^*\) and then assign these simulated values of \(Y^*\) to different categories of \(Y\) based on the position of \(Y^*\) relative to the cutpoints. Therefore, defining these cutpoints is a critical step.

Location of the Latent Variable \(Y^*\)

The location of the latent variable \(Y^*\) can be set in two common ways. One approach is to fix one of the cutpoints \(\tau_k\) at zero and estimate the intercept term \(\beta_0\) along with the other cutpoints relative to this fixed point. This method is often the default in binary model estimation routines.

The second approach, which we use here, is to fix the intercept at zero and estimate all cutpoints relative to it. Importantly, this choice does not affect the relative estimates and coefficients, but restricting either the intercept or a cutpoint to zero is necessary for identification.

Since the estimator fixes the intercept at zero, we must define the cutpoints in the data-generating process (DGP) to check if the estimator accurately recovers them. This requires defining the cutpoints relative to the expected mean and variance of \(Y^*\), based on its assumed probability distribution.

Scaling the Latent Variable \(Y^*\)

Since \(Y^*\) is a latent variable that we do not directly observe or measure, it lacks an inherent scale. To estimate the model, we must establish a scale for \(Y^*\), which involves defining both its location and variance.

Probability Distribution Assumption: we must first assume a probability distribution for \(Y^*\). In ordered models, this distribution is typically either the normal distribution (for ordered probit models) or the logistic distribution (for ordered logit models).
Variance of \(Y^*\): - For a standard normal distribution, the variance is equal to 1. - For a standard logistic distribution, the variance is \(\frac{\pi^2}{3}\).

This variance specification applies to the stochastic portion of \(Y^*\). Without a loss of generality, it could be scaled by a constant to introduce more or less variance if desired. Importantly, in binary logit and probit models, we also make these same assumptions about the variance of the stochastic term.

Simulation code

Below is the code to simulate an ordered probit model. In this example, we will simulate an ordered categorical variable with four discrete levels \(Y = {1, 2, 3, 4}\). You can think of these levels as survey responses or any other common application involving discrete ordered distributions.

# Simulation of Oredered Probit Model

set.seed(8732)


b0 <- 0    # true value of the intercept
b1 <- 0.5  # true value for the slope

n <- 10000 # sample size

# create independent variable
X <- rnorm(n, mean=0, sd=1)

# Systematic component
XB <- b0 + b1 * X

# SD of error of unobserved Y*
sd.error <- 1

# Define the true cutpoints
tau1 <- qnorm(0.1, 
              mean = mean(XB),
              sd = sqrt(var(XB)) + sd.error^2)

tau2 <- qnorm(0.3, 
              mean = mean(XB),
              sd = sqrt(var(XB)) + sd.error^2)

tau3 <- qnorm(0.8, 
              mean = mean(XB),
              sd = sqrt(var(XB)) + sd.error^2)

# set the latent unobserved variable
Y.star <- rnorm(n, XB, sd.error)

# Define Y as a vector of NAs with length n
Y <- rep(NA, n)
 

# Create observed categorical outcomes Y
# Set Y equal to a value according to Y.star

Y <- dplyr::case_when(
  Y.star < tau1 ~ 1,
  Y.star >= tau1 & Y.star < tau2 ~ 2,
  Y.star >= tau2 & Y.star < tau3 ~ 3,
  Y.star >= tau3 ~ 4
  )

# Save the data
dat <- tibble(Y.star = Y.star,
              Y = Y,
              X = X)

To understand the intuition behind the latent variable \(Y^*\), the observed categories \(Y\), and how the estimation maps the cutoff thresholds into the latent continuous space, we present some visualizations of the simulated data below.

Now, we can estimate an ordinal probit model using the polr function from the MASS package. This allows us to compare the estimated cutoff points and the marginal effect of \(X\) against the true parameters used in the simulation. While the estimated values may not perfectly match the true parameters (as is expected due to random variation in the sample), the estimates should be close, demonstrating the effectiveness of the model in recovering the underlying structure of the data.

# Estimate ordered model
model <- MASS::polr(as.ordered(Y) ~ X, 
                    method = "probit", 
                    Hess = TRUE,
                    data = )

summary(model)

## Call:
## MASS::polr(formula = as.ordered(Y) ~ X, Hess = TRUE, method = "probit")
## 
## Coefficients:
##    Value Std. Error t value
## X 0.5036    0.01229   40.97
## 
## Intercepts:
##     Value    Std. Error t value 
## 1|2  -1.8885   0.0242   -78.1950
## 2|3  -0.7741   0.0146   -52.9629
## 3|4   1.2570   0.0173    72.5066
## 
## Residual Deviance: 18635.07 
## AIC: 18643.07

# Compare estimates vs true

tibble(
  parameter = c("tau1","tau2","tau3","X"),
  estimate = c(round(as.numeric(model$zeta),4),b1),
  true = c(round(tau1,4),round(tau2,4),round(tau3,4),
           round(as.numeric(model$coefficients),4))
)

## # A tibble: 4 × 3
##   parameter estimate   true
##   <chr>        <dbl>  <dbl>
## 1 tau1        -1.89  -1.91 
## 2 tau2        -0.774 -0.775
## 3 tau3         1.26   1.28 
## 4 X            0.5    0.504

Application: Sexist Attitudes to Working Mothers

Below, we revisit the example discussed in class on sexist attitudes toward working mothers. The dependent variable is an ordered categorical variable with four levels, ranging from “Strongly Disagree” to “Strongly Agree.”

The wording of the survey item is as follows:

A working mother can establish just as warm and secure of a relationship with her child as a mother who does not work.

We will filter the sample into one sample from the 1977 wave, and the other on the 1989 wave. The main variables of interest are:

male: whether the respondent was male (1) or female (0).
white: whether the respondent was white (1) or some other race (0).
age: respondent’s age in years.
education: respondent’s years of education.
prestige: completed prestige % of other survey respondents rating this R’s job as prestigious.

## Load data
workmom <- read_csv("https://faculty.washington.edu/cadolph/mle/ordwarm2.csv")

names(workmom)

##  [1] "warm"    "yr89"    "male"    "white"   "age"     "ed"      "prst"   
##  [8] "warmlt2" "warmlt3" "warmlt4"

workmom77 <- workmom |> filter(yr89==0)
workmom89 <- workmom |> filter(yr89==1)


# Data from 1977, 1989 GSS:  Attitudes towards working mothers

## Mother can have warm feelings towards child?
y <- workmom77$warm  

## selecting covariates
x <- workmom77 |> 
  select(male,white,age,ed,prst) |> 
  as.matrix()

## male respondent; white resp; age of resp;
## years of education of respondent;
## prestige of respondent's occupation (% considering prestigious)

Estimation: `optim` and `polr`

We will first estimate an ordinal probit using a likelihood function and optim(). After, we will re-estimate the model but this time using polr() from MASS:: library.

Probabilities we want to estimate in four category case:

\[ \begin{aligned} \text{Pr}(y_i=1|\boldsymbol{x_i})&=\Phi(\tau_1-\alpha-\boldsymbol{x_i\beta})\\ \text{Pr}(y_i=2|\boldsymbol{x_i})&=\Phi(\tau_2-\alpha-\boldsymbol{x_i\beta})-\Phi(\tau_{1}-\alpha-\boldsymbol{x_i\beta})\\ \text{Pr}(y_i=3|\boldsymbol{x_i})&=\Phi(\tau_3-\alpha-\boldsymbol{x_i\beta})-\Phi(\tau_{2}-\alpha-\boldsymbol{x_i\beta})\\ \text{Pr}(y_i=4|\boldsymbol{x_i})&=1-\Phi(\tau_3-\alpha-\boldsymbol{x_i\beta}) \end{aligned} \]

To identify the model, we commonly make one of two assumptions:

Assume that \(\tau_{1}=0\). This is also the identifying assumption of logit and probit. optim() uses this.
Assume that \(\alpha=0\). polr() uses this.

The likelihood function for ordered probit finds the \(\boldsymbol{\beta}\) and \(\tau\) that make the observed data most likely.

# Model specification (for simcf)
model <- warm ~ male + white + age + ed + prst


## Likelihood for 4 category ordered probit
llk.oprobit4 <- function(param, x, y) {
  # preliminaries
  os <- rep(1, nrow(x))
  x <- cbind(os, x)  
  b <- param[1:ncol(x)]
  t2 <- param[(ncol(x)+1)]
  t3 <- param[(ncol(x)+2)]
  t4 <- param[(ncol(x)+3)]

  # probabilities and penalty function
  xb <- x%*%b
  p1 <- log(pnorm(-xb))
  # penalty function to keep t2>0
  if (t2<=0)  p2 <- -(abs(t2)*10000)
  else p2 <- log(pnorm(t2-xb)-pnorm(-xb))
  # penalty to keep t3>t2
  if (t3<=t2) p3 <- -((t2-t3)*10000)
  else p3 <- log(pnorm(t3-xb)-pnorm(t2-xb))
  # penalty to keep t3>t2
  if (t3<=t2) p3 <- -((t2-t3)*10000)
  else p3 <- log(pnorm(t3-xb)-pnorm(t2-xb))  
  p4 <- log(1-pnorm(t3-xb)) 
  
  # -1 * log likelihood (optim is a minimizer)
  -sum(cbind(y==1,y==2,y==3,y==4) * cbind(p1,p2,p3,p4))
}


# MLE via optim

## use ls estimates as starting values
ls.result <- lm(model, data=workmom77)

# initial guesses
stval <- c(coef(ls.result),1,2)

oprobit.res77 <- optim(stval, 
                       llk.oprobit4, 
                       method="BFGS", 
                       x=x, 
                       y=y, 
                       hessian=TRUE)

(pe77 <- oprobit.res77$par)                # point estimates

##  (Intercept)         male        white          age           ed         prst 
##  1.321689587 -0.397228916 -0.238335818 -0.011703265  0.044989779  0.002116029 
##                           
##  1.016421525  2.077970893

(vc77 <- solve(oprobit.res77$hessian))     # var-cov matrix

##               (Intercept)          male         white           age
## (Intercept)  3.205974e-02 -1.832887e-03 -5.475643e-03 -2.262774e-04
## male        -1.832887e-03  3.414003e-03 -3.709616e-05  6.780298e-06
## white       -5.475643e-03 -3.709616e-05  8.296760e-03 -8.602415e-06
## age         -2.262774e-04  6.780298e-06 -8.602415e-06  3.670450e-06
## ed          -1.348305e-03  1.698673e-05 -6.289883e-05  8.993977e-06
## prst         3.980131e-05 -1.157565e-05 -2.126967e-05 -1.186523e-06
##              1.163657e-03 -1.141861e-04 -1.208110e-04 -5.034573e-06
##              1.745683e-03 -3.829872e-04 -2.107986e-04 -1.175574e-05
##                        ed          prst                            
## (Intercept) -1.348305e-03  3.980131e-05  1.163657e-03  1.745683e-03
## male         1.698673e-05 -1.157565e-05 -1.141861e-04 -3.829872e-04
## white       -6.289883e-05 -2.126967e-05 -1.208110e-04 -2.107986e-04
## age          8.993977e-06 -1.186523e-06 -5.034573e-06 -1.175574e-05
## ed           1.458304e-04 -1.844602e-05  2.900561e-05  4.117394e-05
## prst        -1.844602e-05  6.691980e-06 -6.657005e-07  2.512622e-06
##              2.900561e-05 -6.657005e-07  1.688779e-03  1.414225e-03
##              4.117394e-05  2.512622e-06  1.414225e-03  2.962733e-03

(se77 <- sqrt(diag(vc77)))                 # standard errors

## (Intercept)        male       white         age          ed        prst 
## 0.179052337 0.058429471 0.091086552 0.001915842 0.012076026 0.002586886 
##                         
## 0.041094751 0.054430991

(ll77 <- -oprobit.res77$value)             # likelihood at maximum

## [1] -1758.824

# Use MASS::polr to do ordered probit (need to turn DV into categorical)

workmom77 <- 
  workmom77 |> 
  mutate(
    warm_f = factor(warm,
                    labels = c("Strongly Disagree",
                               "Disagree",
                               "Agree",
                               "Strongly Agree"))
  ) |> select(warm_f,everything())

glm.res77 <- MASS::polr(warm_f ~ male + white + age + ed + prst, 
                        data=workmom77,
                        method="probit",
                        Hess = TRUE,
                        na.action=na.omit)

summary(glm.res77) # polr output

## Call:
## MASS::polr(formula = warm_f ~ male + white + age + ed + prst, 
##     data = workmom77, na.action = na.omit, Hess = TRUE, method = "probit")
## 
## Coefficients:
##           Value Std. Error t value
## male  -0.397225   0.058429 -6.7984
## white -0.238340   0.091087 -2.6166
## age   -0.011701   0.001916 -6.1077
## ed     0.044996   0.012076  3.7260
## prst   0.002115   0.002587  0.8175
## 
## Intercepts:
##                            Value   Std. Error t value
## Strongly Disagree|Disagree -1.3216  0.1791    -7.3809
## Disagree|Agree             -0.3052  0.1773    -1.7216
## Agree|Strongly Agree        0.7564  0.1776     4.2597
## 
## Residual Deviance: 3517.649 
## AIC: 3533.649

pe77 # optim ouput

##  (Intercept)         male        white          age           ed         prst 
##  1.321689587 -0.397228916 -0.238335818 -0.011703265  0.044989779  0.002116029 
##                           
##  1.016421525  2.077970893

# we can retrive the same threhsolds by subtracting the intercepts

0 - pe77[1]     # tau 1

## (Intercept) 
##    -1.32169

pe77[7] - pe77[1]# tau 2

##            
## -0.3052681

pe77[8] - pe77[1]# tau 3

##           
## 0.7562813

Notice that the estimation of the coefficients using either optim or polr is practically the same.

In the optim routine, where we have estimated the intercept and applied the constraint of 0 in \(\tau_1\), we can get the same thresholds and cutoff points as in polr using the above formula of \(\tau_1-\alpha\).

Quantities of Interest

Now that we have estimated an ordinal probit model, the next step, as usual, is to predict quantities of interest using simcf().

In some cases, it may be more useful to report the marginal effect of a discrete change. This is straightforward for binary discrete variables: we can simply compute the expected value for a specific dichotomous state, such as \(\mathbb{E}[Y \mid \text{male} = 0]\) or \(\mathbb{E}[Y \mid \text{male} = 1]\). Alternatively, we can calculate the first difference: \(\mathbb{E}[Y \mid \text{male} = 1] - \mathbb{E}[Y \mid \text{male} = 0]\).

However, discrete changes for continuous variables, such as age in this example, are less straightforward. There are various strategies to compute and present meaningful discrete changes for continuous variables that can be reported.

Three common strategies are:

Theoretically meaningful quantities: For example, we might hypothesize meaningful cohort effects between respondents in their 20s or younger compared to those in their 50s or older. While this approach may seem arbitrary, it is motivated by theoretical considerations and requires assumptions based on domain knowledge.
Standard deviation changes: This involves comparing the mean minus one standard deviation with the mean plus one standard deviation. However, caution is necessary if the distribution is highly skewed; in such cases, it may be better to use the third strategy.
Changes in quantiles: Select quantiles that offer a clear contrast, such as the interquartile range (comparing the 3rd quartile with the 1st quartile) or comparing the 80th percentile to the 20th percentile.

In general, I have found that the third strategy often provides the best representation of discrete changes, though it is important to acknowledge that each approach involves some degree of arbitrariness.

## Q: how many point estimates does polr provide? Think about how to fit them in simcf

# Simulate parameters from predictive distributions
sims <- 10000
simbetas <- mvrnorm(sims, pe77, vc77)       # draw parameters, using MASS::mvrnorm

# Create example counterfactuals
xhyp <- cfMake(model, workmom77, nscen=10)

xhyp <- cfName(xhyp,"Male", scen=1)
xhyp <- cfChange(xhyp, "male", x=1, xpre=0, scen=1)

xhyp <- cfName(xhyp, "Female", scen=2)
xhyp <- cfChange(xhyp, "male", x=0, xpre=1, scen=2)

xhyp <- cfName(xhyp, "Nonwhite", scen=3)
xhyp <- cfChange(xhyp, "white", x=0, xpre=1, scen=3)

xhyp <- cfName(xhyp, "White", scen=4)
xhyp <- cfChange(xhyp, "white", x=1, xpre=0, scen=4)

xhyp <- cfName(xhyp, "Age + 1sd = 61", scen=5)
xhyp <- cfChange(xhyp, "age",
                 x=mean(na.omit(workmom77$age))+sd(na.omit(workmom77$age)),
                 xpre=mean(na.omit(workmom77$age)),
                 scen=5)

xhyp <- cfName(xhyp, "Age - 1sd = 28", scen=6)
xhyp <- cfChange(xhyp, "age",
                 x=mean(na.omit(workmom77$age))-sd(na.omit(workmom77$age)),
                 xpre=mean(na.omit(workmom77$age)),
                 scen=6)

xhyp <- cfName(xhyp, "High School Grad", scen=7)
xhyp <- cfChange(xhyp, "ed", x=12, xpre=mean(na.omit(workmom77$ed)), scen=7)

xhyp <- cfName(xhyp,"College Grad", scen=8)
xhyp <- cfChange(xhyp, "ed", x=16, xpre=mean(na.omit(workmom77$ed)), scen=8)

xhyp <- cfName(xhyp,"High Prestige Job (+1 sd)", scen=9)
xhyp <- cfChange(xhyp, "prst",
                 x=mean(na.omit(workmom77$prst))+sd(na.omit(workmom77$prst)),
                 xpre=mean(na.omit(workmom77$prst)),
                 scen=9)

xhyp <- cfName(xhyp,"Low Prestige Job (-1 sd)", scen=10)
xhyp <- cfChange(xhyp, "prst",
                 x=mean(na.omit(workmom77$prst))-sd(na.omit(workmom77$prst)),
                 xpre=mean(na.omit(workmom77$prst)),
                 scen=10)

# Simulate expected probabilities (all four categories)
oprobit.ev77 <- oprobitsimev(xhyp, simbetas, cat=4)

# Simulate first differences (all four categories)
oprobit.fd77 <- oprobitsimfd(xhyp, simbetas, cat=4)

# Simulate relative risks (all four categories)
oprobit.rr77 <- oprobitsimrr(xhyp, simbetas, cat=4)

Visualization: `ggplot`

# Plot predicted probabilities for all four categories, sorted by size


##  ggplot

CatNames <- 
  c("Strongly\n Disagree", 
    "Disagree",  
    "Agree", 
    "Strongly\n Agree")

scenNames <- row.names(xhyp$x)


oprobit.ev77.tidy <- 
  oprobit.ev77 |>
  # Set names for columns
  map(set_colnames, CatNames) |> 
  # Set names for rows
  map(set_rownames, scenNames) |> 
  # collapse list into data frame
  map_df(as_tibble, rownames = "scenNames", .id = "Type") |>
  # Wrangle to long format
  pivot_longer(CatNames[1]:CatNames[4], names_to = "Category") |>
  # Wrangle to ggplot friendly
  pivot_wider(names_from = Type, values_from = value) |>
  mutate(Category = as_factor(Category)) |>
  # Grouping by category first
  group_by(Category) |> 
  # fct_reorder to sort scenNames by the pe of Strongly Dis
  mutate(scenNames = fct_reorder(scenNames, pe, .desc = TRUE))


oprobit.ev77.tidy |> 
  ggplot() +
  aes(y = scenNames,
      x = pe, 
      xmax = upper, 
      xmin = lower) +
  ggstance::geom_pointrangeh(size = 0.3) +                        
  scale_x_continuous(limits = c(0, 0.5),
                     breaks = seq(0.1, 0.4, by =0.1),
                     # dup_axis() creates symmetric axes
                     sec.axis = dup_axis(name = NULL)) +
  
  facet_grid(~ Category)+
  
  labs(x = "Predicted probability", 
       y = NULL) +
  
  # theme from Brian
  theme_caviz_vgrid +
  
  theme(aspect.ratio = 2, 
        axis.text.x = element_text(color = "black", size = 10),
        axis.ticks.x = element_blank())

Visualization: `tile`

##  tile

sorted <- order(oprobit.ev77$pe[,1])
scenNames <- row.names(xhyp$x)

trace1 <- ropeladder(x = oprobit.ev77$pe[sorted,1],
                     lower = oprobit.ev77$lower[sorted,1],
                     upper = oprobit.ev77$upper[sorted,1],
                     labels = scenNames[sorted],
                     size=0.5,
                     lex=1.5,
                     plot=1
)

trace2 <- ropeladder(x = oprobit.ev77$pe[sorted,2],
                     lower = oprobit.ev77$lower[sorted,2],
                     upper = oprobit.ev77$upper[sorted,2],
                     size=0.5,
                     lex=1.5,
                     plot=2
)

trace3 <- ropeladder(x = oprobit.ev77$pe[sorted,3],
                     lower = oprobit.ev77$lower[sorted,3],
                     upper = oprobit.ev77$upper[sorted,3],
                     size=0.5,
                     lex=1.5,
                     plot=3
)

trace4 <- ropeladder(x = oprobit.ev77$pe[sorted,4],
                     lower = oprobit.ev77$lower[sorted,4],
                     upper = oprobit.ev77$upper[sorted,4],
                     size=0.5,
                     lex=1.5,
                     plot=4
)


tile(trace1, trace2, trace3, trace4,
     limits = c(0,0.5),
     gridlines = list(type="xt"),
     topaxis=list(add=TRUE, at=c(0,0.1,0.2,0.3,0.4,0.5)),
     xaxistitle=list(labels="probability"),
     topaxistitle=list(labels="probability"),
     plottitle=list(labels=c("Strongly Disagree", "Disagree",
                             "Agree", "Strongly Agree")),
     width=list(spacer=3),
     height = list(plottitle=3,xaxistitle=3.5,topaxistitle=3.5)
)

Collpasing categories

Above, we computed discrete marginal associations for the probabilities of agreeing with the statement from the survey item. However, we can improve the communication of these results by collapsing the four levels into two broader categories: combining “Strongly Disagree” with “Disagree” and “Agree” with “Strongly Agree.”

To collapse ordered categories, provide a list of concatenation in the recode= argument from simcf::oprobitsimev. The code example below demonstrates this approach.

## Re-simulate, now collapsing presentation to two categories ("SD/D" vs "SA/A")


## Simulate expected probabilities (collapsing categories)
oprobit.ev77c <- oprobitsimev(xhyp, simbetas, cat=4,
                              recode=list(c(1,2), c(3,4)) )

## Simulate first differences (collapsing categories)
oprobit.fd77c <- oprobitsimfd(xhyp, simbetas, cat=4,
                              recode=list(c(1,2), c(3,4)) )

## Simulate relative risks (collapsing categories)
oprobit.rr77c <- oprobitsimrr(xhyp, simbetas, cat=4,
                              recode=list(c(1,2), c(3,4)) )

ropeladder plot with `ggplot`

##  ggplot

### ropeladder: plot, EV of Dd vs aA
CatNames.c <- 
  c("Disagree or Str Disagree", "Agree or Str Agree")

oprobit.ev77c.tidy <- 
  oprobit.ev77c |>
  map(set_colnames, CatNames.c) |>   #Set names for columns
  map(set_rownames, scenNames) |>  #Set names for rows
  map_df(as_tibble, rownames = "scenNames", .id = "Type") |> #collapse list into data frame
  
  pivot_longer(CatNames.c[1]:CatNames.c[2], names_to = "Category") |> #Wrangle to long format
  
  pivot_wider(names_from = Type, values_from = value) |> #Wrangle to ggplot friendly
  
  mutate(Category = as_factor(Category)) |>
  group_by(Category) |>                               # Grouping by category first 
  mutate(scenNames = fct_reorder(scenNames, pe, .desc = TRUE)) # helps fct_reorder to sort scenNames by the pe of Strongly Dis


oprobit.ev77c.tidy |> 
  ggplot() +
  aes(y = scenNames,
      x = pe, 
      xmax = upper, 
      xmin = lower) +
  geom_pointrangeh(size = 0.3) +                        
  scale_x_continuous(sec.axis = dup_axis(name = NULL)) + # dup_axis() creates symmetric axes
  
  facet_grid(~ Category)+
  
  labs(x = "Predicted probability", 
       y = NULL,
       title = "Working Mothers Can Be Warm to Kids") +
  
  theme_caviz_vgrid +  # theme from Brian
  
  theme(aspect.ratio = 2, 
        axis.text.x = element_text(color = "black", size = 10),
        axis.ticks.x = element_blank())

### ropeladder: Plot to show only "SA/A"
oprobit.ev77c.tidy |> 
  filter(Category == CatNames.c[2]) |> 
  ggplot() +
  aes(y = scenNames,
      x = pe, 
      xmax = upper, 
      xmin = lower) +
  geom_pointrangeh(size = 0.3) +                        
  scale_x_continuous(sec.axis = dup_axis(name = NULL)) + # dup_axis() creates symmetric axes
  
  labs(x = "Predicted probability", 
       y = NULL,
       title = "Working Mothers Can Be Warm to Kids") +
  
  theme_caviz_vgrid +  # theme from Brian
  
  theme(axis.text.x = element_text(color = "black", size = 10),
        axis.ticks.x = element_blank())

ropeladder plot with `tile`

##  tile

## Make a new rl plot, EV of Dd vs aA
trace1b <- ropeladder(x = oprobit.ev77c$pe[sorted,1],
                      lower = oprobit.ev77c$lower[sorted,1],
                      upper = oprobit.ev77c$upper[sorted,1],
                      labels = scenNames[sorted],
                      size=0.65,
                      lex=1.75,
                      plot=1
)

trace2b <- ropeladder(x = oprobit.ev77c$pe[sorted,2],
                      lower = oprobit.ev77c$lower[sorted,2],
                      upper = oprobit.ev77c$upper[sorted,2],
                      size=0.65,
                      lex=1.75,
                      plot=2
)


tile(trace1b, trace2b,
     limits = c(0.35,0.65),
     gridlines = list(type="xt"),
     xaxis=list(at=c(0.4, 0.5, 0.6)),
     topaxis=list(add=TRUE, at=c(0.4, 0.5, 0.6)),
     xaxistitle=list(labels="probability"),
     topaxistitle=list(labels="probability"),
     plottitle=list(labels=c("Disagree or Str Disagree",
                             "Agree or Str Agree")),
     width=list(spacer=3),
     height = list(plottitle=3,xaxistitle=3.5,topaxistitle=3.5)
)

## Revise traces and plot to show only "SA/A"
trace2b$entryheight <- 0.2
trace2b$plot <- 1
trace2b$labels <- trace1b$labels

tile(trace2b,
     limits = c(0.35,0.65),
     gridlines = list(type="xt"),
     xaxis=list(at=seq(0.35, 0.65, 0.05)),
     topaxis=list(add=TRUE, at=seq(0.35, 0.65, 0.05)),
     xaxistitle=list(labels="probability agree or strongly agree, 1977"),
     topaxistitle=list(labels="probability agree or strongly agree, 1977"),
     plottitle=list(labels="\"Working Mothers Can Be Warm to Kids\""),
     width=list(plot=2.5),
     height = list(plottitle=3,xaxistitle=3.5,topaxistitle=3.5)
)

Analysis of two samples: 1977 vs 1989

Now we can re-do the analysis for the sample from 1989.

## Now estimate 1989 model
y <- workmom89$warm    # Mother can have warm feelings towards child?  

x <- workmom89 |> select(male,white,age,ed,prst) |> as.matrix()



# Use optim directly to get MLE
ls.result <- lm(model, data=workmom89)   # use ls estimates as starting values
stval <- c(coef(ls.result),1,2)          # initial guesses
oprobit.res89 <- optim(stval, 
                       llk.oprobit4, 
                       method="BFGS", 
                       x=x, 
                       y=y, 
                       hessian=TRUE)
(pe89 <- oprobit.res89$par)                # point estimates

##  (Intercept)         male        white          age           ed         prst 
##  1.983971961 -0.442980617 -0.206295345 -0.012979963  0.027631612  0.005149637 
##                           
##  1.198280077  2.414641320

(vc89 <- solve(oprobit.res89$hessian))     # var-cov matrix

##               (Intercept)          male         white           age
## (Intercept)  5.243718e-02 -2.275698e-03 -7.298131e-03 -3.041666e-04
## male        -2.275698e-03  5.348504e-03 -4.601340e-04  1.458896e-05
## white       -7.298131e-03 -4.601340e-04  1.160058e-02 -1.972153e-05
## age         -3.041666e-04  1.458896e-05 -1.972153e-05  4.881033e-06
## ed          -2.090501e-03 -5.626598e-05 -1.219894e-04  8.923750e-06
## prst         1.027141e-05 -2.043623e-06 -6.284932e-06 -8.863255e-07
##              3.805373e-03 -3.094736e-04 -5.391857e-05 -1.414865e-05
##              4.875512e-03 -6.956017e-04 -2.618378e-04 -2.451509e-05
##                        ed          prst                            
## (Intercept) -2.090501e-03  1.027141e-05  3.805373e-03  4.875512e-03
## male        -5.626598e-05 -2.043623e-06 -3.094736e-04 -6.956017e-04
## white       -1.219894e-04 -6.284932e-06 -5.391857e-05 -2.618378e-04
## age          8.923750e-06 -8.863255e-07 -1.414865e-05 -2.451509e-05
## ed           2.211948e-04 -2.399284e-05  2.138751e-05  3.999702e-05
## prst        -2.399284e-05  8.465097e-06  6.235541e-06  1.032785e-05
##              2.138751e-05  6.235541e-06  4.430397e-03  3.931261e-03
##              3.999702e-05  1.032785e-05  3.931261e-03  6.139354e-03

(se89 <- sqrt(diag(vc89)))                 # standard errors

## (Intercept)        male       white         age          ed        prst 
## 0.228991670 0.073133468 0.107705986 0.002209306 0.014872619 0.002909484 
##                         
## 0.066561229 0.078354032

(ll89 <- -oprobit.res89$value)             # likelihood at maximum

## [1] -1082.657

# re-estimating the model with the sample from 1977
glm.res77 <- MASS::polr(warm_f ~ male + white + age + ed + prst, 
                        data=workmom77,
                        method="probit", 
                        na.action=na.omit)

(pe77 <- c(coef(glm.res77),glm.res77$zeta))

##                       male                      white 
##               -0.397225180               -0.238339515 
##                        age                         ed 
##               -0.011701218                0.044995843 
##                       prst Strongly Disagree|Disagree 
##                0.002114664               -1.321570640 
##             Disagree|Agree       Agree|Strongly Agree 
##               -0.305164619                0.756389946

(vc77 <- vcov(glm.res77))

##                                     male         white           age
## male                        3.413998e-03 -3.709926e-05  6.780050e-06
## white                      -3.709926e-05  8.296751e-03 -8.602815e-06
## age                         6.780050e-06 -8.602815e-06  3.670353e-06
## ed                          1.698540e-05 -6.290118e-05  8.993936e-06
## prst                       -1.157493e-05 -2.126878e-05 -1.186494e-06
## Strongly Disagree|Disagree  1.832883e-03  5.475612e-03  2.262743e-04
## Disagree|Agree              1.718697e-03  5.354805e-03  2.212394e-04
## Agree|Strongly Agree        1.449894e-03  5.264818e-03  2.145182e-04
##                                       ed          prst
## male                        1.698540e-05 -1.157493e-05
## white                      -6.290118e-05 -2.126878e-05
## age                         8.993936e-06 -1.186494e-06
## ed                          1.458303e-04 -1.844571e-05
## prst                       -1.844571e-05  6.691778e-06
## Strongly Disagree|Disagree  1.348311e-03 -3.980224e-05
## Disagree|Agree              1.377316e-03 -4.046800e-05
## Agree|Strongly Agree        1.389483e-03 -3.728944e-05
##                            Strongly Disagree|Disagree Disagree|Agree
## male                                     1.832883e-03   0.0017186966
## white                                    5.475612e-03   0.0053548051
## age                                      2.262743e-04   0.0002212394
## ed                                       1.348311e-03   0.0013773156
## prst                                    -3.980224e-05  -0.0000404680
## Strongly Disagree|Disagree               3.205968e-02   0.0308960118
## Disagree|Agree                           3.089601e-02   0.0314211044
## Agree|Strongly Agree                     3.031397e-02   0.0305645167
##                            Agree|Strongly Agree
## male                               1.449894e-03
## white                              5.264818e-03
## age                                2.145182e-04
## ed                                 1.389483e-03
## prst                              -3.728944e-05
## Strongly Disagree|Disagree         3.031397e-02
## Disagree|Agree                     3.056452e-02
## Agree|Strongly Agree               3.153099e-02

Quantities of Interest and Visualizations

sims  <- 10000

# draw parameters, using MASS::mvrnorm
simbetas89 <- mvrnorm(sims, pe89, vc89)

# draw parameters, using MASS::mvrnorm
simbetas77 <- mvrnorm(sims, pe77, vc77)


# Create example counterfactuals -- for diffs
xhyp <- cfMake(model, workmom89, nscen=5)

xhyp <- cfName(xhyp, "Female (Male)", scen=1)
xhyp <- cfChange(xhyp, "male", x=0, xpre=1, scen=1)

xhyp <- cfName(xhyp, "Nonwhite (White)", scen=2)
xhyp <- cfChange(xhyp, "white", x=0, xpre=1, scen=2)

xhyp <- cfName(xhyp, "28 Year Olds (61)", scen=3)
xhyp <- cfChange(xhyp, "age",
                 x=mean(na.omit(workmom89$age))-sd(na.omit(workmom89$age)),
                 xpre=mean(na.omit(workmom89$age)),
                 scen=3)

xhyp <- cfName(xhyp,"College Grad (High School)", scen=4)
xhyp <- cfChange(xhyp, "ed", x=16, xpre=12, scen=4)

xhyp <- cfName(xhyp,"High Prestige Job (Low)", scen=5)
xhyp <- cfChange(xhyp, "prst",
                 x=mean(na.omit(workmom89$prst))+sd(na.omit(workmom89$prst)),
                 xpre=mean(na.omit(workmom89$prst)) - sd(na.omit(workmom89$prst)),
                 scen=5)

# Simulate expected probabilities (all four categories)
oprobit.ev89 <- oprobitsimev(xhyp, simbetas89, cat=4)

# Simulate first differences (all four categories)
oprobit.fd89 <- oprobitsimfd(xhyp, simbetas89, cat=4)

# Simulate relative risks (all four categories)
oprobit.rr89 <- oprobitsimrr(xhyp, simbetas89, cat=4)


# Re-simulate, now collapsing presentation to two categories ("SD/D" vs "SA/A")

# Simulate first differences, 1989
oprobit.fd89c <- oprobitsimfd(xhyp, simbetas89, cat=4,
                              recode=list(c(1,2), c(3,4)) )

# Simulate relative risks, 1989
oprobit.rr89c <- oprobitsimrr(xhyp, simbetas89, cat=4,
                              recode=list(c(1,2), c(3,4)) )

# Simulate first differences, 1977
oprobit.fd77c <- oprobitsimfd(xhyp, simbetas77, cat=4,
                              constant=NA, # for polr
                              recode=list(c(1,2), c(3,4)) )

# Simulate relative risks, 1977
oprobit.rr77c <- oprobitsimrr(xhyp, simbetas77, cat=4,
                              constant=NA, # for polr
                              recode=list(c(1,2), c(3,4)) )


# Make a new ropeladder plot, showing just change in probability of any agreement



##  ggplot

scenNames <- row.names(xhyp$x)
CatNames.c <- 
  c("Disagree or Str Disagree", "Agree or Str Agree")

oprobit.fd89c.tidy <- 
  bind_rows(oprobit.fd77c, oprobit.fd89c) |>
  map(set_colnames, CatNames.c)  |>
  #Set names for rows
  map(set_rownames, rep(scenNames, 2)) |>
  #collapse list into data frame
  map_df(as_tibble, rownames = "scenNames", .id = "Type") |>
  add_column(Year = rep(c(1979, 1989), each =5) |> rep(3)) |> 
  #Wrangle to long format
  pivot_longer(CatNames.c[1]:CatNames.c[2], names_to = "Category") |>
  #Wrangle to ggplot friendly
  pivot_wider(names_from = Type, values_from = value) |>
  mutate(Category = as_factor(Category)) |>
  # Grouping by category first 
  group_by(Year, Category) |>
  # fct_reorder to sort scenNames by the pe of Strongly Dis
  mutate(scenNames = fct_reorder(scenNames, pe, .desc = TRUE),
         # Add 95% significance
         conf95 = case_when(upper>0 & lower>0 ~ TRUE,
                            upper<0 & lower<0 ~ TRUE,
                            TRUE ~ FALSE)) 

oprobit.fd89c.tidy |> 
  filter(Category == CatNames.c[2]) |> 
  ggplot() +
  aes(y = scenNames,
      x = pe, 
      xmax = upper, 
      xmin = lower,
      color = factor(Year),
      shape = conf95,
      label = factor(Year)) +
  # points with intervals
  geom_pointrangeh(size = 0.6, fill = "white",
                   #Plot lines parallelly 
                   position = position_dodge2v(height = 0.6, 
                                               reverse = T)) +
  # include labels
  geom_text(size = 4, nudge_y = c(rep(0.3, 5), rep(-0.3, 5)))+
  # dup_axis() creates symmetric axes
  scale_x_continuous(sec.axis = dup_axis(name = NULL)) +
  # For line color
  scale_color_manual(values = c(orange, blue), guide = FALSE)+
  # For shape color
  scale_shape_manual(values = c(21, 19), guide = FALSE)+ 
  labs(x = "Predicted probability", 
       y = NULL,
       title = "Working Mothers Can Be Warm to Kids") +
  # theme from Brian
  theme_caviz_vgrid +
  theme(axis.text.x = element_text(color = "black", size = 10),
        axis.ticks.x = element_blank())

##  tile


sortedc <- rev(order(oprobit.fd89c$pe[,2]))
scenNames <- row.names(xhyp$x)

trace1c <- ropeladder(x = oprobit.fd77c$pe[sortedc,2],
                      lower = oprobit.fd77c$lower[sortedc,2],
                      upper = oprobit.fd77c$upper[sortedc,2],
                      labels = scenNames[sortedc],
                      sublabels="1979",
                      sublabelsyoffset=0.04,
                      col=orange,
                      size=0.65,
                      lex=1.75,
                      plot=1
)

trace2c <- ropeladder(x = oprobit.fd89c$pe[sortedc,2],
                      lower = oprobit.fd89c$lower[sortedc,2],
                      upper = oprobit.fd89c$upper[sortedc,2],
                      labels = scenNames[sortedc],
                      sublabels = "1989",
                      sublabelsyoffset = -0.04,
                      col=blue,
                      size=0.65,
                      lex=1.75,
                      entryheight=0.40,
                      subentryheight=.8,
                      plot=1
)

sigMark1 <- oprobit.fd77c$pe[sortedc,2]
is.na(sigMark1) <- (oprobit.fd89c$lower[sortedc,2]>0)
traceSig1 <- ropeladder(x=sigMark1,
                        col="white",
                        group=1,
                        plot=1)

sigMark2 <- oprobit.fd89c$pe[sortedc,2]
is.na(sigMark2) <- (oprobit.fd89c$lower[sortedc,2]>0)
traceSig2 <- ropeladder(x=sigMark2,
                        col="white",
                        group=2,
                        plot=1)

vertmark <- linesTile(x=c(0,0), y=c(0,1), plot=1)


tile(trace1c, trace2c, vertmark, traceSig1, traceSig2,
     limits=c(-0.05,0.25),
     gridlines=list(type="xt"),
     topaxis=list(add=TRUE, at=seq(from=0, to=0.2, by=0.05),
                  labels=c("0%", "+5%", "+10%", "+15%", "+20%")),
     xaxis=list(at=seq(from=0, to=0.2, by=0.05), labels=c("0%", "+5%", "+10%", "+15%", "+20%")),
     xaxistitle=list(labels="difference in probability agree or strongly agree"),
     topaxistitle=list(labels="difference in probability agree or strongly agree"),
     plottitle=list(labels="\"Working Mothers Can Be Warm to Kids\""),
     width=list(plot=2),
     height=list(plottitle=3,xaxistitle=3.5,topaxistitle=3.5)
)

CSSS/POLS 510 - Maximum Likelihood Estimation

Lab 7. Ordered Categories

Ramses Llobet

Ordered Models

Simulating an Ordered Probit Model

Location of the Latent Variable \(Y^*\)

Scaling the Latent Variable \(Y^*\)

Simulation code

Application: Sexist Attitudes to Working Mothers

Estimation: `optim` and `polr`

Quantities of Interest

Visualization: `ggplot`

Visualization: `tile`

Collpasing categories

ropeladder plot with `ggplot`

ropeladder plot with `tile`

Analysis of two samples: 1977 vs 1989

Quantities of Interest and Visualizations

CSSS/POLS 510 - Maximum Likelihood Estimation

Lab 7. Ordered Categories

Ramses Llobet

Ordered Models

Simulating an Ordered Probit Model

Location of the Latent Variable \(Y^*\)

Scaling the Latent Variable \(Y^*\)

Simulation code

Application: Sexist Attitudes to Working Mothers

Estimation: optim and polr

Quantities of Interest

Visualization: ggplot

Visualization: tile

Collpasing categories

ropeladder plot with ggplot

ropeladder plot with tile

Analysis of two samples: 1977 vs 1989

Quantities of Interest and Visualizations

Estimation: `optim` and `polr`

Visualization: `ggplot`

Visualization: `tile`

ropeladder plot with `ggplot`

ropeladder plot with `tile`