Modeling Counts

Count models are used for dependent variables that take on non-negative integer values, such as the number of births at a hospital on a given day or the number of patents awarded during a specified period. A typical starting point for estimating the parameters of a count model is Poisson regression, which assumes that counts follow a Poisson distribution.

The Poisson Model

Recall from problem set 2 that the Poisson distribution is characterized by a single parameter, \(\lambda\), which represents both the mean and the variance of the distribution. Specifically, for a Poisson-distributed random variable \(Y\), we have:

\[ \mathbb{E}(Y) = \text{Var}(Y) = \lambda \]

Poisson regression links the mean of the dependent variable to a linear combination of predictors, using a log-link function to ensure that the mean remains positive. This relationship is specified through \(\lambda\), the expected count, by modeling the log of the expected value of \(Y\) as a linear function of the predictors.

The model is defined as follows: \[ \log \left[ \mathbb{E}(Y \mid X) \right] = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \]

Exponentiating both sides yields:

\[ \mathbb{E}(Y \mid X) = \exp \left( \beta_0 + \beta_1 X_1 + \beta_2 X_2 \right) \]

where \(\mathbb{E}(Y \mid X)\) is the expected count, and \(\beta_0\), \(\beta_1\), and \(\beta_2\) are coefficients to be estimated.

We can simulate data from a Poisson model using the rpois() function in R. Here, we specify a systematic component (i.e., a linear predictor) and exponentiate it to obtain the mean parameter \(\lambda\), ensuring that the expected value is positive.

# Poisson Simulation
set.seed(1234)

# Number of samples
n <- 1000

# "True" parameters
b0 <- 0.2
b1 <- 0.5

# Independent variable
X <- runif(n = n, min = -1, max = 1)

# Poisson DGP
Y1 <- rpois(n = n, lambda = exp(b0 + b1 * X))

# Fit model
model.pois <- glm(Y1 ~ X, family = "poisson")
summary(model.pois)

## 
## Call:
## glm(formula = Y1 ~ X, family = "poisson")
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.13931    0.03037   4.587  4.5e-06 ***
## X            0.56232    0.05073  11.085  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1189.6  on 999  degrees of freedom
## Residual deviance: 1062.7  on 998  degrees of freedom
## AIC: 2737.3
## 
## Number of Fisher Scoring iterations: 5

The above code generates a Poisson-distributed response variable \(Y_1\) based on a linear predictor of \(X\). The model parameters are then estimated using glm() with a Poisson family specification.

Estimating Parameter Stability through Simulation

To evaluate the stability of our estimates, we can repeat the Poisson model estimation multiple times and observe the distribution of the parameter estimates:

# Number of simulations
sims <- 1000

# Store estimates
par.est.pois <- 
  tibble(
    b0_hat = rep(NA, sims),
    b1_hat = rep(NA, sims),
    b0_se = rep(NA, sims),
    b1_se = rep(NA, sims),
    model = "Poisson"
  )

# Loop to estimate the model multiple times
for (i in 1:sims) {
  Y <- rpois(n, exp(b0 + b1 * X))
  model.pois <- glm(Y ~ X, family = "poisson")
  vcv <- vcov(model.pois)
  par.est.pois[i, 1] <- model.pois$coef[1]
  par.est.pois[i, 2] <- model.pois$coef[2]
  par.est.pois[i, 3] <- sqrt(diag(vcv)[1])
  par.est.pois[i, 4] <- sqrt(diag(vcv)[2])
}

# Check out estimates
par.est.pois |> 
  pivot_longer(cols = 1:4, names_to = "estimates") |> 
  group_by(estimates) |> 
  summarize(mean = mean(value), sd = sd(value))

## # A tibble: 4 × 3
##   estimates   mean       sd
##   <chr>      <dbl>    <dbl>
## 1 b0_hat    0.200  0.0276  
## 2 b0_se     0.0293 0.000453
## 3 b1_hat    0.497  0.0487  
## 4 b1_se     0.0492 0.000685

This code performs 1000 simulations of the Poisson model and stores the estimated parameters (\(\hat{\beta}_0\), \(\hat{\beta}_1\)) and their standard errors for each run. By examining the mean and standard deviation of these estimates, we can assess their variability and reliability.

Limitations of the Poisson Model: Overdispersion

An important limitation of the Poisson model is its assumption that the mean and variance are equal. In many real-world count processes, especially in social science applications, the variance exceeds the mean—a phenomenon known as overdispersion. When overdispersion is present, the Poisson model can yield biased standard errors, leading to inaccurate inferences.

Negative Binomial Models

The Negative Binomial (NB) model is a more flexible approach for modeling count data because it can accommodate overdispersion by introducing a second parameter, which allows the variance to differ from the mean.

The NB model can be viewed as a “mixture” model because it combines the Poisson distribution with a Gamma distribution to model the variability. In contrast to the Poisson distribution, which assumes that the mean and variance are equal, the NB model relaxes this assumption by allowing the variance to exceed the mean.

In the last simulation, we set \(\lambda\) to be equal to the systematic component of the model. For the NB model, we set \(\lambda\) equal to both the systematic component and a stochastic component, yielding a mean \(\mu\) and a variance \(\nu\) defined as follows:

\[ \begin{aligned} \mu &= \exp \left( \beta_0 + \beta_1 X_1 + \beta_2 X_2 \right) \\ \nu &= \mu + \frac{\mu^2}{\theta} \end{aligned} \]

where \(\theta\) is a dispersion parameter that controls the degree of overdispersion. Because the NB model has a separate parameter for the variance, it is robust to overdispersion, unlike the Poisson model.

Illustration with Overdispersed Data

To illustrate the impact of overdispersion, we simulate overdispersed data, estimate Poisson and NB models, and compare their results. To simulate overdispersed count data, we use the rnbinom() function, where the size argument represents the dispersion parameter \(\theta\), and mu is the mean \(\mu\).

# Negative Binomial DGP
Y2 <- rnbinom(n = n, size = 0.5, mu = exp(b0 + b1 * X))

# run NB regression with MASS package
model.nb <- MASS::glm.nb(Y2 ~ X)

summary(model.nb)

## 
## Call:
## MASS::glm.nb(formula = Y2 ~ X, init.theta = 0.5210878262, link = log)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.18333    0.05305   3.456 0.000548 ***
## X            0.61331    0.09114   6.729  1.7e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.5211) family taken to be 1)
## 
##     Null deviance: 957.71  on 999  degrees of freedom
## Residual deviance: 911.51  on 998  degrees of freedom
## AIC: 3019.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.5211 
##           Std. Err.:  0.0433 
## 
##  2 x log-likelihood:  -3013.1000

In the comparison below, a solid line represents the mean of \(Y\), and a dashed line denotes its variance. Observe that in Panel (a) (Poisson model), the mean and variance are nearly identical, whereas in Panel (b) (Negative Binomial model), the variance is significantly larger than the mean.

Comparison of Poisson and Negative Binomial Models

Next, we assess the implications of overdispersion on parameter estimation by comparing the Poisson and Negative Binomial estimators.

# Poisson vs. Negative Binomial
set.seed(1234)

# Store estimates
par.est.nb <- tibble(
  b1_pois = rep(NA, sims),
  b1_nb = rep(NA, sims),
  b1_pois_se = rep(NA, sims),
  b1_nb_se = rep(NA, sims)
)

# Loop for Negative Binomial regression
for (i in 1:sims) {
  # Create count data with overdispersion
  Y <- rnbinom(n = n, size = 0.5, mu = exp(b0 + b1 * X))
  
  # Estimate models
  model.pois <- glm(Y ~ X, family = "poisson")
  model.nb <- MASS::glm.nb(Y ~ X)
  
  # Variance-covariance matrices
  vcv.pois <- vcov(model.pois)
  vcv.nb <- vcov(model.nb)
  
  # Store point estimates
  par.est.nb[i, 1] <- model.pois$coef[2]
  par.est.nb[i, 2] <- model.nb$coef[2]
  
  # Store standard errors
  par.est.nb[i, 3] <- sqrt(diag(vcv.pois)[2])
  par.est.nb[i, 4] <- sqrt(diag(vcv.nb)[2])
}

# store results
results <- par.est.nb |> 
  pivot_longer(cols = 1:4, names_to = "estimates") |> 
  group_by(estimates) |> 
  summarize(mean = mean(value), sd = sd(value))

# Check estimates
results

## # A tibble: 4 × 3
##   estimates    mean      sd
##   <chr>       <dbl>   <dbl>
## 1 b1_nb      0.501  0.0944 
## 2 b1_nb_se   0.0919 0.00271
## 3 b1_pois    0.501  0.0952 
## 4 b1_pois_se 0.0493 0.00131

# coverage negative binomial
round(results$mean[2], 3) / round(results$sd[1], 3)

## [1] 0.9787234

# coverage poisson
round(results$mean[4], 3) / round(results$sd[3], 3)

## [1] 0.5157895

The simulation above highlights two important insights:

In this example, we recover unbiased point estimates of \(\beta_1\), as both the Poisson and Negative Binomial models return \(\hat{\beta}_1\) estimates of 0.501 and 0.501, respectively.
However, notice that the mean of the simulated standard errors for the Negative Binomial and Poisson models are 0.092 and 0.049, respectively. Meanwhile, the standard deviation of the point estimates \(\sigma(\hat{\beta}_1)\) are 0.094 and 0.095, yielding a coverage ratio of 0.9787234 for the Negative Binomial model and 0.5157895 for the Poisson model.

The low coverage of the Poisson model implies that when data are overdispersed, the Poisson model’s assumptions produce coefficient variability (standard errors) that are too small. This results in a high likelihood of Type I error (false positive), meaning that we may incorrectly reject the null hypothesis even when there is no significant effect.

Zero-Inflation Processes

Another potential issue with count models arises when there is a large number of cases where the value of \(Y\) is zero. These excess zeros may either be a natural part of the data-generating process (DGP) governing other counts (e.g., 1s, 2s, 3s, etc.) or they may be “inflated” by an additional process unrelated to counts greater than zero. In other words, the DGP may involve two separate systematic processes:

A process influencing whether an observation produces a zero on the dependent variable or not, denoted as:

\[ \Pr(Y = 0) = \pi(X) \quad \text{(typically modeled with a binary logit or probit)} \]

where \(\pi(X)\) represents the probability of an observation being zero, given predictors \(X\).

A process governing the actual count values (for nonzero observations), modeled as:

\[ \Pr(Y = y \mid Y > 0) = f(y; \lambda) \quad \text{(e.g., a Poisson or Negative Binomial distribution)} \] where \(f(y; \lambda)\) represents the count distribution with parameter \(\lambda\).

Zero-inflated count models combine these two processes by estimating:

A binary model predicting whether an observation is zero or nonzero, often using logistic regression:

\[ \log \left( \frac{\pi(X)}{1 - \pi(X)} \right) = \beta_0 + \beta_1 X_1 + \ldots + \beta_k X_k \]

A count model predicting the expected count for observations with \(Y > 0\), using either a Poisson or Negative Binomial distribution:

\[ \log(\lambda) = \alpha_0 + \alpha_1 X_1 + \ldots + \alpha_k X_k \]

In summary, zero-inflation models work by modeling the probability of zeros separately from the count-generating process, allowing flexibility to handle excess zeros present in the data. They can be applied using both Poisson and Negative Binomial distributions for the count component.

Application: Foreclosure Filings from HOAdata.org

Below, we review an application that will be discussed in class:

Foreclosure Filings of Homeowners’ Associations in Harris County (Houston), TX, from 1995 to 2001.

The data include the following variables: - Foreclosure Filings (file9501): The number of filings in a neighborhood during 1995–2001. - Median Valuation (mdvaluation): The log of the median home price in the neighborhood. - Post-1975 Neighborhood: An indicator of whether the median home in the neighborhood was built after 1975.

The unit of observation is the neighborhood, which may or may not have an organized Homeowners’ Association (HOA), though this is unobserved.

## Load data
data <- read_csv("https://faculty.washington.edu/cadolph/mle/hoaCounts.csv")[,-1]

names(data)

##  [1] "index"       "filings"     "file9501"    "file8594"    "harhoa"     
##  [6] "file8587"    "file8894"    "filb9500"    "nhomes"      "mdvaluation"
## [11] "mdyear"      "pctwhite"    "pcthispanic" "pctblack"    "pctasian"

## data management

data <-
  data %>% 
  ## Restrict to places with >=100 homes
  filter(nhomes >= 100) %>% 
  mutate(
    ## Construct exposure variable and add to dataset
    exposure = nhomes * length(1995:2001),
    ## Explore relationship between ever-filers and age of neighborhood
    everfile = ifelse(file9501 > 0, 1, 0),
    ## GAM suggests constructing new neighborhood (after 1975) covariate
    post1975construction = ifelse(mdyear > 1975, 1, 0)
  )

## Create alternate "no zeros" dataset

dataNZ <- data %>% filter(file9501 > 0)

## Process data for model

m1 <- file9501 ~ log(mdvaluation) + post1975construction

m1data <- extractdata(m1, data, extra=~exposure, na.rm = TRUE)

m1dataNZ <- extractdata(m1, dataNZ, extra=~exposure, na.rm = TRUE)

Poisson regression

First, we run the analysis including \(Pr(Y=0)\).

### Model 1 - Poisson using GLM with zeros included

glm.poisWZ <- glm(m1,
                  data = m1data,
                  family = "poisson",
                  # turn the exposure into log
                  offset = log(m1data$exposure))

pe.poisWZ <- coef(glm.poisWZ)        # point estimates
vc.poisWZ <- vcov(glm.poisWZ)        # var-cov matrix
se.poisWZ <- sqrt(diag(vc.poisWZ))    # standard errors
ll.poisWZ <- logLik(glm.poisWZ)      # likelihood at maximum
aic.poisWZ <- AIC(glm.poisWZ)

## Set up counterfactuals (range over mdvaluation, all else equal)

valueseq <- seq(35000, 400000, 1000)

xhypWZ <- cfMake(formula = m1, 
                 data = m1data, 
                 nscen = length(valueseq))

for (i in 1:length(valueseq)) {
  xhypWZ <- cfChange(xhypWZ, "mdvaluation", x=valueseq[i], scen=i)
}


## Simulate results

sims <- 10000

simbetas.poisWZ <- mvrnorm(sims, pe.poisWZ, vc.poisWZ) # draw parameters

yhyp.poisWZ <- loglinsimev(xhypWZ, simbetas.poisWZ)

# Transform expected counts to expected rates
yhyp.poisWZ$pe <- yhyp.poisWZ$pe*1000
yhyp.poisWZ$lower <- yhyp.poisWZ$lower*1000
yhyp.poisWZ$upper <- yhyp.poisWZ$upper*1000


# Model 1 - ggplot visual

vis_poisWZ <-
  yhyp.poisWZ %>% as.data.frame() %>% 
  bind_cols() %>% as_tibble() %>% 
  mutate(model="poisson",
         valueseq=valueseq/1000)

vis_poisWZ %>% 
  ggplot(aes(y=pe,
             ymin=lower,
             ymax=upper,
             x=valueseq)) +
  geom_line(color=purple) +
  geom_ribbon(alpha=0.2, fill=purple) +
  scale_y_continuous(limits = c(0,25)) +
  scale_x_continuous(breaks = seq(50,400,50),
                     limits = c(50,300),
                     labels = scales::dollar_format(suffix = "k")) +
  labs(title= "Expected HOA foreclosure fillings per 1000 homes per year",
       y=NULL,
       x="Median home price of neighborhood") +
  theme_caviz_hgrid +
  coord_cartesian(xlim=c(50,225))+
  annotate("text", x = 150, y = 10, 
           label = "Poisson including excess zeros", 
           color=purple, size=6)

Now, we run the analysis excluding \(Pr(Y=0)\).

### Model 2 - Poisson using GLM with zeros removed


glm.poisNZ <- glm(m1,
                  data = m1dataNZ,
                  family = "poisson",
                  # turn the exposure into log
                  offset = log(m1dataNZ$exposure))

pe.poisNZ <- coef(glm.poisNZ)        # point estimates
vc.poisNZ <- vcov(glm.poisNZ)        # var-cov matrix
se.poisNZ <- sqrt(diag(vc.poisNZ))    # standard errors
ll.poisNZ <- logLik(glm.poisNZ)      # likelihood at maximum
aic.poisNZ <- AIC(glm.poisNZ)


stargazer::stargazer(glm.poisWZ,glm.poisNZ, type="text")

## 
## =================================================
##                          Dependent variable:     
##                      ----------------------------
##                                file9501          
##                           (1)            (2)     
## -------------------------------------------------
## log(mdvaluation)       -0.904***      -1.559***  
##                         (0.029)        (0.042)   
##                                                  
## post1975construction    2.706***      0.737***   
##                         (0.043)        (0.037)   
##                                                  
## Constant                1.845***      11.648***  
##                         (0.309)        (0.478)   
##                                                  
## -------------------------------------------------
## Observations             1,417           326     
## Log Likelihood         -7,799.816    -3,025.548  
## Akaike Inf. Crit.      15,605.630     6,057.096  
## =================================================
## Note:                 *p<0.1; **p<0.05; ***p<0.01

# Set up counterfactuals (range over mdvaluation, all else equal)
valueseq <- seq(35000, 400000, 1000)

xhypNZ <- cfMake(m1, m1dataNZ, nscen = length(valueseq))

for (i in 1:length(valueseq)) {
  xhypNZ <- cfChange(xhypNZ, "mdvaluation", x=valueseq[i], scen=i)
}


# Simulate results
simbetas.poisNZ <- mvrnorm(sims, pe.poisNZ, vc.poisNZ) # draw parameters
yhyp.poisNZ <- loglinsimev(xhypNZ, simbetas.poisNZ)

# Transform expected counts to expected rates
yhyp.poisNZ$pe <- yhyp.poisNZ$pe*1000
yhyp.poisNZ$lower <- yhyp.poisNZ$lower*1000
yhyp.poisNZ$upper <- yhyp.poisNZ$upper*1000


### Model 2 - ggplot visual


vis_poisNZ <-
  yhyp.poisNZ %>% as.data.frame() %>% 
  bind_cols() %>% as_tibble() %>% 
  mutate(model="poisson_NZ",
         valueseq=valueseq/1000)

vis_poisNZ %>% 
  ggplot(aes(y=pe,
             ymin=lower,
             ymax=upper,
             x=valueseq)) +
  geom_line(color=red) +
  geom_ribbon(alpha=0.2, fill=red) +
  scale_y_continuous(limits = c(0,25)) +
  scale_x_continuous(labels = scales::dollar_format(suffix = "k")) +
  labs(title= "Expected HOA foreclosure fillings per 1000 homes per year",
       y=NULL,
       x="Median home price of neighborhood") +
  theme_caviz_hgrid +
  coord_cartesian(xlim=c(40,225))+
  annotate("text", x = 150, y = 10, 
           label = "Poisson including excess zeros", 
           color=red, size=6)

Negative binomial regression

Now, we address the issue using the Negative Binomial (NB) model.

Recall that the NB model is a combination of the Poisson and Gamma distributions, where the Gamma distribution models the random error in the Poisson model. For more details, refer to slides 198-216.

### Model 3 -  Negative binomial regression, zeros excluded

## Note: glm.nb requires offset be included in the model formula thus

m1nb <- file9501 ~ log(mdvaluation) + post1975construction + offset(log(exposure))


# NB is similar to the Beta-Binomial, but only bounded on one side

nb.result <- glm.nb(m1nb, data=m1dataNZ) # glm.nb is from MASS package

pe.nb <- coef(nb.result)
vc.nb <- vcov(nb.result)
se.nb <- sqrt(diag(vc.nb))
ll.nb <- logLik(nb.result)
aic.nb <- AIC(nb.result)
theta.nb <- nb.result$theta
setheta.nb <- nb.result$SE.theta

# Simulate results (we re-use xhyp from above)

simbetas.nb <- mvrnorm(sims, pe.nb, vc.nb) # draw parameters

yhyp.nb <- loglinsimev(xhypNZ, simbetas.nb)

# Transform expected counts to expected rates
yhyp.nb$pe <- yhyp.nb$pe*1000
yhyp.nb$lower <- yhyp.nb$lower*1000
yhyp.nb$upper <- yhyp.nb$upper*1000


### Model 3 - ggplot visual


vis_nb <-
  yhyp.nb %>% as.data.frame() %>% 
  bind_cols() %>% as_tibble() %>% 
  mutate(model="nb",
         valueseq=valueseq/1000)

vis_nb %>% 
  ggplot(aes(y=pe,
             ymin=lower,
             ymax=upper,
             x=valueseq)) +
  geom_line(color=blue) +
  geom_ribbon(alpha=0.2, fill=blue) +
  scale_y_continuous(limits = c(0,25)) +
  scale_x_continuous(labels = scales::dollar_format(suffix = "k")) +
  labs(title= "Expected HOA foreclosure filings per 1000 homes per year",
       y=NULL,
       x="Median home price of neighborhood") +
  theme_caviz_hgrid +
  coord_cartesian(xlim=c(40,225))+
  annotate("text", x = 150, y = 10, 
           label = "Negative Binomial with all zeros deleted", 
           color=blue, size=6)

Quasi-Poisson Regression

The Negative Binomial (NB) model constructs a probability model for overdispersion using the Gamma distribution.

In contrast, the Quasi-Poisson approach retains the Poisson-like mean-variance relationship but multiplicatively re-scales the variance up or down as needed.

As a result, the Quasi-Poisson model is more flexible than the NB model because it can scale down the variance to accommodate underdispersion, although underdispersion is very rare.

### Model 4 - Quasipoisson using GLM with zeros removed



## Quasipoisson using GLM with zeros removed

glm.qpoisNZ <- glm(m1, data= m1dataNZ, 
                   family=quasipoisson, 
                   offset=log(m1dataNZ$exposure))


pe.qpoisNZ <- coef(glm.qpoisNZ)        # point estimates
vc.qpoisNZ <- vcov(glm.qpoisNZ)        # var-cov matrix
se.qpoisNZ <- sqrt(diag(vc.qpoisNZ))    # standard errors
ll.qpoisNZ <- logLik(glm.qpoisNZ)      # likelihood at maximum
aic.qpoisNZ <- AIC(glm.qpoisNZ)

## Simulate results
simbetas.qpoisNZ <- mvrnorm(sims, pe.qpoisNZ, vc.qpoisNZ) # draw parameters

yhyp.qpoisNZ <- loglinsimev(xhypNZ, simbetas.qpoisNZ)

## Transform expected counts to expected rates
yhyp.qpoisNZ$pe <- yhyp.qpoisNZ$pe*1000
yhyp.qpoisNZ$lower <- yhyp.qpoisNZ$lower*1000
yhyp.qpoisNZ$upper <- yhyp.qpoisNZ$upper*1000


### Model 4 - ggplot visual



vis_qpoisNZ <-
  yhyp.qpoisNZ %>% as.data.frame() %>% 
  bind_cols() %>% as_tibble() %>% 
  mutate(model="qpoisNZ",
         valueseq=valueseq/1000)

vis_qpoisNZ %>% 
  ggplot(aes(y=pe,
             ymin=lower,
             ymax=upper,
             x=valueseq)) +
  geom_line(color=brown) +
  geom_ribbon(alpha=0.2, fill=brown) +
  scale_y_continuous(limits = c(0,25)) +
  scale_x_continuous(labels = scales::dollar_format(suffix = "k")) +
  labs(title= "Expected HOA foreclosure filings per 1000 homes per year",
       y=NULL,
       x="Median home price of neighborhood") +
  theme_caviz_hgrid +
  coord_cartesian(xlim=c(40,225))+
  annotate("text", x = 150, y = 10, 
           label = "Quasipoisson with all zeros deleted", 
           color=brown, size=6)

Models comparison

Here, we compare the models fitted until now.

## We can compare model 4 vs 3

vis_qpoisNZ %>% 
  ggplot(aes(y=pe,
             ymin=lower,
             ymax=upper,
             x=valueseq)) +
  geom_line(color=brown) +
  geom_ribbon(alpha=0.2, fill=brown) +
  geom_line(inherit.aes = F, 
            data=vis_nb,
            aes(y=pe, x=valueseq),
            color=blue) +
  geom_ribbon(data=vis_nb,
              aes(y=pe,
                  xmin=lower,
                  xmax=upper),
              alpha=0,
              color=blue,
              linetype="dashed") +
  scale_y_continuous(limits = c(0,25)) +
  scale_x_continuous(labels = scales::dollar_format(suffix = "k")) +
  labs(title= "Expected HOA foreclosure filings per 1000 homes per year",
       y=NULL,
       x="Median home price of neighborhood") +
  theme_caviz_hgrid +
  coord_cartesian(xlim=c(40,225))+
  annotate("text", x = 100, y = 12, 
           label = "Quasipoisson with all zeros deleted", 
           color=brown, size=6) +
  annotate("text", x = 150, y = 6, 
           label = "Negative Binomial with all zeros deleted", 
           color=blue, size=6)

## or model 4 vs 2

vis_qpoisNZ %>% 
  ggplot(aes(y=pe,
             ymin=lower,
             ymax=upper,
             x=valueseq)) +
  geom_line(color=brown) +
  geom_ribbon(alpha=0.2, fill=brown) +
  geom_line(inherit.aes = F, 
            data=vis_poisNZ,
            aes(y=pe, x=valueseq),
            color=red) +
  geom_ribbon(data=vis_poisNZ,
              aes(y=pe,
                  xmin=lower,
                  xmax=upper),
              alpha=0,
              color=red,
              linetype="dashed") +
  scale_y_continuous(limits = c(0,25)) +
  scale_x_continuous(labels = scales::dollar_format(suffix = "k")) +
  labs(title= "Expected HOA foreclosure filings per 1000 homes per year",
       y=NULL,
       x="Median home price of neighborhood") +
  theme_caviz_hgrid +
  coord_cartesian(xlim=c(40,225))+
  annotate("text", x = 100, y = 12, 
           label = "Quasipoisson with all zeros deleted", 
           color=brown, size=6) +
  annotate("text", x = 150, y = 6, 
           label = "Poisson with all zeros deleted", 
           color=red, size=6)

Zero-Inflated Negative Binomial Regression

### Model 5 - Zero-Inflated Negative Binomial

## Estimate ZINB using zeroinfl in the pscl library
## Notice the particular form of the formula -- count first, then zero-inflation

m1zinb <- file9501 ~ log(mdvaluation) + post1975construction | log(mdvaluation) + 
  post1975construction + log(exposure)

## zeroninfl function comes from pscl package

zinb.result <- zeroinfl(m1zinb, 
                        data=m1data, # we are now using the full data
                        dist="negbin", # character specification of count model family
                        offset=log(m1data$exposure))

## Extract results, taking care to keep track of both equations
pe.zinb.count <- zinb.result$coefficients$count
pe.zinb.zeros <- zinb.result$coefficients$zero
pe.zinb <- c(pe.zinb.count,pe.zinb.zeros)
vc.zinb <- vcov(zinb.result)
se.zinb <- sqrt(diag(vc.zinb))
se.zinb.count <- se.zinb[1:length(pe.zinb.count)]
se.zinb.zeros <- se.zinb[ (length(pe.zinb.count)+1) : length(se.zinb) ]
ll.zinb <- logLik(zinb.result)
aic.zinb <- AIC(zinb.result)
theta.zinb <- zinb.result$theta
setheta.zinb <- zinb.result$SE.theta



# Simulate results  (we re-use xhyp from above)

# Option 1:  E(count | not a structural zero)
# { other options include unconditional E(count) and Pr(Structural Zero) }
simparam.zinb <- mvrnorm(sims,pe.zinb,vc.zinb) # draw parameters
simbetas.zinb <- simparam.zinb[,1:length(pe.zinb.count)]
simgammas.zinb <- simparam.zinb[,(length(pe.zinb.count)+1) : length(se.zinb) ]


yhyp.zinbC <- loglinsimev(xhypWZ, simbetas.zinb)

## Transform expected counts to expected rates
yhyp.zinbC$pe <- yhyp.zinbC$pe*1000
yhyp.zinbC$lower <- yhyp.zinbC$lower*1000
yhyp.zinbC$upper <- yhyp.zinbC$upper*1000


## Option 2:  Pr(Structural Zero)
## { other options include E(count|not a structural zero) and unconditional count }
xhypWZ0 <- xhypWZ 
xhypWZ0$model <- file9501 ~ log(mdvaluation) + post1975construction + log(exposure)
xhypWZ0$x$exposure <- xhypWZ$xpre$exposure <- mean(m1data$exposure)
yhyp.zinbZ <- logitsimev(xhypWZ0, simgammas.zinb)




### Model 5 - ggplot visual option 1

vis_zinbC <-
  yhyp.zinbC %>% as.data.frame() %>% 
  bind_cols() %>% as_tibble() %>% 
  mutate(model="zinbC",
         valueseq=valueseq/1000)

(p1 <-
  vis_zinbC %>% 
  ggplot(aes(y=pe,
             ymin=lower,
             ymax=upper,
             x=valueseq)) +
  geom_line(color=orange) +
  geom_ribbon(alpha=0.2, fill=orange) +
  scale_y_continuous(limits = c(0,40)) +
  scale_x_continuous(labels = scales::dollar_format(suffix = "k")) +
  labs(title= "Expected HOA foreclosure filings per 1000 homes per year",
       y=NULL,
       x="Median home price of neighborhood") +
  theme_caviz_hgrid +
  coord_cartesian(xlim=c(40,225))+
  annotate("text", x = 150, y = 10, 
           label = "Zero-Inflated Negative Binomial", 
           color=orange, size=6))

### Model 5 - ggplot visual option 2



vis_zinbZ <-
  yhyp.zinbZ %>% as.data.frame() %>% 
  bind_cols() %>% as_tibble() %>% 
  mutate(model="zinbZ",
         pe = 1 - pe,
         lower = 1 - lower,
         upper = 1 - upper,
         valueseq=valueseq/1000)

(p2 <-
  vis_zinbZ %>% 
  ggplot(aes(y=pe,
             ymin=lower,
             ymax=upper,
             x=valueseq)) +
  geom_line(color=orange) +
  geom_ribbon(alpha=0.2, fill=orange) +
  scale_y_continuous(limits = c(0,1),breaks=seq(0,1,0.2)) +
  scale_x_continuous(labels = scales::dollar_format(suffix = "k")) +
  labs(title= "Probability neighborhood has a filing HOA",
       y=NULL,
       x="Median home price of neighborhood") +
  theme_caviz_hgrid +
  coord_cartesian(xlim=c(40,225))+
  annotate("text", x = 100, y = 0.7, 
           label = "Zero-Inflated Negative Binomial", 
           color=orange, size=6))

### Combine options 1 and 2


bind_rows(vis_zinbC,vis_zinbZ)  %>% 
  ggplot(aes(y=pe,
             ymin=lower,
             ymax=upper,
             x=valueseq)) +
  geom_line(color=orange) +
  geom_ribbon(alpha=0.2, fill=orange) +
  scale_x_continuous(labels = scales::dollar_format(suffix = "k")) +
  labs(title= "Expected HOA foreclosure filings per 1000 homes per year",
       y=NULL,
       x="Median home price of neighborhood") +
  theme_caviz_hgrid +
  coord_cartesian(xlim=c(40,225)) +
  facet_wrap(~model, scales = "free")

## You can aslo use packages like grid.arrange


gridExtra::grid.arrange(p1, p2, ncol=2) # adjust the size of the labels fi you choose this visual

Zero-Inflated Poisson regression

### Model 6 - Zero-Inflated Poisson


## Estimate ZIP using zeroinfl in the pscl library
zip.result <- zeroinfl(m1zinb, 
                       data=m1data, 
                       dist="poisson", 
                       offset=log(m1data$exposure))

## Extract results, taking care to keep track of both equations
pe.zip.count <- zip.result$coefficients$count
pe.zip.zeros <- zip.result$coefficients$zero
pe.zip <- c(pe.zip.count, pe.zip.zeros)
vc.zip <- vcov(zip.result)
se.zip <- sqrt(diag(vc.zip))
se.zip.count <- se.zip[1:length(pe.zip.count)]
se.zip.zeros <- se.zip[ (length(pe.zip.count)+1) : length(se.zip) ]
ll.zip <- logLik(zip.result)
aic.zip <- AIC(zip.result)

## Simulate results  (we re-use xhyp from above)
## Option 1:  E(count | not a structural zero)
## { other options include unconditional E(count) and Pr(Structural Zero) }

simparam.zip <- mvrnorm(sims,pe.zip,vc.zip) # draw parameters

simbetas.zip <- simparam.zip[,1:length(pe.zip.count)]

simgammas.zip <- simparam.zip[,(length(pe.zip.count)+1) : length(se.zip) ]

yhyp.zipC <- loglinsimev(xhypWZ, simbetas.zip)

## Transform expected counts to expected rates
yhyp.zipC$pe <- yhyp.zipC$pe*1000
yhyp.zipC$lower <- yhyp.zipC$lower*1000
yhyp.zipC$upper <- yhyp.zipC$upper*1000


## Option 3:  Pr(Structural Zero)
## { other options include E(count|not a structural zero) and unconditional count }
yhyp.zipZ <- logitsimev(xhypWZ0, simgammas.zip)


### Model 6 - ggplot visual option 1


vis_zipC <-
  yhyp.zipC %>% as.data.frame() %>% 
  bind_cols() %>% as_tibble() %>% 
  mutate(model="zipC",
         valueseq=valueseq/1000)

(p3 <-
    vis_zipC %>% 
    ggplot(aes(y=pe,
               ymin=lower,
               ymax=upper,
               x=valueseq)) +
    geom_line(color=green) +
    geom_ribbon(alpha=0.2, fill=green) +
    scale_y_continuous(limits = c(0,25)) +
    scale_x_continuous(labels = scales::dollar_format(suffix = "k")) +
    labs(title= "Expected HOA foreclosure filings per 1000 homes per year",
         y=NULL,
         x="Median home price of neighborhood") +
    theme_caviz_hgrid +
    coord_cartesian(xlim=c(40,225))+
    annotate("text", x = 150, y = 10, 
             label = "Expected counts Poisson", 
             color=green, size=6))

### Model 6 - ggplot visual option 2


vis_zipZ <-
  yhyp.zipZ %>% as.data.frame() %>% 
  bind_cols() %>% as_tibble() %>% 
  mutate(model="zinbZ",
         pe = 1 - pe,
         lower = 1 - lower,
         upper = 1 - upper,
         valueseq=valueseq/1000)

(p4 <-
    vis_zinbZ %>% 
    ggplot(aes(y=pe,
               ymin=lower,
               ymax=upper,
               x=valueseq)) +
    geom_line(color=green) +
    geom_ribbon(alpha=0.2, fill=green) +
    scale_y_continuous(limits = c(0,1),breaks=seq(0,1,0.2)) +
    scale_x_continuous(labels = scales::dollar_format(suffix = "k")) +
    labs(title= "Probability neighborhood has a filing HOA",
         y=NULL,
         x="Median home price of neighborhood") +
    theme_caviz_hgrid +
    coord_cartesian(xlim=c(40,225))+
    annotate("text", x = 100, y = 0.7, 
             label = "Zero-Inflated Poisson", 
             color=green, size=6))

### Combine options 1 and 2

## You can aslo use packages like grid.arrange


gridExtra::grid.arrange(p3, p4, ncol=2) # adjust the size of the labels fi you choose this visual

Visualizations with `tile`

## NOTE: to run the code below you need all the quantities estimated in the previous section

### Model 1 - Poisson using GLM with zeros included

# Set up lineplot trace of result for plotting using tile
trace1poisWZ <- lineplot(x = valueseq/1000,
                         y = yhyp.poisWZ$pe,
                         lower = yhyp.poisWZ$lower,
                         upper = yhyp.poisWZ$upper,
                         ci = list(mark="shaded"),
                         lex=1.5,
                         extrapolate = list(data=model.frame(m1, m1data),
                                            cfact=model.frame(m1, xhypWZ$x),
                                            omit.extrapolated=TRUE),
                         col = purple,
                         plot = 1
)

poisLabWZ <- textTile(labels="Poisson including excess zeros",
                      x=77,
                      y=3,
                      col=purple,
                      cex=1,
                      plot=1)

rug1 <- rugTile(x=m1data$mdvaluation[(m1data$mdvaluation>35000)
                                     &(m1data$mdvaluation<400000)]/1000,
                type="jitter",
                col=nicegray, alpha=0.4, plot=1)

xat <- c(50, 100, 200)
xlabs <- c("$50k", "$100k", "$200k")

## Plot traces using tile

tile(trace1poisWZ, rug1, poisLabWZ,
     RxC = c(1,1),
     limits = c(35,400,0,28.5),
     yaxis=list(major=FALSE),
     xaxis=list(log=TRUE, at=xat, labels=xlabs),
     xaxistitle = list(labels="Median home price of neighborhood"),
     plottitle = list(labels="Expected HOA foreclosure filings per 1000 homes per year",
                      x=0.39, y=0.75),
     height = list(plot="golden", xaxistitle=3.5, plottitle=3.5),
     gridlines = list(type="y")
)

### Model 2 - Poisson using GLM with zeros removed


# Set up lineplot trace of result for plotting using tile
trace1poisNZ <- lineplot(x = valueseq/1000,
                         y = yhyp.poisNZ$pe,
                         lower = yhyp.poisNZ$lower,
                         upper = yhyp.poisNZ$upper,
                         ci = list(mark="shaded"),
                         lex=1.5,
                         extrapolate = list(data=model.frame(m1, m1dataNZ),
                                            cfact=model.frame(m1, xhypNZ$x),
                                            omit.extrapolated=TRUE),
                         col = red,
                         plot = 1
)

poisLabNZ <- textTile(labels="Poisson with all zeros deleted",
                      x=81,
                      y=12.5,
                      col=red,
                      cex=1,
                      plot=1)

rug1 <- rugTile(x=m1dataNZ$mdvaluation[(m1dataNZ$mdvaluation>35000)
                                       &(m1dataNZ$mdvaluation<400000)]/1000,
                type="jitter",
                col=nicegray, alpha=0.4, plot=1)

xat <- c(50, 100, 200)
xlabs <- c("$50k", "$100k", "$200k")

## Plot traces using tile

tile(trace1poisNZ, rug1, poisLabNZ,
     RxC = c(1,1),
     limits = c(35,400,0,28.5),
     yaxis=list(major=FALSE),
     xaxis=list(log=TRUE, at=xat, labels=xlabs),
     xaxistitle = list(labels="Median home price of neighborhood"),
     plottitle = list(labels="Expected HOA foreclosure filings per 1000 homes per year",
                      x=0.39, y=0.75),
     height = list(plot="golden", xaxistitle=3.5, plottitle=3.5),
     gridlines = list(type="y")
)

### Model 3 -  Negative binomial regression, zeros excluded


# Set up lineplot trace of result for plotting using tile
trace1nb <- lineplot(x = valueseq/1000,
                     y = yhyp.nb$pe,
                     lower = yhyp.nb$lower,
                     upper = yhyp.nb$upper,
                     ci = list(mark="shaded"),
                     lex=1.5,
                     extrapolate = list(data=model.frame(m1, m1dataNZ),
                                        cfact=model.frame(m1, xhypNZ$x),
                                        omit.extrapolated=TRUE),
                     col = blue,
                     plot = 1
)

poisLabNB <- textTile(labels="Negative Binomial with all zeros deleted",
                      x=98,
                      y=12.5,
                      col=blue,
                      cex=1,
                      plot=1)

xat <- c(50, 100, 200)
xlabs <- c("$50k", "$100k", "$200k")

## Plot traces using tile

tile(trace1nb, rug1, poisLabNB,
     limits = c(35,400,0,28.5),
     yaxis=list(major=FALSE),
     xaxis=list(log=TRUE, at=xat, labels=xlabs),
     xaxistitle = list(labels="Median home price of neighborhood"),
     plottitle = list(labels="Expected HOA foreclosure filings per 1000 homes per year",
                      x=0.39, y=0.75),
     height = list(plot="golden", xaxistitle=3.5, plottitle=3.5),
     gridlines = list(type="y")
)

### Model 4 -  Quasipoisson using GLM with zeros removed


## Set up lineplot trace of result for plotting using tile
trace1qpoisNZ <- lineplot(x = valueseq/1000,
                          y = yhyp.qpoisNZ$pe,
                          lower = yhyp.qpoisNZ$lower,
                          upper = yhyp.qpoisNZ$upper,
                          ci = list(mark="shaded"),
                          lex=1.5,
                          extrapolate = list(data=model.frame(m1, m1dataNZ),
                                             cfact=model.frame(m1, xhypNZ$x),
                                             omit.extrapolated=TRUE),
                          col = brown,
                          plot = 1
)

qpoisLabNZ <- textTile(labels="Quasipoisson with all zeros deleted",
                       x=93,
                       y=12.5,
                       col=brown,
                       cex=1,
                       plot=1)

# Switch negative binomial CIs to dashed lines
trace1nb$ci$mark <- "dashed"

# Move negative binomial labels
poisLabNB$y <- 7.5
poisLabNB$x <- 140

## Plot traces using tile

tile(trace1qpoisNZ, trace1nb, rug1, qpoisLabNZ, poisLabNB,
     limits = c(35,400,0,28.5),
     yaxis=list(major=FALSE),
     xaxis=list(log=TRUE, at=xat, labels=xlabs),
     xaxistitle = list(labels="Median home price of neighborhood"),
     plottitle = list(labels="Expected HOA foreclosure filings per 1000 homes per year",
                      x=0.39, y=0.75),
     height = list(plot="golden", xaxistitle=3.5, plottitle=3.5),
     gridlines = list(type="y")
)

## Switch poisson CIs to dashed lines
trace1poisNZ$ci$mark <- "dashed"

## Move poisson labels
poisLabNZ$y <- 7.5
poisLabNZ$x <- 125

## Plot traces using tile

tile(trace1qpoisNZ, rug1, qpoisLabNZ, poisLabNZ, trace1poisNZ,
     limits = c(35,400,0,28.5),
     yaxis=list(major=FALSE),
     xaxis=list(log=TRUE, at=xat, labels=xlabs),
     xaxistitle = list(labels="Median home price of neighborhood"),
     plottitle = list(labels="Expected HOA foreclosure filings per 1000 homes per year",
                      x=0.39, y=0.75),
     height = list(plot="golden", xaxistitle=3.5, plottitle=3.5),
     gridlines = list(type="y")
)

### Model 5 - Zero-Inflated Negative Binomial (Option 1)


## Set up lineplot trace of result for plotting using tile
trace1zinbC <- lineplot(x = valueseq/1000,
                        y = yhyp.zinbC$pe,
                        lower = yhyp.zinbC$lower,
                        upper = yhyp.zinbC$upper,
                        ci = list(mark="shaded"),
                        lex=1.5,
                        extrapolate = list(data=model.frame(m1, m1data),
                                           cfact=model.frame(m1, xhypWZ$x),
                                           omit.extrapolated=TRUE),
                        col = orange,
                        plot = 1
)

poisLabZINB2 <- textTile(labels="Zero-Inflated Negative Binomial",
                         x=80,
                         y=17.5,
                         col=orange,
                         cex=1,
                         plot=1)

xat <- c(50, 100, 200)
xlabs <- c("$50k", "$100k", "$200k")

## Plot traces using tile

tile(trace1zinbC, rug1, poisLabZINB2,
     RxC = c(1,1),
     limits = c(35,400,0,28.5),
     yaxis=list(major=FALSE),
     xaxis=list(log=TRUE, at=xat, labels=xlabs),
     xaxistitle = list(labels="Median home price of neighborhood"),
     plottitle = list(labels="Expected HOA foreclosure filings per 1000 homes per year",
                      x=0.39, y=0.75),
     height = list(plot="golden", xaxistitle=3.5, plottitle=3.5),
     gridlines = list(type="y")
)

### Model 5 - Zero-Inflated Negative Binomial (Option 2)

# recall: this option include E(count|not a structural zero) and unconditional count



## Set up lineplot trace of result for plotting using tile
trace1zinbZ <- lineplot(x = valueseq/1000,
                        y = 1-yhyp.zinbZ$pe,
                        lower = 1-yhyp.zinbZ$lower,
                        upper = 1-yhyp.zinbZ$upper,
                        ci = list(mark="shaded"),
                        lex=1.5,
                        extrapolate = list(data=model.frame(m1, m1data),
                                           cfact=model.frame(m1, xhypWZ0$x),
                                           omit.extrapolated=TRUE),
                        col = orange,
                        plot = 1
)

poisLabZINB1 <- textTile(labels="Zero-Inflated Negative Binomial",
                         x=189,
                         y=0.3,
                         col=orange,
                         cex=1,
                         plot=1)

xat <- c(50, 100, 200)
xlabs <- c("$50k", "$100k", "$200k")

## Plot traces using tile

tile(trace1zinbZ, rug1, poisLabZINB1,
     RxC = c(1,1),
     limits = c(35,400,-0.01,1.01),
     yaxis=list(major=FALSE),
     xaxis=list(log=TRUE, at=xat, labels=xlabs),
     xaxistitle = list(labels="Median home price of neighborhood"),
     plottitle = list(labels="Probability neighborhood has a foreclosure-filing HOA",
                      x=0.341, y=0.75),
     height = list(plot="golden", xaxistitle=3.5, plottitle=3.5),
     gridlines = list(type="y")
)

## Combined plot: revise some details
trace1nb$plot <- 2
trace1nb$ci <- list(mark="dashed")
trace1zinbC$plot <- 2
rug1$plot <- c(1,2)
poisLabZINB2$plot <- 2

poisLabNB2 <- textTile(labels="Negative Binomial",
                       x=115,
                       y=7.5,
                       col=blue,
                       cex=1,
                       plot=2)

poisLabZINB1$x <- 205
poisLabZINB2$x <- 94

## Plot traces using tile

tile(trace1zinbZ, trace1zinbC,  rug1,
     poisLabZINB1, poisLabZINB2,
     RxC = c(1,2),
     limits = rbind(c(35,400,0,1.0),
                    c(35,400,0,28.5)),
     yaxis=list(major=FALSE),
     xaxis=list(log=TRUE, at=xat, labels=xlabs),
     xaxistitle = list(labels="Median home price of neighborhood"),
     plottitle = list(labels1="Probability neighborhood has a filing HOA",
                      labels2="Expected HOA foreclosure filings per 1000 homes",
                      x1=0.32625, y1=0.75, x2=0.42025, y2=0.75),  # 355 405
     height = list(plot="golden", xaxistitle=3.5, plottitle=3.5),
     width = list(spacer=3),
     gridlines = list(type="y")
)

## Plot traces using tile

tile(trace1zinbZ, trace1zinbC, trace1nb, rug1,
     poisLabZINB1, poisLabZINB2, poisLabNB2,
     RxC = c(1,2),
     limits = rbind(c(35,400,0,1.0),
                    c(35,400,0,28.5)),
     yaxis=list(major=FALSE),
     xaxis=list(log=TRUE, at=xat, labels=xlabs),
     xaxistitle = list(labels="Median home price of neighborhood"),
     plottitle = list(labels1="Probability neighborhood has a filing HOA",
                      labels2="Expected HOA foreclosure filings per 1000 homes",
                      x1=0.32625, y1=0.75, x2=0.42025, y2=0.75),  # 355 405
     height = list(plot="golden", xaxistitle=3.5, plottitle=3.5),
     width = list(spacer=3),
     gridlines = list(type="y")
)

### Model 6 - Zero-Inflated Poisson


## Set up lineplot trace of result for plotting using tile
trace1zipC <- lineplot(x = valueseq/1000,
                       y = yhyp.zipC$pe,
                       lower = yhyp.zipC$lower,
                       upper = yhyp.zipC$upper,
                       ci = list(mark="shaded"),
                       lex=1.5,
                       extrapolate = list(data=model.frame(m1, m1data),
                                          cfact=model.frame(m1, xhypWZ$x),
                                          omit.extrapolated=TRUE),
                       col = green,
                       plot = 1
)

poisLabZIP2 <- textTile(labels="Zero-Inflated Poisson",
                        x=120,
                        y=7.5,
                        col=green,
                        cex=1,
                        plot=1)


## Option 3:  Pr(Structural Zero)
## { other options include E(count|not a structural zero) and unconditional count }
yhyp.zipZ <- logitsimev(xhypWZ0, simgammas.zip)

## Set up lineplot trace of result for plotting using tile
trace1zipZ <- lineplot(x = valueseq/1000,
                       y = 1-yhyp.zipZ$pe,
                       lower = 1-yhyp.zipZ$lower,
                       upper = 1-yhyp.zipZ$upper,
                       ci = list(mark="shaded"),
                       lex=1.5,
                       extrapolate = list(data=model.frame(m1, m1data),
                                          cfact=model.frame(m1, xhypWZ0$x),
                                          omit.extrapolated=TRUE),
                       col = green,
                       plot = 1
)

poisLabZIP1 <- textTile(labels="Zero-Inflated Poisson",
                        x=205,
                        y=0.133,
                        col=green,
                        cex=1,
                        plot=1)


## Combined plot: revise some details
trace1zipC$plot <- 2
poisLabZIP2$plot <- 2

trace1zinbZ$ci$mark <- "dashed"
trace1zinbC$ci$mark <- "dashed"

poisLabZINB1$x <- 75
poisLabZINB1$y <- 0.5

## Plot traces using tile

tile(trace1zipZ, trace1zipC, rug1, 
     poisLabZIP1, poisLabZIP2, 
     RxC = c(1,2),
     limits = rbind(c(35,400,0,1.0),
                    c(35,400,0,28.5)),
     yaxis=list(major=FALSE),
     xaxis=list(log=TRUE, at=xat, labels=xlabs),
     xaxistitle = list(labels="Median home price of neighborhood"),
     plottitle = list(labels1="Probability neighborhood has a filing HOA",
                      labels2="Expected HOA foreclosure filings per 1000 homes",
                      x1=0.32625, y1=0.75, x2=0.42025, y2=0.75),  # 355 405
     height = list(plot="golden", xaxistitle=3.5, plottitle=3.5),
     width = list(spacer=3),
     gridlines = list(type="y")
)

## Plot traces using tile

tile(trace1zipZ, trace1zipC, rug1, trace1zinbZ, trace1zinbC,
     poisLabZIP1, poisLabZIP2, poisLabZINB1, poisLabZINB2,
     RxC = c(1,2),
     limits = rbind(c(35,400,0,1.0),
                    c(35,400,0,28.5)),
     yaxis=list(major=FALSE),
     xaxis=list(log=TRUE, at=xat, labels=xlabs),
     xaxistitle = list(labels="Median home price of neighborhood"),
     plottitle = list(labels1="Probability neighborhood has a filing HOA",
                      labels2="Expected HOA foreclosure filings per 1000 homes",
                      x1=0.32625, y1=0.75, x2=0.42025, y2=0.75),  # 355 405
     height = list(plot="golden", xaxistitle=3.5, plottitle=3.5),
     width = list(spacer=3),
     gridlines = list(type="y")
)

CSSS/POLS 510 - Maximum Likelihood Estimation

Lab 9. Modeling Counts

Ramses Llobet

Modeling Counts

The Poisson Model

Estimating Parameter Stability through Simulation

Limitations of the Poisson Model: Overdispersion

Negative Binomial Models

Illustration with Overdispersed Data

Comparison of Poisson and Negative Binomial Models

Zero-Inflation Processes

Application: Foreclosure Filings from HOAdata.org

Poisson regression

Negative binomial regression

Quasi-Poisson Regression

Models comparison

Zero-Inflated Negative Binomial Regression

Zero-Inflated Poisson regression

Visualizations with `tile`

CSSS/POLS 510 - Maximum Likelihood Estimation

Lab 9. Modeling Counts

Ramses Llobet

Modeling Counts

The Poisson Model

Estimating Parameter Stability through Simulation

Limitations of the Poisson Model: Overdispersion

Negative Binomial Models

Illustration with Overdispersed Data

Comparison of Poisson and Negative Binomial Models

Zero-Inflation Processes

Application: Foreclosure Filings from HOAdata.org

Poisson regression

Negative binomial regression

Quasi-Poisson Regression

Models comparison

Zero-Inflated Negative Binomial Regression

Zero-Inflated Poisson regression

Visualizations with tile

Visualizations with `tile`