BTS 510 Lab 12

set.seed(12345)
library(tidyverse)
Warning: package 'purrr' was built under R version 4.5.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(Stat2Data)
theme_set(theme_classic(base_size = 16))

1 Learning objectives

  • Conduct Poisson regression for count outcomes
  • Interpret results in terms of predicted counts
  • Describe when to use alternative models / specifications
    • Negative binomial, inflated zeroes, offset, etc.

2 Data

  • Read in data from text file: jpapoisson.txt
dat <- read_table("jpapoisson.txt",
                  col_names = FALSE)
colnames(dat) <- c("ID", "sensation", "sex", "alcohol")
  • n = 200 observations on 4 variables
    • ID: ID variable
    • sensation: 1 to 7 scale of sensation seeking
    • sex: 0 = female, 1 = male
    • alcohol: count of number of alcoholic drinks consumed

3 Analysis

  • Extend the analysis from the lecture
    • sensation and sex predict alcohol
    • Be sure to mean center sensation for interpretability

4 Tasks

  • Model 1: Poisson regression for alcohol ~ sensation + sex
  • Model 2: Negative binomial for alcohol ~ sensation + sex
  1. How can we compare models 1 and 2? Do that comparison. Which model is preferred?

  2. Report the full results of the preferred model.

  • Report the R^2 for the model
  • For each coefficient (including intercept), report:
    • Regression coefficient
    • Standard error
    • z statistic
    • p value
  1. Calculate predicted counts for alcohol for 6 predictor combinations:
  • Male, low sensation seeking
  • Male, mean sensation seeking
  • Male, high sensation seeking
  • Female, low sensation seeking
  • Female, mean sensation seeking
  • Female, high sensation seeking
  1. Report the rate ratios for each predictor and their confidence intervals. Interpret the rate ratios in words.

  2. Describe the overall findings for this model. Be statistically accurate but avoid jargon and technical terms as much as you can. Be sure to use the names of the variables studied (i.e., sensation seeking, sex, alcoholic drinks consumed) rather than X and Y.

5 Bonus task: Are there excess zeroes?

  • Zero-inflated Poisson
    • y ~ count_predictors | logistic_predictors
library(pscl)
zip <- zeroinfl(y ~ sensationC + sex | sensationC + sex, 
                data = dat,
                dist = "poisson")
summary(zip)
  • Zero-inflated negative binomial
    • y ~ count_predictors | logistic_predictors
library(pscl)
zinb <- zeroinfl(y ~ sensationC + sex | sensationC + sex, 
                 data = dat,
                 dist = "negbin")
summary(zinb)
  • Hurdle Poisson
    • y ~ count_predictors | logistic_predictors
library(pscl)
hurdle_poi <- hurdle(y ~ sensationC + sex | sensationC + sex, 
                     data = dat, 
                     dist = "poisson")
summary(hurdle_poi)
  • Hurdle negative binomial
    • y ~ count_predictors | logistic_predictors
library(pscl)
hurdle_nb <- hurdle(y ~ sensationC + sex | sensationC + sex, 
                    data = dat, 
                    dist = "negbin")
summary(hurdle_nb)