BTS 510 Lab 11

set.seed(12345)
library(tidyverse)
Warning: package 'purrr' was built under R version 4.5.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(Stat2Data)
theme_set(theme_classic(base_size = 16))

1 Learning objectives

  • Conduct logistic regression for binary outcomes
  • Interpret results in terms of probability, odds, and log-odds

2 Data

  • ICU data from the Stat2Data package: n = 200
    • ID: Patient ID code
    • Survive: 1 = patient survived to discharge or 0 = patient died
    • Age: Age (in years)
    • AgeGroup: 1 = young (under 50), 2 = middle (50-69), 3 = old (70+)
    • Sex: 1 = female or 0 = male
    • Infection: 1 = infection suspected or 0 = no infection
    • SysBP: Systolic blood pressure (in mm of Hg)
    • Pulse: Heart rate (beats per minute)
    • Emergency: 1 = emergency admission or 0 = elective admission

3 Analysis

  • Replicate and extend the analysis from the lecture
    • Age and Emergency predict Survive
    • Be sure to mean center Age for interpretability

4 Pseudo R^2

  • For linear regression, R^2_{multiple} is calculated from the sums of squares from ordinary least squares estimation
  • Logistic regression is estimated via maximum likelihood
    • There are no sums of squares
    • There is, however, deviance
  • Deviance is a function of the log-likelihood
    • Deviance for a model is how far the model is from a theoretical perfect model where every observation is perfectly predicted
    • Relative measure so can only compare to another model
    • Big deviance (far from perfect) is worse than small deviance (closer to perfect)
  • Pseudo R^2
    • Compare your model to a model with no predictors (only intercept)
    • How much closer is this model to the perfect model than the intercept only model?
    • Theoretically ranges from 0 to 1 (but not in practice)
    • R^2_{deviance} = 1 - \frac{deviance_{M}}{deviance_{intercept\ only}}
    • Deviance for your model is “Residual deviance” from output
    • Deviance for intercept model is “Null deviance” from output

5 Confidence intervals

  • Calculate confidence intervals for all coefficients in a model using confint()
    • For example: confint(m1) to get confidence intervals for model m1
    • If the CI doesn’t include zero, the coefficient is significant
    • Default is 99% confidence interval
      • Use level argument to change
      • e.g., confint(m1, level = 0.99) produces 99% CIs
  • To convert from default logit metric to odds metric: Exponentiate
    • Logit metric: b_1
    • Convert to odds ratio: e^{b_1}
    • Do the same thing to the confidence intervals
      • exp(confint(m1)) produces the odds metric CIs
      • Compare to “no effect” value of one for odds
      • If the CI doesn’t include one, the coefficient is significant

6 Predicted values

  • Two main options
    • Use R like a calculator to calculate predicted values using equations
      • You can be fancy and grab the coefficients directly from the output
      • e.g., using coef(m1)[1] to get the first coefficient, etc.
      • Or just type the numbers in
    • Use predict() to calculate predicted values
      • Set up a new dataset with the predictor values you’re interested in
      • e.g., age_vals <- data.frame(AgeC = c(-20, 0, 20))
      • Then use that new data in predict()
      • e.g., predict(m1, newdata = age_vals, type = "response")
      • type = "response" gives predicted values in probability
      • type = "link" gives predicted values in logit

7 Tasks

  • Model 1: Survive ~ Age
  • Model 2: Survive ~ Age + Emergency
  1. Conduct a likelihood ratio test to compare the models. Report the results. Which model is preferred?

  2. Report the full results of the preferred model. This is similar to linear regression. For each coefficient (including intercept), report:

  • Regression coefficient
  • Standard error
  • z statistic (note that there aren’t degrees of freedom for z tests)
  • p value
  1. Calculate predicted probabilities for 6 predictor combinations:
  • Emergency admit, 20 years younger than mean
  • Emergency admit, mean age
  • Emergency admit, 20 years older than mean
  • Non-emergency admit, 20 years younger than mean
  • Non-emergency admit, mean age
  • Non-emergency admit, 20 years older than mean
  1. Report the odds ratios for each predictor and their confidence intervals. Interpret the odds ratios in words.

  2. Describe the overall findings for this model. Be statistically accurate but avoid jargon and technical terms as much as you can. Be sure to use the names of the variables studied (i.e., survival, age, emergency admission) rather than X and Y.