Multivariate: Poisson regression

1 Goals

1.1 Goals

1.1.1 Goals of this lecture

My outcome variable isn’t normally distributed
- It’s a count of how often something happened!!!
- Probably right skewed (among other things)
- May also have a lot of zeroes
Use Poisson regression (or a related model) to analyze the outcome
- It’s an extension of linear regression
- Many of the same concepts still apply

2 Counts and frequencies

2.1 Count outcomes

2.1.1 Count outcomes

Easy to miss that linear regression assumptions are violated
Count variables are
- Discrete: only take on whole number (integer) values
- Lower bound of 0
- Typically right skewed

2.1.2 Figure: histogram of count

2.1.3 Figure: count as outcome

2.1.4 GLM with count outcomes

Count outcomes violate assumptions of GLM (regression and ANOVA)
- Residuals are not conditionally normally distributed
- Residuals do not have a constant variance (heteroscedasticity)
GLM also results in out-of-bound (i.e., less than 0) predicted values

2.1.5 Figure: using GLM, out-of-bounds prediction

2.1.6 Figure: using GLM, histogram of count residuals

2.1.7 Figure: Using GLM, X versus count residuals

3 Poisson regression

3.1 Model

3.1.1 Poisson regression

Outcome: count in a fixed period of time
- Integer (whole number) values greater than or equal to 0
Residuals: Poisson distribution
Link function: natural log (\(ln\))

3.1.2 Poisson distribution

Named after Simeon Denis Poisson (1781 - 1840)
I had to delete the pictures because they were too large
- Jim Morrison
- Jim Morrison’s grave
- Pere Lachaise cemetery sign
- Pere Lachaise cemetery

3.1.3 Poisson distribution

Number of times a low probability event happens
- Number of soldiers killed by horse-kicks each year in each corps in the Prussian cavalry (Bortkiewicz)
- Number of yeast cells used when brewing Guinness beer (Gosset, “Student” of “Student’s \(t\)”)
- Number of typographical errors on each page of a book
- Number of bombs falling in London in each square block
- Number of earthquakes per day in Southern California

3.1.4 Poisson distribution

\[P(X = k) = \frac{\lambda^k}{k!}e^{-\lambda}\]

Poisson distribution properties:
- Mean of a Poisson distribution = \(\lambda\)
- Variance of a Poisson distribution = \(\lambda\)
- Discrete distribution, so only defined for integers
- Undefined below zero

3.1.5 Figure: mean = variance = 1

3.1.6 Figure: mean = variance = 4

3.1.7 Figure: mean = variance = 7

3.1.8 Figure: mean = variance = 10

3.1.9 Poisson regression

Outcome: count in a fixed period of time
- Integer (whole number) values greater than or equal to 0
Residuals: Poisson distribution
- Heteroscedasticity is a feature of Poisson distribution
- Mean and variance are related
Link function: natural log (\(ln\))
- \(\ln\) is only defined for positive values, so no predicted values < 0

3.1.10 Figure: Poisson regression graphically

3.1.11 Figure: Poisson regression graphically

3.2 Interpretation

3.2.1 Two forms of Poisson regression

In terms of the predicted count, \(\hat{\mu}\):

\[\hat{Y} = \hat{\mu} = e^{b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_p X_p}\]

In terms of the natural log of the predicted count, \(ln(\hat{\mu})\):

\[ln(\hat{Y}) = ln(\hat{\mu}) = b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_p X_p\]

Poisson regression is sometimes called the log-linear model because the log (i.e., \(ln\)) form is a straight line

3.2.2 Count metric interpretation: \(\hat{\mu} = e^{0.466 + 0.925 X}\)

3.2.3 Count metric interpretation

Intercept:
- \(e^{b_0}\) is the predicted count when \(X\) = 0
Slope:
- \(e^{b_1}\) is the multiplicative effect of \(X\) on the predicted count
  - For a 1 unit increase in \(X\), the predicted count is multiplied by \(e^{b_1}\)
  - This is sometimes called the rate ratio (RR)

3.2.4 Count metric interpretation

\(e^{b_0} = e^{0.4659581} = 1.594\)
- Predicted count when \(X\) = 0
\(e^{b_1} = e^{0.9247665} = 2.521\)
- Multiplicative effect of \(X\) on the predicted count
- For a 1 unit increase in \(X\), the predicted count is multiplied by \(2.521\)

3.2.5 Count metric interpretation

X	Predicted count
-2	0.251
-1	0.632
0	1.594
1	4.018
2	10.130

3.2.6 Log of predicted count metric interpretation \(ln(\hat{\mu}) = 0.466 + 0.925 X\)

Intercept:
- \(b_0\) is the \(ln(\hat{\mu})\) when \(X\) = 0
Slope:
- \(b_1\) is the additive effect of \(X\) on the \(ln(\hat{\mu})\)

3.2.7 Log of predicted count metric interpretation

\(b_0 = 0.466\)
- \(ln(\hat{\mu})\) when \(X\) = 0
\(b_1 = 0.9248\)
- Additive effect of \(x\) on the \(ln(\hat{\mu})\)
- For a 1 unit increase in \(X\), the \(ln(\hat{\mu})\) increases by \(0.9248\)

3.2.8 Log of predicted count metric interpretation

X	Predicted count	ln(Predicted count)
-2	0.251	-1.384
-1	0.632	-0.459
0	1.594	0.466
1	4.018	1.391
2	10.130	2.315

3.2.9 Confidence intervals

Results from software are in \(ln(\hat{\mu})\) metric (linear metric)

b Lower CL Upper CL

(Intercept) 0.466 0.277 0.643

x 0.925 0.806 1.047
To get predicted count metric, exponentiate each value (\(e^{b}\))

exp.b. Lower.CL Upper.CL

(Intercept) 1.594 1.320 1.902

x 2.521 2.239 2.848

	b	Lower CL	Upper CL
(Intercept)	0.466	0.277	0.643
x	0.925	0.806	1.047

	exp.b.	Lower.CL	Upper.CL
(Intercept)	1.594	1.320	1.902
x	2.521	2.239	2.848

4 Overdispersion

4.1 Overdispersion

4.1.1 Equidispersion

Remember the Poisson distribution:
- Mean = Variance = \(\lambda\)
- This is called equidispersion

4.1.2 Overdispersion

In Poisson regression, equidispersion means that the (conditional) mean and (conditional) variance of \(Y\) are equal
- In practice, the actual variance of \(Y\) is typically larger than expected by the Poisson distribution
- This is called overdispersion
- (Can technically be smaller, uncommon, underdispersion)

4.1.3 Overdispersion

Why does overdispersion happen?
- Sometimes, just because
- Due to omitted predictors
- “Excess” zeroes

4.1.4 Overdispersion

Why do we care about overdispersion?
- The variance estimated in the model (\(\lambda\)) helps determine the standard errors for regression coefficients
- It also plays a role in calculating deviance
- If the variance in our data is larger than \(\lambda\), we are using the wrong standard errors and the deviance is wrong
  - Test of significance, pseudo-\(R^2\), model comparisons are wrong

4.1.5 What do we do?

There are two main alternative models for overdispersion
- Overdispersed Poisson regression
- Negative binomial regression
Both handle overdispersion, but in slightly different ways

4.2 Overdispersed Poisson regression

4.2.1 Overdispersed (or quasi-) Poisson regression

The simplest way to deal with overdispersion is to include a “fudge factor” to account for overdispersion
- The variance is larger than the model says, so we just multiply it by something to make it larger
The fudge factor is provided by the scale parameter (\(\psi\))
- If equidispersion holds, \(\psi\) = 1
- If there is overdispersion, \(\psi\) > 1
- (\(\psi\) will show up in your output, no need to calculate)

4.2.2 Overdispersed (or quasi-) Poisson regression

Compared to Poisson regression:
- Each standard error in the OD Poisson model is multiplied by \(\sqrt{\psi}\)
- If \(\psi\) > 1, this will make the standard errors larger
Software
- SPSS: Scale value in output is \(\psi\)
- R: Scale value in output is \(\psi\)
- SAS: Scale value in output is \(\sqrt{\psi}\)

4.2.3 Estimating the scale parameter

The scale parameter can be estimated in two similar ways
- Pearson \(\chi^2\): \(\psi = \frac{\chi^2_{Pearson}}{df}\)
- Deviance: \(\psi = \frac{Deviance}{df}\)
The results are typically very close, but Pearson is preferred

4.2.4 Overdispersed Poisson regression interpretation

Interpretation for the overdispersed Poisson regression model is identical to interpretation for the Poisson regression model
- Interpret coefficients the same
  - Predicted count metric or log of predicted count metric
Regression coefficients same as coefficients from Poisson regression
- Only standard errors (and therefore potentially significance) will change

4.2.5 Comparison

Poisson regression:

	Estimate	Std. Error	z value	Pr(>\|z\|)
(Intercept)	0.466	0.093	5.000	0
x	0.925	0.061	15.063	0

Overdispersed Poisson regression (from R, \(\psi\) = 2.45):

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.466	0.146	3.196	0.002
x	0.925	0.096	9.628	0.000

4.3 Negative binomial regression

4.3.1 Negative binomial distribution

Related to Poisson distribution
- But that’s really not obvious
- Mixture of Poisson and gamma distributions
- Addition of gamma adds in extra variance to make variance > mean

\[P(X = k) = {(k - r - 1) \choose k} (1 - p)^r p^k\]

4.3.2 Negative binomial in words 1

Poisson regression assumes that every person with the same predictor value(s) comes from the same Poisson distribution

Person A with \(X\) = 1: \(Y\) from Poisson distribution w mean = \(e^{b_0+b_1(1)}\)
Person B with \(X\) = 1: \(Y\) from Poisson distribution w mean = \(e^{b_0+b_1(1)}\)
Person C with \(X\) = 1: \(Y\) from Poisson distribution w mean = \(e^{b_0+b_1(1)}\)

But what if there’s actually more variability among subjects than captured by the predictors? (i.e., overdispersion)

For example, what if we have omitted a predictor?

4.3.3 Negative binomial in words 2

Negative binomial regression allows people with the same predictor value(s) to come from different Poisson distributions

Person A with \(X\) = 1: \(Y\) from one Poisson distribution
Person B with \(X\) = 1: \(Y\) from another Poisson distribution
Person C with \(X\) = 1: \(Y\) from yet another Poisson distribution

The means of the Poisson distributions follow a gamma distribution

The average of those distributions is the same as Poisson regression
But variability of several different Poisson distributions > variability of single Poisson distribution

4.3.4 Person A: \(Y\) from Poisson distribution with \(\lambda\) = 1

4.3.5 Person B: \(Y\) from Poisson distribution with \(\lambda\) = 4

4.3.6 Person C: \(Y\) from Poisson distribution with \(\lambda\) = 7

4.3.7 Negative binomial distribution

4.3.8 Negative binomial distribution

4.3.9 Negative binomial distribution

4.3.10 Negative binomial scale parameter

Negative binomial regression estimates a different scale parameter
- Usually called \(\alpha\) (SPSS, SAS, STATA)
- Sometimes called \(\theta\) which is \(1/\alpha\) (R)

4.3.11 Negative binomial scale parameter

Conditional residual distribution based on negative binomial
- Mean = \(\mu\)
- Variance = \(\mu + \alpha \mu^2\)
  - If there is overdispersion, \(\alpha\) > 0
  - If there is not, \(\alpha\) = 0 and this reduces to Poisson
Standard errors don’t need to be adjusted
- They are calculated using variance = \(\mu + \alpha \mu^2\) to begin with

4.3.12 Negative binomial regression interpretation

Interpretation for the negative binomial regression model is identical to interpretation for the Poisson regression model
- Interpret coefficients the same
  - Predicted count metric or log of predicted count metric
Regression coefficients for negative binomial will be different
- Now that the model isn’t constrained to “mean = variance” anymore, the mean may shift
  - Regression coefficients change
  - Standard errors will also change

4.3.13 Comparison

Poisson regression:

	Estimate	Std. Error	z value	Pr(>\|z\|)
(Intercept)	0.466	0.093	5.000	0
x	0.925	0.061	15.063	0

Negative binomial regression (\(\alpha\) = 0.49):

	Estimate	Std. Error	z value	Pr(>\|z\|)
(Intercept)	0.501	0.122	4.123	0
x	0.880	0.101	8.719	0

4.3.14 Figure: comparison (blue = Poisson, red = NB)

5 Miscellaneous

5.1 Choosing a model

5.1.1 Which model to pick?

Poisson regression
- You will likely never satisfy the equidispersion assumption
Overdispersed Poisson regression
- Preferred to Poisson regression
- Appropriate standard errors and relatively simple model
Negative binomial regression
- More complex, but better able to reproduce the observed counts
- This may be preferred if you’re interested in prediction

5.1.2 A word of caution…

Berk & MacDonald (2008)

If apparent overdispersion results from specification errors in the systematic part of the Poisson regression model, resorting to the negative binomial distribution does not help. It can make things worse by giving a false sense of security when the fundamental errors in the model remain.

In other words, negative binomial regression doesn’t fix a bad model

5.1.3 Variable lengths of time

Poisson regression and variants assume a fixed length of time
- number of aggressive acts committed by a child in 1 hour
- number of cigarettes smoked per day
Often, we measure a count over some variable period of time

5.1.4 Variable lengths of time

Can extend Poisson type models to variable time periods
- Include natural log of the measurement interval as a predictor with regression coefficient equal to 1:
  - \(ln(\hat{\mu}) = ln(time) + b_0 + b_1X_1 + b_2X_2 + \cdots + b_pX_p\)
- In software: include time (specifically, ln(time)) as an “offset”

5.2 Too many or too few zeroes

5.2.1 Zeroes can be really important

Conceptually, zeroes are meaningful
- Lowest possible value of a count
- Indicate “nothing”
- Two situations:
  - Too few zeroes (relative to a Poisson distribution)
  - Too many zeroes (relative to a Poisson distribution)

5.2.2 Too few zeroes

Sometimes you have a study where the outcome is a count, but it cannot take on a value of 0
- Study of medical visits
  - Have to visit the doctor to get involved in the study
- Study of substance use
  - Only interested in substance users
Truncated Poisson regression model uses a truncated Poisson distribution that removes the probability of zeroes

5.2.3 Excess zeroes

Counts often display “excess” zeroes
- More values of 0 than expected for a Poisson distribution
Even if the rest of the distribution is approximately Poisson, the ”excess” zeroes lead to overdispersion
- Sometimes, what looks like overdispersion is really excess zeroes
There are specific Poisson family models to appropriately deal with excess zeroes, depending on why the zeroes are there

5.2.4 Why are there all these zeroes?

This is a substantive question
- You need to know about the outcome you’re studying
Question: Do the people who are responding zero have some probability of responding otherwise?

5.2.5 Do they have some probability of responding other than 0?

Yes \(\rightarrow\) zero-inflated Poisson regression
- Cigarettes smoked for smoker who hasn’t smoked yet today
- Alcohol consumed for someone who hasn’t had any yet today
No \(\rightarrow\) hurdle regression or with-zeroes regression
- Cigarettes smoked: nonsmoker always responds 0
- Alcoholic drinks consumed: abstainer always responds 0
- (These are called “structural zeroes”)

5.2.6 Zero-inflated Poisson regression

There are zeroes have some probability to be non-zero
Two parts modeled simultaneously:
- Logistic regression: is each zero structural or not?
- Poisson regression: non-structural zeroes and positive values
Can use same set of predictors in both parts, but do not have to
Can also model non-structural zeroes and positive values using overdispersed Poisson regression or negative binomial regression

5.2.7 Hurdle regression or with-zeroes regression

There are zeroes that have no probability of being non-zero
- Two different populations: smokers and nonsmokers, drinkers and nondrinkers
Two parts modeled simultaneously:
- Logistic regression: whether zero or not
- Truncated Poisson regression: positive values only
Can use same set of predictors in both parts, but do not have to
Can also model positive values using truncated overdispersed Poisson regression or truncated negative binomial regression

5.3 Evaluation and comparison

5.3.1 Pseudo-\(R^2\) values

\[R^2 = 1 - \frac{deviance_{model}}{deviance_{interceptmodel}}\]

Theoretically bounded by 0 and 1
Recommended for count models
Also recommended: correlation between \(Y\) and \(\hat{Y}\)

NOTE: for count models, use deviance not -2LL

5.3.2 Likelihood ratio tests

\[\chi^2 = deviance_{model1} - deviance_{model2}\]

Model 1: simpler model (fewer predictors, worse fit)
Model 2: more complex model (more predictors, better fit)
Degrees of freedom = difference in number of parameters
- Significant test: Model 1 is significantly worse than Model 2
- NS test: Model 1 and 2 are not significantly different, so go with simpler one (Model 1)

NOTE: for count models, use deviance not -2LL

5.4 Nested models

5.4.1 Nested models

Two models are nested if
- All terms in one are included in the other
- You can get from one to the other by fixing some paths to a value
Why do we care?
- Likelihood ratio tests can only be used to compare nested models
  - If your models are not nested, you cannot use the likelihood ratio tests to compare them
- Instead, use AIC or BIC (smaller is better)

5.4.2 Nested models

Linear regression with predictor \(X_1\)
Linear regression with predictors \(X_1\) and \(X_2\)

Model 1 is nested within model 2
- Fix the regression path for \(X_2\) to 0 and Model 2 becomes Model 1

5.4.3 Nested models

Poisson regression with predictors \(X_3\) and \(X_4\)
Overdispersed Poisson regression w/ predictors \(X_3\) and \(X_4\)

Model 3 is nested within model 4
Note that this is a test of the dispersion parameter, \(\psi\)

5.4.4 Complicated nesting situations 1

Overdispersed Poisson regression w/ predictor \(X_5\)
Overdispersed Poisson regression w/ predictors \(X_5\) and \(X_6\)

Model 5 is NOT nested within model 6
In addition to the extra predictor (\(X_6\)), the overdispersion parameter is not the same in these models
- Model 5 has a \(\psi\) parameter based on only \(X_5\)
- Model 6 has a different \(\psi\) parameter based on both \(X_5\) and \(X_6\)
- You can’t get from Model 6 to Model 5 by fixing a parameter to 0

5.4.5 Complicated nesting situations 2

Overdispersed Poisson regression w/ predictors \(X_7\) and \(X_8\)
Negative binomial regression w/ predictors \(X_7\) and \(X_8\)

Model 7 is NOT nested within model 8
The overdispersion parameters are completely different for the two models
- Model 7 has a \(\psi\) parameter and variance = \(\psi \mu\)
- Model 8 has a \(\alpha\) parameter and estimates variance = \(\mu + \alpha \mu^2\)

5.5 Conclusion

5.5.1 Summary

When your outcome is a count, use a model from the Poisson regression family
- Nonlinear, exponential model or log-linear model
- Overdispersion: overdispersed Poisson or negative binomial
- Too many or too few zeroes

5.5.2 In class exercises

Use the same data as last time
- Original count outcome of number of drinks, not binge
Run a variety of count models
- Poisson, overdispersed Poisson, negative binomial