X | Predicted count |
---|---|
-2 | 0.251 |
-1 | 0.632 |
0 | 1.594 |
1 | 4.018 |
2 | 10.130 |
My outcome variable isn’t normally distributed
It’s a count of how often something happened!!!
Probably right skewed (among other things)
May also have a lot of zeroes
Use Poisson regression (or a related model) to analyze the outcome
It’s an extension of linear regression
Many of the same concepts still apply
Easy to miss that linear regression assumptions are violated
Count variables are
Discrete: only take on whole number (integer) values
Lower bound of 0
Typically right skewed
Count outcomes violate assumptions of GLM (regression and ANOVA)
GLM also results in out-of-bound (i.e., less than 0) predicted values
Outcome: count in a fixed period of time
Residuals: Poisson distribution
Link function: natural log (\(ln\))
Named after Simeon Denis Poisson (1781 - 1840)
I had to delete the pictures because they were too large
\[P(X = k) = \frac{\lambda^k}{k!}e^{-\lambda}\]
Poisson distribution properties:
Outcome: count in a fixed period of time
Residuals: Poisson distribution
Link function: natural log (\(ln\))
\[\hat{Y} = \hat{\mu} = e^{b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_p X_p}\]
\[ln(\hat{Y}) = ln(\hat{\mu}) = b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_p X_p\]
Intercept:
Slope:
\(e^{b_1}\) is the multiplicative effect of \(X\) on the predicted count
\(e^{b_0} = e^{0.4659581} = 1.594\)
\(e^{b_1} = e^{0.9247665} = 2.521\)
X | Predicted count |
---|---|
-2 | 0.251 |
-1 | 0.632 |
0 | 1.594 |
1 | 4.018 |
2 | 10.130 |
Intercept:
Slope:
\(b_0 = 0.466\)
\(b_1 = 0.9248\)
X | Predicted count | ln(Predicted count) |
---|---|---|
-2 | 0.251 | -1.384 |
-1 | 0.632 | -0.459 |
0 | 1.594 | 0.466 |
1 | 4.018 | 1.391 |
2 | 10.130 | 2.315 |
b | Lower CL | Upper CL | |
---|---|---|---|
(Intercept) | 0.466 | 0.277 | 0.643 |
x | 0.925 | 0.806 | 1.047 |
exp.b. | Lower.CL | Upper.CL | |
---|---|---|---|
(Intercept) | 1.594 | 1.320 | 1.902 |
x | 2.521 | 2.239 | 2.848 |
Remember the Poisson distribution:
In Poisson regression, equidispersion means that the (conditional) mean and (conditional) variance of \(Y\) are equal
Why does overdispersion happen?
Why do we care about overdispersion?
There are two main alternative models for overdispersion
Both handle overdispersion, but in slightly different ways
The simplest way to deal with overdispersion is to include a “fudge factor” to account for overdispersion
The fudge factor is provided by the scale parameter (\(\psi\))
Each standard error in the OD Poisson model is multiplied by \(\sqrt{\psi}\)
If \(\psi\) > 1, this will make the standard errors larger
The scale parameter can be estimated in two similar ways
Pearson \(\chi^2\): \(\psi = \frac{\chi^2_{Pearson}}{df}\)
Deviance: \(\psi = \frac{Deviance}{df}\)
The results are typically very close, but Pearson is preferred
Interpretation for the overdispersed Poisson regression model is identical to interpretation for the Poisson regression model
Regression coefficients same as coefficients from Poisson regression
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | 0.466 | 0.093 | 5.000 | 0 |
x | 0.925 | 0.061 | 15.063 | 0 |
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 0.466 | 0.146 | 3.196 | 0.002 |
x | 0.925 | 0.096 | 9.628 | 0.000 |
\[P(X = k) = {(k - r - 1) \choose k} (1 - p)^r p^k\]
Poisson regression assumes that every person with the same predictor value(s) comes from the same Poisson distribution
But what if there’s actually more variability among subjects than captured by the predictors? (i.e., overdispersion)
Negative binomial regression allows people with the same predictor value(s) to come from different Poisson distributions
The means of the Poisson distributions follow a gamma distribution
Negative binomial regression estimates a different scale parameter
Conditional residual distribution based on negative binomial
Standard errors don’t need to be adjusted
Interpretation for the negative binomial regression model is identical to interpretation for the Poisson regression model
Regression coefficients for negative binomial will be different
Now that the model isn’t constrained to “mean = variance” anymore, the mean may shift
Poisson regression:
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | 0.466 | 0.093 | 5.000 | 0 |
x | 0.925 | 0.061 | 15.063 | 0 |
Negative binomial regression (\(\alpha\) = 0.49):
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | 0.501 | 0.122 | 4.123 | 0 |
x | 0.880 | 0.101 | 8.719 | 0 |
Poisson regression
Overdispersed Poisson regression
Negative binomial regression
Berk & MacDonald (2008)
If apparent overdispersion results from specification errors in the systematic part of the Poisson regression model, resorting to the negative binomial distribution does not help. It can make things worse by giving a false sense of security when the fundamental errors in the model remain.
Poisson regression and variants assume a fixed length of time
Often, we measure a count over some variable period of time
Include natural log of the measurement interval as a predictor with regression coefficient equal to 1:
In software: include time (specifically, ln(time)) as an “offset”
Conceptually, zeroes are meaningful
Sometimes you have a study where the outcome is a count, but it cannot take on a value of 0
Study of medical visits
Study of substance use
Truncated Poisson regression model uses a truncated Poisson distribution that removes the probability of zeroes
Counts often display “excess” zeroes
Even if the rest of the distribution is approximately Poisson, the ”excess” zeroes lead to overdispersion
There are specific Poisson family models to appropriately deal with excess zeroes, depending on why the zeroes are there
This is a substantive question
Question: Do the people who are responding zero have some probability of responding otherwise?
Yes \(\rightarrow\) zero-inflated Poisson regression
No \(\rightarrow\) hurdle regression or with-zeroes regression
There are zeroes have some probability to be non-zero
Two parts modeled simultaneously:
Can use same set of predictors in both parts, but do not have to
Can also model non-structural zeroes and positive values using overdispersed Poisson regression or negative binomial regression
There are zeroes that have no probability of being non-zero
Two parts modeled simultaneously:
Can use same set of predictors in both parts, but do not have to
Can also model positive values using truncated overdispersed Poisson regression or truncated negative binomial regression
\[R^2 = 1 - \frac{deviance_{model}}{deviance_{interceptmodel}}\]
Theoretically bounded by 0 and 1
Recommended for count models
Also recommended: correlation between \(Y\) and \(\hat{Y}\)
NOTE: for count models, use deviance not -2LL
\[\chi^2 = deviance_{model1} - deviance_{model2}\]
Model 1: simpler model (fewer predictors, worse fit)
Model 2: more complex model (more predictors, better fit)
Degrees of freedom = difference in number of parameters
NOTE: for count models, use deviance not -2LL
Two models are nested if
Why do we care?
Model 5 is NOT nested within model 6
In addition to the extra predictor (\(X_6\)), the overdispersion parameter is not the same in these models
Model 7 is NOT nested within model 8
The overdispersion parameters are completely different for the two models
When your outcome is a count, use a model from the Poisson regression family
binge