\[ln(\hat{\mu}) = b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_p X_p\]
\[f(x) = {\frac {1}{{\sqrt {2\pi \sigma^2}}}}e^{-{\frac {(x-\mu)^2 }{2 \sigma^2 }}}\]
Mean of normal distribution = \(\mu\)
Variance of normal distribution = \(\sigma^2\)
\[P(X = k) = \frac{\lambda^k}{k!}e^{-\lambda}\]
glm()
function in stats package (always loaded)
data
: same as lm()
formula
: same as lm()
family
: Residual distribution and link function
Call:
glm(formula = alcohol ~ sensationC, family = poisson(link = "log"),
data = dat)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.05963 0.02967 35.715 < 2e-16 ***
sensationC 0.23148 0.03966 5.837 5.33e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1186.8 on 399 degrees of freedom
Residual deviance: 1151.7 on 398 degrees of freedom
AIC: 2079
Number of Fisher Scoring iterations: 5
Predicted count:
\[\hat{\mu} = e^{\color{OrangeRed}{b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_p X_p}}\]
Natural log of predicted count:
\[ln(\hat{\mu}) = \color{OrangeRed}{b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_p X_p}\]
\[\hat{\mu} = e^{b_0 + b_1 X}\]
sensation
are expected to drink 2.886 alcoholic drinkssensation
, the number of predicted alcoholic drinks is multiplied by 1.26sensation
sens_vals <- data.frame(sensationC = c(-2, -1, 0, 1))
pred_counts <- predict(m1, newdata = sens_vals, type = "response")
sensationC <- c(-2, -1, 0, 1)
pred_alc <- c(1.82, 2.29, 2.89, 3.64)
pred_counts <- data.frame(sensationC, pred_alc)
ggplot(dat,
aes(x=sensationC,
y=alcohol)) +
geom_point(color = "#3697aa",
alpha = 0.3,
size = 2) +
stat_smooth(method="glm",
method.args=list(family="poisson"),
se=FALSE,
color = "black") +
geom_point(data = pred_counts,
aes(x = sensationC, y = pred_alc),
size = 3,
color = "black") +
annotate("text", x = -2, y = 1.82,
label = "1.82", vjust = -1, size = 6) +
annotate("text", x = -1, y = 2.29,
label = "2.29", vjust = -1, size = 6) +
annotate("text", x = 0, y = 2.89,
label = "2.89", vjust = -1, size = 6) +
annotate("text", x = 1, y = 3.64,
label = "3.64", vjust = -1, size = 6)
(Intercept) sensationC
1.0596254 0.2314809
2.5 % 97.5 %
(Intercept) 1.0008860 1.1172000
sensationC 0.1541381 0.3096215
(Intercept) sensationC
2.885290 1.260465
2.5 % 97.5 %
(Intercept) 2.720691 3.056284
sensationC 1.166652 1.362909
Call:
glm(formula = alcohol ~ sensationC, family = poisson(link = "log"),
data = dat)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.05963 0.02967 35.715 < 2e-16 ***
sensationC 0.23148 0.03966 5.837 5.33e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1186.8 on 399 degrees of freedom
Residual deviance: 1151.7 on 398 degrees of freedom
AIC: 2079
Number of Fisher Scoring iterations: 5
Call:
glm(formula = alcohol ~ sensationC + sex, family = poisson(link = "log"),
data = dat)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.56334 0.05248 10.734 < 2e-16 ***
sensationC 0.26085 0.03882 6.719 1.83e-11 ***
sex 0.83947 0.06292 13.342 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1186.76 on 399 degrees of freedom
Residual deviance: 959.46 on 397 degrees of freedom
AIC: 1888.8
Number of Fisher Scoring iterations: 5
Call:
glm(formula = alcohol ~ sensationC, family = quasipoisson, data = dat)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.05963 0.05006 21.166 < 2e-16 ***
sensationC 0.23148 0.06692 3.459 0.000601 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 2.847168)
Null deviance: 1186.8 on 399 degrees of freedom
Residual deviance: 1151.7 on 398 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
Call:
glm(formula = alcohol ~ sensationC, family = poisson(link = "log"),
data = dat)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.05963 0.02967 35.715 < 2e-16 ***
sensationC 0.23148 0.03966 5.837 5.33e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1186.8 on 399 degrees of freedom
Residual deviance: 1151.7 on 398 degrees of freedom
AIC: 2079
Number of Fisher Scoring iterations: 5
Call:
glm(formula = alcohol ~ sensationC, family = quasipoisson, data = dat)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.05963 0.05006 21.166 < 2e-16 ***
sensationC 0.23148 0.06692 3.459 0.000601 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 2.847168)
Null deviance: 1186.8 on 399 degrees of freedom
Residual deviance: 1151.7 on 398 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
Call:
glm.nb(formula = alcohol ~ sensationC, data = dat, init.theta = 1.392859438,
link = log)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.06039 0.05170 20.509 < 2e-16 ***
sensationC 0.22050 0.06822 3.232 0.00123 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(1.3929) family taken to be 1)
Null deviance: 463.49 on 399 degrees of freedom
Residual deviance: 452.56 on 398 degrees of freedom
AIC: 1772.1
Number of Fisher Scoring iterations: 1
Theta: 1.393
Std. Err.: 0.163
2 x log-likelihood: -1766.091
Call:
glm(formula = alcohol ~ sensationC, family = poisson(link = "log"),
data = dat)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.05963 0.02967 35.715 < 2e-16 ***
sensationC 0.23148 0.03966 5.837 5.33e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1186.8 on 399 degrees of freedom
Residual deviance: 1151.7 on 398 degrees of freedom
AIC: 2079
Number of Fisher Scoring iterations: 5
Call:
glm.nb(formula = alcohol ~ sensationC, data = dat, init.theta = 1.392859438,
link = log)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.06039 0.05170 20.509 < 2e-16 ***
sensationC 0.22050 0.06822 3.232 0.00123 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(1.3929) family taken to be 1)
Null deviance: 463.49 on 399 degrees of freedom
Residual deviance: 452.56 on 398 degrees of freedom
AIC: 1772.1
Number of Fisher Scoring iterations: 1
Theta: 1.393
Std. Err.: 0.163
2 x log-likelihood: -1766.091
term estimate std.error statistic p.value
1 (Intercept) 1.0596254 0.02966915 35.714723 2.335830e-279
2 sensationC 0.2314809 0.03966037 5.836581 5.328294e-09
term estimate std.error statistic p.value
1 (Intercept) 1.0596254 0.05006239 21.166096 3.702312e-67
2 sensationC 0.2314809 0.06692113 3.459011 6.007424e-04
term estimate std.error statistic p.value
1 (Intercept) 1.0603897 0.05170433 20.508721 1.799643e-93
2 sensationC 0.2204963 0.06822183 3.232049 1.229061e-03
nb_b0 <- summary(m3)$coefficients[1]
nb_b1 <- summary(m3)$coefficients[2]
ggplot(data = dat, aes(x = sensationC, y = alcohol)) +
geom_point(size = 3,
alpha = 0.4,
color = "#3697aa") +
labs(x = "Sensation seeking (centered)", y = "Number of alcoholic drinks") +
geom_smooth(method = "glm",
method.args = list(family = 'poisson'),
se = FALSE,
color = "black",
fullrange = TRUE) +
geom_function(fun = function(x) exp(nb_b0 + nb_b1 * x),
color = "#dc1e34",
size = 1)
library(pscl) # predprob() function for predicted counts
# zeroinfl() function for zero-inflated count models
# hurdle() for hurdle count models
# vuong() for non-nested model comparisons
phat.poi <- predprob(m1)
phat.poi.mn <- apply(phat.poi, 2, mean)
#phat.poi.mn
phat.nb <- predprob(m3)
phat.nb.mn <- apply(phat.nb, 2, mean)
#phat.nb.mn
#phat.zinb <- predprob(zinb)
#phat.zinb.mn <- apply(phat.zinb, 2, mean)
#phat.zinb.mn
x <- seq(0, 17, 1)
probs <- data.frame(x, phat.poi.mn, phat.nb.mn)#, phat.zinb.mn)
#probs
ggplot(data = dat, aes(x = alcohol)) +
geom_bar(fill = "#3697aa",
color = "#76777a",
size = 1) +
geom_line(data = probs,
aes(x = x, y = 400*phat.poi.mn),
linetype = "solid",
size = 1,
color = "black") +
labs(x = "Number of alcoholic drinks", y = "Frequency") +
ylim(0, 90)
ggplot(data = dat, aes(x = alcohol)) +
geom_bar(fill = "#3697aa",
color = "#76777a",
size = 1) +
geom_line(data = probs,
aes(x = x, y = 400*phat.poi.mn),
linetype = "solid",
size = 1,
color = "black") +
geom_line(data = probs,
aes(x = x, y = 400*phat.nb.mn),
size = 1,
#linetype = "dashed",
color = "#dc1e34") +
labs(x = "Number of alcoholic drinks", y = "Frequency") +
ylim(0, 90)
Berk & MacDonald (2008)
If apparent overdispersion results from specification errors in the systematic part of the Poisson regression model, resorting to the negative binomial distribution does not help. It can make things worse by giving a false sense of security when the fundamental errors in the model remain.
Conceptually, zeroes are meaningful
Lowest possible value of a count
Indicate “nothing”
Two situations:
\[ln(\hat{\mu}) = ln(time) + b_0 + b_1X_1 + b_2X_2 + \cdots + b_pX_p\]
\[R^2_{deviance} = 1 - \frac{deviance_{model}}{deviance_{intercept\ only}}\]
m1
(only sensationC
) to m1add
(sensationC
and sex
)Model 5 is NOT nested within model 6
In addition to the extra predictor (\(X_6\)), the overdispersion parameter is not the same in these models
Model 7 is NOT nested within model 8 (or vice versa)
The overdispersion parameters are different for the two models
vuong()
function in pscl packageVuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
-------------------------------------------------------------
Vuong z-statistic H_A p-value
Raw -6.911618 model2 > model1 2.3958e-12
AIC-corrected -6.911618 model2 > model1 2.3958e-12
BIC-corrected -6.911618 model2 > model1 2.3958e-12