My outcome variable isn’t normally distributed
It’s binary!!!
Two mutually exclusive categories
Linear regression assumptions are violated
Use logistic regression to analyze the outcome
General linear model (GLM, linear regression, ANOVA) makes three assumptions about the residuals (\(e_i = Y_i - \hat{Y}_i\)) of the model
Ignore the problem
Transform the outcome
The generalized linear model (GLiM)
\[ln\left(\frac{\hat{Y}}{1-\hat{Y}}\right) = ln\left(\frac{\hat{p}}{1-\hat{p}}\right) = b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_p X_p\]
\[f(x) = {\frac {1}{{\sqrt {2\pi \sigma^2}}}}e^{-{\frac {(x-\mu)^2 }{2 \sigma^2 }}}\]
Mean of normal distribution = \(\mu\)
Variance of normal distribution = \(\sigma^2\)
\[P(X = k) = {n \choose k} p^k (1-p)^{n-k}\]
What is the probability of having \(k\) events in \(n\) trials, each of which has probability \(p\) of being an “event”?
\[P(X = k) = {n \choose k} p^k (1-p)^{n-k}\]
Mean of a binomial distribution: \(np\)
Variance of a binomial distribution: \(np(1-p)\)
Mean and variance are related to one another
Heteroscedasticity is built into logistic regression
Linear regression: Model the mean of the outcome (conditional on predictors(s))
Logistic regression: Model the probability of a “success” or “event” (conditional on predictor(s))
Probability:
\[\hat{p} = \frac{e^{(\color{OrangeRed}{b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_p X_p})}}{1+e^{(\color{OrangeRed}{b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_p X_p})}}\]
Odds:
\[\hat{odds} = \frac{\hat{p}}{1-\hat{p}} = e^{\color{OrangeRed}{b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_p X_p}}\]
Logit:
\[ln\left(\frac{\hat{p}}{1-\hat{p}}\right) = \color{OrangeRed}{b_0 + b_1 X_1 + b_2 X_2 + \cdots + b_p X_p}\]
\[\hat{p} = \frac{e^{0.251 + 1.219 X}}{1 + e^{0.251 + 1.219 X}}\]
General interpretation of intercept:
\(b_0\) is related to the probability of success when X = 0
\[\hat{p} = \frac{e^{0.251 + 1.219 X}}{1 + e^{0.251 + 1.219 X}}\]
General interpretation of slope:
\(b_1\) tells you how predictor X relates to probability of success
\[\hat{p} = \frac{e^{\color{OrangeRed}{0.251} + 1.219 X}}{1 + e^{\color{OrangeRed}{0.251} + 1.219 X}}\]
Interpretation of example intercept:
\(\frac{e^{\color{OrangeRed}{b_0}}}{1 + e^{\color{OrangeRed}{b_0}}} = \frac{e^{\color{OrangeRed}{0.251}}}{1 + e^{\color{OrangeRed}{0.251}}} =0.562\)
\[\hat{p} = \frac{e^{0.251 + \color{OrangeRed}{1.219} X}}{1 + e^{0.251 + \color{OrangeRed}{1.219} X}}\]
Interpretation of example slope:
Linear regression:
Logistic regression (probability):
When \(\color{blue}{X = 1.5}\):
\[\hat{P}(success) = \hat{p} = \frac{e^{b_0 + b_1 \color{blue}{X}}}{1+e^{b_0 + b_1 \color{blue}{X}}} = \frac{e^{0.251 + 1.219 \times \color{blue}{1.5}}}{1 + e^{0.251 + 1.219 \times \color{blue}{1.5}}} = 0.889\]
Approximate slope at that point is
\[\hat{p} (1-\hat{p}) \color{OrangeRed}{b_1} = 0.889 \times (1 - 0.889) \times \color{OrangeRed}{1.219} = 0.12\]
X value | Predicted probability | Slope |
---|---|---|
-3 | 0.03 | 0.04 |
-2 | 0.10 | 0.11 |
-1 | 0.28 | 0.24 |
0 | 0.56 | 0.30 |
1 | 0.81 | 0.19 |
2 | 0.94 | 0.07 |
3 | 0.98 | 0.02 |
Odds is the ratio of two probabilities
\[odds = \frac{\hat{p}}{(1 - \hat{p})}\]
As probability of “success” increases (nonlinearly), the odds of “success” increases (also nonlinearly, but in a different way)
Probability ranges from 0 to 1, switches at 0.5
Odds range from \(0\) to \(+\infty\), switches at 1
\[\hat{odds} = \frac{\hat{p}}{(1 - \hat{p})} = e^{0.251 + 1.219 X}\]
General interpretation of intercept:
\(b_0\) is related to the odds of success when \(X\) = 0
\[\hat{odds} = \frac{\hat{p}}{(1 - \hat{p})} = e^{0.251 + 1.219 X}\]
General interpretation of slope:
\(b_1\) = relationship between predictor \(X\) and the odds of success
\[\hat{odds} = \frac{\hat{p}}{(1 - \hat{p})} = e^{\color{OrangeRed}{0.251} + 1.219 X}\]
Interpretation of example intercept:
\[\hat{odds} = \frac{\hat{p}}{(1 - \hat{p})} = e^{0.251 + \color{OrangeRed}{1.219} X}\]
Interpretation of example slope:
\(b_1\) > 0: Odds of a success increases as \(X\) increases
This non-linear change is presented in terms of odds ratio
Example: odds ratio \(= e^{b_1}= e^{1.219} = 3.38\)
Odds ratio \(= e^{b_1}= e^{1.219} = 3.38\)
Odds ratio for \(X\) = 1 versus \(X\) = 0 : \(\frac{odds(X = 1)}{odds(X = 0)} = \frac{4.3492351}{1.2853101} = 3.38\)
Odds ratio for \(X\) = 2 versus \(X\) = 1 : \(\frac{odds(X = 2)}{odds(X = 1)} = \frac{14.7169516}{4.3492351} = 3.38\)
In fact, ANY 1 unit difference in \(X\)
Constant multiplicative change
X value | Predicted probability | Predicted odds |
---|---|---|
-3 | 0.03 | 0.03 |
-2 | 0.10 | 0.11 |
-1 | 0.28 | 0.38 |
0 | 0.56 | 1.29 |
1 | 0.81 | 4.35 |
2 | 0.94 | 14.72 |
3 | 0.98 | 49.80 |
Logit or log-odds is the natural log (\(ln\)) of the odds
As probability of “success” increases (nonlinearly, S-shaped curve)
Probability ranges from 0 to 1, switches at 0.5
Odds range from 0 to \(+\infty\) , switches at 1
Logit ranges from \(-\infty\) to \(+\infty\), switches at 0
\[\hat{logit} = ln\left(\frac{\hat{p}}{(1 - \hat{p})}\right) = 0.251 + 1.219 X\]
General interpretation of intercept:
\(b_0\) is related to the logit of success when X = 0
\[\hat{logit} = ln\left(\frac{\hat{p}}{(1 - \hat{p})}\right) = 0.251 + 1.219 X\]
General interpretation of slope:
\(b_1\) is the relationship between predictor X and logit of success
\[\hat{logit} = ln\left(\frac{\hat{p}}{(1 - \hat{p})}\right) = \color{OrangeRed}{0.251} + 1.219 X\]
Interpretation of example intercept
\[\hat{logit} = ln\left(\frac{\hat{p}}{(1 - \hat{p})}\right) = 0.251 + 1.219 X\]
Interpretation of example slope
They are equivalent, so use the metric that
Odds ratios tell you about change, but not where you start
Logit is nice because it’s linear, but it’s not very interpretable
Default results are in logit metric: compare to null value of 0
term | estimate |
---|---|
(Intercept) | 0.251 |
x | 1.219 |
Confidence intervals are in logit metric: does it contain 0?
2.5 % | 97.5 % | |
---|---|---|
(Intercept) | -0.188 | 0.703 |
x | 0.661 | 1.876 |
\(e^{estimate}\) converts to odds ratio metric: compare to null value of 1
term | estimate | OR |
---|---|---|
(Intercept) | 0.251 | 1.285 |
x | 1.219 | 3.383 |
\(e^{estimate}\) converts to odds ratio metric: does it contain 1?
2.5 % | 97.5 % | OR 2.5 % | OR 97.5 % | |
---|---|---|---|---|
(Intercept) | -0.188 | 0.703 | 0.829 | 2.019 |
x | 0.661 | 1.876 | 1.938 | 6.528 |
Usually two things you want to do with it
Compute some measure of predictive power or model fit
Compare that model to another competing model
Linear regression is estimated using ordinary least squares (OLS)
GLiMs (like logistic regression) are estimated using maximum likelihood
Conceptually similar to \(SS_{residual}\)
If you had \(n\) predictors
Deviance is how far from this “perfect” model you are
\(R^2\) for linear regression has many desirable qualities
Without \(SS_{residual}\), what can we do?
\[R^2_{deviance} = 1 - \frac{deviance_{model}}{deviance_{intercept.only.model}}\]
Compare your model to a model with no predictor (only intercept)
\[R^2_{McFadden} = 1 - \frac{LL_{model}}{LL_{intercept.only.model}}\]
Same idea as \(R^2_{deviance}\), just using LL instead of deviance
In linear regression, \(R^2_{multiple}\) is also the squared correlation between the observed \(Y\) values and the predicted \(Y\) values
Most software packages can produce predicted \(Y\) values for your analysis
In linear regression, if you added a predictor, there were two ways to tell if that predictor was adding to the model:
For logistic regression, Wald test of the regression coefficient may not be reliable (see Vaeth, 1985)
Ratio of likelihoods
Test statistic
\[\chi^2 = deviance_{model1} - deviance_{model2}\]
Model 1: simpler model (fewer predictors, worse fit)
Model 2: more complex model (more predictors, better fit)
Degrees of freedom = difference in number of parameters
Logistic regression example: Deviance \(= 116.146\)
Logistic regression model with no predictors (intercept only): Deviance \(= 137.989\)
\(\chi^2(1) = 137.989 - 116.146 = 21.843\)
Use logistic regression when your outcome is binary
Be careful with interpretation no matter what
But many basic concepts parallel linear regression
We will