Introduction to Biostatistics

1 Learning objectives

1.1 Learning objectives

Describe maximum likelihood estimation for linear regression
Describe hypothesis testing for linear regression coefficients, including the sampling distribution used and degrees of freedom

2 Estimation

2.1 What is estimation?

Method by which we get estimates of regression coefficients
What are the \(b_0\) and \(b_1\) for the regression equation?

Code

height <- 3*rnorm(50) + 70
weight <- 7 * (height - 70) + 25 * rnorm(50) + 125
dat <- data.frame(height, weight)
ggplot(data = dat, aes(x = height, y = weight)) + 
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

2.2 Ordinary least squares (OLS) estimation

Minimize the sum of squared residuals
Find \(b_0\) and \(b_1\) that give the smallest \(\Sigma((Y_i - \hat{Y}_1)^2)\)
For linear regression, there is one value of \(b_0\) and one value of \(b_1\) that minimize the sum of squared residuals
- Not true for all estimation methods

2.3 Minimizing

Functions with squared in them look like a “U”
Minimum is the bottom of the “U”
Tangent line is a line that touches a curve at a single point
- The slope of the tangent line is 0 at the minimum

\(Y = X^2\)

Code

x <- c(-10:10)
y <- x*x
dat1 <- data.frame(x, y)
ggplot(data = dat1, aes(x = x, y = y)) + 
  geom_line(linewidth = 1)

2.4 Tangent lines

Code

b1 <- c(0:20)
ssr <- b1*b1 - 20*b1 + 115
dat_ssr <- data.frame(b1, ssr)
ggplot(data = dat_ssr, aes(x = b1, y = ssr)) +
  geom_line(linewidth = 1) +
  ylim(0, 120) +
  labs(x = "Regression coefficient (b)", y = "Sum of squared residuals")

2.5 Tangent lines (at b = 15)

Code

ggplot(data = dat_ssr, aes(x = b1, y = ssr)) +
  geom_line(linewidth = 1) +
  ylim(0, 120) +
  labs(x = "Regression coefficient (b)", y = "Sum of squared residuals") +
  geom_abline(intercept = -110, slope = 10, linewidth = 1, linetype = "dashed", color = "red") +
  annotate("pointrange", x = 15, y = 15*15 - 20*15 + 115, ymin = 15*15 - 20*15 + 115, ymax = 15*15 - 20*15 + 115) +
  geom_segment(x = 15, xend = 15, y = -1, yend = 15*15 - 20*15 + 115, linetype = "dotted", linewidth = 1)

2.6 Tangent lines (at b = 12)

Code

ggplot(data = dat_ssr, aes(x = b1, y = ssr)) +
  geom_line(linewidth = 1) +
  ylim(0, 120) +
  labs(x = "Regression coefficient (b)", y = "Sum of squared residuals") +
  geom_abline(intercept = -29, slope = 4, linewidth = 1, linetype = "dashed", color = "red") +
  annotate("pointrange", x = 12, y = 12*12 - 20*12 + 115, ymin = 12*12 - 20*12 + 115, ymax = 12*12 - 20*12 + 115) +
  geom_segment(x = 12, xend = 12, y = -1, yend = 12*12 - 20*12 + 115, linetype = "dotted", linewidth = 1)

2.7 Tangent lines (at b = 10)

Code

ggplot(data = dat_ssr, aes(x = b1, y = ssr)) +
  geom_line(linewidth = 1) +
  ylim(0, 120) +
  labs(x = "Regression coefficient (b)", y = "Sum of squared residuals") +
  geom_abline(intercept = 15, slope = 0, linewidth = 1, linetype = "dashed", color = "red") +
  annotate("pointrange", x = 10, y = 10*10 - 20*10 + 115, ymin = 10*10 - 20*10 + 115, ymax = 10*10 - 20*10 + 115) +
  geom_segment(x = 10, xend = 10, y = -1, yend = 10*10 - 20*10 + 115, linetype = "dotted", linewidth = 1)

2.8 Tangents and minimums

The tangent line is horizontal (slope = 0) at the minimum
Want to find the minimum of the sum of squared residuals
Value of regression coefficient that meets the least squares criterion
Use calculus: Derivative of a function produces the tangent line

2.9 Least squares solution

State function to minimize: \(\Sigma((Y_i - \hat{Y}_i)^2)\)
Take derivative of the function, with respect to the constants of interest (i.e., \(b_0\) and \(b_1\))
Set the derivative equal to 0: Normal equation
Solve for the constants

2.10 1. Function to minimize

\[\Sigma(Y_i - \hat{Y}_i)^2 = \]

\[\Sigma(Y - (b_1 X + b_0))^2 =\] \[\Sigma(Y - b_1 X - b_0)^2 =\] \[\Sigma(Y^2 + {b^2_0} + {b^2_1} X^2 - 2 b_0 Y - 2 b_1 X Y + 2 b_0 b_1 X ) =\] \[\Sigma Y^2 + \Sigma{b^2_0} + \Sigma{b^2_1} X^2 - \Sigma 2 b_0 Y - \Sigma 2 b_1 X Y + \Sigma 2 b_0 b_1 X =\] \[\Sigma Y^2 + n{b^2_0} + {b^2_1} \Sigma X^2 - 2 b_0 \Sigma Y - 2 b_1 \Sigma X Y + 2 b_0 b_1 \Sigma X\]

2.11 2. Get the derivative

With respect to \(b_1\):

\[\frac{\partial \Sigma (Y - \hat{Y})^2}{\partial b_1} = 2 b_1 \Sigma X^2 - 2 \Sigma X Y + 2 b_0 \Sigma X\]

With respect to \(b_0\):

\[\frac{\partial \Sigma (Y - \hat{Y})^2}{\partial b_0} = 2 n b_0 - 2 \Sigma Y + 2 b_1 \Sigma X\]

2.12 3 and 4. Set to 0 and solve

For \(b_1\):

\[2 b_1 \Sigma X^2 - 2 \Sigma X Y + 2 b_0 \Sigma X = 0 \] \[ \vdots \] \[b_1 = \frac{n \Sigma X Y - (\Sigma X) (\Sigma Y)}{n \Sigma X^2 - (\Sigma X)^2} = \frac{SP_{XY}}{SS_X} = \frac{s_{XY}}{{s^2_X}}\]

2.13 3 and 4. Set to 0 and solve

For \(b_0\):

\[2 n b_0 - 2 \Sigma Y + 2 b_1 \Sigma X = 0\] \[ \vdots \] \[b_0 = \overline{Y} - b_1 \overline{X}\]

2.14 How one variable varies

Variation = sum of squared deviations from the mean
- \(SS_X = \Sigma((X - \bar{X})^2)\)
Variance = variation divided by \(n - 1\)
- \(s^2_X = \frac{SS_X}{n-1}\)
Standard deviation = square root of variance
- \(s_X = \sqrt{s^2_X}\)

2.15 How two variables covary

Covariation = sum of product of deviations from the mean
- \(SP_{XY} = \Sigma((X - \bar{X})(Y - \bar{Y}))\)
Covariance = covariation divided by \(n - 1\)
- \(s_{XY} = \frac{SP_{XY}}{n-1}\)

2.16 Least squares estimates

\(b_1\) is the
- Covariation between X and Y, divided by the variation of X
- Covariance between X and Y, divided by the variance of X

\[b_1 = \frac{SP_{XY}}{SS_X} = \frac{s_{XY}}{{s^2_X}}\]

\[b_0 = \overline{Y} - b_1 \overline{X}\]

2.17 Aside: \(R^2_{multiple}\)

\(R^2\) = proportion of variation in \(Y\) that’s explained by the predictors
- Also: Maximum possible squared correlation between \(Y\) and \(\hat{Y}\)

\[R^2_{multiple} = r^2_{Y\hat{Y}} = \frac{SS_{regression}}{SS_Y} = \frac{predictable\ variation}{total\ variation}\]

where \(SS_{regression} = \Sigma(\hat{Y_i} - \overline{Y}_i)^2\)

2.18 How do we really estimate?

Maximum likelihood (ML)
- Maximizes a likelihood function
- Makes assumptions about the distribution of parameter

2.19 Likelihood

Conceptually similar to probability
- Higher values are more likely / probable
- Likelihood = relative probability
Make assumptions about the distribution of observations
- For example, that they’re normally distributed
- Use (normal) probability distributions to create a likelihood function

2.20 Recall: Normal distribution

\[f(X) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(X - \mu)^2}{2\sigma^2}}\]

\(\mu\) = mean = location of the distribution
\(\sigma^2\) = variance = spread of the distribution
- \(\sigma\) = standard deviation

2.21 Likelihood for one subject

Make assumptions about the population distribution
Assume outcome (BDI) has mean = 20, SD = 5
For a single subject with BDI = 21:
- \(L_i = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(X - \mu)^2}{2\sigma^2}} = \frac{1}{\sqrt{2\pi*5^2}}e^{-\frac{(21 - 20)^2}{2*5^2}} = 0.078\)
\(L_i\) = 0.078 is the likelihood or relative probability of X = 21 from a normally distributed population with mean = 20 and SD = 5

2.22 Subject likelihoods

Small example from Enders (2005)
Assume \(\mu\) = 20, \(\sigma\) = 5

Code

ID <- c(1:10)
BDI <- c(5, 33, 17, 21, 13, 17, 5, 13, 17, 13)
Likelihood <- dnorm(BDI, 20, 5)
LogLikelihood <- log(Likelihood)
LL_dat <- data.frame(ID, BDI, Likelihood, LogLikelihood)
round(LL_dat, 3)

   ID BDI Likelihood LogLikelihood
1   1   5      0.001        -7.028
2   2  33      0.003        -5.908
3   3  17      0.067        -2.708
4   4  21      0.078        -2.548
5   5  13      0.030        -3.508
6   6  17      0.067        -2.708
7   7   5      0.001        -7.028
8   8  13      0.030        -3.508
9   9  17      0.067        -2.708
10 10  13      0.030        -3.508

2.23 Normal distribution, mean = 20, SD = 5

Code

norm_plot <- ggplot(data = data.frame(x = c(0, 40)), aes(x)) +
  stat_function(fun = dnorm, n = 101, args = list(mean = 20, sd = 5)) + 
  ylab("Likelihood")
norm_plot + 
  annotate("pointrange", x = LL_dat[4,2], y = LL_dat[4,3], ymin = LL_dat[4,3], ymax = LL_dat[4,3], size = 1) +
  annotate("text", x = LL_dat[4,2] + 2, y = LL_dat[4,3], label = "0.078", hjust = 0, size = 8) +
  geom_segment(x = LL_dat[4,2], xend = LL_dat[4,2], y = 0, yend = LL_dat[4,3], linetype = "dashed") + 
  annotate("pointrange", x = LL_dat[2,2], y = LL_dat[2,3], ymin = LL_dat[2,3], ymax = LL_dat[2,3], size = 1) +
  annotate("text", x = LL_dat[2,2] - 2, y = LL_dat[2,3], label = "0.003", hjust = 1, size = 8) +
  geom_segment(x = LL_dat[2,2], xend = LL_dat[2,2], y = 0, yend = LL_dat[2,3], linetype = "dashed") + 
  annotate("pointrange", x = LL_dat[3,2], y = LL_dat[3,3], ymin = LL_dat[3,3], ymax = LL_dat[3,3], size = 1) +
  annotate("text", x = LL_dat[3,2] - 2, y = LL_dat[3,3], label = "0.067", hjust = 1, size = 8) +
  geom_segment(x = LL_dat[3,2], xend = LL_dat[3,2], y = 0, yend = LL_dat[3,3], linetype = "dashed") + 
  annotate("pointrange", x = LL_dat[1,2], y = LL_dat[1,3], ymin = LL_dat[1,3], ymax = LL_dat[1,3], size = 1) +
  annotate("text", x = LL_dat[1,2] + 2, y = LL_dat[1,3], label = "0.001", hjust = 0, size = 8) +
  geom_segment(x = LL_dat[1,2], xend = LL_dat[1,2], y = 0, yend = LL_dat[1,3], linetype = "dashed")

2.24 Likelihood for a sample

Each subject has a likelihood for their observation
- How to combine them?
Probability theory: Joint probability of independent events is the product of the individual probabilities
- Same works for likelihoods:

\[L = \prod_{i=1}^{n} L_i = L_1 \times L_2 \times L_3 \times \cdots \times L_n\]

2.25 What does likelihood mean?

Individual
- Relatively higher \(L_i\) means that that individual’s score is more likely than other scores (given the distributional assumptions)
Sample
- Relatively higher \(L\) means that the sample’s scores are more likely than other samples of scores (given the distributional assumptions)

2.26 Sample likelihood

Sample likelihood for this very small sample

\[L = 0.001 \times 0.003 \times 0.067 \times 0.078 \times 0.030 \times 0.067 \times 0.001\]

\[\times 0.030 \times 0.067 \times 0.030 = 0.000000000000000001327\]

Especially as \(n\) increases: Very very very small number, rounding errors
- Computers have problems with this

2.27 Log-likelihood

Logarithms: \(log(ab) = log(a) + log(b)\)

\[L = \prod_{i=1}^{n} L_i = L_1 \times L_2 \times L_3 \times \cdots \times L_n\] becomes

\[log(L) = \sum_{i=1}^{n} log(L_i) = log(L_1) + log(L_2) + log(L_3) + \cdots + log(L_n)\]

2.28 Log-likelihood

Code

round(LL_dat, 3)

   ID BDI Likelihood LogLikelihood
1   1   5      0.001        -7.028
2   2  33      0.003        -5.908
3   3  17      0.067        -2.708
4   4  21      0.078        -2.548
5   5  13      0.030        -3.508
6   6  17      0.067        -2.708
7   7   5      0.001        -7.028
8   8  13      0.030        -3.508
9   9  17      0.067        -2.708
10 10  13      0.030        -3.508

\(L = 0.000000000000000001327\)
\(log(L) = -7.028 + -5.908 + -2.708 + \cdots + -3.508 = -41.16\)

2.29 What does log-likelihood mean?

Log-likelihood is always negative
Individual
- Relatively higher \(log(L_i)\) means that that individual’s score is more likely than other scores (given the distributional assumptions)
Sample
- Relatively higher \(log(L)\) means that the sample’s scores are more likely than other samples of scores (given the distributional assumptions)

2.30 Log-likelihoods

Code

ggplot(data = LL_dat, aes(x = BDI, y = LogLikelihood)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm",formula = y~poly(x,3), se = FALSE, color = "black")

2.31 Distribution actually unknown!!!

We don’t actually know the population values
- We draw samples and calculate statistics to estimate them
Conceptually, with maximum likelihood, we try out different values of the mean and SD to figure out which are most likely
- Which values result in the highest (log)likelihood

2.32 Assume SD = 5

Calculate the sample log-likelihood for several potential mean values

Code

mu <- c(10:20)
logL <- c(-42.764,  -40.804,    -39.244,    -38.084,    -37.324,    -36.964,    -37.004,    -37.444,    -38.284,    -39.524,    -41.164)
aud_LL <- data.frame(mu, logL)
aud_LL

   mu    logL
1  10 -42.764
2  11 -40.804
3  12 -39.244
4  13 -38.084
5  14 -37.324
6  15 -36.964
7  16 -37.004
8  17 -37.444
9  18 -38.284
10 19 -39.524
11 20 -41.164

2.33 Tangents and minimums again

Conceptually, we try out different values of the mean and SD to figure out which are most likely
Actually, we again use calculus and derivatives
- Find the maximum of the likelihood function

2.34 LL function maximized at 15.4

Code

ggplot(data = aud_LL, aes(x = mu, y = logL)) +
  geom_smooth(method = "lm",formula = y~poly(x,3), se = FALSE, color = "black") + 
  labs(x = "Estimate of mean", y = "log(L)") +
  geom_hline(yintercept = -36.932, color = "dodgerblue", linetype = "dashed", linewidth = 1) +
  annotate("pointrange", x = 15.4, y = -36.932, ymin = -36.932, ymax = -36.932, color = "dodgerblue", size = 1)

2.35 Extensions

More than 1 parameter estimated
- 1 parameter = 2D plots
- k parameters = (k+1)D plots
Can’t assume the outcome is from a normally distriubuted population
- Replace normal likelihood (i.e., PDF) with appropriate version
- A slight simplification

2.36 Extensions

A variation on maximum likelihood called full information maximum likelihood or “FIML” can be used to improve estimation in the presence of missing data
- See the “References” section of the course website

2.37 Caveats

For linear regression, the likelihood function is very well-behaved
- Shaped like upside down “U”
- Only one maximum value
For more complex models, the likelihood function is less well-behaved
- Missing data
- More complex, multivariate models
- Non-normal outcomes

3 Hypothesis testing

3.1 Uncertainty

Sampling distributions capture uncertainty in estimate
- Sample is not population
- Estimates vary from sample to sample
- Uncertainty in the point estimate
Produce quantities like \(p\)-values, confidence intervals
- Based on the standard error
- Standard error = standard deviation of sampling distribution

3.2 Standard error estimation

Use the likelihood function
- Second derivative of the likelihood function
- Tells you have quickly the 1st derivative changes
- Basically, how steep is the maximum likelihood peak

3.3 Small SE vs large SE

Code

mu <- c(0:40)
L_smallse <- dnorm(mu, 20, 2)
L_largese <- dnorm(mu, 20, 5)
LL_smallse <- log(L_smallse)
LL_largese <- log(L_largese)
LLse_dat <- data.frame(mu, L_smallse, L_largese, LL_smallse, LL_largese)
#round(LLse_dat, 3)
ggplot(data = LLse_dat, aes(x = mu, y = LL_smallse)) +
  geom_smooth(method = "lm",formula = y~poly(x,3), se = FALSE, color = "black") +
  ylim(-10,0) +
  xlim(0,40) +
  annotate("pointrange", x = 15, y = -4.737, ymin = -4.737, ymax = -4.737, color = "dodgerblue", size = 1) +
  geom_segment(x = 15, xend = 15, y = -10, yend = -4.737, linetype = "dotted", linewidth = 1) +
  geom_hline(yintercept = -1.6) +
  labs(x = "Estimate of mean", y = "Log-likelihood")

Code

ggplot(data = LLse_dat, aes(x = mu, y = LL_largese)) +
  geom_smooth(method = "lm",formula = y~poly(x,3), se = FALSE, color = "black") +
  ylim(-10,0) +
  xlim(0,40) +
  annotate("pointrange", x = 15, y = -3.028, ymin = -3.028, ymax = -3.028, color = "dodgerblue", size = 1) +
  geom_segment(x = 15, xend = 15, y = -10, yend = -3.028, linetype = "dotted", linewidth = 1) +
  geom_hline(yintercept = -2.5) +
  labs(x = "Estimate of mean", y = "Log-likelihood")

3.4 Information matrix

a.k.a. Fisher Information Matrix
\(k \times k\) matrix, where \(k\) is the number of parameters estimated
Values on diagonal are variances of parameter estimates
- Square root to get the standard error
Error about “information matrix” = problem estimating standard errors

3.5 Types of significance tests

Equivalent in very large samples
- Wald tests
- Likelihood ratio tests

3.6 Wald tests

Usually used to test a single parameter
- e.g., regression coefficient
Divide estimate by its standard error
- Compare to appropriate hypothesized distribution
Regression coefficients use \(t\)-test, but may see \(z\)-tests
- ML estimates are asymptotically normally distributed

3.7 Likelihood ratio tests

Used to test 1 or more parameters
- e.g., sets of predictors
- Similar to anova() that we’ve used previously
Compare likelihoods of two different nested models
- Difference in -2*LL between models \(\sim \chi^2(df)\)
  - Degrees of freedom = difference in parameters
- anova(m1, m2, test = "LRT")

3.8 Degrees of freedom

Observations are independent pieces of information
Each thing we estimate removes something
- One sample \(t\)-test: Degrees of freedom = \(n - 1 = 10 - 1 = 9\)
- Regression coefficient: Degrees of freedom = \(n - k - 1\)
- \(F\)-test of \(R^2\): \(k\), \(n - k - 1\)

3.9 Example data

ICU data from the Stat2Data package

library(Stat2Data)
data(ICU)
head(ICU)

  ID Survive Age AgeGroup Sex Infection SysBP Pulse Emergency
1  4       0  87        3   1         1    80    96         1
2  8       1  27        1   1         1   142    88         1
3 12       1  59        2   0         0   112    80         1
4 14       1  77        3   0         0   100    70         0
5 27       0  76        3   1         1   128    90         1
6 28       1  54        2   0         1   142   103         1

3.10 Model 1

Age and Sex predict Pulse

m1 <- lm(data = ICU, Pulse ~ Age + Sex)
summary(m1)


Call:
lm(formula = Pulse ~ Age + Sex, data = ICU)

Residuals:
    Min      1Q  Median      3Q     Max 
-59.231 -17.730  -4.163  19.440  93.225 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 95.55647    5.87949  16.253   <2e-16 ***
Age          0.04533    0.09563   0.474    0.636    
Sex          2.00005    3.94123   0.507    0.612    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 26.93 on 197 degrees of freedom
Multiple R-squared:  0.0027,    Adjusted R-squared:  -0.007425 
F-statistic: 0.2667 on 2 and 197 DF,  p-value: 0.7662

3.11 Degrees of freedom

Regression coefficient
- \(b_{Age} = 0.04533, t(197) = 0.474, p = .636\)
- \(n - k - 1\) (sample size minus predictors minus 1)
Overall \(F\)-test
- \(R^2 = 0.0027, F(2, 197) = 0.2667, p = .7662\)
- Degree of freedom 1: \(k\) (number of predictors)
- Degree of freedom 2: \(n - k - 1\) (sample size minus predictors minus 1)

3.12 Information matrix

vcov() function in stats package (always loaded)
- Values on main diagonal are variances
- Square root to get standard error

vcov(m1)

            (Intercept)          Age         Sex
(Intercept)  34.5683578 -0.512491478 -3.81886908
Age          -0.5124915  0.009145049 -0.03621144
Sex          -3.8188691 -0.036211443 15.53330671

\(\sqrt{15.53330671}\) = 3.94123, which is standard error for Sex

3.13 Model 2

Age and Sex predict Pulse, then add Infection and Emergency

m2 <- lm(data = ICU, Pulse ~ Age + Sex + Infection + Emergency)
summary(m2)


Call:
lm(formula = Pulse ~ Age + Sex + Infection + Emergency, data = ICU)

Residuals:
    Min      1Q  Median      3Q     Max 
-70.688 -16.922  -3.541  15.705  82.057 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 85.11154    6.83731  12.448  < 2e-16 ***
Age          0.02128    0.09430   0.226   0.8217    
Sex          0.88884    3.77520   0.235   0.8141    
Infection   15.52284    3.77981   4.107  5.9e-05 ***
Emergency    7.79833    4.29475   1.816   0.0709 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 25.53 on 195 degrees of freedom
Multiple R-squared:  0.1127,    Adjusted R-squared:  0.09446 
F-statistic:  6.19 on 4 and 195 DF,  p-value: 0.0001041

3.14 Information matrix

vcov(m2)

            (Intercept)          Age         Sex   Infection    Emergency
(Intercept)  46.7487516 -0.535360582 -1.31998556  0.23115884 -16.70442963
Age          -0.5353606  0.008892873 -0.04388807 -0.06836532   0.09389264
Sex          -1.3199856 -0.043888071 14.25216756  0.30122681  -2.30859090
Infection     0.2311588 -0.068365322  0.30122681 14.28699784  -3.28174376
Emergency   -16.7044296  0.093892640 -2.30859090 -3.28174376  18.44486337

3.15 Likelihood ratio test

logLik() function from stats package (always loaded)

logLik(m1)

'log Lik.' -940.9174 (df=4)

logLik(m2)

'log Lik.' -929.2347 (df=6)

Why different degrees of freedom here?
- Counts estimating the variance of the outcome as well
- Intercept + number of predictors + variance

3.16 Likelihood ratio test

anova(m1, m2, test = "LRT")

Analysis of Variance Table

Model 1: Pulse ~ Age + Sex
Model 2: Pulse ~ Age + Sex + Infection + Emergency
  Res.Df    RSS Df Sum of Sq  Pr(>Chi)    
1    197 142859                           
2    195 127107  2     15752 5.657e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

-2 * (logLik(m1) - logLik(m2))

'log Lik.' 23.36533 (df=4)

4 Next time

4.1 Interactions

Non-additive effects
- Relationship differs across groups
- Easy to add to the model
  - Interpretation is more difficult