Introduction to Biostatistics

1 Learning objectives

1.1 Learning objectives

  • Describe maximum likelihood estimation for linear regression
  • Describe hypothesis testing for linear regression coefficients, including the sampling distribution used and degrees of freedom

2 Estimation

2.1 What is estimation?

  • Method by which we get estimates of regression coefficients
  • What are the \(b_0\) and \(b_1\) for the regression equation?
Code
height <- 3*rnorm(50) + 70
weight <- 7 * (height - 70) + 25 * rnorm(50) + 125
dat <- data.frame(height, weight)
ggplot(data = dat, aes(x = height, y = weight)) + 
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

2.2 Ordinary least squares (OLS) estimation

  • Minimize the sum of squared residuals
  • Find \(b_0\) and \(b_1\) that give the smallest \(\Sigma((Y_i - \hat{Y}_1)^2)\)
  • For linear regression, there is one value of \(b_0\) and one value of \(b_1\) that minimize the sum of squared residuals
    • Not true for all estimation methods

2.3 Minimizing

  • Functions with squared in them look like a “U”
  • Minimum is the bottom of the “U”
  • Tangent line is a line that touches a curve at a single point
    • The slope of the tangent line is 0 at the minimum
  • \(Y = X^2\)
Code
x <- c(-10:10)
y <- x*x
dat1 <- data.frame(x, y)
ggplot(data = dat1, aes(x = x, y = y)) + 
  geom_line(linewidth = 1)

2.4 Tangent lines

Code
b1 <- c(0:20)
ssr <- b1*b1 - 20*b1 + 115
dat_ssr <- data.frame(b1, ssr)
ggplot(data = dat_ssr, aes(x = b1, y = ssr)) +
  geom_line(linewidth = 1) +
  ylim(0, 120) +
  labs(x = "Regression coefficient (b)", y = "Sum of squared residuals")

2.5 Tangent lines (at b = 15)

Code
ggplot(data = dat_ssr, aes(x = b1, y = ssr)) +
  geom_line(linewidth = 1) +
  ylim(0, 120) +
  labs(x = "Regression coefficient (b)", y = "Sum of squared residuals") +
  geom_abline(intercept = -110, slope = 10, linewidth = 1, linetype = "dashed", color = "red") +
  annotate("pointrange", x = 15, y = 15*15 - 20*15 + 115, ymin = 15*15 - 20*15 + 115, ymax = 15*15 - 20*15 + 115) +
  geom_segment(x = 15, xend = 15, y = -1, yend = 15*15 - 20*15 + 115, linetype = "dotted", linewidth = 1)

2.6 Tangent lines (at b = 12)

Code
ggplot(data = dat_ssr, aes(x = b1, y = ssr)) +
  geom_line(linewidth = 1) +
  ylim(0, 120) +
  labs(x = "Regression coefficient (b)", y = "Sum of squared residuals") +
  geom_abline(intercept = -29, slope = 4, linewidth = 1, linetype = "dashed", color = "red") +
  annotate("pointrange", x = 12, y = 12*12 - 20*12 + 115, ymin = 12*12 - 20*12 + 115, ymax = 12*12 - 20*12 + 115) +
  geom_segment(x = 12, xend = 12, y = -1, yend = 12*12 - 20*12 + 115, linetype = "dotted", linewidth = 1)

2.7 Tangent lines (at b = 10)

Code
ggplot(data = dat_ssr, aes(x = b1, y = ssr)) +
  geom_line(linewidth = 1) +
  ylim(0, 120) +
  labs(x = "Regression coefficient (b)", y = "Sum of squared residuals") +
  geom_abline(intercept = 15, slope = 0, linewidth = 1, linetype = "dashed", color = "red") +
  annotate("pointrange", x = 10, y = 10*10 - 20*10 + 115, ymin = 10*10 - 20*10 + 115, ymax = 10*10 - 20*10 + 115) +
  geom_segment(x = 10, xend = 10, y = -1, yend = 10*10 - 20*10 + 115, linetype = "dotted", linewidth = 1)

2.8 Tangents and minimums

  • The tangent line is horizontal (slope = 0) at the minimum
  • Want to find the minimum of the sum of squared residuals
  • Value of regression coefficient that meets the least squares criterion
  • Use calculus: Derivative of a function produces the tangent line

2.9 Least squares solution

  1. State function to minimize: \(\Sigma((Y_i - \hat{Y}_i)^2)\)
  2. Take derivative of the function, with respect to the constants of interest (i.e., \(b_0\) and \(b_1\))
  3. Set the derivative equal to 0: Normal equation
  4. Solve for the constants

2.10 1. Function to minimize

\[\Sigma(Y_i - \hat{Y}_i)^2 = \]

\[\Sigma(Y - (b_1 X + b_0))^2 =\] \[\Sigma(Y - b_1 X - b_0)^2 =\] \[\Sigma(Y^2 + {b^2_0} + {b^2_1} X^2 - 2 b_0 Y - 2 b_1 X Y + 2 b_0 b_1 X ) =\] \[\Sigma Y^2 + \Sigma{b^2_0} + \Sigma{b^2_1} X^2 - \Sigma 2 b_0 Y - \Sigma 2 b_1 X Y + \Sigma 2 b_0 b_1 X =\] \[\Sigma Y^2 + n{b^2_0} + {b^2_1} \Sigma X^2 - 2 b_0 \Sigma Y - 2 b_1 \Sigma X Y + 2 b_0 b_1 \Sigma X\]

2.11 2. Get the derivative

  • With respect to \(b_1\):

\[\frac{\partial \Sigma (Y - \hat{Y})^2}{\partial b_1} = 2 b_1 \Sigma X^2 - 2 \Sigma X Y + 2 b_0 \Sigma X\]

  • With respect to \(b_0\):

\[\frac{\partial \Sigma (Y - \hat{Y})^2}{\partial b_0} = 2 n b_0 - 2 \Sigma Y + 2 b_1 \Sigma X\]

2.12 3 and 4. Set to 0 and solve

For \(b_1\):

\[2 b_1 \Sigma X^2 - 2 \Sigma X Y + 2 b_0 \Sigma X = 0 \] \[ \vdots \] \[b_1 = \frac{n \Sigma X Y - (\Sigma X) (\Sigma Y)}{n \Sigma X^2 - (\Sigma X)^2} = \frac{SP_{XY}}{SS_X} = \frac{s_{XY}}{{s^2_X}}\]

2.13 3 and 4. Set to 0 and solve

For \(b_0\):

\[2 n b_0 - 2 \Sigma Y + 2 b_1 \Sigma X = 0\] \[ \vdots \] \[b_0 = \overline{Y} - b_1 \overline{X}\]

2.14 How one variable varies

  • Variation = sum of squared deviations from the mean
    • \(SS_X = \Sigma((X - \bar{X})^2)\)
  • Variance = variation divided by \(n - 1\)
    • \(s^2_X = \frac{SS_X}{n-1}\)
  • Standard deviation = square root of variance
    • \(s_X = \sqrt{s^2_X}\)

2.15 How two variables covary

  • Covariation = sum of product of deviations from the mean
    • \(SP_{XY} = \Sigma((X - \bar{X})(Y - \bar{Y}))\)
  • Covariance = covariation divided by \(n - 1\)
    • \(s_{XY} = \frac{SP_{XY}}{n-1}\)

2.16 Least squares estimates

  • \(b_1\) is the
    • Covariation between X and Y, divided by the variation of X
    • Covariance between X and Y, divided by the variance of X

\[b_1 = \frac{SP_{XY}}{SS_X} = \frac{s_{XY}}{{s^2_X}}\]

\[b_0 = \overline{Y} - b_1 \overline{X}\]

2.17 Aside: \(R^2_{multiple}\)

  • \(R^2\) = proportion of variation in \(Y\) that’s explained by the predictors
    • Also: Maximum possible squared correlation between \(Y\) and \(\hat{Y}\)

\[R^2_{multiple} = r^2_{Y\hat{Y}} = \frac{SS_{regression}}{SS_Y} = \frac{predictable\ variation}{total\ variation}\]

where \(SS_{regression} = \Sigma(\hat{Y_i} - \overline{Y}_i)^2\)

2.18 How do we really estimate?

  • Maximum likelihood (ML)
    • Maximizes a likelihood function
    • Makes assumptions about the distribution of parameter

2.19 Likelihood

  • Conceptually similar to probability
    • Higher values are more likely / probable
    • Likelihood = relative probability
  • Make assumptions about the distribution of observations
    • For example, that they’re normally distributed
    • Use (normal) probability distributions to create a likelihood function

2.20 Recall: Normal distribution

\[f(X) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(X - \mu)^2}{2\sigma^2}}\]

  • \(\mu\) = mean = location of the distribution
  • \(\sigma^2\) = variance = spread of the distribution
    • \(\sigma\) = standard deviation

2.21 Likelihood for one subject

  • Make assumptions about the population distribution
  • Assume outcome (BDI) has mean = 20, SD = 5
  • For a single subject with BDI = 21:
    • \(L_i = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(X - \mu)^2}{2\sigma^2}} = \frac{1}{\sqrt{2\pi*5^2}}e^{-\frac{(21 - 20)^2}{2*5^2}} = 0.078\)
  • \(L_i\) = 0.078 is the likelihood or relative probability of X = 21 from a normally distributed population with mean = 20 and SD = 5

2.22 Subject likelihoods

  • Small example from Enders (2005)
  • Assume \(\mu\) = 20, \(\sigma\) = 5
Code
ID <- c(1:10)
BDI <- c(5, 33, 17, 21, 13, 17, 5, 13, 17, 13)
Likelihood <- dnorm(BDI, 20, 5)
LogLikelihood <- log(Likelihood)
LL_dat <- data.frame(ID, BDI, Likelihood, LogLikelihood)
round(LL_dat, 3)
   ID BDI Likelihood LogLikelihood
1   1   5      0.001        -7.028
2   2  33      0.003        -5.908
3   3  17      0.067        -2.708
4   4  21      0.078        -2.548
5   5  13      0.030        -3.508
6   6  17      0.067        -2.708
7   7   5      0.001        -7.028
8   8  13      0.030        -3.508
9   9  17      0.067        -2.708
10 10  13      0.030        -3.508

2.23 Normal distribution, mean = 20, SD = 5

Code
norm_plot <- ggplot(data = data.frame(x = c(0, 40)), aes(x)) +
  stat_function(fun = dnorm, n = 101, args = list(mean = 20, sd = 5)) + 
  ylab("Likelihood")
norm_plot + 
  annotate("pointrange", x = LL_dat[4,2], y = LL_dat[4,3], ymin = LL_dat[4,3], ymax = LL_dat[4,3], size = 1) +
  annotate("text", x = LL_dat[4,2] + 2, y = LL_dat[4,3], label = "0.078", hjust = 0, size = 8) +
  geom_segment(x = LL_dat[4,2], xend = LL_dat[4,2], y = 0, yend = LL_dat[4,3], linetype = "dashed") + 
  annotate("pointrange", x = LL_dat[2,2], y = LL_dat[2,3], ymin = LL_dat[2,3], ymax = LL_dat[2,3], size = 1) +
  annotate("text", x = LL_dat[2,2] - 2, y = LL_dat[2,3], label = "0.003", hjust = 1, size = 8) +
  geom_segment(x = LL_dat[2,2], xend = LL_dat[2,2], y = 0, yend = LL_dat[2,3], linetype = "dashed") + 
  annotate("pointrange", x = LL_dat[3,2], y = LL_dat[3,3], ymin = LL_dat[3,3], ymax = LL_dat[3,3], size = 1) +
  annotate("text", x = LL_dat[3,2] - 2, y = LL_dat[3,3], label = "0.067", hjust = 1, size = 8) +
  geom_segment(x = LL_dat[3,2], xend = LL_dat[3,2], y = 0, yend = LL_dat[3,3], linetype = "dashed") + 
  annotate("pointrange", x = LL_dat[1,2], y = LL_dat[1,3], ymin = LL_dat[1,3], ymax = LL_dat[1,3], size = 1) +
  annotate("text", x = LL_dat[1,2] + 2, y = LL_dat[1,3], label = "0.001", hjust = 0, size = 8) +
  geom_segment(x = LL_dat[1,2], xend = LL_dat[1,2], y = 0, yend = LL_dat[1,3], linetype = "dashed")

2.24 Likelihood for a sample

  • Each subject has a likelihood for their observation
    • How to combine them?
  • Probability theory: Joint probability of independent events is the product of the individual probabilities
    • Same works for likelihoods:

\[L = \prod_{i=1}^{n} L_i = L_1 \times L_2 \times L_3 \times \cdots \times L_n\]

2.25 What does likelihood mean?

  • Individual
    • Relatively higher \(L_i\) means that that individual’s score is more likely than other scores (given the distributional assumptions)
  • Sample
    • Relatively higher \(L\) means that the sample’s scores are more likely than other samples of scores (given the distributional assumptions)

2.26 Sample likelihood

  • Sample likelihood for this very small sample

\[L = 0.001 \times 0.003 \times 0.067 \times 0.078 \times 0.030 \times 0.067 \times 0.001\]

\[\times 0.030 \times 0.067 \times 0.030 = 0.000000000000000001327\]

  • Especially as \(n\) increases: Very very very small number, rounding errors
    • Computers have problems with this

2.27 Log-likelihood

  • Logarithms: \(log(ab) = log(a) + log(b)\)

\[L = \prod_{i=1}^{n} L_i = L_1 \times L_2 \times L_3 \times \cdots \times L_n\] becomes

\[log(L) = \sum_{i=1}^{n} log(L_i) = log(L_1) + log(L_2) + log(L_3) + \cdots + log(L_n)\]

2.28 Log-likelihood

Code
round(LL_dat, 3)
   ID BDI Likelihood LogLikelihood
1   1   5      0.001        -7.028
2   2  33      0.003        -5.908
3   3  17      0.067        -2.708
4   4  21      0.078        -2.548
5   5  13      0.030        -3.508
6   6  17      0.067        -2.708
7   7   5      0.001        -7.028
8   8  13      0.030        -3.508
9   9  17      0.067        -2.708
10 10  13      0.030        -3.508
  • \(L = 0.000000000000000001327\)
  • \(log(L) = -7.028 + -5.908 + -2.708 + \cdots + -3.508 = -41.16\)

2.29 What does log-likelihood mean?

  • Log-likelihood is always negative

  • Individual

    • Relatively higher \(log(L_i)\) means that that individual’s score is more likely than other scores (given the distributional assumptions)
  • Sample

    • Relatively higher \(log(L)\) means that the sample’s scores are more likely than other samples of scores (given the distributional assumptions)

2.30 Log-likelihoods

Code
ggplot(data = LL_dat, aes(x = BDI, y = LogLikelihood)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm",formula = y~poly(x,3), se = FALSE, color = "black")

2.31 Distribution actually unknown!!!

  • We don’t actually know the population values
    • We draw samples and calculate statistics to estimate them
  • Conceptually, with maximum likelihood, we try out different values of the mean and SD to figure out which are most likely
    • Which values result in the highest (log)likelihood

2.32 Assume SD = 5

  • Calculate the sample log-likelihood for several potential mean values
Code
mu <- c(10:20)
logL <- c(-42.764,  -40.804,    -39.244,    -38.084,    -37.324,    -36.964,    -37.004,    -37.444,    -38.284,    -39.524,    -41.164)
aud_LL <- data.frame(mu, logL)
aud_LL
   mu    logL
1  10 -42.764
2  11 -40.804
3  12 -39.244
4  13 -38.084
5  14 -37.324
6  15 -36.964
7  16 -37.004
8  17 -37.444
9  18 -38.284
10 19 -39.524
11 20 -41.164

2.33 Tangents and minimums again

  • Conceptually, we try out different values of the mean and SD to figure out which are most likely
  • Actually, we again use calculus and derivatives
    • Find the maximum of the likelihood function

2.34 LL function maximized at 15.4

Code
ggplot(data = aud_LL, aes(x = mu, y = logL)) +
  geom_smooth(method = "lm",formula = y~poly(x,3), se = FALSE, color = "black") + 
  labs(x = "Estimate of mean", y = "log(L)") +
  geom_hline(yintercept = -36.932, color = "dodgerblue", linetype = "dashed", linewidth = 1) +
  annotate("pointrange", x = 15.4, y = -36.932, ymin = -36.932, ymax = -36.932, color = "dodgerblue", size = 1)

2.35 Extensions

  • More than 1 parameter estimated
    • 1 parameter = 2D plots
    • k parameters = (k+1)D plots
  • Can’t assume the outcome is from a normally distriubuted population
    • Replace normal likelihood (i.e., PDF) with appropriate version
    • A slight simplification

2.36 Extensions

  • A variation on maximum likelihood called full information maximum likelihood or “FIML” can be used to improve estimation in the presence of missing data
    • See the “References” section of the course website

2.37 Caveats

  • For linear regression, the likelihood function is very well-behaved
    • Shaped like upside down “U”
    • Only one maximum value
  • For more complex models, the likelihood function is less well-behaved
    • Missing data
    • More complex, multivariate models
    • Non-normal outcomes

3 Hypothesis testing

3.1 Uncertainty

  • Sampling distributions capture uncertainty in estimate
    • Sample is not population
    • Estimates vary from sample to sample
    • Uncertainty in the point estimate
  • Produce quantities like \(p\)-values, confidence intervals
    • Based on the standard error
    • Standard error = standard deviation of sampling distribution

3.2 Standard error estimation

  • Use the likelihood function
    • Second derivative of the likelihood function
    • Tells you have quickly the 1st derivative changes
    • Basically, how steep is the maximum likelihood peak

3.3 Small SE vs large SE

Code
mu <- c(0:40)
L_smallse <- dnorm(mu, 20, 2)
L_largese <- dnorm(mu, 20, 5)
LL_smallse <- log(L_smallse)
LL_largese <- log(L_largese)
LLse_dat <- data.frame(mu, L_smallse, L_largese, LL_smallse, LL_largese)
#round(LLse_dat, 3)
ggplot(data = LLse_dat, aes(x = mu, y = LL_smallse)) +
  geom_smooth(method = "lm",formula = y~poly(x,3), se = FALSE, color = "black") +
  ylim(-10,0) +
  xlim(0,40) +
  annotate("pointrange", x = 15, y = -4.737, ymin = -4.737, ymax = -4.737, color = "dodgerblue", size = 1) +
  geom_segment(x = 15, xend = 15, y = -10, yend = -4.737, linetype = "dotted", linewidth = 1) +
  geom_hline(yintercept = -1.6) +
  labs(x = "Estimate of mean", y = "Log-likelihood")

Code
ggplot(data = LLse_dat, aes(x = mu, y = LL_largese)) +
  geom_smooth(method = "lm",formula = y~poly(x,3), se = FALSE, color = "black") +
  ylim(-10,0) +
  xlim(0,40) +
  annotate("pointrange", x = 15, y = -3.028, ymin = -3.028, ymax = -3.028, color = "dodgerblue", size = 1) +
  geom_segment(x = 15, xend = 15, y = -10, yend = -3.028, linetype = "dotted", linewidth = 1) +
  geom_hline(yintercept = -2.5) +
  labs(x = "Estimate of mean", y = "Log-likelihood")

3.4 Information matrix

  • a.k.a. Fisher Information Matrix
  • \(k \times k\) matrix, where \(k\) is the number of parameters estimated
  • Values on diagonal are variances of parameter estimates
    • Square root to get the standard error
  • Error about “information matrix” = problem estimating standard errors

3.5 Types of significance tests

  • Equivalent in very large samples
    • Wald tests
    • Likelihood ratio tests

3.6 Wald tests

  • Usually used to test a single parameter
    • e.g., regression coefficient
  • Divide estimate by its standard error
    • Compare to appropriate hypothesized distribution
  • Regression coefficients use \(t\)-test, but may see \(z\)-tests
    • ML estimates are asymptotically normally distributed

3.7 Likelihood ratio tests

  • Used to test 1 or more parameters
    • e.g., sets of predictors
    • Similar to anova() that we’ve used previously
  • Compare likelihoods of two different nested models
    • Difference in -2*LL between models \(\sim \chi^2(df)\)
      • Degrees of freedom = difference in parameters
    • anova(m1, m2, test = "LRT")

3.8 Degrees of freedom

  • Observations are independent pieces of information
  • Each thing we estimate removes something
    • One sample \(t\)-test: Degrees of freedom = \(n - 1 = 10 - 1 = 9\)
    • Regression coefficient: Degrees of freedom = \(n - k - 1\)
    • \(F\)-test of \(R^2\): \(k\), \(n - k - 1\)

3.9 Example data

  • ICU data from the Stat2Data package
library(Stat2Data)
data(ICU)
head(ICU)
  ID Survive Age AgeGroup Sex Infection SysBP Pulse Emergency
1  4       0  87        3   1         1    80    96         1
2  8       1  27        1   1         1   142    88         1
3 12       1  59        2   0         0   112    80         1
4 14       1  77        3   0         0   100    70         0
5 27       0  76        3   1         1   128    90         1
6 28       1  54        2   0         1   142   103         1

3.10 Model 1

  • Age and Sex predict Pulse
m1 <- lm(data = ICU, Pulse ~ Age + Sex)
summary(m1)

Call:
lm(formula = Pulse ~ Age + Sex, data = ICU)

Residuals:
    Min      1Q  Median      3Q     Max 
-59.231 -17.730  -4.163  19.440  93.225 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 95.55647    5.87949  16.253   <2e-16 ***
Age          0.04533    0.09563   0.474    0.636    
Sex          2.00005    3.94123   0.507    0.612    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 26.93 on 197 degrees of freedom
Multiple R-squared:  0.0027,    Adjusted R-squared:  -0.007425 
F-statistic: 0.2667 on 2 and 197 DF,  p-value: 0.7662

3.11 Degrees of freedom

  • Regression coefficient
    • \(b_{Age} = 0.04533, t(197) = 0.474, p = .636\)
    • \(n - k - 1\) (sample size minus predictors minus 1)
  • Overall \(F\)-test
    • \(R^2 = 0.0027, F(2, 197) = 0.2667, p = .7662\)
    • Degree of freedom 1: \(k\) (number of predictors)
    • Degree of freedom 2: \(n - k - 1\) (sample size minus predictors minus 1)

3.12 Information matrix

  • vcov() function in stats package (always loaded)
    • Values on main diagonal are variances
    • Square root to get standard error
vcov(m1)
            (Intercept)          Age         Sex
(Intercept)  34.5683578 -0.512491478 -3.81886908
Age          -0.5124915  0.009145049 -0.03621144
Sex          -3.8188691 -0.036211443 15.53330671
  • \(\sqrt{15.53330671}\) = 3.94123, which is standard error for Sex

3.13 Model 2

  • Age and Sex predict Pulse, then add Infection and Emergency
m2 <- lm(data = ICU, Pulse ~ Age + Sex + Infection + Emergency)
summary(m2)

Call:
lm(formula = Pulse ~ Age + Sex + Infection + Emergency, data = ICU)

Residuals:
    Min      1Q  Median      3Q     Max 
-70.688 -16.922  -3.541  15.705  82.057 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 85.11154    6.83731  12.448  < 2e-16 ***
Age          0.02128    0.09430   0.226   0.8217    
Sex          0.88884    3.77520   0.235   0.8141    
Infection   15.52284    3.77981   4.107  5.9e-05 ***
Emergency    7.79833    4.29475   1.816   0.0709 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 25.53 on 195 degrees of freedom
Multiple R-squared:  0.1127,    Adjusted R-squared:  0.09446 
F-statistic:  6.19 on 4 and 195 DF,  p-value: 0.0001041

3.14 Information matrix

vcov(m2)
            (Intercept)          Age         Sex   Infection    Emergency
(Intercept)  46.7487516 -0.535360582 -1.31998556  0.23115884 -16.70442963
Age          -0.5353606  0.008892873 -0.04388807 -0.06836532   0.09389264
Sex          -1.3199856 -0.043888071 14.25216756  0.30122681  -2.30859090
Infection     0.2311588 -0.068365322  0.30122681 14.28699784  -3.28174376
Emergency   -16.7044296  0.093892640 -2.30859090 -3.28174376  18.44486337

3.15 Likelihood ratio test

  • logLik() function from stats package (always loaded)
logLik(m1)
'log Lik.' -940.9174 (df=4)
logLik(m2)
'log Lik.' -929.2347 (df=6)
  • Why different degrees of freedom here?
    • Counts estimating the variance of the outcome as well
    • Intercept + number of predictors + variance

3.16 Likelihood ratio test

anova(m1, m2, test = "LRT")
Analysis of Variance Table

Model 1: Pulse ~ Age + Sex
Model 2: Pulse ~ Age + Sex + Infection + Emergency
  Res.Df    RSS Df Sum of Sq  Pr(>Chi)    
1    197 142859                           
2    195 127107  2     15752 5.657e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-2 * (logLik(m1) - logLik(m2))
'log Lik.' 23.36533 (df=4)

4 Next time

4.1 Interactions

  • Non-additive effects
    • Relationship differs across groups
    • Easy to add to the model
      • Interpretation is more difficult