Introduction to Biostatistics

1 Learning objectives

1.1 Learning objectives

  • Select an appropriate test for a contingency table, taking study design into consideration
  • Interpret tests comparing two related samples

2 Review: Contingency tables

2.1 Contingency tables

  • Cross-tabs, summary tables, 2x2 table
    • Relationship between two (or more) categorical variables
    • Each cell is a frequency for that combination
  • Sex and Smoke from the Pulse dataset
Code
library(Stat2Data)
data(Pulse)
smoke_sex <- table(Pulse$Sex, Pulse$Smoke)
colnames(smoke_sex) <- c("Non-smoker", "Smoker")
rownames(smoke_sex) <- c("Male", "Female")
smoke_sex_margins <- addmargins(smoke_sex)
smoke_sex_prop_margins <- addmargins(prop.table(smoke_sex))
smoke_sex
        
         Non-smoker Smoker
  Male          105     17
  Female        101      9

2.2 Sex and Smoke: Frequencies

Code
smoke_sex
        
         Non-smoker Smoker
  Male          105     17
  Female        101      9

2.3 Sex and Smoke: Margins

Code
smoke_sex_margins
        
         Non-smoker Smoker Sum
  Male          105     17 122
  Female        101      9 110
  Sum           206     26 232

2.4 Sex and Smoke: Marginal prob

Code
smoke_sex_prop_margins
        
         Non-smoker     Smoker        Sum
  Male   0.45258621 0.07327586 0.52586207
  Female 0.43534483 0.03879310 0.47413793
  Sum    0.88793103 0.11206897 1.00000000

2.5 Sex and Smoke: Conditional prob

Code
prop.table(smoke_sex, margin = 1)
        
         Non-smoker     Smoker
  Male   0.86065574 0.13934426
  Female 0.91818182 0.08181818

2.6 Types of study designs

  • Cross-sectional
    • Total marginal value fixed (i.e., total \(n\))
  • Retrospective
    • Outcome marginal values fixed (i.e., Smoke)
  • Prospective
    • Predictor marginal values fixed (i.e., Sex)

2.7 Measures of relationship

  • Difference in proportion (prospective design only)
  • Relative risk (prospective design only)
  • Odds ratio
  • (Chi-square) Test of independence

3 Tests for contingency tables

3.1 Difference in proportion: Assumptions

  • \(2 \times 2\) contingency table
    • Only compare two groups
    • Probability needs only 2 categories
  • Prospective design: \(X\) marginals are fixed
  • “Large sample”: Uses the normal distribution for CIs

3.2 Difference in proportion: Hypotheses

  • Directional (one-tailed) tests
    • \(H_0\): \(\pi_1 \le \pi_2\)
      • \(H_1\): \(\pi_1 > \pi_2\)
    • \(H_0\): \(\pi_1 \ge \pi_2\)
      • \(H_1\): \(\pi_1 < \pi_2\)
  • Non-directional (two-tailed) tests
    • \(H_0\): \(\pi_1 = \pi_2\)
      • \(H_1\): \(\pi_1 \ne \pi_2\)

3.3 Difference in proportion: Calculations

  • Difference: \(p_1 - p_2\)
  • Standard error (SE): \(\sqrt{\frac{p_1(1-p_1)}{n_{1+}} + \frac{p_2(1-p_2)}{n_{2+}}}\)
  • Observed \(z\) statistic: \(\frac{p_1 - p_2}{SE}\)
  • Confidence interval for difference: \((p_1 - p_2) \pm z_{\alpha/2}(SE)\)

3.4 Difference in proportion: Example

  • Is the proportion of smokers the same for males and females?
  • prop.test() function from stats package
Code
smoke_sex
        
         Non-smoker Smoker
  Male          105     17
  Female        101      9
Code
smoke_sex_margins
        
         Non-smoker Smoker Sum
  Male          105     17 122
  Female        101      9 110
  Sum           206     26 232
Code
prop.test(x = smoke_sex[,2], 
          n = smoke_sex_margins[c(1,2),3],
          alternative = "two.sided",
          conf.level = 0.95,
          correct = TRUE)

    2-sample test for equality of proportions with continuity correction

data:  smoke_sex[, 2] out of smoke_sex_margins[c(1, 2), 3]
X-squared = 1.389, df = 1, p-value = 0.2386
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.03111589  0.14616805
sample estimates:
    prop 1     prop 2 
0.13934426 0.08181818 
Code
p1 <- prop.test(x = smoke_sex[,2], 
          n = smoke_sex_margins[c(1,2),3],
          alternative = "two.sided",
          conf.level = 0.95,
          correct = TRUE)$estimate[1]
p2 <- prop.test(x = smoke_sex[,2], 
          n = smoke_sex_margins[c(1,2),3],
          alternative = "two.sided",
          conf.level = 0.95,
          correct = TRUE)$estimate[2]

3.5 Relative risk: Assumptions

  • \(2 \times 2\) contingency table
    • Only compare two groups
    • Probability needs only 2 categories
  • Prospective design: \(X\) marginals are fixed
  • “Large sample”: Uses the normal distribution for CIs

3.6 Relative risk: Hypotheses

  • Directional (one-tailed) tests
    • \(H_0\): \(\frac{\pi_1}{\pi_2} \le 1\)
      • \(H_1\): \(\frac{\pi_1}{\pi_2} > 1\)
    • \(H_0\): \(\frac{\pi_1}{\pi_2} \ge 1\)
      • \(H_1\): \(\frac{\pi_1}{\pi_2} < 1\)
  • Non-directional (two-tailed) tests
    • \(H_0\): \(\frac{\pi_1}{\pi_2} = 1\)
      • \(H_1\): \(\frac{\pi_1}{\pi_2} \ne 1\)

3.7 Relative risk: Calculations

  • Relative risk: \(\frac{p_1}{p_2}\)
  • Standard error (SE) for \(ln(RR)\): \(\sqrt{\frac{1}{n_{11}} + \frac{1}{n_{21}} + \frac{1}{n_{1+}} + \frac{1}{n_{2+}}}\)
  • Observed \(z\) statistic: \(\frac{p_1 / p_2}{SE}\)
  • Confidence interval for \(ln(RR)\): \(ln\left(\frac{p_1}{p_2}\right) \pm z_{\alpha/2}(SE)\)
    • Exponentiate (\(e^x\)) to convert back to RR metric

3.8 Why transform?

  • Ratios (like relative risk) have non-symmetric distributions
    • They range from 0 to \(+\infty\)
    • “No effect” is 1: This is not in the middle
  • The natural log of relative risk does have a symmetric distribution
    • Ranges from \(-\infty\) to \(+\infty\)
    • “No effect” is 0: Right in the middle
  • Have seen symmetric CIs
    • Confidence interval for relative risk is not symmetric \(\checkmark\)

3.9 Relative risk: Example

  • Is the risk of smoking the same for males and females?
  • riskratio.wald() function from epitools package
Code
library(epitools)
riskratio.wald(smoke_sex)
$data
        
         Non-smoker Smoker Total
  Male          105     17   122
  Female        101      9   110
  Total         206     26   232

$measure
        risk ratio with 95% C.I.
          estimate     lower    upper
  Male   1.0000000        NA       NA
  Female 0.5871658 0.2730208 1.262774

$p.value
        two-sided
         midp.exact fisher.exact chi.square
  Male           NA           NA         NA
  Female  0.1729872    0.2118609  0.1654532

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

3.10 Odds ratio: Assumptions

  • \(2 \times 2\) contingency table
    • Only compare two groups
    • Probability needs only 2 categories
  • Any study design
  • “Large sample”: Uses the normal distribution for CIs

3.11 Odds ratio: Hypotheses

  • Directional (one-tailed) tests
    • \(H_0\): \(\frac{\pi_1/(1-\pi_1)}{\pi_2/(1-\pi_2)} \le 1\)
      • \(H_1\): \(\frac{\pi_1/(1-\pi_1)}{\pi_2/(1-\pi_2)} > 1\)
    • \(H_0\): \(\frac{\pi_1/(1-\pi_1)}{\pi_2/(1-\pi_2)} \ge 1\)
      • \(H_1\): \(\frac{\pi_1/(1-\pi_1)}{\pi_2/(1-\pi_2)} < 1\)
  • Non-directional (two-tailed) tests
    • \(H_0\): \(\frac{\pi_1/(1-\pi_1)}{\pi_2/(1-\pi_2)} = 1\)
      • \(H_1\): \(\frac{\pi_1/(1-\pi_1)}{\pi_2/(1-\pi_2)} \ne 1\)

3.12 Odds ratio: Calculations

  • Odds ratio (OR): \(\frac{p_1/(1-p_1)}{p_2/(1-p_2)}\)
  • Standard error (SE) for \(ln(OR)\): \(\sqrt{\frac{1}{n_{11}} + \frac{1}{n_{12}} + \frac{1}{n_{21}} + \frac{1}{n_{22}}}\)
  • Observed \(z\) statistic: \(\frac{OR}{SE}\)
  • Confidence interval for \(ln(OR)\): \(ln(OR) \pm z_{\alpha/2}(SE)\)
    • Exponentiate (\(e^x\)) to convert back to OR metric

3.13 Odds ratio: Example

  • Is the odds of smoking the same for males and females?
  • oddsratio.wald() function from epitools package
Code
library(epitools)
oddsratio.wald(smoke_sex)
$data
        
         Non-smoker Smoker Total
  Male          105     17   122
  Female        101      9   110
  Total         206     26   232

$measure
        odds ratio with 95% C.I.
          estimate     lower    upper
  Male   1.0000000        NA       NA
  Female 0.5503786 0.2345618 1.291415

$p.value
        two-sided
         midp.exact fisher.exact chi.square
  Male           NA           NA         NA
  Female  0.1729872    0.2118609  0.1654532

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

3.14 Compare RR and OR

  • Relative risk = 0.587
  • Odds ratio = 0.55

\[odds~ratio = \frac{p_1/(1 - p_1)}{p_2/(1 - p_2)} = relative~risk \frac{(1 - p_1)}{(1 - p_2)} \]

  • When \(p_1\) and \(p_2\) are both close to 0 or both close to 1
    • Odds ratio and relative risk are similar
    • In this example, \(p_1 = 0.139\) and \(p_2 = 0.082\)

3.15 Chi-square test: Assumptions

  • Categorical variables (nominal or ordinal)
    • Can have more than 2 levels (but not covered here)
  • Cells are counts of observations in each combination
  • All observed frequencies > 5
  • All expected frequencies > 5

3.16 Chi-square test: Hypotheses

  • \(H_0\): Variables are independent
  • \(H_1\): Variables are not independent

3.17 Chi-square test: Expected values 1

  • Expected values under the null hypothesis (independence)
  • Independence is like correlation but stronger
    • If variables are independent then correlation = 0
    • If variables are independent then covariance = 0

3.18 Chi-square test: Expected values 2

  • \(\color{red}{Joint\ probability}\) is a fxn of \(\color{blue}{marginal\ probs}\) and covariance
    • \(\color{red}{E(XY)} = \color{blue}{E(X)} * \color{blue}{E(Y)} - cov(XY)\)
  • If \(H_0\): Independence is true, then \(cov(XY) = 0\)
    • \(\color{red}{E(XY)} = \color{blue}{E(X)} * \color{blue}{E(Y)}\)
    • Joint frequencies depend only on their marginal frequencies

3.19 Chi-square test: Expected values 3

  • Observed joint and marginal frequencies
Code
smoke_sex_margins
        
         Non-smoker Smoker Sum
  Male          105     17 122
  Female        101      9 110
  Sum           206     26 232
  • Expected joint frequencies: \(\mu_{ij} = \frac{n_{i+}n_{+j}}{n}\)
Code
expected(smoke_sex)
        
         Non-smoker   Smoker
  Male    108.32759 13.67241
  Female   97.67241 12.32759
  • Joint frequency for male non-smokers = \(\frac{122*206}{232} = 108.328\)

3.20 Chi-square test: Test statistic 1

\[\chi^2 = \Sigma_{i = 1} ^N \frac{(O_i - E_i)^2}{E_i}\]

  • with \(df = (\#\ rows - 1)*(\#\ columns - 1)\)

3.21 Chi-square test: Test statistic 2

  • Yates’ continuity correction
    • Smooths discrete distribution closer to assumed continuous
    • More accurate with small cell sizes
    • Doesn’t matter much with very large samples

\[\chi^2 = \Sigma_{i = 1} ^N \frac{(|O_i - E_i| - 0.5)^2}{E_i}\]

3.22 Chi-square test: Example

  • Is the gender split the same for smokers and non-smokers?
Code
chisq.test(smoke_sex, correct = TRUE)

    Pearson's Chi-squared test with Yates' continuity correction

data:  smoke_sex
X-squared = 1.389, df = 1, p-value = 0.2386
  • Retain \(H_0\): Variables are independent
  • Smoking and sex are independent, \(\chi^2(1) = 1.389, p =.2386\)
    • Knowing someone’s sex doesn’t tell you whether they smoke

3.23 Fisher’s exact test

  • “Alternative to \(\chi^2\) for small samples: \(\le\) 5 in a cell”
    • But that’s not really correct
  • Used when all margins are “fixed”
    • Hypergeometric distribution
  • Origin: “Lady tasting tea”
    • Truth: Milk first or tea first? 4 cups each
    • Guess: Milk first or tea first? 4 cups each

3.24 Fisher’s exact test

  Tea first Milk first Total
Guess tea 3 1 4
Guess milk 1 3 4
Total 4 4 8

3.25 Fisher’s exact test

  • fisher.test() function in stats package
Code
TeaTasting <- matrix(c(3, 1, 1, 3),
                     nrow = 2,
                     dimnames = list(Guess = c("Milk", "Tea"),
                                     Truth = c("Milk", "Tea")))
fisher.test(TeaTasting, alternative = "greater")

    Fisher's Exact Test for Count Data

data:  TeaTasting
p-value = 0.2429
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
 0.3135693       Inf
sample estimates:
odds ratio 
  6.408309 

4 Two dependent sample tests

4.1 Dependent samples tests

  • Parametric tests
    • Matched pairs \(z\)-test
    • Matched pairs (or paired) \(t\)-test
  • Non-parametric tests
    • Sign test
    • Wilcoxon signed-rank test
    • McNemar’s test (proportion)

4.2 Non-independence

  • Opposite of independence
    • The two samples are made up of the same (or related) units
  • When does this happen?
    • Most common: Pre-post designs, each unit in multiple conditions
    • Also: Dyads, multiple reporters or sources

4.3 Matched pairs \(t\)-test: Assumptions

  • Data are continuous (i.e., ratio or interval)
  • Data are randomly sampled from the population
  • Data are independent within a group
  • Two matched or paired observations per unit
  • Distribution of the difference between two observations is approximately normally distributed OR sample size is large enough for normally distributed sampling distribution

4.4 Matched pairs \(t\)-test: Hypotheses

  • Directional (one-tailed) tests
    • \(H_0\): \(\mu_d \le 0\)
      • \(H_1\): \(\mu_d > 0\)
    • \(H_0\): \(\mu_d \ge 0\)
      • \(H_1\): \(\mu_d < 0\)
  • Non-directional (two-tailed) tests
    • \(H_0\): \(\mu_d = 0\)
      • \(H_1\): \(\mu_d \ne 0\)

4.5 Matched pairs \(t\)-test: Calculations

Code
paired <- Pulse %>% 
  select(c("Active", "Rest")) %>%
  mutate(diff = Active - Rest)
head(paired, n = 10)
   Active Rest diff
1      97   78   19
2      82   68   14
3      88   62   26
4     106   74   32
5      78   63   15
6     109   65   44
7      66   43   23
8      68   65    3
9     100   63   37
10     70   59   11

\[t = \frac{\bar{d} - 0}{\sigma_d/\sqrt{n}}\]

  • where \(\sigma_d\) is the standard deviation of the differences
  • Degrees of freedom = \(n - 1\) (where \(n\) is the number of pairs)

4.6 Matched pairs \(t\)-test: Example 1

  • Is there is a difference between active and resting pulse rate in the same person?
Code
t.test(x = paired$Active,
       y = paired$Rest,
       paired = TRUE,
       alternative = "two.sided",
       mu = 0, 
       conf.level = .95)

    Paired t-test

data:  paired$Active and paired$Rest
t = 23.204, df = 231, p-value < 0.00000000000000022
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 20.99966 24.89689
sample estimates:
mean difference 
       22.94828 

4.7 Versus independent samples

  • Two independent groups: Resting pulse rate vs active pulse rate
    • Research question: Group differences
  • Matched-pairs \(t\)-test
    • Research question: Individual differences
      • Is the average difference for a person different from 0?
    • Statistical: Use each person as their own “control”
      • Reduces error variance

4.8 Extensions

  • Paired \(t\)-test is a one-sample \(t\)-test with \(H_0: \mu_d = 0\)
  • Can extend this to the equivalent of a two-sample \(t\)-test
    • Is the difference between active and resting pulse rate different for smokers vs non-smokers?
  • Often done in ANOVA designs
    • “Mixed ANOVA” or “between-within ANOVA”
  • Extends to mixed models (which are related but regression-based)

4.9 Sign test: Assumptions

  • Data are at least ordinal (ordinal, interval, or ratio)
  • Data are randomly sampled from the population
  • No assumptions about distribution of the data
  • No assumptions about sampling distribution of the median
  • Hypotheses are about median differences between two related groups

4.10 Sign test: Logic

  • Count how many pairs increase vs decrease
    • Null hypothesis: “No change” (equal numbers increase and decrease)
  • Compare to binomial distribution to determine probability of observed number of increases

4.11 Sign test: Example

  • SIGN.test() function in BSDA package
    • But there are several functions in difference packages
Code
library(BSDA)
SIGN.test(x = paired$Active,
          y = paired$Rest,
          mu = 0,
          alternative = "two.sided",
          conf.level = 0.95)

    Dependent-samples Sign-Test

data:  paired$Active and paired$Rest
S = 227, p-value < 0.00000000000000022
alternative hypothesis: true median difference is not equal to 0
95 percent confidence interval:
 20 23
sample estimates:
median of x-y 
           21 

Achieved and Interpolated Confidence Intervals: 

                  Conf.Level L.E.pt U.E.pt
Lower Achieved CI     0.9433     20     23
Interpolated CI       0.9500     20     23
Upper Achieved CI     0.9584     20     23

4.12 Wilcoxon Signed rank test: Assumptions

  • Data are at least interval (interval or ratio)
  • Data are randomly sampled from the population
  • No assumptions about distribution of the data
  • No assumptions about sampling distribution of the median
  • Hypotheses are about median differences between two related groups

4.13 Signed rank test: Logic

  • Rank absolute values of all differences
  • Above hypothesized median (0): Assign “+”
  • Below hypothesized median (0): Assign “-”
  • Add positive ranks, add negative ranks
  • Compare to distribution under the null hypothesis
    • i.e., sum of positive ranks = sum of negative ranks

4.14 Signed rank test: Example

  • wilcox.test() function from stats package
Code
wilcox.test(x = paired$Active,
            y = paired$Rest,
            paired = TRUE,
            mu = 0,
            alternative = "two.sided",
            conf.level = 0.95)

    Wilcoxon signed rank test with continuity correction

data:  paired$Active and paired$Rest
V = 26931, p-value < 0.00000000000000022
alternative hypothesis: true location shift is not equal to 0

4.15 McNemar’s test

  • Test of paired proportions
    • Proportion before vs proportion after
Design
Post
Success Failure
Pre Success a b a+b
Failure c d c+d
a+c b+d

4.16 McNemar’s test

Code
paired_prop <- matrix(c(183, 55, 49, 13), nrow = 2, ncol = 2, byrow = TRUE)
colnames(paired_prop) <- c("Post success", "Post failure")
rownames(paired_prop) <- c("Pre success", "Pre failure")
addmargins(paired_prop)
            Post success Post failure Sum
Pre success          183           55 238
Pre failure           49           13  62
Sum                  232           68 300
  • mcnemar.test() function in stats package
Code
mcnemar.test(paired_prop)

    McNemar's Chi-squared test with continuity correction

data:  paired_prop
McNemar's chi-squared = 0.24038, df = 1, p-value = 0.6239

5 In-class activities

5.1 In-class activities

  • Create some contingency tables
    • Perform some tests on them
  • Perform some paired tests

5.2 Next week

  • Multiple comparisons
    • \(\alpha\) is the type I error rate for a single test
    • We often perform more than 1 test
      • Sometimes a couple, sometimes 100s
    • Each additional test increases \(\alpha\)
      • How can we maintain the type I error rate with multiple tests?