BTS 510 Lab 12

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
set.seed(12345)
theme_set(theme_classic(base_size = 16))

1 Learning objectives

  • Describe the logic of ANOVA
  • Conduct and interpret the overall ANOVA test
  • Conduct post hoc tests

2 Data

  • Amyloid dataset (N = 57) in Stat2Data package
    • Group: mAD=Alzheimer’s, MCI=mild impairment, NCI=no impairment
    • Abeta: Amount of Abeta from the posterior cingulate cortex (pmol/g tissue)
library(Stat2Data)
data(Amyloid)
head(Amyloid)
  Group Abeta
1   NCI   114
2   NCI    41
3   NCI   276
4   NCI     0
5   NCI    16
6   NCI   228

3 Descriptives and data visualization

3.1 Means, variances, and SDs by group

Amyloid %>% group_by(Group) %>%
  summarize(mean = mean(Abeta),
            variance = var(Abeta),
            sd = sd(Abeta))
# A tibble: 3 × 4
  Group  mean variance    sd
  <fct> <dbl>    <dbl> <dbl>
1 mAD    761.  182068.  427.
2 MCI    341.  165168.  406.
3 NCI    336.  189756.  436.
  • For some reason, these values get rounded to the whole number. You can get the full numbers to print if you use a package to print prettier tables, like kableExtra (only works for HTML output)
library(kableExtra)

Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':

    group_rows
Amyloid %>% group_by(Group) %>%
  summarize(mean = mean(Abeta),
            variance = var(Abeta),
            sd = sd(Abeta)) %>%
  kable()
Group mean variance sd
mAD 761.2941 182068.0 426.6942
MCI 341.0476 165168.4 406.4092
NCI 336.2632 189755.8 435.6096

3.2 Equal variance assumption

  • Graphically
ggplot(data = Amyloid,
       aes(x = Group, y = Abeta)) +
  geom_boxplot(alpha = 0.5) +
  geom_jitter(width = 0.2)

  • Statistical test
library(car)
Loading required package: carData

Attaching package: 'car'
The following object is masked from 'package:dplyr':

    recode
The following object is masked from 'package:purrr':

    some
leveneTest(Abeta ~ Group, data = Amyloid)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  2  0.3505 0.7059
      54               
  • Variances are roughly equal – good to go!

4 Analysis of variance

  • Use both lm() with contr.sum and anova() to get correct type III sums of squares
m1 <- lm(Abeta ~ Group,
         contrasts=list(Group='contr.sum'),
         data = Amyloid)
anova(m1)
Analysis of Variance Table

Response: Abeta
          Df  Sum Sq Mean Sq F value   Pr(>F)   
Group      2 2129969 1064985  5.9706 0.004544 **
Residuals 54 9632060  178371                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Overall, there is an effect of Group
    • We reject the H_0 that all the group means are equal, F(2, 54) = 5.97, p <.01. At least one group mean is different from the others.

4.1 Normality assumption

  • Graphically
m1resid <- data.frame(residuals(m1))
ggplot(data = m1resid, 
       aes(sample = residuals.m1.)) +
  geom_qq() +
  geom_qq_line()

m1resid <- data.frame(residuals(m1))
ggplot(data = m1resid, 
       aes(x = residuals.m1.)) +
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  • Statistical test
shapiro.test(resid(m1))

    Shapiro-Wilk normality test

data:  resid(m1)
W = 0.88246, p-value = 4.857e-05
  • Normality assumption is violated
    • We should consider non-parametric alternatives that don’t assume normality
    • But for the purposes of this class, we’re going to carry on anyway

5 Post hoc tests

  • I’m only going to use the reference / dummy / treatment contrasts here
    • Feel free to try out the others
  • Dummy coding compares each group to a specific reference group (e.g., control)
    • In R, the default reference group is the first group
    • Factors in R are ordered alphabetically (if character names) or numerically (if number names)
    • So this will do two comparisons (mAD is the reference group)
      • mAD vs MCI
      • mAD vs NCI
m1_trt <- lm(Abeta ~ Group,
         contrasts=list(Group='contr.treatment'),
         data = Amyloid)
summary(m1_trt)

Call:
lm(formula = Abeta ~ Group, data = Amyloid, contrasts = list(Group = "contr.treatment"))

Residuals:
   Min     1Q Median     3Q    Max 
-623.3 -322.3 -108.3  227.9 1269.0 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    761.3      102.4   7.432 8.19e-10 ***
GroupMCI      -420.2      137.8  -3.050  0.00354 ** 
GroupNCI      -425.0      141.0  -3.014  0.00392 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 422.3 on 54 degrees of freedom
Multiple R-squared:  0.1811,    Adjusted R-squared:  0.1508 
F-statistic: 5.971 on 2 and 54 DF,  p-value: 0.004544
  • Intercept = mean for reference group
    • Mean for mAD = 761.3
  • GroupMCI is the difference between reference group and MCI
    • Mean for MCI = 761.3 - 420.2 = 341.1
  • GroupNCI is the difference between reference group and NCI
    • Mean for NCI = 761.3 - 425.0 = 336.3
  • Compare these values to the group means calculated above
  • t tests tell you whether that contrast (difference) is significant
    • Both contrasts are significant here
    • Mean Abeta for mAD is different from mean Abeta for MCI
    • Mean Abeta for mAD is different from mean Abeta for NCI
    • Report t tests as we’ve dont before
      • The degrees of freedom for each test is 1

5.1 Multiple comparisons

  • If we controlled for multiple comparisons what are our conclusions?
    • Do those differ from the conclusions where we didn’t control for multiple comparisons?
p.adjust(c(0.00354, 0.00392), "holm", 2)
[1] 0.00708 0.00708
p.adjust(c(0.00354, 0.00392), "BH", 2)
[1] 0.00392 0.00392

6 Conclusions

In plain language:

  1. Are there differences in amyloid beta levels for the different impairment groups?
  2. Which groups are different from one another?