BTS 510 Lab 12

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

set.seed(12345)
theme_set(theme_classic(base_size = 16))

1 Learning objectives

Describe the logic of ANOVA
Conduct and interpret the overall ANOVA test
Conduct post hoc tests

2 Data

Amyloid dataset (N = 57) in Stat2Data package
- Group: mAD=Alzheimer’s, MCI=mild impairment, NCI=no impairment
- Abeta: Amount of Abeta from the posterior cingulate cortex (pmol/g tissue)

library(Stat2Data)
data(Amyloid)
head(Amyloid)

  Group Abeta
1   NCI   114
2   NCI    41
3   NCI   276
4   NCI     0
5   NCI    16
6   NCI   228

3 Descriptives and data visualization

3.1 Means, variances, and SDs by group

Amyloid %>% group_by(Group) %>%
  summarize(mean = mean(Abeta),
            variance = var(Abeta),
            sd = sd(Abeta))

# A tibble: 3 × 4
  Group  mean variance    sd
  <fct> <dbl>    <dbl> <dbl>
1 mAD    761.  182068.  427.
2 MCI    341.  165168.  406.
3 NCI    336.  189756.  436.

For some reason, these values get rounded to the whole number. You can get the full numbers to print if you use a package to print prettier tables, like kableExtra (only works for HTML output)

library(kableExtra)


Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

Amyloid %>% group_by(Group) %>%
  summarize(mean = mean(Abeta),
            variance = var(Abeta),
            sd = sd(Abeta)) %>%
  kable()

Group	mean	variance	sd
mAD	761.2941	182068.0	426.6942
MCI	341.0476	165168.4	406.4092
NCI	336.2632	189755.8	435.6096

3.2 Equal variance assumption

Graphically

ggplot(data = Amyloid,
       aes(x = Group, y = Abeta)) +
  geom_boxplot(alpha = 0.5) +
  geom_jitter(width = 0.2)

Statistical test

library(car)

Loading required package: carData


Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some

leveneTest(Abeta ~ Group, data = Amyloid)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  2  0.3505 0.7059
      54

Variances are roughly equal – good to go!

4 Analysis of variance

Use both lm() with contr.sum and anova() to get correct type III sums of squares

m1 <- lm(Abeta ~ Group,
         contrasts=list(Group='contr.sum'),
         data = Amyloid)
anova(m1)

Analysis of Variance Table

Response: Abeta
          Df  Sum Sq Mean Sq F value   Pr(>F)   
Group      2 2129969 1064985  5.9706 0.004544 **
Residuals 54 9632060  178371                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall, there is an effect of Group
- We reject the H_0 that all the group means are equal, F(2, 54) = 5.97, p <.01. At least one group mean is different from the others.

4.1 Normality assumption

Graphically

m1resid <- data.frame(residuals(m1))
ggplot(data = m1resid, 
       aes(sample = residuals.m1.)) +
  geom_qq() +
  geom_qq_line()

m1resid <- data.frame(residuals(m1))
ggplot(data = m1resid, 
       aes(x = residuals.m1.)) +
  geom_histogram()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Statistical test

shapiro.test(resid(m1))


    Shapiro-Wilk normality test

data:  resid(m1)
W = 0.88246, p-value = 4.857e-05

Normality assumption is violated
- We should consider non-parametric alternatives that don’t assume normality
- But for the purposes of this class, we’re going to carry on anyway

5 Post hoc tests

I’m only going to use the reference / dummy / treatment contrasts here
- Feel free to try out the others
Dummy coding compares each group to a specific reference group (e.g., control)
- In R, the default reference group is the first group
- Factors in R are ordered alphabetically (if character names) or numerically (if number names)
- So this will do two comparisons (mAD is the reference group)
  - mAD vs MCI
  - mAD vs NCI

m1_trt <- lm(Abeta ~ Group,
         contrasts=list(Group='contr.treatment'),
         data = Amyloid)
summary(m1_trt)


Call:
lm(formula = Abeta ~ Group, data = Amyloid, contrasts = list(Group = "contr.treatment"))

Residuals:
   Min     1Q Median     3Q    Max 
-623.3 -322.3 -108.3  227.9 1269.0 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    761.3      102.4   7.432 8.19e-10 ***
GroupMCI      -420.2      137.8  -3.050  0.00354 ** 
GroupNCI      -425.0      141.0  -3.014  0.00392 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 422.3 on 54 degrees of freedom
Multiple R-squared:  0.1811,    Adjusted R-squared:  0.1508 
F-statistic: 5.971 on 2 and 54 DF,  p-value: 0.004544

Intercept = mean for reference group
- Mean for mAD = 761.3
GroupMCI is the difference between reference group and MCI
- Mean for MCI = 761.3 - 420.2 = 341.1
GroupNCI is the difference between reference group and NCI
- Mean for NCI = 761.3 - 425.0 = 336.3
Compare these values to the group means calculated above
t tests tell you whether that contrast (difference) is significant
- Both contrasts are significant here
- Mean Abeta for mAD is different from mean Abeta for MCI
- Mean Abeta for mAD is different from mean Abeta for NCI
- Report t tests as we’ve dont before
  - The degrees of freedom for each test is 1

5.1 Multiple comparisons

If we controlled for multiple comparisons what are our conclusions?
- Do those differ from the conclusions where we didn’t control for multiple comparisons?

p.adjust(c(0.00354, 0.00392), "holm", 2)

[1] 0.00708 0.00708

p.adjust(c(0.00354, 0.00392), "BH", 2)

[1] 0.00392 0.00392

6 Conclusions

In plain language:

Are there differences in amyloid beta levels for the different impairment groups?
Which groups are different from one another?