BTS 510 Lab 10

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

set.seed(12345)
theme_set(theme_classic(base_size = 16))

1 Learning objectives

Select an appropriate test for a contingency table, taking study design into consideration
Interpret tests comparing two related samples

2 Data

MedGPA dataset from the Stat2Data package
- A dataset with n = 55 observations on the following 11 variables
  - Accept: A=accepted to medical school or D=denied admission
  - Acceptance: Indicator for Accept: 1=accepted or 0=denied
  - Sex: F=female or M=male
  - BCPM: Bio/Chem/Physics/Math grade point average
  - GPA: College grade point average
  - VR: Verbal reasoning (subscore)
  - PS: Physical sciences (subscore)
  - WS: Writing sample (subcore)
  - BS: Biological sciences (subscore)
  - MCAT: Score on the MCAT exam (sum of CR+PS+WS+BS)
  - Apps: Number of medical schools applied to

library(Stat2Data)
data("MedGPA")

3 Contingency tables

table() function
- Row variable first, then column variable
- Use rownames() and colnames() to add appropriate labels for each
  - Instead of default 0 and 1, which we don’t know what they are
- Add marginal frequencies with addmargins()

accept_sex <- table(MedGPA$Sex, MedGPA$Accept)
colnames(accept_sex) <- c("Accepted", "Denied")
rownames(accept_sex) <- c("Female", "Male")
accept_sex_margins <- addmargins(accept_sex)
accept_sex_margins

        
         Accepted Denied Sum
  Female       18     10  28
  Male         12     15  27
  Sum          30     25  55

3.1 Difference in proportions

Assume prospective design (which this isn’t)
prop.test() function from stats package
- Proportion = x out of n
  - Choose data accordingly
  - Use column 1 which is the “Accepted” category to model probability of being accepted
- correct = TRUE for continuity correction
  - Especially with smaller samples

prop.test(x = accept_sex[,1], 
          n = accept_sex_margins[c(1,2),3],
          alternative = "two.sided",
          conf.level = 0.95,
          correct = TRUE)


    2-sample test for equality of proportions with continuity correction

data:  accept_sex[, 1] out of accept_sex_margins[c(1, 2), 3]
X-squared = 1.4556, df = 1, p-value = 0.2276
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.09608848  0.49291387
sample estimates:
   prop 1    prop 2 
0.6428571 0.4444444

3.2 Relative risk

Assume prospective design (which this isn’t)
riskratio.wald() function from epitools package
- Rows in table are assumed to be exposure / predictor
  - i.e., the fixed margins
- First row is reference group
  - Relative risk = 2nd row group / 1st row group
- Second column is event
  - Model the probability of the event
- Use rev = to switch rows and or columns
  - Options: "rows", "columns", "both", "neither" (default)

library(epitools)
riskratio.wald(accept_sex,
               rev = "both",
               correction = TRUE)

$data
        
         Denied Accepted Total
  Male       15       12    27
  Female     10       18    28
  Total      25       30    55

$measure
        risk ratio with 95% C.I.
         estimate     lower    upper
  Male   1.000000        NA       NA
  Female 1.446429 0.8737588 2.394431

$p.value
        two-sided
         midp.exact fisher.exact chi.square
  Male           NA           NA         NA
  Female  0.1534856    0.1798884  0.2276263

$correction
[1] TRUE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

Note that rev = "both" results in the proportion of "Accepted" for females compared to males
- Females’ probability of acceptance is 1.446 times that of males’ probability of acceptance

3.3 Odds ratio

No assumptions about design
oddsratio.wald() function from epitools package
- Rows in table are assumed to be exposure / predictor
  - Not necessarily fixed
- First row is reference group
  - Odds ratio = 2nd row group / 1st row group
- Second column is event
  - Model the probability of the event
- Use rev = to switch rows and or columns
  - Options: "rows", "columns", "both", "neither" (default)

library(epitools)
accept_sex

        
         Accepted Denied
  Female       18     10
  Male         12     15

oddsratio.wald(accept_sex,
               rev = "both",
               correction = TRUE)

$data
        
         Denied Accepted Total
  Male       15       12    27
  Female     10       18    28
  Total      25       30    55

$measure
        odds ratio with 95% C.I.
         estimate     lower    upper
  Male       1.00        NA       NA
  Female     2.25 0.7614882 6.648166

$p.value
        two-sided
         midp.exact fisher.exact chi.square
  Male           NA           NA         NA
  Female  0.1534856    0.1798884  0.2276263

$correction
[1] TRUE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

3.4 Chi-square test of independence

Code

chisq.test(accept_sex, 
           correct = TRUE)


    Pearson's Chi-squared test with Yates' continuity correction

data:  accept_sex
X-squared = 1.4556, df = 1, p-value = 0.2276

3.5 Compare results

What can we say about the relationship between Sex and Accept?
- In terms of proportions, relative risk, odds ratios, independence?
- Are all the tests telling a similar story?
  - Why or why not?

4 Dependent samples tests

Are total GPA and Bio/Chem/Physics/Math (BCPM) GPA the same for the same person?

4.1 THE WRONG WAY!!!

Independent samples t-test
- Ignores that observations are from the same person

mean(MedGPA$GPA)

[1] 3.553273

mean(MedGPA$BCPM)

[1] 3.500545

t.test(x = MedGPA$GPA,
       y = MedGPA$BCPM,
       alternative = "two.sided",
       mu = 0, 
       conf.level = .95)


    Welch Two Sample t-test

data:  MedGPA$GPA and MedGPA$BCPM
t = 0.8646, df = 103.94, p-value = 0.3892
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.0682078  0.1736623
sample estimates:
mean of x mean of y 
 3.553273  3.500545

4.2 Matched-pairs t-test

mean(MedGPA$GPA)

[1] 3.553273

mean(MedGPA$BCPM)

[1] 3.500545

t.test(x = MedGPA$GPA,
       y = MedGPA$BCPM,
       paired = TRUE,
       alternative = "two.sided",
       mu = 0, 
       conf.level = .95)


    Paired t-test

data:  MedGPA$GPA and MedGPA$BCPM
t = 3.5129, df = 54, p-value = 0.0009044
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 0.02263517 0.08281938
sample estimates:
mean difference 
     0.05272727

4.3 Sign test

library(BSDA)

Loading required package: lattice


Attaching package: 'BSDA'

The following object is masked from 'package:datasets':

    Orange

median(MedGPA$GPA)

[1] 3.58

median(MedGPA$BCPM)

[1] 3.53

SIGN.test(x = MedGPA$GPA,
          y = MedGPA$BCPM,
          mu = 0,
          alternative = "two.sided",
          conf.level = 0.95)


    Dependent-samples Sign-Test

data:  MedGPA$GPA and MedGPA$BCPM
S = 36, p-value = 0.007787
alternative hypothesis: true median difference is not equal to 0
95 percent confidence interval:
 0.01000000 0.05287644
sample estimates:
median of x-y 
         0.04 

Achieved and Interpolated Confidence Intervals: 

                  Conf.Level L.E.pt U.E.pt
Lower Achieved CI     0.9419   0.01 0.0500
Interpolated CI       0.9500   0.01 0.0529
Upper Achieved CI     0.9700   0.01 0.0600

4.4 Signed rank test

median(MedGPA$GPA)

[1] 3.58

median(MedGPA$BCPM)

[1] 3.53

wilcox.test(x = MedGPA$GPA,
            y = MedGPA$BCPM,
            paired = TRUE,
            mu = 0,
            alternative = "two.sided",
            conf.level = 0.95)


    Wilcoxon signed rank test with continuity correction

data:  MedGPA$GPA and MedGPA$BCPM
V = 1034.5, p-value = 0.001671
alternative hypothesis: true location shift is not equal to 0

4.5 Compare results

What can we say about individual differences between total GPA and BCPM GPA?
- Are all the tests telling a similar story?
  - Why or why not?