BTS 510 Lab 8

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
set.seed(12345)
theme_set(theme_classic(base_size = 16))

1 Learning objectives

  • Describe the logic of hypothesis testing
  • Interpret tests comparing one sample to hypothesized value
  • Relate hypothesis testing to confidence intervals
  • Recognize when to use a nonparametric test

2 Data

  • Pulse dataset from the Stat2Data package
    • A dataset with n = 232 observations on the following 7 variables.
      • Active: Pulse rate (beats per minute) after exercise
      • Rest: Resting pulse rate (beats per minute)
      • Smoke: 1=smoker or 0=nonsmoker
      • Sex: 1=female or 0=male
      • Exercise: Typical hours of exercise (per week)
      • Hgt: Height (in inches)
      • Wgt: Weight (in pounds)
  • Convert the factor variables to factor variables, as in the lecture
    • as.factor() function
library(Stat2Data)
data(Pulse)
#Pulse$Smoke <- as.factor(Pulse$Smoke)
#Pulse$Sex <- as.factor(Pulse$Sex)
head(Pulse)
  Active Rest Smoke Sex Exercise Hgt Wgt
1     97   78     0   1        1  63 119
2     82   68     1   0        3  70 225
3     88   62     0   0        3  72 175
4    106   74     0   0        3  72 170
5     78   63     0   1        3  67 125
6    109   65     0   0        3  74 188

3 Tasks

  • Make plots of variables as needed

3.1 Gender split

  • Is the gender split in this sample the same as in the total population (i.e., 50/50)?
    • What kind of variable is this?
    • What test should you do?
    • Directional or non-directional?
    • Present the results of the test
    • Write your conclusions: Is the gender split in this sample comparable to that in the total population?
table(Pulse$Sex)

  0   1 
122 110 
binom.test(x = 110, 
           n = 232, 
           p = 0.5, 
           alternative = "two.sided")

    Exact binomial test

data:  110 and 232
number of successes = 110, number of trials = 232, p-value = 0.4703
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.4084248 0.5405202
sample estimates:
probability of success 
             0.4741379 
Note
  • sigma.x is the standard deviation
    • I accidentally used the variance in class
    • Remember that SD = \sqrt{variance}
library(BSDA)
Loading required package: lattice

Attaching package: 'BSDA'
The following object is masked from 'package:datasets':

    Orange
z.test(x = Pulse$Sex, 
       alternative = "two.sided",
       mu = 0.5, 
       sigma.x = sqrt(0.25),
       conf.level = .95)

    One-sample z-Test

data:  Pulse$Sex
z = -0.78784, p-value = 0.4308
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
 0.4097990 0.5384769
sample estimates:
mean of x 
0.4741379 

3.2 Smoking rate

  • Is the smoking rate in this sample the same as the 11.5% rate in the US? (CDC info on smoking here
    • What kind of variable is this?
    • What test should you do?
    • Directional or non-directional?
    • Present the results of the test
    • Write your conclusions: Is the smoking rate in this sample comparable to the 11.5% rate in the US?
table(Pulse$Smoke)

  0   1 
206  26 
binom.test(x = 26, 
           n = 232, 
           p = 0.115, 
           alternative = "two.sided")

    Exact binomial test

data:  26 and 232
number of successes = 26, number of trials = 232, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.115
95 percent confidence interval:
 0.07452657 0.15988360
sample estimates:
probability of success 
              0.112069 
Note
  • sigma.x is the standard deviation
    • I accidentally used the variance in class
    • Remember that SD = \sqrt{variance}
library(BSDA)
z.test(x = Pulse$Smoke, 
       alternative = "two.sided",
       mu = 0.115, 
       sigma.x = sqrt(0.115*(1-0.115)),
       conf.level = .95)

    One-sample z-Test

data:  Pulse$Smoke
z = -0.13994, p-value = 0.8887
alternative hypothesis: true mean is not equal to 0.115
95 percent confidence interval:
 0.07101788 0.15312005
sample estimates:
mean of x 
 0.112069 

3.3 Elevated pulse rate

  • Is the active pulse rate higher than the high end of resting pulse rate of 100 bpm?
    • What kind of variable is this?
    • What test should you do?
    • Directional or non-directional?
    • Present the results of the test
    • Write your conclusions: Is the active pulse rate different from the high end of resting pulse rate of 100 bpm?
t.test(x = Pulse$Active,
       alternative = "greater",
       mu = 100, 
       conf.level = .95)

    One Sample t-test

data:  Pulse$Active
t = -7.0432, df = 231, p-value = 1
alternative hypothesis: true mean is greater than 100
95 percent confidence interval:
 89.25683      Inf
sample estimates:
mean of x 
 91.29741 

3.4 Height among men

  • Is the height of men in this sample different from the US average of 5 feet 9 inches?
    • What kind of variable is this?
    • What test should you do?
    • Directional or non-directional?
    • Present the results of the test
    • Write your conclusions: Is the height of men in this sample different from the US average of 5 feet 9 inches?
PulseM <- Pulse %>%
    filter(Sex == 0)
t.test(x = PulseM$Hgt,
       alternative = "two.sided",
       mu = 69, 
       conf.level = .95)

    One Sample t-test

data:  PulseM$Hgt
t = 8.5883, df = 121, p-value = 3.622e-14
alternative hypothesis: true mean is not equal to 69
95 percent confidence interval:
 70.46958 71.35009
sample estimates:
mean of x 
 70.90984 
library(BSDA)
sd(PulseM$Hgt)
[1] 2.456242
z.test(x = PulseM$Hgt, 
       alternative = "two.sided",
       mu = 69, 
       sigma.x = 3,
       conf.level = .95)

    One-sample z-Test

data:  PulseM$Hgt
z = 7.0316, p-value = 2.042e-12
alternative hypothesis: true mean is not equal to 69
95 percent confidence interval:
 70.37750 71.44218
sample estimates:
mean of x 
 70.90984