BTS 510 Lab 8

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

set.seed(12345)
theme_set(theme_classic(base_size = 16))

1 Learning objectives

Describe the logic of hypothesis testing
Interpret tests comparing one sample to hypothesized value
Relate hypothesis testing to confidence intervals
Recognize when to use a nonparametric test

2 Data

Pulse dataset from the Stat2Data package
- A dataset with n = 232 observations on the following 7 variables.
  - Active: Pulse rate (beats per minute) after exercise
  - Rest: Resting pulse rate (beats per minute)
  - Smoke: 1=smoker or 0=nonsmoker
  - Sex: 1=female or 0=male
  - Exercise: Typical hours of exercise (per week)
  - Hgt: Height (in inches)
  - Wgt: Weight (in pounds)
Convert the factor variables to factor variables, as in the lecture
- as.factor() function

library(Stat2Data)
data(Pulse)
#Pulse$Smoke <- as.factor(Pulse$Smoke)
#Pulse$Sex <- as.factor(Pulse$Sex)
head(Pulse)

  Active Rest Smoke Sex Exercise Hgt Wgt
1     97   78     0   1        1  63 119
2     82   68     1   0        3  70 225
3     88   62     0   0        3  72 175
4    106   74     0   0        3  72 170
5     78   63     0   1        3  67 125
6    109   65     0   0        3  74 188

3 Tasks

Make plots of variables as needed

3.1 Gender split

Is the gender split in this sample the same as in the total population (i.e., 50/50)?
- What kind of variable is this?
- What test should you do?
- Directional or non-directional?
- Present the results of the test
- Write your conclusions: Is the gender split in this sample comparable to that in the total population?

table(Pulse$Sex)


  0   1 
122 110

binom.test(x = 110, 
           n = 232, 
           p = 0.5, 
           alternative = "two.sided")


    Exact binomial test

data:  110 and 232
number of successes = 110, number of trials = 232, p-value = 0.4703
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.4084248 0.5405202
sample estimates:
probability of success 
             0.4741379

Note

sigma.x is the standard deviation
- I accidentally used the variance in class
- Remember that SD = \sqrt{variance}

library(BSDA)

Loading required package: lattice


Attaching package: 'BSDA'

The following object is masked from 'package:datasets':

    Orange

z.test(x = Pulse$Sex, 
       alternative = "two.sided",
       mu = 0.5, 
       sigma.x = sqrt(0.25),
       conf.level = .95)


    One-sample z-Test

data:  Pulse$Sex
z = -0.78784, p-value = 0.4308
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
 0.4097990 0.5384769
sample estimates:
mean of x 
0.4741379

3.2 Smoking rate

Is the smoking rate in this sample the same as the 11.5% rate in the US? (CDC info on smoking here
- What kind of variable is this?
- What test should you do?
- Directional or non-directional?
- Present the results of the test
- Write your conclusions: Is the smoking rate in this sample comparable to the 11.5% rate in the US?

table(Pulse$Smoke)


  0   1 
206  26

binom.test(x = 26, 
           n = 232, 
           p = 0.115, 
           alternative = "two.sided")


    Exact binomial test

data:  26 and 232
number of successes = 26, number of trials = 232, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.115
95 percent confidence interval:
 0.07452657 0.15988360
sample estimates:
probability of success 
              0.112069

Note

sigma.x is the standard deviation
- I accidentally used the variance in class
- Remember that SD = \sqrt{variance}

library(BSDA)
z.test(x = Pulse$Smoke, 
       alternative = "two.sided",
       mu = 0.115, 
       sigma.x = sqrt(0.115*(1-0.115)),
       conf.level = .95)


    One-sample z-Test

data:  Pulse$Smoke
z = -0.13994, p-value = 0.8887
alternative hypothesis: true mean is not equal to 0.115
95 percent confidence interval:
 0.07101788 0.15312005
sample estimates:
mean of x 
 0.112069

3.3 Elevated pulse rate

Is the active pulse rate higher than the high end of resting pulse rate of 100 bpm?
- What kind of variable is this?
- What test should you do?
- Directional or non-directional?
- Present the results of the test
- Write your conclusions: Is the active pulse rate different from the high end of resting pulse rate of 100 bpm?

t.test(x = Pulse$Active,
       alternative = "greater",
       mu = 100, 
       conf.level = .95)


    One Sample t-test

data:  Pulse$Active
t = -7.0432, df = 231, p-value = 1
alternative hypothesis: true mean is greater than 100
95 percent confidence interval:
 89.25683      Inf
sample estimates:
mean of x 
 91.29741

3.4 Height among men

Is the height of men in this sample different from the US average of 5 feet 9 inches?
- What kind of variable is this?
- What test should you do?
- Directional or non-directional?
- Present the results of the test
- Write your conclusions: Is the height of men in this sample different from the US average of 5 feet 9 inches?

PulseM <- Pulse %>%
    filter(Sex == 0)

t.test(x = PulseM$Hgt,
       alternative = "two.sided",
       mu = 69, 
       conf.level = .95)


    One Sample t-test

data:  PulseM$Hgt
t = 8.5883, df = 121, p-value = 3.622e-14
alternative hypothesis: true mean is not equal to 69
95 percent confidence interval:
 70.46958 71.35009
sample estimates:
mean of x 
 70.90984

library(BSDA)
sd(PulseM$Hgt)

[1] 2.456242

z.test(x = PulseM$Hgt, 
       alternative = "two.sided",
       mu = 69, 
       sigma.x = 3,
       conf.level = .95)


    One-sample z-Test

data:  PulseM$Hgt
z = 7.0316, p-value = 2.042e-12
alternative hypothesis: true mean is not equal to 69
95 percent confidence interval:
 70.37750 71.44218
sample estimates:
mean of x 
 70.90984