Is active pulse rate higher among smokers than non-smokers?
library(BSDA)
Loading required package: lattice
Attaching package: 'BSDA'
The following object is masked from 'package:datasets':
Orange
ztest1 <-z.test(x = Pulse_smoke$Active, y = Pulse_nosmoke$Active, sigma.x =sd(Pulse_smoke$Active), sigma.y =sd(Pulse_nosmoke$Active), alternative ="greater")ztest1
Two-sample z-Test
data: Pulse_smoke$Active and Pulse_nosmoke$Active
z = 1.7668, p-value = 0.03863
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
0.4791124 NA
sample estimates:
mean of x mean of y
97.46154 90.51942
ttest1 <-t.test(x = Pulse_smoke$Active, y = Pulse_nosmoke$Active, alternative ="greater",var.equal =TRUE)ttest1
Two Sample t-test
data: Pulse_smoke$Active and Pulse_nosmoke$Active
t = 1.7806, df = 230, p-value = 0.03815
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
0.5034263 Inf
sample estimates:
mean of x mean of y
97.46154 90.51942
ttest1b <-t.test(x = Pulse_smoke$Active, y = Pulse_nosmoke$Active, alternative ="greater",var.equal =FALSE)ttest1b
Welch Two Sample t-test
data: Pulse_smoke$Active and Pulse_nosmoke$Active
t = 1.7668, df = 31.509, p-value = 0.04348
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
0.2833517 Inf
sample estimates:
mean of x mean of y
97.46154 90.51942
ztest2 <-z.test(x = Pulse_smoke$Wgt, y = Pulse_nosmoke$Wgt, sigma.x =sd(Pulse_smoke$Wgt), sigma.y =sd(Pulse_nosmoke$Wgt), alternative ="less")ztest2
Two-sample z-Test
data: Pulse_smoke$Wgt and Pulse_nosmoke$Wgt
z = 2.1167, p-value = 0.9829
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
NA 28.25999
sample estimates:
mean of x mean of y
172.0385 156.1359
ttest2 <-t.test(x = Pulse_smoke$Wgt, y = Pulse_nosmoke$Wgt, alternative ="less",var.equal =TRUE)ttest2
Two Sample t-test
data: Pulse_smoke$Wgt and Pulse_nosmoke$Wgt
t = 2.4256, df = 230, p-value = 0.992
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf 26.73016
sample estimates:
mean of x mean of y
172.0385 156.1359
ttest2b <-t.test(x = Pulse_smoke$Wgt, y = Pulse_nosmoke$Wgt, alternative ="less",var.equal =FALSE)ttest2b
Welch Two Sample t-test
data: Pulse_smoke$Wgt and Pulse_nosmoke$Wgt
t = 2.1167, df = 29.613, p-value = 0.9786
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf 28.65903
sample estimates:
mean of x mean of y
172.0385 156.1359
Do smokers and non-smokers exercise the same amount?
ztest3 <-z.test(x = Pulse_smoke$Exercise, y = Pulse_nosmoke$Exercise, sigma.x =sd(Pulse_smoke$Exercise), sigma.y =sd(Pulse_nosmoke$Exercise), alternative ="two.sided")ztest3
Two-sample z-Test
data: Pulse_smoke$Exercise and Pulse_nosmoke$Exercise
z = -3.4643, p-value = 0.0005317
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.7875596 -0.2184150
sample estimates:
mean of x mean of y
1.807692 2.310680
ttest3 <-t.test(x = Pulse_smoke$Exercise, y = Pulse_nosmoke$Exercise, alternative ="two.sided",var.equal =TRUE)ttest3
Two Sample t-test
data: Pulse_smoke$Exercise and Pulse_nosmoke$Exercise
t = -3.3437, df = 230, p-value = 0.0009651
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.7993817 -0.2065929
sample estimates:
mean of x mean of y
1.807692 2.310680
ttest3b <-t.test(x = Pulse_smoke$Exercise, y = Pulse_nosmoke$Exercise, alternative ="two.sided",var.equal =FALSE)ttest3b
Welch Two Sample t-test
data: Pulse_smoke$Exercise and Pulse_nosmoke$Exercise
t = -3.4643, df = 32.314, p-value = 0.001521
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.7986223 -0.2073523
sample estimates:
mean of x mean of y
1.807692 2.310680
library(coin)
Loading required package: survival
median_test(Exercise ~as.factor(Smoke), data = Pulse)
Asymptotic Two-Sample Brown-Mood Median Test
data: Exercise by as.factor(Smoke) (0, 1)
Z = 3.0223, p-value = 0.002509
alternative hypothesis: true mu is not equal to 0
wilcox.test(Exercise ~as.factor(Smoke), data = Pulse)
Wilcoxon rank sum test with continuity correction
data: Exercise by as.factor(Smoke)
W = 3657, p-value = 0.001026
alternative hypothesis: true location shift is not equal to 0
Source Code
---title: "BTS 510 Lab 9"format: html: embed-resources: true self-contained-math: true html-math-method: katex number-sections: true toc: true code-tools: true code-block-bg: true code-block-border-left: "#31BAE9"---```{r}#| label: setuplibrary(tidyverse)set.seed(12345)theme_set(theme_classic(base_size =16))```## Learning objectives* *Interpret* tests comparing **two unrelated samples*** *Summarize data* using **contingency tables*** *Describe* different study designs for **contingency tables** ## Data * `Pulse` dataset from the **Stat2Data** package * A dataset with $n$ = 232 observations on the following 7 variables. * `Active`: Pulse rate (beats per minute) after exercise * `Rest`: Resting pulse rate (beats per minute) * `Smoke`: 1=smoker or 0=nonsmoker * `Sex`: 1=female or 0=male * `Exercise`: Typical hours of exercise (per week) * `Hgt`: Height (in inches) * `Wgt`: Weight (in pounds)## Tasks* Make plots of variables as needed (e.g., to assess assumptions)* Conduct a $z$-test, $t$-test, and Welch's $t$-test * What is/are your conclusion(s) based on the tests? * Are the assumptions met? * e.g., large enough sample to justify $z$ test using sample variance * e.g., equal variances in both groups * Which test seems the best choice? (Don't make this decision based on what is significant -- here or elsewhere) * Do you think a non-parametric test might be a good option?### Some useful code* To split the dataset into `Smoke` = 0 and `Smoke` = 1 * There are other ways to do this, so you don't *need* to use this code```{r}library(Stat2Data)data(Pulse)library(tidyverse)Pulse_smoke <- Pulse %>%filter(Smoke ==1)Pulse_nosmoke <- Pulse %>%filter(Smoke ==0)head(Pulse_smoke)head(Pulse_nosmoke)```* Use `alternative = "greater"` if $H_1$: $\mu_1 > \mu_2$ * Use `alternative = "less"` if $H_1$: $\mu_1 < \mu_2$ * Where $\mu_1$ is the mean for the first-entered group (`x`) * The order you enter them (`x` vs `y`) doesn't matter, **just make sure you set up the directional hypothesis accordingly**### Active pulse rate```{r}mean(Pulse_smoke$Active)mean(Pulse_nosmoke$Active)var(Pulse_smoke$Active)var(Pulse_nosmoke$Active)ggplot(data = Pulse_nosmoke, aes(x = Active)) +geom_histogram(fill ="red", alpha =0.5, bins =30) +geom_histogram(data = Pulse_smoke, aes(x = Active), fill ="black", bins =30)```* Is active pulse rate **higher** among smokers than non-smokers?```{r}library(BSDA)ztest1 <-z.test(x = Pulse_smoke$Active, y = Pulse_nosmoke$Active, sigma.x =sd(Pulse_smoke$Active), sigma.y =sd(Pulse_nosmoke$Active), alternative ="greater")ztest1ttest1 <-t.test(x = Pulse_smoke$Active, y = Pulse_nosmoke$Active, alternative ="greater",var.equal =TRUE)ttest1ttest1b <-t.test(x = Pulse_smoke$Active, y = Pulse_nosmoke$Active, alternative ="greater",var.equal =FALSE)ttest1b```### Weight```{r}mean(Pulse_smoke$Wgt)mean(Pulse_nosmoke$Wgt)var(Pulse_smoke$Wgt)var(Pulse_nosmoke$Wgt)ggplot(data = Pulse_nosmoke, aes(x = Wgt)) +geom_histogram(fill ="red", alpha =0.5, bins =30) +geom_histogram(data = Pulse_smoke, aes(x = Wgt), fill ="black", bins =30)```* Do smokers weight **less than** non-smokers?```{r}ztest2 <-z.test(x = Pulse_smoke$Wgt, y = Pulse_nosmoke$Wgt, sigma.x =sd(Pulse_smoke$Wgt), sigma.y =sd(Pulse_nosmoke$Wgt), alternative ="less")ztest2ttest2 <-t.test(x = Pulse_smoke$Wgt, y = Pulse_nosmoke$Wgt, alternative ="less",var.equal =TRUE)ttest2ttest2b <-t.test(x = Pulse_smoke$Wgt, y = Pulse_nosmoke$Wgt, alternative ="less",var.equal =FALSE)ttest2b```### Exercise```{r}mean(Pulse_smoke$Exercise)mean(Pulse_nosmoke$Exercise)var(Pulse_smoke$Exercise)var(Pulse_nosmoke$Exercise)ggplot(data = Pulse_nosmoke, aes(x = Exercise)) +geom_histogram(fill ="red", alpha =0.5, bins =30) +geom_histogram(data = Pulse_smoke, aes(x = Exercise), fill ="black", bins =30)```* Do smokers and non-smokers exercise the **same** amount?```{r}ztest3 <-z.test(x = Pulse_smoke$Exercise, y = Pulse_nosmoke$Exercise, sigma.x =sd(Pulse_smoke$Exercise), sigma.y =sd(Pulse_nosmoke$Exercise), alternative ="two.sided")ztest3ttest3 <-t.test(x = Pulse_smoke$Exercise, y = Pulse_nosmoke$Exercise, alternative ="two.sided",var.equal =TRUE)ttest3ttest3b <-t.test(x = Pulse_smoke$Exercise, y = Pulse_nosmoke$Exercise, alternative ="two.sided",var.equal =FALSE)ttest3b``````{r}library(coin)median_test(Exercise ~as.factor(Smoke), data = Pulse)wilcox.test(Exercise ~as.factor(Smoke), data = Pulse)```