BTS 510 Lab 7

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
set.seed(12345)
theme_set(theme_classic(base_size = 16))

1 Learning objectives

  • Compare and contrast point estimates and interval estimates
  • Construct confidence intervals
  • Interpret confidence intervals

2 Vary population distribution, sample size, confidence interval level

2.1 Normally distributed: population \sim \mathcal{N}(\mu = 10, \sigma^2 = 25)

  • Two sample sizes: 30 and 100
  • Note that rnorm() takes standard deviation as an argument, not variance
norm30 <- rnorm(n = 30, mean = 10, sd = 5)
norm100 <- rnorm(n = 100, mean = 10, sd = 5)
  • Calculate the means, variances, and standard deviations for each sample
mean(norm30)
[1] 10.39404
mean(norm100)
[1] 11.11517
var(norm30)
[1] 22.00669
var(norm100)
[1] 34.24083
sd(norm30)
[1] 4.691129
sd(norm100)
[1] 5.851567
  • Construct the 95% and 99% confidence intervals for each mean

[\bar{X} - z_{1-\alpha/2}\ \sqrt{\frac{\sigma^2}{n}}, \bar{X} + z_{1-\alpha/2}\ \sqrt{\frac{\sigma^2}{n}}]

  • I’m just having R make the calculations for me, rather than doing by hand
    • I’m also doing it in several steps so you can see where each piece comes from, but you can do it all in one step if you’re confident
norm30_95moe <- 1.96 * sqrt(var(norm30)/30)
norm30_95lcl <- mean(norm30) - norm30_95moe
norm30_95ucl <- mean(norm30) + norm30_95moe

norm30_99moe <- 2.326 * sqrt(var(norm30)/30)
norm30_99lcl <- mean(norm30) - norm30_99moe
norm30_99ucl <- mean(norm30) + norm30_99moe  

norm100_95moe <- 1.96 * sqrt(var(norm100)/100)
norm100_95lcl <- mean(norm100) - norm100_95moe
norm100_95ucl <- mean(norm100) + norm100_95moe

norm100_99moe <- 2.326 * sqrt(var(norm100)/100)
norm100_99lcl <- mean(norm100) - norm100_99moe
norm100_99ucl <- mean(norm100) + norm100_99moe  
  • You can look at them in the output, or you can have R print them in your Quarto document
  • (Look at the code to see how this was done)
    • n = 30, 95%: [8.715, 12.073]
    • n = 30, 99%: [8.402, 12.386]
    • n = 100, 95%: [9.968, 12.262]
    • n = 100, 99%: [9.754, 12.476]
  • How do the confidence intervals vary with 1) sample size (i.e., n = 30 vs n = 100) and 2) confidence level (i.e., 95% vs 99%)?

2.2 Bernoulli distributed: population \sim B(0.3)

  • Two sample sizes: 30 and 100
bern30 <- rbinom(30, 1, 0.3)
bern100 <- rbinom(100, 1, 0.3)
ggplot(data = data.frame(bern30), aes(x = bern30)) +
    geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  • Calculate the means, variances, and standard deviations for each sample
mean(bern30)
[1] 0.4
sd(bern30)
[1] 0.4982729
var(bern30)
[1] 0.2482759
mean(bern100)
[1] 0.33
sd(bern100)
[1] 0.4725816
var(bern100)
[1] 0.2233333
  • Construct the 95% and 99% confidence intervals for each mean

[\bar{X} - z_{1-\alpha/2}\ \sqrt{\frac{\sigma^2}{n}}, \bar{X} + z_{1-\alpha/2}\ \sqrt{\frac{\sigma^2}{n}}]

bern30_95moe <- 1.96 * sqrt(var(bern30)/30)
bern30_95lcl <- mean(bern30) - bern30_95moe
bern30_95ucl <- mean(bern30) + bern30_95moe

bern30_99moe <- 2.326 * sqrt(var(bern30)/30)
bern30_99lcl <- mean(bern30) - bern30_99moe
bern30_99ucl <- mean(bern30) + bern30_99moe  

bern100_95moe <- 1.96 * sqrt(var(bern100)/100)
bern100_95lcl <- mean(bern100) - bern100_95moe
bern100_95ucl <- mean(bern100) + bern100_95moe

bern100_99moe <- 2.326 * sqrt(var(bern100)/100)
bern100_99lcl <- mean(bern100) - bern100_99moe
bern100_99ucl <- mean(bern100) + bern100_99moe  
  • You can look at them in the output, or you can have R print them in your Quarto document
  • (Look at the code to see how this was done)
    • n = 30, 95%: [0.222, 0.578]
    • n = 30, 99%: [0.188, 0.612]
    • n = 100, 95%: [0.237, 0.423]
    • n = 100, 99%: [0.22, 0.44]
  • How do the confidence intervals vary with 1) sample size (i.e., n = 30 vs n = 100) and 2) confidence level (i.e., 95% vs 99%)?

2.3 Binomial distributed: population \sim Bin(50, 0.3)

  • Two sample sizes: 30 and 100
binom30 <- rbinom(30, 50, 0.3)
binom100 <- rbinom(100, 50, 0.3)
  • Calculate the means, variances, and standard deviations for each sample
  • Construct the 95% and 99% confidence intervals for each mean

[\bar{X} - z_{1-\alpha/2}\ \sqrt{\frac{\sigma^2}{n}}, \bar{X} + z_{1-\alpha/2}\ \sqrt{\frac{\sigma^2}{n}}]

  • How do the confidence intervals vary with 1) sample size (i.e., n = 30 vs n = 100) and 2) confidence level (i.e., 95% vs 99%)?

2.4 Poisson distributed: population \sim Poisson(1.5)

  • Two sample sizes: 30 and 100
pois30 <- rpois(n = 30, lambda = 1.5)
pois100 <- rpois(n = 30, lambda = 1.5)
  • Calculate the means, variances, and standard deviations for each sample
  • Construct the 95% and 99% confidence intervals for each mean

[\bar{X} - z_{1-\alpha/2}\ \sqrt{\frac{\sigma^2}{n}}, \bar{X} + z_{1-\alpha/2}\ \sqrt{\frac{\sigma^2}{n}}]

  • How do the confidence intervals vary with 1) sample size (i.e., n = 30 vs n = 100) and 2) confidence level (i.e., 95% vs 99%)?

3 Summary

  • How do the confidence intervals vary with sample size?

  • How do the confidence intervals vary with confidence level?

  • How do the confidence intervals vary with population distribution?

  • Did you notice anything else?