Introduction to Biostatistics

1 Learning objectives

  • Describe the logic of hypothesis testing
  • Interpret tests comparing one sample to hypothesized value
  • Relate hypothesis testing to confidence intervals
  • Recognize when to use a nonparametric test

2 Hypothesis testing

2.1 History of hypothesis testing

  • William Gossett (Student)
    • \(t\) distribution, \(t\)-test
  • Sir Ronald Fisher
    • “Significance testing”
    • \(p\)-values, \(\alpha\) = .05, \(\beta\) = .2, degrees of freedom
  • Jerzy Neyman, Karl Pearson, Egon Pearson
    • “Hypothesis testing”

2.2 Null and alternative hypotheses

  • Null hypothesis = no effect in population
    • \(H_0\)
  • Alternative hypothesis = effect in population
    • \(H_1\) or \(H_A\)
  • If there is enough evidence, we can reject \(H_0\)
    • Otherwise, we retain or accept \(H_0\)
    • We cannot reject \(H_A\)

2.3 Directional vs non-directional tests

  • Directional (one-tailed) tests
    • \(H_0\): \(\mu \le \mu_0\)
    • \(H_1\): \(\mu > \mu_0\)
  • Non-directional (two-tailed) tests
    • \(H_0\): \(\mu = \mu_0\)
    • \(H_1\): \(\mu \ne \mu_0\)

2.4 95% One tailed vs two tailed

  • One-tailed test (\(H_1: \mu > 0\))
ggplot(data.frame(x = c(-4, 4)), aes(x)) +
  stat_function(fun = dnorm,
                geom = "line",
                xlim = c(-4, 4)) +
  stat_function(fun = dnorm,
                geom = "area",
                fill = "steelblue",
                xlim = c(-4, 1.65)) +
  annotate("text", x = 0, y = 0.1, label = "95%", color = "white", size = 6) +
  annotate("text", x = 3, y = 0.1, label = "5%", size = 6) +
  geom_vline(xintercept = 1.65, linetype = "dashed", linewidth = 1) +
  annotate("text", x = 2.25, y = 0.35, label = "z=1.65", size = 6) +
  xlim(-4, 4) +
  labs(x = "z", y = "f(z)")

  • Two-tailed test (\(H_1: \mu \ne 0\))
ggplot(data.frame(x = c(-4, 4)), aes(x)) +
  stat_function(fun = dnorm,
                geom = "line",
                xlim = c(-4, 4)) +
  stat_function(fun = dnorm,
                geom = "area",
                fill = "steelblue",
                xlim = c(-1.96, 1.96)) +
  annotate("text", x = 0, y = 0.1, label = "95%", color = "white", size = 6) +
  annotate("text", x = 3, y = 0.1, label = "2.5%", size = 6) +
  annotate("text", x = -3, y = 0.1, label = "2.5%", size = 6) +
  geom_vline(xintercept = 1.96, linetype = "dashed", linewidth = 1) +
  geom_vline(xintercept = -1.96, linetype = "dashed", linewidth = 1) +
  annotate("text", x = 2.6, y = 0.35, label = "z=+1.96", size = 6) +
  annotate("text", x = -2.6, y = 0.35, label = "z=-1.96", size = 6) +
  xlim(-4, 4) +
  labs(x = "z", y = "f(z)")

2.5 Critical values and decisions

  • Critical value: Distribution value that puts \(\alpha\) in the tail(s)
    • i.e., \(\pm\) 1.96 for a two-tailed \(z\)-test, 1.65 for a one-tailed \(z\)-test
  • If the test statistic is more extreme than the critical value
    • Reject \(H_0\)
  • Otherwise, retain \(H_0\) or accept \(H_0\)

2.6 Rejection region(s)

  • One-tailed test (\(H_1: \mu > 0\))
ggplot(data.frame(x = c(-4, 4)), aes(x)) +
  stat_function(fun = dnorm,
                geom = "line",
                xlim = c(-4, 4)) +
  stat_function(fun = dnorm,
                geom = "area",
                fill = "steelblue",
                xlim = c(-4, 1.65)) +
  annotate("text", x = 0, y = 0.1, label = "95%", color = "white", size = 6) +
  annotate("text", x = 3, y = 0.1, label = "5%", size = 6) +
  geom_vline(xintercept = 1.65, linetype = "dashed", linewidth = 1) +
  annotate("text", x = 2.25, y = 0.35, label = "z=1.65", size = 6) +
  annotate("text", x = 3, y = 0.25, label = "Rejection \nregion", size = 6) +
  xlim(-4, 4) +
  labs(x = "z", y = "f(z)")

  • Two-tailed test (\(H_1: \mu \ne 0\))
ggplot(data.frame(x = c(-4, 4)), aes(x)) +
  stat_function(fun = dnorm,
                geom = "line",
                xlim = c(-4, 4)) +
  stat_function(fun = dnorm,
                geom = "area",
                fill = "steelblue",
                xlim = c(-1.96, 1.96)) +
  annotate("text", x = 0, y = 0.1, label = "95%", color = "white", size = 6) +
  annotate("text", x = 3, y = 0.1, label = "2.5%", size = 6) +
  annotate("text", x = -3, y = 0.1, label = "2.5%", size = 6) +
  geom_vline(xintercept = 1.96, linetype = "dashed", linewidth = 1) +
  geom_vline(xintercept = -1.96, linetype = "dashed", linewidth = 1) +
  annotate("text", x = 2.6, y = 0.35, label = "z=+1.96", size = 6) +
  annotate("text", x = -2.6, y = 0.35, label = "z=-1.96", size = 6) +
  annotate("text", x = 3, y = 0.25, label = "Rejection \nregion", size = 6) +
  annotate("text", x = -3, y = 0.25, label = "Rejection \nregion", size = 6) +
  xlim(-4, 4) +
  labs(x = "z", y = "f(z)")

2.7 Statistical errors

  • Any statistical decision has some probability of being an error
    • Type I error: Detecting an effect in the sample that doesn’t exist in the population
    • Type II error: Not detecting an effect in the sample that does exist in the population
  Effect in population No effect in population
Effect in sample Correct Type I error (\(\alpha\))
No effect in sample Type II error (\(\beta\)) Correct

2.8 Alpha (\(\alpha\))

  • \(\alpha\) = Probability of type I error
    • P(incorrectly detecting an effect that doesn’t exist in the population)
  • Typical \(\alpha\) values: .05, .01, .001
    • Correspond to 95%, 99%, 99.9% confidence levels, respectively
  • Fisher: Type I error should happen 1 time in 20 \(\rightarrow\) \(\alpha\) = .05

2.9 Beta (\(\beta\))

  • \(\beta\) = Probability of type II error
    • P(incorrectly missing an effect that exists in the population)
  • Statistical power = \(1 - \beta\)
    • Probability of correctly detecting an effect that does exist
  • Typical \(\beta\) values: .2, .1
    • Correspond to .8, .9 power, respectively
  • Fisher: Type II:Type I ratio = 4 \(\rightarrow\) 0.2:0.05 = 4

3 One sample tests

3.1 One sample tests

  • Compare one sample mean to a hypothesized population mean value
  • Reject \(H_0\)
    • Sample is unlikely to have come from a population with that mean
  • Retain \(H_0\)
    • Sample could have come from a population with that mean
  • Three tests here: \(z\)-test, \(t\)-test, binomial test

3.2 \(z\)-test: Assumptions

  • Data are continuous (i.e., ratio or interval)
  • Data are randomly sampled from the population
  • Data are independent
  • Data are approximately normally distributed OR sample size is large enough for normally distributed sampling distribution (central limit theorem)
  • Population variance (or standard deviation) is known

3.3 \(z\)-test: Hypotheses

  • Directional (one-tailed) tests
    • \(H_0\): \(\mu \le \mu_0\)
    • \(H_1\): \(\mu > \mu_0\)
  • Non-directional (two-tailed) tests
    • \(H_0\): \(\mu = \mu_0\)
    • \(H_1\): \(\mu \ne \mu_0\)

3.4 \(z\)-test: Test statistic

\[z = \frac{\bar{X} - \mu_0}{\sigma /\sqrt{n}}\]

  • Sample mean minus hypothesized population mean, divided by standard error
    • Standard error is the standard deviation of the sampling distribution

3.5 \(z\)-test: Decision

  • Determine critical value for \(\alpha\)
    • e.g., 1.96 for 2-tailed test at \(\alpha = .05\)
  • If observed \(z\) statistic is more extreme than the critical value
    • Reject \(H_0\)
  • Otherwise, retain \(H_0\)

3.6 \(z\)-test: Example 1

  • Sample of \(n\) = 50
  • Known population SD = 15
  • Does this sample come from a population with a mean of 100?
  • Two-tailed test
    • \(H_0\): \(\mu = 100\)
    • \(H_1\): \(\mu \ne 100\)
sample1 <- rnorm(n = 50, mean = 103, sd = 15)

3.7 \(z\)-test: Example 2

\[z = \frac{\bar{X} - \mu_0}{\sigma /\sqrt{n}}\]

[1] 105.6935
z <- (mean(sample1) - 100) / (15 / sqrt(50)) 
[1] 2.683939
  • Two tailed test, \(\alpha\) = .05: Critical value = 1.96
    • Is the observed test statistic greater than +1.96 or lower than -1.96?
    • If yes, reject \(H_0\)

3.8 \(z\)-test: Example 3

  • Another option: z.test() function from BSDA (Basic Statistics and Data Analysis) package
    • x: Data
    • alternative: “two.tailed” (default), “greater”, or “less”
    • mu: Hypothesized population mean (\(\mu_0\))
    • sigma.x: Known population standard deviation
    • conf.level: Confidence level (default = .95)

3.9 \(z\)-test: Example 4

z.test(x = sample1, 
       alternative = "two.sided",
       mu = 100, 
       sigma.x = 15,
       conf.level = .95)

    One-sample z-Test

data:  sample1
z = 2.6839, p-value = 0.007276
alternative hypothesis: true mean is not equal to 100
95 percent confidence interval:
 101.5358 109.8512
sample estimates:
mean of x 

3.10 \(z\)-test: Report results

  • \(p\)-value < .05 and 95% confidence interval don’t contain 100
    • Reject \(H_0: \mu = 100\)
    • This sample came from a population with a different mean
  • Using a one sample \(z\)-test, we rejected the null hypothesis that \(\mu\) = 100, \(z\) = 2.684, \(p\) = 0.007.

3.11 What is the \(p\)-value?

  • P(detecting an effect this large or larger if the null hypothesis is true)
mu0 <- paste0("mu[0]")
ggplot(data.frame(x = c(90, 110)), aes(x)) +
  stat_function(fun = dnorm,
                args = list(mean = 100, sd = 15/sqrt(50)),
                geom = "line",
                xlim = c(90, 110)) +
  stat_function(fun = dnorm,
                args = list(mean = 100, sd = 15/sqrt(50)),
                geom = "area",
                fill = "steelblue",
                xlim = c(100 - 1.96*15/sqrt(50), 100 + 1.96*15/sqrt(50))) +
  stat_function(fun = dnorm,
                args = list(mean = 100, sd = 15/sqrt(50)),
                geom = "area",
                fill = "red",
                xlim = c(mean(sample1), 110)) +
  geom_vline(xintercept = mean(sample1), linewidth = 1) +
  annotate("text", x = mean(sample1) - 2, y = 0.15, label = "Observed \nmean", size = 6) +
  annotate("text", x = mean(sample1) + 2, y = 0.02, label = "p-value", size = 6, color = "red") +
  geom_vline(xintercept = 100, linewidth = 1, color = "white", linetype = "dashed") +
  annotate("text", x = 99, y = 0.05, label = mu0, size = 6, color = "white", parse = TRUE) +
  xlim(90, 110) +
  labs(x = "X", y = "f(x)")

3.12 \(t\)-test: Assumptions

  • Data are continuous (i.e., ratio or interval)
  • Data are randomly sampled from the population
  • Data are independent
  • Data are approximately normally distributed OR sample size is large enough for normally distributed sampling distribution (central limit theorem)
  • Population variance (or standard deviation) is unknown

3.13 Unknown population variance

  • Population variance is not known when using a \(t\)-test
    • So we estimate it with the sample variance
Estimate Definition Computational
Population variance (\(\sigma^2\)) \(\frac{\sum (X - \mu)^2}{N}\) \(\frac{\sum X^2 - \frac{(\sum X)^2}{N}}{N}\)
Sample variance (\(s^2\)) \(\frac{\sum (X - \overline{X})^2}{n-1}\) \(\frac{\sum X^2 - \frac{(\sum X)^2}{n}}{n-1}\)

3.14 \(t\)-test: Hypotheses

  • Directional (one-tailed) tests
    • \(H_0\): \(\mu \le \mu_0\)
    • \(H_1\): \(\mu > \mu_0\)
  • Non-directional (two-tailed) tests
    • \(H_0\): \(\mu = \mu_0\)
    • \(H_1\): \(\mu \ne \mu_0\)

3.15 \(t\)-test: Test statistic

\[t = \frac{\bar{X} - \mu_0}{s /\sqrt{n}}\]

  • Sample mean minus hypothesized population mean, divided by standard error
    • Standard error is the standard deviation of the sampling distribution

3.16 Degrees of freedom

  • How many independent pieces of information do we have?
    • In general, this is the sample size (\(n\))
  • But if we estimate something (e.g., \(\bar{X}\)) and then use that to estimate something else (e.g., \(s^2\)), we have less information
    • Once you know \(n - 1\) values and the mean, you know everything
  • Degrees of freedom quantify how much independent information

3.17 Degrees of freedom: \(t\) distribution

ggplot(data.frame(x = c(-3, 3)), aes(x)) +
  stat_function(fun = dnorm, args = list(mean = 0, sd = 1), geom = "line", color = "black", linewidth = 1) +
  stat_function(fun = dt, args = list(df = 1, ncp = 0), geom = "line", color = "red", linewidth = 1, linetype = "dashed") +
  stat_function(fun = dt, args = list(df = 30, ncp = 0), geom = "line", color = "darkgreen", linewidth = 1, linetype = "dashed") +
  ylim(0,.5) + 
  scale_x_continuous(breaks = -3:3) +
  annotate("text", x = 0, y = 0.29, label = "t(1)", color = "red", size = 6) +
  annotate("text", x = 0, y = 0.37, label = "t(30)", color = "darkgreen", size = 6) +
  annotate("text", x = 0, y = 0.43, label = "z", color = "black", size = 6) +
  labs(x = "t", y = "f(t)") 

3.18 \(t\)-test: Decision

  • Determine critical value for \(\alpha\) and degrees of freedom
    • Degrees of freedom = \(n - 1\)
  • If observed \(t\) statistic is more extreme than the critical value
    • Reject \(H_0\)
  • Otherwise, retain \(H_0\)

3.19 \(t\)-test: Example 1

  • Same data as for \(z\)-test
  • Sample of \(n\) = 50
  • Unknown population SD
  • Does this sample come from a population with a mean of 100?
  • Two-tailed test
    • \(H_0\): \(\mu = 100\)
    • \(H_1\): \(\mu \ne 100\)

3.20 \(t\)-test: Example 2

\[t = \frac{\bar{X} - \mu_0}{s /\sqrt{n}}\]

[1] 105.6935
t <- (mean(sample1) - 100) / (sd(sample1) / sqrt(50)) 
[1] 2.447541
  • Two tailed test, \(\alpha\) = .05, df = 49: Critical value = 2.01
    • Is the observed test statistic greater than +2.01 or lower than -2.01?
    • If yes, reject \(H_0\)

3.21 \(t\)-test: Example 3

  • Another option: t.test() function in stats package
    • x: Data
    • alternative: “two.sided” (default), “greater”, or “less”
    • mu: Hypothesized population mean (\(\mu_0\))
    • conf.level: Confidence level (default = .95)

3.22 \(t\)-test: Example 4

t.test(x = sample1,
       alternative = "two.sided",
       mu = 100, 
       conf.level = .95)

    One Sample t-test

data:  sample1
t = 2.4475, df = 49, p-value = 0.01801
alternative hypothesis: true mean is not equal to 100
95 percent confidence interval:
 101.0188 110.3682
sample estimates:
mean of x 

3.23 \(t\)-test: Report results

  • \(p\)-value < .05 and 95% confidence interval don’t contain 100
    • Reject \(H_0: \mu = 100\)
    • This sample came from a population with a different mean
  • Using a one sample \(t\)-test, we rejected the null hypothesis that \(\mu\) = 100, \(t(49)\) = 2.448, \(p\) = 0.018

3.24 Compare \(z\)-test and \(t\)-test

  \(z\)-test \(t\)-test
Sample mean 105.69 105.69
Population SD 15 (known) 16.45 (estimated)
Test statistic 2.684 2.448
Degrees of freedom N/A 49
Critical value 1.96 2.01
\(p\)-value 0.007 0.018
Decision Reject \(H_0\) Reject \(H_0\)

3.25 \(\mathcal{N}(0, 1)\) vs \(t(49)\)

ggplot(data.frame(x = c(-3, 3)), aes(x)) +
  stat_function(fun = dnorm, args = list(mean = 0, sd = 1), geom = "line", color = "black", linewidth = 1) +
  #stat_function(fun = dt, args = list(df = 1, ncp = 0), geom = "line", color = "red", linewidth = 1, linetype = "dashed") +
  #stat_function(fun = dt, args = list(df = 10, ncp = 0), geom = "line", color = "darkgreen", linewidth = 1, linetype = "dashed") +
  stat_function(fun = dt, args = list(df = 49, ncp = 0), geom = "line", color = "red", linewidth = 1, linetype = "dashed") +
  ylim(0,.5) + 
  scale_x_continuous(breaks = -3:3) +
  labs(x = "", y = "") 

3.26 Binomial test: Assumptions

  • Data are binomial (i.e., several 0,1 Bernoulli trials)
    • With same probability of success (\(p\)) for each trial
  • Data are randomly sampled from the population
  • Data are independent

3.27 Binomial test: Hypotheses

  • Directional (one-tailed) tests
    • \(H_0\): \(\pi \le \pi_0\)
    • \(H_1\): \(\pi > \pi_0\)
  • Non-directional (two-tailed) tests
    • \(H_0\): \(\pi = \pi_0\)
    • \(H_1\): \(\pi \ne \pi_0\)

3.28 Binomial test: Test and decision

\(P(X = x) = {m \choose x} p^x (1 - p)^{m -x}\)

  • Binomial test uses the binomial distribution
    • Exact test
    • Does not require “large” sample
    • Compare observed proportion (i.e., number of successes) to binomial distribution with hypothesized \(m\) and \(p\)

3.29 Binomial test: Example 1

  • Sample of \(n\) = 50 coin flips
  • Does this sample come from a fair coin (\(\pi\) = 0.5)?
  • Two-tailed test
    • \(H_0\): \(\pi = 0.5\)
    • \(H_1\): \(\pi \ne 0.5\)
sample2 <- rbinom(50, 1, 0.53)
 0  1 
30 20 

3.30 Binomial test: Example 2

\(P(X = x) = {m \choose x} p^x (1 - p)^{m - x}\)

binom_dat <- data.frame(x = 0:50, y = dbinom(0:50, 50, 0.5))
ggplot(data = binom_dat, aes(x = x, y = y)) +
  geom_col() +
  #scale_x_continuous(breaks = 0:50) +
  ylim(0,0.15) +
  labs(x = "X", y = "P(X = x)") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

3.31 Binomial test: Example 3

  • binom.test() function from stats package
    • x: number of successes
    • n: sample size
    • p: hypothesized probability of success (median = 50th %ile)
    • alternative: “two.sided” (default), “less”, “greater”

3.32 Binomial test: Example 4

binom.test(x = sum(sample2 > 0), 
           n = length(sample2), 
           p = 0.5, 
           alternative = "two.sided")

    Exact binomial test

data:  sum(sample2 > 0) and length(sample2)
number of successes = 20, number of trials = 50, p-value = 0.2026
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.2640784 0.5482060
sample estimates:
probability of success 

3.33 Binomial test: Report results

  • \(p\)-value > .05 and 95% confidence interval contain 0.5
    • Retain \(H_0: \pi = 0.5\)
    • This sample came from a population with mean = 0.5
  • Using a binomial test, we retained the null hypothesis that \(\pi\) = 0.5, observed mean = 0.4, \(p\) = 0.2026.

3.34 Alternative: \(z\)-test for proportion

  • Proportion = mean of binary (i.e., Bernoulli) variable
    • \(z\)-test as test of proportion
  • Observed distribution is not normal
    • Sampling distribution is normal with a “large” sample
    • Approximate test based on assumption of normality
  • Assumptions, hypotheses, critical values are the same as \(z\)-test
    • Replace \(\mu\) with \(\pi\) and \(\bar{X}\) with \(p\)

3.35 Proportion: \(z\)-test statistic

\[z = \frac{p - \pi_0}{\sqrt{\pi_0(1-\pi_0)/n}}\]

  • If \(H_0\) is true
    • Population mean = \(\pi_0\)
    • Population variance = \(\pi_0(1-\pi_0)\)
  • Works best when \(p\) and \(\pi\) are between 0.2 and 0.8

4 Non-parametric tests

4.1 Parametric vs non-parametric tests

  • Parametric = following a distribution
    • e.g., sampling distribution of the mean is normally distributed
  • Non-parametric = not following a distribution
    • Don’t assume any particular sampling distribution
    • Often assume the variables don’t have a particular distribution

4.2 Non-parametric tests

  • Many non-parametric tests
    • Not well organized or named
  • Two commonly used one sample non-parametric tests
    • Sign test: Test of median for ordinal+ data
    • Wilcoxon signed rank test: Test of median for interval+ data

4.3 Why non-parametric tests?

  • Non-numeric data (i.e., nominal or ordinal)
  • Continuous but very non-normal variables
  • Especially with smaller samples, where CLT doesn’t apply
  • Note that parametric tests are more powerful than non-parametric tests if their assumptions are met

4.4 Sign test: Assumptions

  • Data are at least ordinal (ordinal, interval, or ratio)
  • Data are randomly sampled from the population
  • No assumptions about distribution of the data
  • No assumptions about sampling distribution of the median
  • Hypotheses are about median

4.5 Sign test: Logic

  • Count observations above and below hypothesized median
    • If observed median = population median: 50% above, 50% below
  • Compare to binomial distribution to determine probability of observed number of “successes”

4.6 Sign test: Example

  • binom.test() function from stats package
    • x: number of successes (here, values > hypothesized median)
    • n: sample size
    • p: hypothesized probability of success (median = 50th %ile)
    • alternative: “two.sided” (default), “less”, “greater”

4.7 Sign test: Example

binom.test(x = sum(sample1 > 100), 
           n = length(sample1), 
           p = 0.5, 
           alternative = "two.sided")

    Exact binomial test

data:  sum(sample1 > 100) and length(sample1)
number of successes = 32, number of trials = 50, p-value = 0.06491
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.4919314 0.7708429
sample estimates:
probability of success 

4.8 Wilcoxon Signed rank test: Assumptions

  • Data are at least interval (interval or ratio)
  • Data are randomly sampled from the population
  • No assumptions about distribution of the data
  • No assumptions about sampling distribution of the median
  • Hypotheses are about median

4.9 Signed rank test: Logic

  • Rank absolute values of all values
  • Above hypothesized median: Assign “+”
  • Below hypothesized median: Assign “-”
  • Add positive ranks, add negative ranks
  • Compare to distribution under the null hypothesis
    • i.e., sum of positive ranks = sum of negative ranks

4.10 Signed rank test: Example

  • wilcox.test() function from stats package
[1] 109.4636
wilcox.test(sample1, mu = 100, alternative = "two.sided")

    Wilcoxon signed rank test with continuity correction

data:  sample1
V = 877, p-value = 0.02105
alternative hypothesis: true location is not equal to 100

5 Summary

5.1 Hypothesis testing

  • Null and alternative hypotheses
    • Retain or reject \(H_0\)
  • Hypotheses are about the population value (mean, median, etc.)
    • We use the sample to make inference about the population
  • If we reject \(H_0\), that just means that it’s very unlikely that the sample came from a population with that mean
    • We can still make a type I error

5.2 Deciding on a test

  • Normal outcome, known \(\sigma^2\): \(z\)-test
  • Normal outcome, unknown \(\sigma^2\): \(t\)-test
  • Proportion: Binomial test for small sample, \(z\)-test for large sample
  • Ordinal or very non-normal, especially with small sample: Non-parametric test

6 In-class activities

6.1 In-class activities

  • Assess some hypotheses about some variables
    • What type of variable?
    • What type of test?
    • What do we conclude?

6.2 Next week

  • Tests for two unrelated samples
  • Tests for two related samples
  • Chi-square test of independence for two binary variables