Visual Inference

From EDA to inference

So far, we’ve been making plots for their own sake

  • We’ve focused on exploratory data analysis rather than statistical inference

But graphics are also useful for both

  • Determining statistical significance

  • Presenting statistical significance

Are we good at it?

Are we good at telling if, e.g., two variables are significantly correlated?

Not so much…

http://guessthecorrelation.com/

But we do a better job of comparing relationships

  • We can tell which relationship is stronger

Visual Inference: Line-up method

What do p-values represent?

Alpha

Alpha (usually .05) is the type I error rate

This means two things:

  1. Even if there’s no effect in the population, we still expect to find a significant effect in 5% of samples
  • Even if there’s actually no effect, we will find one 1 time in 20
  1. If the observed effect is in the extreme 5% of the distribution, we call that significant
  • We reject the null hypothesis that there’s no effect

So alpha means both of these things:

  1. Probability of an extreme value given the null hypothesis is true

  2. How extreme a value has to be to reject the null hypothesis

How can this help us use graphics for inference?

Imagine generating data from the null hypothesis (in the population)

  • e.g., correlation = 0, \(b_1\) = 0

Do this several times (say, 19) and make a plot of each

Compare the now 20 versions of the plot

  • A significant effect will stand out compared to the null effect

  • In the same way that a significant effect is extreme in test statistic, it will be extreme visually

Data line-up code

# three datasets with no relationship between x and y
x <- rnorm(100, 0, 1)
y <- rnorm(100, 0, 1)
data1 <- data.frame(x, y)
x <- rnorm(100, 0, 1)
y <- rnorm(100, 0, 1)
data2 <- data.frame(x, y)
x <- rnorm(100, 0, 1)
y <- rnorm(100, 0, 1)
data3 <- data.frame(x, y)
# this one has a relationship
x <- rnorm(100, 0, 1)
e <- rnorm(100, 0, 0.5)
y <- rnorm(100, 0.4*x + e)
data4 <- data.frame(x, y)

Data line-up figures (random order)

Data line-ups

In practice, you would want to have more plots

  • 1 out of 20 = .05 alpha

  • With 4 plots, the p-value is .25

You could do this by hand like I have

  • Pretty easy to do and you have lots of control over the models

There is also a package to automatically create line-ups:

References

The title of this article is incredibly vague, but it lays out the logic of the line-up procedure:

  • Wickham, H., Cook, D., Hofmann, H., & Buja, A. (2010). Graphical inference for infovis. IEEE Transactions on Visualization and Computer Graphics, 16(6), 973-979.

Theoretical background for the line-up method for a variety of tests:

  • Majumder, M., Hofmann, H., & Cook, D. (2013). Validation of visual statistical inference, applied to linear models. Journal of the American Statistical Association, 108(503), 942-956.

A presentation about the line-up technique:

Visual Inference: Confidence intervals

Confidence intervals

Range of plausible values for the population value we are estimating

We have a certain degree of confidence (e.g., 95%) in this range

Do not say that the CI has a 95% chance of including the population value

  • The population value is what it is, it does not vary

  • Our sample varies (not every sample is perfectly representative of the population = sampling variability) and therefore our estimate varies from sample to sample

Can be derived in a variety of different ways: theoretical, bootstrapping, Monte Carlo simulation, etc.

APA recommendations

APA Task Force on Statistical Inference

https://www.apa.org/science/leadership/bsa/statistical/

  • Report confidence intervals in addition to estimates

  • CIs are consistent with p-values / statistical testing

  • Provide information about precision of the estimates

What are we doing in our plots?

  • We usually do not have CIs in plots, but we should

  • Sometimes, we include other indicators of variability, but we should include CIs

Inference by eye

Geoff Cumming has made a career out of this

Using confidence intervals in plots (instead of standard error or standard deviation) allows you to make (rough) statistical inference based on the plot

  • Good plot: Plot matches the story

Overlap in CIs approximates p-value

For independent groups

95% confidence intervals will just touch when p = .01

95% confidence intervals will overlap half way when p = .05

  • “Half way” here means that the end of CI1 will be half way from the end of CI2 to mean2

What does it look like? p > .05

What does it look like? p = .05

What does it look like? p = .01

How does it work?

total_N <- 50
set.seed(12345)
group <- rbinom(total_N, 1, 0.5)
#group
set.seed(13579)
outcome <- 50 + 1 * group + rnorm(total_N, 0, 1.5)
#outcome
group_data <- data.frame(group, outcome)
#group_data

model1 <- lm(data = group_data, outcome ~ group)
#summary(model1)

Plot of just the raw data (jittered)

rawdata <- 
  ggplot(data = group_data, 
          aes(x = as.factor(group), y = outcome)) + 
  geom_jitter() + 
  geom_boxplot(alpha = 0.2)

Plot of just the raw data (jittered)

rawdata

Group by group and calculate the CIs

summary_data <- group_data %>% group_by(group) %>% 
summarize(group_n = n(), 
  out_mean = mean(outcome), 
  out_sd = sd(outcome), 
  error = qt(0.975,df = group_n-1) * out_sd/sqrt(group_n))
summary_data
## # A tibble: 2 × 5
##   group group_n out_mean out_sd error
##   <int>   <int>    <dbl>  <dbl> <dbl>
## 1     0      27     49.9   1.51 0.599
## 2     1      23     50.9   1.59 0.688

Plot with CI error bars

CIerror <- 
  ggplot(data = summary_data, 
          aes(x = as.factor(group), y = out_mean)) +
  geom_point(size = 6) +
  geom_errorbar(data = summary_data,
                aes(x = as.factor(group), 
                  ymin = out_mean - error, 
                  ymax = out_mean + error)) +  
  geom_jitter(data = group_data, 
             aes(x = as.factor(group), 
                 y = outcome), 
             color = "blue", alpha = 0.5)

Plot with CI error bars

CIerror 

Plot versus statistical test

p-value for the group effect

summary(model1)$coefficients[8]
## [1] 0.03028818

Continuous effects?

The geom_smooth() function produces a smoothed 95% (default) CI of predicted values

  • Estimates predicted values for all subjects

  • At each value of X, create the CI and then smooth together

Comparing two lines with their own standard errors

  • Where the CIs overlap halfway, the lines are different at p<.05

  • Where the CIs don’t overlap, the lines are different at p<.01

  • Most useful for interactions, but not the only place

  • Keep in mind that 1) the CIs vary across X and 2) they’ll always get larger near the ends of the data

Continuous effects

data("FirstYearGPA")
#glimpse(FirstYearGPA)

hsgpa_x_white <- 
  ggplot(data = FirstYearGPA, 
         aes(x = HSGPA, y = GPA, 
             color = as.factor(White))) +
  geom_point() +
  geom_smooth(method = lm)

Continuous effects

hsgpa_x_white
## `geom_smooth()` using formula 'y ~ x'

Within-subjects designs?

Things are very simple for independent groups

What about non-independence?

  • Paired t-test
  • Repeated measures ANOVA
  • Mixed model

For these types of models, the effect of interest is no longer just the mean difference

  • Average difference between observations across all pairs
  • Similar to above, but generalized to more than 2 groups
  • Within-person slope

Standard errors for repeated-measures designs

There are two somewhat similar methods

  1. Loftus and Masson (1994):
  • Use the \(MS_{error}\) from the repeated-measures ANOVA as the standard error

  • CI is \(\pm t_{crit} * \sqrt{MS_{error} / df_{error}}\)

  • Single CI width across all conditions = assumes perfect homogeneity

  • Use highest order interaction, such as subject x condition

Standard errors for repeated-measures designs

  1. Masson and Loftus (2003) and Cousineau (2005):
  • There is substantial variability across subjects (which is modeled in RM ANOVA or mixed model)

  • “Normalize” observations by subtracting the difference between the subject mean and the grand mean from each observation: \(X_{ij} - (\bar{X}_{personi} - \bar{X}_{grand})\)

  • This removes the between subject variability

  • Use this modified data for calculating CIs as you would for between-subjects designs

References - between-subjects

Cumming, G., & Finch, S. (2005). Inference by eye: confidence intervals and how to read pictures of data. American Psychologist, 60(2), 170.

Cumming, G. (2007). Inference by eye: Pictures of confidence intervals and thinking about levels of confidence. Teaching Statistics, 29(3), 89-93.

Cumming, G. (2009). Inference by eye: reading the overlap of independent confidence intervals. Statistics in medicine, 28(2), 205-220.

References - within-subjects

Loftus, G. R., & Masson, M. E. (1994). Using confidence intervals in within-subject designs. Psychonomic bulletin & review, 1(4), 476-490.

Masson, M. E., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 57(3), 203.

Cousineau, D. (2005). Confidence intervals in within-subject designs: A simpler solution to Loftus and Masson’s method. Tutorials in quantitative methods for psychology, 1(1), 42-45.

Notes on plots

Adding captions to plots

+ labs(caption = "This is my caption for this plot.")

  • Captions are the place to point out that you are showing 95% confidence intervals in your plot…