Visual Inference

From EDA to inference

So far, we’ve been making plots for their own sake

We’ve focused on exploratory data analysis rather than statistical inference

But graphics are also useful for both

Determining statistical significance
Presenting statistical significance

Are we good at it?

Are we good at telling if, e.g., two variables are significantly correlated?

Not so much…

http://guessthecorrelation.com/

But we do a better job of comparing relationships

We can tell which relationship is stronger

Visual Inference: Line-up method

What do p-values represent?

Alpha

Alpha (usually .05) is the type I error rate

This means two things:

Even if there’s no effect in the population, we still expect to find a significant effect in 5% of samples

Even if there’s actually no effect, we will find one 1 time in 20

If the observed effect is in the extreme 5% of the distribution, we call that significant

We reject the null hypothesis that there’s no effect

So alpha means both of these things:

Probability of an extreme value given the null hypothesis is true
How extreme a value has to be to reject the null hypothesis

How can this help us use graphics for inference?

Imagine generating data from the null hypothesis (in the population)

e.g., correlation = 0, \(b_1\) = 0

Do this several times (say, 19) and make a plot of each

Compare the now 20 versions of the plot

A significant effect will stand out compared to the null effect
In the same way that a significant effect is extreme in test statistic, it will be extreme visually

Data line-up code

# three datasets with no relationship between x and y
x <- rnorm(100, 0, 1)
y <- rnorm(100, 0, 1)
data1 <- data.frame(x, y)
x <- rnorm(100, 0, 1)
y <- rnorm(100, 0, 1)
data2 <- data.frame(x, y)
x <- rnorm(100, 0, 1)
y <- rnorm(100, 0, 1)
data3 <- data.frame(x, y)
# this one has a relationship
x <- rnorm(100, 0, 1)
e <- rnorm(100, 0, 0.5)
y <- rnorm(100, 0.4*x + e)
data4 <- data.frame(x, y)

Data line-up figures (random order)

Data line-ups

In practice, you would want to have more plots

1 out of 20 = .05 alpha
With 4 plots, the p-value is .25

You could do this by hand like I have

Pretty easy to do and you have lots of control over the models

There is also a package to automatically create line-ups:

https://cran.r-project.org/web/packages/nullabor/vignettes/nullabor.html
Also includes a function (rorschach) to create all null plots, so you can get used to what real null effects look like (since we’re not very good at it)

References

The title of this article is incredibly vague, but it lays out the logic of the line-up procedure:

Wickham, H., Cook, D., Hofmann, H., & Buja, A. (2010). Graphical inference for infovis. IEEE Transactions on Visualization and Computer Graphics, 16(6), 973-979.

Theoretical background for the line-up method for a variety of tests:

Majumder, M., Hofmann, H., & Cook, D. (2013). Validation of visual statistical inference, applied to linear models. Journal of the American Statistical Association, 108(503), 942-956.

A presentation about the line-up technique:

https://aloy.rbind.io/talk/2018-lanl/

Visual Inference: Confidence intervals

Confidence intervals

Range of plausible values for the population value we are estimating

We have a certain degree of confidence (e.g., 95%) in this range

Do not say that the CI has a 95% chance of including the population value

The population value is what it is, it does not vary
Our sample varies (not every sample is perfectly representative of the population = sampling variability) and therefore our estimate varies from sample to sample

Can be derived in a variety of different ways: theoretical, bootstrapping, Monte Carlo simulation, etc.

APA recommendations

APA Task Force on Statistical Inference

https://www.apa.org/science/leadership/bsa/statistical/

Report confidence intervals in addition to estimates
CIs are consistent with p-values / statistical testing
Provide information about precision of the estimates

What are we doing in our plots?

We usually do not have CIs in plots, but we should
Sometimes, we include other indicators of variability, but we should include CIs

Inference by eye

Geoff Cumming has made a career out of this

Using confidence intervals in plots (instead of standard error or standard deviation) allows you to make (rough) statistical inference based on the plot

Good plot: Plot matches the story

Overlap in CIs approximates p-value

For independent groups

95% confidence intervals will just touch when p = .01

95% confidence intervals will overlap half way when p = .05

“Half way” here means that the end of CI1 will be half way from the end of CI2 to mean2

What does it look like? p > .05

What does it look like? p = .05

What does it look like? p = .01

How does it work?

total_N <- 50
set.seed(12345)
group <- rbinom(total_N, 1, 0.5)
#group
set.seed(13579)
outcome <- 50 + 1 * group + rnorm(total_N, 0, 1.5)
#outcome
group_data <- data.frame(group, outcome)
#group_data

model1 <- lm(data = group_data, outcome ~ group)
#summary(model1)

Plot of just the raw data (jittered)

rawdata <- 
  ggplot(data = group_data, 
          aes(x = as.factor(group), y = outcome)) + 
  geom_jitter() + 
  geom_boxplot(alpha = 0.2)

Plot of just the raw data (jittered)

rawdata

Group by group and calculate the CIs

summary_data <- group_data %>% group_by(group) %>% 
summarize(group_n = n(), 
  out_mean = mean(outcome), 
  out_sd = sd(outcome), 
  error = qt(0.975,df = group_n-1) * out_sd/sqrt(group_n))
summary_data

## # A tibble: 2 × 5
##   group group_n out_mean out_sd error
##   <int>   <int>    <dbl>  <dbl> <dbl>
## 1     0      27     49.9   1.51 0.599
## 2     1      23     50.9   1.59 0.688

Plot with CI error bars

CIerror <- 
  ggplot(data = summary_data, 
          aes(x = as.factor(group), y = out_mean)) +
  geom_point(size = 6) +
  geom_errorbar(data = summary_data,
                aes(x = as.factor(group), 
                  ymin = out_mean - error, 
                  ymax = out_mean + error)) +  
  geom_jitter(data = group_data, 
             aes(x = as.factor(group), 
                 y = outcome), 
             color = "blue", alpha = 0.5)

Plot with CI error bars

CIerror

Plot versus statistical test

p-value for the group effect

summary(model1)$coefficients[8]

## [1] 0.03028818

Continuous effects?

The geom_smooth() function produces a smoothed 95% (default) CI of predicted values

Estimates predicted values for all subjects
At each value of X, create the CI and then smooth together

Comparing two lines with their own standard errors

Where the CIs overlap halfway, the lines are different at p<.05
Where the CIs don’t overlap, the lines are different at p<.01
Most useful for interactions, but not the only place
Keep in mind that 1) the CIs vary across X and 2) they’ll always get larger near the ends of the data

Continuous effects

data("FirstYearGPA")
#glimpse(FirstYearGPA)

hsgpa_x_white <- 
  ggplot(data = FirstYearGPA, 
         aes(x = HSGPA, y = GPA, 
             color = as.factor(White))) +
  geom_point() +
  geom_smooth(method = lm)

Continuous effects

hsgpa_x_white

## `geom_smooth()` using formula 'y ~ x'

Within-subjects designs?

Things are very simple for independent groups

What about non-independence?

Paired t-test
Repeated measures ANOVA
Mixed model

For these types of models, the effect of interest is no longer just the mean difference

Average difference between observations across all pairs
Similar to above, but generalized to more than 2 groups
Within-person slope

Standard errors for repeated-measures designs

There are two somewhat similar methods

Loftus and Masson (1994):

Use the \(MS_{error}\) from the repeated-measures ANOVA as the standard error
CI is \(\pm t_{crit} * \sqrt{MS_{error} / df_{error}}\)
Single CI width across all conditions = assumes perfect homogeneity
Use highest order interaction, such as subject x condition

Standard errors for repeated-measures designs

Masson and Loftus (2003) and Cousineau (2005):

There is substantial variability across subjects (which is modeled in RM ANOVA or mixed model)
“Normalize” observations by subtracting the difference between the subject mean and the grand mean from each observation: \(X_{ij} - (\bar{X}_{personi} - \bar{X}_{grand})\)
This removes the between subject variability
Use this modified data for calculating CIs as you would for between-subjects designs

References - between-subjects

Cumming, G., & Finch, S. (2005). Inference by eye: confidence intervals and how to read pictures of data. American Psychologist, 60(2), 170.

Cumming, G. (2007). Inference by eye: Pictures of confidence intervals and thinking about levels of confidence. Teaching Statistics, 29(3), 89-93.

Cumming, G. (2009). Inference by eye: reading the overlap of independent confidence intervals. Statistics in medicine, 28(2), 205-220.

References - within-subjects

Loftus, G. R., & Masson, M. E. (1994). Using confidence intervals in within-subject designs. Psychonomic bulletin & review, 1(4), 476-490.

Masson, M. E., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 57(3), 203.

Cousineau, D. (2005). Confidence intervals in within-subject designs: A simpler solution to Loftus and Masson’s method. Tutorials in quantitative methods for psychology, 1(1), 42-45.

Notes on plots

Adding captions to plots

+ labs(caption = "This is my caption for this plot.")

Captions are the place to point out that you are showing 95% confidence intervals in your plot…

Statistical Graphics and Communication

Stefany Coxe

October 10, 2019

Visual Inference

From EDA to inference

Are we good at it?

Visual Inference: Line-up method

What do p-values represent?

Alpha

How can this help us use graphics for inference?

Data line-up code

Data line-up figures (random order)

Data line-ups

References

Visual Inference: Confidence intervals

Confidence intervals

APA recommendations

Inference by eye

Overlap in CIs approximates p-value

What does it look like? p > .05

What does it look like? p = .05

What does it look like? p = .01

How does it work?

Plot of just the raw data (jittered)

Plot of just the raw data (jittered)

Group by group and calculate the CIs

Plot with CI error bars

Plot with CI error bars

Plot versus statistical test

Continuous effects?

Continuous effects

Continuous effects

Within-subjects designs?

Standard errors for repeated-measures designs

Standard errors for repeated-measures designs

References - between-subjects

References - within-subjects

Notes on plots

Adding captions to plots