Storytelling through statistics

Data analysis as…?

  • “Statistics as principled argument” by Robert Abelson

“The focus of the book is that the purpose of statistics is to organize a useful argument from quantitative evidence, using a form of principled rhetoric. Five criteria, described by the acronym MAGIC (magnitude, articulation, generality, interestingness, and credibility) are proposed as crucial features of a persuasive, principled argument.”

  • “Statistics as storytelling”

Lisa Lambert - Storytelling through statistics

Consortium for the Advancement of Research Methods and Analysis talk, January 2018: https://carma.azurewebsites.net/video

Methods and Results should be the most exciting part of a paper!

  • Where you tell reader what you did and what you found

Four principles

  1. Maintain the narrative

  2. Write for multiple audiences

  3. Create transparency

  4. Seek correspondence

Lisa Lambert - Maintain the narrative

  • BAD: Bland and boring, could be any paper

“Table 1 displays means and standard deviations of the variables, their correlations, and includes Cronbach’s alpha for the scales along the diagonal.”

  • GOOD: Include a table, but tell its story in the text

“Table 1 displays descriptive statistics for the full sample as well as subsets of NYC‐born attorneys and their officemates. Comparing NYC metro‐born attorneys and their officemates, we see predictable differences. NYC metro‐born attorneys are more likely to have attended a top and/or NYC‐based law school, and they are slightly more likely to be licensed to practice in NY.” (Carnahan, Kryscynski, & Olson, AMJ 2016, p1942)

Lisa Lambert - Maintain the narrative

  • BAD: Cut-and-paste results

“Results are reported in Table 2 and in Figures 2a and 2b. These results show that the mean of the experimental group was significantly higher than the mean of the control group supporting H1a.”

  • GOOD: Directly tie the findings back to the hypotheses

“Did overconfidence drive participants’ entry choices? If so, which type of overconfidence was responsible? We compared easy market entrants with difficult market entrants, examining their beliefs about themselves and their relative placements on each quiz. As Figures 2a and 2b show, …” (Cain, Moore & Haran, SMJ, 2015 36:1, p6)

Lisa Lambert - Maintain the narrative

Tables (and figures)

  • Should be interpretable without reference to the text

  • Should not require repetitive flipping back to other sections

  • Variables should be written with full names

  • Should indicate the type of analysis and the relationship between the statistics and the hypotheses

Lisa Lambert - Write for multiple audiences

  • BAD: Too much (unnecessary) detail

“The endogenous dependent variable of interest is Organizational Citizenship Behavior (OCB) with the focal entity on the individual (individual level). OCB is conceptualized as the employee intention to commit OCB in the future, how likely an employee is to commit an act of OCB and the unit of analysis is the employee’s intention consisting of one item on a 7‐point Likert scale. This should be measured at the individual level to reflect the intention level within each individual.”

  • GOOD: Simple enough that a novice can understand, but still covers the important points

“The dependent variable is Organizational Citizenship Behavior Intentions measured at the individual level with a single item on a 7‐point Likert scale.”

Lisa Lambert - Write for multiple audiences

Explain your choice of method using simple language and your research question

  • Igic, Keller, Elfering, Tschan, Kalin, & Semmer (JAP, 2017, 102): Change in job-related stress and control

  • Hypothesized change, but also that change could be roughly categorized (e.g., increasing, decreasing, stable)

  • p1319: Describe options (change scores, theoretically-defined groups, empirically-defined trajectories)

  • p1324: Use of growth mixture modeling: “This approach captures information about interindividual differences in intraindividual changeover time, and allows for differences in growth parameters acrossunobserved subpopulations (Muthén, 2001; Muthén & Muthén,2000).”

    • How the analysis ties back to research question, with references

Lisa Lambert - Write for multiple audiences

Summarize practical points

  • e.g., Table 8 in Lacerenza, Reyes, Marlow, Joseph & Salas, JAP 2017 p1704

  • Evidence-based best practices for designing a leadership training program

  • Define 8 best practices

  • Tips for implementation in terms of Learning, Transfer, Results

Lisa Lambert - Create transparency

Present full and complete results

  • Regression results plus simple slopes

  • Don’t use outdated methods (e.g., Baron & Kenny for mediation)

Lisa Lambert - Create transparency

  • GOOD: Be honest about shortcomings with data or design

“However, Study 1 also has some limitations. First, we were unable to disentangle the causal ordering among our independent and mediator variables because they were collected at the same time. Although longitudinally separated, the causal relationship between our mediators and dependent variables could also be strengthened. Second, our measure of identity conflict and enhancement may not reflect the underlying values associated with brand identities. Last, the unexpected negative relationship between perspective‐taking and performance also requires further investigation. To address the above limitations, in Studies 2a and 2b we conduct experiments using a between subjects design to isolate and better examine the mechanisms we find in the field study.” (Ramarajan, Rothbard, and Wilk, AMJ 2017, Dec p2222)

Lisa Lambert - Seek correspondence

Parallelism: Verbal story should match statistical story should match graphical story

Examples of the wrong way:

  • Theory is causal but design is poor for making causal inference (i.e., not randomized)

  • Timing of measurements doesn’t match the theoretical process

  • Model as described (or as presented in equations) doesn’t match figure for model

  • Verbal story (Introduction) doesn’t match the statistical story (Method & Results)

    • Maybe written by different people?

Lisa Lambert - Seek correspondence

Examples of the right way:

  • Analytical plan: map that links the verbal story (Introduction) to the statistical story

  • Argyres, Bigelow & Nickerson (SMJ, 2015, 36:2, p228)

  • Our analysis follows the general approach taken by Carroll et al. (1996) and Klepper (2006) in their survival analyses of the U.S. auto industry. We estimate a Gompertz hazard rate model of firm mortality because nonparametric analyses suggest that the Gompertz specification shows superior goodness of fit relative to the Weibull and other specifications. We conduct this analysis at the level of the firm rather than at the level of the car model because our theory is about firms.

Lisa Lambert - Storytelling through statistics

  1. Maintain the narrative

  2. Write for multiple audiences

  3. Create transparency

  4. Seek correspondence

Storytelling with R (Rstudio conference 2018)

https://resources.rstudio.com/rstudio-conf-2018/storytelling-with-r-olga-pierce

  • Olga Pierce - Data journalist for ProPublica

  • Make things easy for people to see / read

  • Help people make sense for themselves (context)

  • Great example using specific individual observations to demonstrate the findings

    • Recidivism prediction algorithm shows racial bias

    • Different false positive rate for black vs white

    • Show exemplar white person who was “low risk” with many priors and re-committed a violent crime, with black person who was “high risk” with no priors and never re-committed a crime

Finding & Telling Stories w/ R (Rstudio conference 2017)

Rstudio conference, 2017: https://resources.rstudio.com/wistia-rstudio-conf-2017/finding-and-telling-stories-with-r-andrew-flowers

Andrew Flowers - Data journalist (formerly at fiverthirtyeight.com)

6 types of data stories (and their dangers)

  • Novelty (Triviality)

  • Outlier (Spurious result)

  • Archetype (Oversimplification)

  • Trend (Variance)

  • Debunking (Confirmation bias)

  • Forecast (Overfitting)

Telling a story with data

https://www.forbes.com/sites/evamurray/2019/02/06/how-do-you-tell-a-story-with-data-visualization/amp/

  • Guide your audience

  • Tell your story chronologically, if possible

  • Start with a summary (general), then move to details

  • Is there a conclusion or are you just passing on information?

Science communication

Effective communication

  • Public communication of science

  • What are effective science communication approaches?

  • Explain things simply and succinctly

  • Who are you trying to reach and why does this matter to them?

  • Listen to and engage with audience: bidirectional dialogue

  • Systems approach: Science is just one piece of information that people learn about and use

https://blogs.scientificamerican.com/guest-blog/effective-communication-better-science/

https://theconversation.com/what-does-research-say-about-how-to-effectively-communicate-about-science-70244

What is your goal?

  • Dissemination of information

    • Purely getting the information out
  • Dissemination to practitioners

    • Anyone who might be applying it – therapists, parents, teachers, managers, human resources
  • Persuade an audience

    • Safety of vaccines, climate change is real, etc.

Who is your audience?

  • Others in your exact field

  • Others in a similar field (e.g., psychology)

  • Other scientists in a different field (e.g., chemist)

  • Affiliated non-scientist professionals (e.g, therapists, teachers, lawyer / judge)

  • Less-affiliated non-scientist professionals (e.g., managers, business leaders, administrators)

  • Non-scientist non-professionals (e.g., parents, general public)

How do you communicate differently?

  • Speak simply, to the lowest level

    • Repeat important information
  • Avoid jargon and abbreviations

  • Consider what you want your audience to get out of it, as well as what your audience wants to get out of it

  • Dialogue is important: no one wants to be lectured at

Communicating with different groups

Statisticians communicating with non-statisticians:

  • What does the client (non-statistician) want from the interaction?

    • Do they want to learn statistics or do they want an answer to their question?

    • Should I bring a basket of fish or some fishing poles? Link

  • Pay attention to whether your explanations are clicking

    • Listen, paraphrase, summarize (or ask them to)
  • Tailor communication to their knowledge / experience

magazine.amstat.org/blog/2019/09/01/comm_nonstatisticians/

Communicating with different groups

Communicating in non-traditional settings

  • Technology may not be available, data may be poor quality, question may not be well-developed

    • Analyst’s job is often to do what they can
  • The research can involve important, life-and-death decisions

    • Spread of disease, availability of water, etc.

Statistics in the Wild: Practicing Statistics in Nontraditional Places, from a Tiny Island in the Pacific to the Federal Cabinet: https://ww2.amstat.org/meetings/csp/2018/onlineprogram/AbstractDetails.cfm?AbstractID=303499

Data visualization for low and middle income countries: https://medium.com/nightingale/data-visualization-for-audiences-in-low-middle-income-countries-ed722d161313?source=friends_link&sk=8ca77bd05f6ea12fbd7e240b98b255bd

Communicating with different groups

Rural United States

  • Showed 10 different graphics to 40+ people in rural Pennsylvania

  • Rank based on their usefulness

  • Findings: Personal connection to the data is more important than any design aspect

    • Family / friend dealing with addiction: drawn to addiction graphic

    • Title of the graphic said “America” so they related to it more (same data in both)

https://medium.com/multiple-views-visualization-research-explained/data-is-personal-what-we-learned-from-42-interviews-in-rural-america-93539f25836d

Communicating with different groups

3 minute thesis / dissertation, elevator pitch for job interviews: https://en.wikipedia.org/wiki/Three_Minute_Thesis

  • Very short, very concise summary of what you’ve spent the last half-decade of your life completely absorbed in….

  • General audience, so must be non-technical

  • You should have this ready at all times

Science for children

  • Through the Frontiers journals

  • Children and teens review articles written by scientists (for kids)

  • Important things, broken down to simple concepts

    • Example: “Hitting Your Head Can Result in Invisible Disability That Affects Your Body and Beyond!”

https://kids.frontiersin.org/

Teaching communication

Wikipedia: gigantic online encyclopedia

  • Have enlisted HS teachers and students to create articles

  • Wikipedia fellows comes from the opposite direction

    • Enlist scientists (experts) to edit and refine articles

    • We have “big picture” knowledge that e.g., high school and college students don’t

  • http://wikiedu.org/wikipedia-fellows/

Communicating Science Conference (comscicon):

  • Was actually in Miami a few weeks ago, co-hosted by FIU, would have been nice if they’d circulated the info to us…

  • https://comscicon.com/

Making plots

Give plots titles that describe the findings

https://twitter.com/mikemorrison/status/1110191245035479041

http://betterposters.blogspot.com/2019/04/critique-morrison-billboard-poster.html

Markdown template: https://github.com/GerkeLab/betterposter

Expanding plot limits

For example, make a plot that doesn’t include 0 on the Y axis include 0

+ expand_limits(y = 0)

Or allow smooth transition from one plot to another, by expanding the limits on the first plot to the (larger) limits on the second

Or to make two plot more similar and easy to compare, by making the axes extend to the same place

R for Data Science - chapter on communication

https://r4ds.had.co.nz/graphics-for-communication.html

Title, subtitle, and caption:

+ labs(

title = "Fuel efficiency generally decreases with engine size",

subtitle = "Two seaters (sports cars) are an exception because of their light weight",

caption = "Data from fueleconomy.gov"

)

R for Data Science - chapter on communication

Label X and Y axis with equations:

+ labs(
x = quote(sum(x[i] ^ 2, i == 1, n)),
y = quote(alpha + beta + frac(delta, theta)))

Both of these labels are all math text

If you have both text and mathtext, you will need to use the methods we talked about before: text in quotes, math text not, ~ as spacer between them

R for Data Science - chapter on communication

Justifying text in annotations

annotate, geom_text, or element_text

hjust takes options
- left
- center
- right

vjust takes options
- top
- center
- bottom

e.g., put an annotation in a spot, justified in upper right corner

+ geom_text(aes(label = label), data = label, vjust = "top", hjust = "right")

R for Data Science - chapter on communication

Be specific with your tick marks

Specify all tick marks:

+ scale_y_continuous(breaks = c(15, 20, 25, 30, 35, 40))

Specify a pattern to the ticks:

+ scale_y_continuous(breaks = seq(15, 40, by = 5))

The ticks can be unequally spaced if that reflects your data