“The focus of the book is that the purpose of statistics is to organize a useful argument from quantitative evidence, using a form of principled rhetoric. Five criteria, described by the acronym MAGIC (magnitude, articulation, generality, interestingness, and credibility) are proposed as crucial features of a persuasive, principled argument.”
Consortium for the Advancement of Research Methods and Analysis talk, January 2018: https://carma.azurewebsites.net/video
Methods and Results should be the most exciting part of a paper!
Four principles
Maintain the narrative
Write for multiple audiences
Create transparency
Seek correspondence
“Table 1 displays means and standard deviations of the variables, their correlations, and includes Cronbach’s alpha for the scales along the diagonal.”
“Table 1 displays descriptive statistics for the full sample as well as subsets of NYC‐born attorneys and their officemates. Comparing NYC metro‐born attorneys and their officemates, we see predictable differences. NYC metro‐born attorneys are more likely to have attended a top and/or NYC‐based law school, and they are slightly more likely to be licensed to practice in NY.” (Carnahan, Kryscynski, & Olson, AMJ 2016, p1942)
“Results are reported in Table 2 and in Figures 2a and 2b. These results show that the mean of the experimental group was significantly higher than the mean of the control group supporting H1a.”
“Did overconfidence drive participants’ entry choices? If so, which type of overconfidence was responsible? We compared easy market entrants with difficult market entrants, examining their beliefs about themselves and their relative placements on each quiz. As Figures 2a and 2b show, …” (Cain, Moore & Haran, SMJ, 2015 36:1, p6)
Tables (and figures)
Should be interpretable without reference to the text
Should not require repetitive flipping back to other sections
Variables should be written with full names
Should indicate the type of analysis and the relationship between the statistics and the hypotheses
“The endogenous dependent variable of interest is Organizational Citizenship Behavior (OCB) with the focal entity on the individual (individual level). OCB is conceptualized as the employee intention to commit OCB in the future, how likely an employee is to commit an act of OCB and the unit of analysis is the employee’s intention consisting of one item on a 7‐point Likert scale. This should be measured at the individual level to reflect the intention level within each individual.”
“The dependent variable is Organizational Citizenship Behavior Intentions measured at the individual level with a single item on a 7‐point Likert scale.”
Explain your choice of method using simple language and your research question
Igic, Keller, Elfering, Tschan, Kalin, & Semmer (JAP, 2017, 102): Change in job-related stress and control
Hypothesized change, but also that change could be roughly categorized (e.g., increasing, decreasing, stable)
p1319: Describe options (change scores, theoretically-defined groups, empirically-defined trajectories)
p1324: Use of growth mixture modeling: “This approach captures information about interindividual differences in intraindividual changeover time, and allows for differences in growth parameters acrossunobserved subpopulations (Muthén, 2001; Muthén & Muthén,2000).”
Summarize practical points
e.g., Table 8 in Lacerenza, Reyes, Marlow, Joseph & Salas, JAP 2017 p1704
Evidence-based best practices for designing a leadership training program
Define 8 best practices
Tips for implementation in terms of Learning, Transfer, Results
Present full and complete results
Regression results plus simple slopes
Don’t use outdated methods (e.g., Baron & Kenny for mediation)
“However, Study 1 also has some limitations. First, we were unable to disentangle the causal ordering among our independent and mediator variables because they were collected at the same time. Although longitudinally separated, the causal relationship between our mediators and dependent variables could also be strengthened. Second, our measure of identity conflict and enhancement may not reflect the underlying values associated with brand identities. Last, the unexpected negative relationship between perspective‐taking and performance also requires further investigation. To address the above limitations, in Studies 2a and 2b we conduct experiments using a between subjects design to isolate and better examine the mechanisms we find in the field study.” (Ramarajan, Rothbard, and Wilk, AMJ 2017, Dec p2222)
Parallelism: Verbal story should match statistical story should match graphical story
Examples of the wrong way:
Theory is causal but design is poor for making causal inference (i.e., not randomized)
Timing of measurements doesn’t match the theoretical process
Model as described (or as presented in equations) doesn’t match figure for model
Verbal story (Introduction) doesn’t match the statistical story (Method & Results)
Examples of the right way:
Analytical plan: map that links the verbal story (Introduction) to the statistical story
Argyres, Bigelow & Nickerson (SMJ, 2015, 36:2, p228)
Our analysis follows the general approach taken by Carroll et al. (1996) and Klepper (2006) in their survival analyses of the U.S. auto industry. We estimate a Gompertz hazard rate model of firm mortality because nonparametric analyses suggest that the Gompertz specification shows superior goodness of fit relative to the Weibull and other specifications. We conduct this analysis at the level of the firm rather than at the level of the car model because our theory is about firms.
Maintain the narrative
Write for multiple audiences
Create transparency
Seek correspondence
https://resources.rstudio.com/rstudio-conf-2018/storytelling-with-r-olga-pierce
Olga Pierce - Data journalist for ProPublica
Make things easy for people to see / read
Help people make sense for themselves (context)
Great example using specific individual observations to demonstrate the findings
Recidivism prediction algorithm shows racial bias
Different false positive rate for black vs white
Show exemplar white person who was “low risk” with many priors and re-committed a violent crime, with black person who was “high risk” with no priors and never re-committed a crime
Rstudio conference, 2017: https://resources.rstudio.com/wistia-rstudio-conf-2017/finding-and-telling-stories-with-r-andrew-flowers
Andrew Flowers - Data journalist (formerly at fiverthirtyeight.com)
6 types of data stories (and their dangers)
Novelty (Triviality)
Outlier (Spurious result)
Archetype (Oversimplification)
Trend (Variance)
Debunking (Confirmation bias)
Forecast (Overfitting)
Guide your audience
Tell your story chronologically, if possible
Start with a summary (general), then move to details
Is there a conclusion or are you just passing on information?
Public communication of science
What are effective science communication approaches?
Explain things simply and succinctly
Who are you trying to reach and why does this matter to them?
Listen to and engage with audience: bidirectional dialogue
Systems approach: Science is just one piece of information that people learn about and use
https://blogs.scientificamerican.com/guest-blog/effective-communication-better-science/
Dissemination of information
Dissemination to practitioners
Persuade an audience
Others in your exact field
Others in a similar field (e.g., psychology)
Other scientists in a different field (e.g., chemist)
Affiliated non-scientist professionals (e.g, therapists, teachers, lawyer / judge)
Less-affiliated non-scientist professionals (e.g., managers, business leaders, administrators)
Non-scientist non-professionals (e.g., parents, general public)
Speak simply, to the lowest level
Avoid jargon and abbreviations
Consider what you want your audience to get out of it, as well as what your audience wants to get out of it
Dialogue is important: no one wants to be lectured at
Statisticians communicating with non-statisticians:
What does the client (non-statistician) want from the interaction?
Do they want to learn statistics or do they want an answer to their question?
Should I bring a basket of fish or some fishing poles? Link
Pay attention to whether your explanations are clicking
Tailor communication to their knowledge / experience
Communicating in non-traditional settings
Technology may not be available, data may be poor quality, question may not be well-developed
The research can involve important, life-and-death decisions
Statistics in the Wild: Practicing Statistics in Nontraditional Places, from a Tiny Island in the Pacific to the Federal Cabinet: https://ww2.amstat.org/meetings/csp/2018/onlineprogram/AbstractDetails.cfm?AbstractID=303499
Data visualization for low and middle income countries: https://medium.com/nightingale/data-visualization-for-audiences-in-low-middle-income-countries-ed722d161313?source=friends_link&sk=8ca77bd05f6ea12fbd7e240b98b255bd
Rural United States
Showed 10 different graphics to 40+ people in rural Pennsylvania
Rank based on their usefulness
Findings: Personal connection to the data is more important than any design aspect
Family / friend dealing with addiction: drawn to addiction graphic
Title of the graphic said “America” so they related to it more (same data in both)
3 minute thesis / dissertation, elevator pitch for job interviews: https://en.wikipedia.org/wiki/Three_Minute_Thesis
Very short, very concise summary of what you’ve spent the last half-decade of your life completely absorbed in….
General audience, so must be non-technical
You should have this ready at all times
Science for children
Through the Frontiers journals
Children and teens review articles written by scientists (for kids)
Important things, broken down to simple concepts
Wikipedia: gigantic online encyclopedia
Have enlisted HS teachers and students to create articles
Wikipedia fellows comes from the opposite direction
Enlist scientists (experts) to edit and refine articles
We have “big picture” knowledge that e.g., high school and college students don’t
Communicating Science Conference (comscicon):
Was actually in Miami a few weeks ago, co-hosted by FIU, would have been nice if they’d circulated the info to us…
Now: Neutral titles with basic information about the data
UK Office of National Statistics: https://digitalblog.ons.gov.uk/2019/01/28/say-what-you-see-the-way-we-write-chart-titles-is-changing/
Pew Research: https://www.pewresearch.org/fact-tank/2019/06/25/stark-partisan-divisions-in-americans-views-of-socialism-capitalism/
Related: Better (?) posters
https://twitter.com/mikemorrison/status/1110191245035479041
http://betterposters.blogspot.com/2019/04/critique-morrison-billboard-poster.html
Markdown template: https://github.com/GerkeLab/betterposter
For example, make a plot that doesn’t include 0 on the Y axis include 0
+ expand_limits(y = 0)
Or allow smooth transition from one plot to another, by expanding the limits on the first plot to the (larger) limits on the second
Or to make two plot more similar and easy to compare, by making the axes extend to the same place
https://r4ds.had.co.nz/graphics-for-communication.html
Title, subtitle, and caption:
+ labs(
title = "Fuel efficiency generally decreases with engine size",
subtitle = "Two seaters (sports cars) are an exception because of their light weight",
caption = "Data from fueleconomy.gov"
)
Label X and Y axis with equations:
+ labs(
x = quote(sum(x[i] ^ 2, i == 1, n)),
y = quote(alpha + beta + frac(delta, theta)))
Both of these labels are all math text
If you have both text and mathtext, you will need to use the methods
we talked about before: text in quotes, math text not, ~
as
spacer between them
Justifying text in annotations
annotate
, geom_text
, or
element_text
hjust
takes options
- left
- center
- right
vjust
takes options
- top
- center
- bottom
e.g., put an annotation in a spot, justified in upper right corner
+ geom_text(aes(label = label), data = label, vjust = "top", hjust = "right")
Be specific with your tick marks
Specify all tick marks:
+ scale_y_continuous(breaks = c(15, 20, 25, 30, 35, 40))
Specify a pattern to the ticks:
+ scale_y_continuous(breaks = seq(15, 40, by = 5))
The ticks can be unequally spaced if that reflects your data