Graphics

What are we talking about?

  • Data visualization (“datavis”)
    Basic depictions of (typically) raw data

  • Infographics
    Graphical representations of information (not necessarily data)

  • Exploratory data analysis
    Statistical analysis without hypothesis testing

  • Statistical graphics
    Graphics related to specific statistical procedures and their output (e.g, regression lines)

Why do we make plots?

  • Exploration

  • Analysis

  • Presentation

Certainty and uncertainty

Error bars or CIs: uncertainty of an estimate or variability of a sample or population

It is much easier to understand uncertainty if it is displayed graphically

  • You can see the estimate and the standard error together

What makes a good or bad graphic?

So many things, but…

  • Does it actually tell the story that it’s trying to tell?
  • Does it do that on it’s own, without additional explanation?
  • Does it lie or mislead to tell the story?

What makes a good or bad graphic?

Trifecta checkup for a graphic

  • What is the practical question?
  • What does the data say?
  • What does the chart say?

Do the answers to these questions match?

https://junkcharts.typepad.com/junk_charts/2019/07/what-is-a-bad-chart.html

Tips for better graphics

  1. Who is the audience?
  2. Answer a question
  3. Which type of graph?
  4. Outliers
  5. Remove unnecessary information
  6. Don’t mislead
  7. Colow palette
  8. Axis titles and labels
  9. Adjust titles, labels, and legend text
  10. Export your plot

https://michaeltoth.me/10-steps-to-better-graphs-in-r.html

Bad graphics

  • A collection of submitted bad graphics

https://viz.wtf/

  • Terrible graphic about salaries in tech

https://howmuch.net/articles/highest-lowest-paying-jobs-in-tech

  • Summing areas is hard to do mentally

https://eagereyes.org/criticism/visual-math-wrong

History

History

Long history of information being depicted in figures

  • Cave paintings of constellations, hunted animals
  • Maps
  • Mechanical diagrams

Graphics used in science after about 1600

William Playfair (1759 - 1823)

  • Scottish engineer and economist (also a secret agent)

  • Invented
    • Line chart
    • Area chart
    • Bar chart
    • Pie chart
  • Charts communicate better than tables

Most famous figure: https://upload.wikimedia.org/wikipedia/commons/5/52/Playfair_TimeSeries-2.png

Charles Joseph Minard (1781 - 1870)

  • French civil engineer

  • Best known for maps, especial “flow maps” that depict movement in both space and time

Most famous figure (English translation): https://upload.wikimedia.org/wikipedia/commons/e/e2/Minard_Update.png

John Snow (1813 - 1858)

  • English physician, helped develop modern aneasthesia use and medical hygiene, father of modern epidemiology

  • Helped prove that cholera was spread by germs and not by miasma (“bad air”), which was the prevailing theory at the time
    • Studied the relationship between water quality and cholera cases, using statistics
    • Showed that polluted water from the Thames was being delivered to certain areas, and people in those areas were getting cholera
    • “Outlier” story that proved the theory
  • Most famous plot: https://upload.wikimedia.org/wikipedia/commons/2/27/Snow-cholera-map-1.jpg

Florence Nightingale (1820 - 1910)

W.E.B. Du Bois (1868 - 1963)

Mary Eleanor Spear (1897 - 1986)

  • Career working for the federal government
    • IRS, Bureau of Labor Statistics
  • Background in drafting, so focuses on the physical aspects of creating graphs

  • Wrote several books on charting statistics

  • Created the box plot (called it the range plot) but until the Medium article below, was not credited with it on Wikipedia
    • Women in red

Medium article on her: https://medium.com/nightingale/credit-where-credit-is-due-mary-eleanor-spear-6a7a1951b8e6

Hans Rosling (1948 - 2017)

  • Swedish physician and professor

  • Co-founder and chairman of Gapminder Foundation
    • The “gapminder” dataset includes repeated measurements of mortality rates and other variables for dozens of countries and is frequently used to demonstrate packages in R
  • Emphasis on communicating statistics and science to lay audiences

  • Most famous plot (not actually a plot, but a TED talk): https://www.ted.com/talks/hans_rosling_the_truth_about_hiv
  • Has several really good TED talks

John Tukey (1915 - 2000)

  • American mathematician and professor, worked for Bell Labs, founded statistics department at Princeton

  • Likely created the words “bit” (binary digit) and “software”
  • Credited with creating the box plot

  • Distinction between exploratory and confirmatory data analysis
    • Graphics as a major part of exploratory data analysis
  • Best quotes:
    • The simple graph has brought more information to the data analyst’s mind than any other device.
    • The greatest value of a picture is when it forces us to notice what we never expected to see.

Edward Tufte (1942 - present)

  • American professor of political science, statistics, and computer science

  • Focus is on “information design” for data visualization
    • Get rid of “chartjunk” (things that don’t convey information)
    • Other terms: data-ink ratio, lie factor, data density
    • Critical of Powerpoint
  • Developed the concept of “small multiples”
    • Sets of small figures with the same X and Y axes
    • Allows easy comparison
  • Best plot (that’s not really a plot): The data duck

Software

R

  • Software environment and programming language

  • Known for use in statistical analysis, but does much more

  • “Object-oriented” programming language
    • Everthing is an object – dataset, variable, output
    • Everthing has a name
    • Just refer to an object’s name to do something with it
  • Can be expanded by writing your own code or using “packages” developed by others

  • Download here: https://cran.r-project.org/

Rstudio

  • Integrated development environment (IDE) for R
    • Nice user interface and some quality of life features to make using R a bit more pleasant.
  • To download to your computer: www.rstudio.com

  • To use via web browser: www.rstudio.cloud
    • Link to class space on Rstudio.cloud
    • I may put some files here

Rmarkdown

  • Markdown is a markup language (like HTML) to format documents

  • Idea: you put in your content, let Markdown handle the formatting

    • Written in simple code

    • Final document can be in multiple formats (PDF, Word, Powerpoint, HTML)

  • Implemented in R via the markdown package

Rmarkdown

  • Reproducible!

    • Run your code and write your report in a single file

    • Make calls to parts of the output in the report

    • You don’t have to copy values from the results to a report document

  • Can easily upload HTML files to a webpage using Rpubs (integrated with Rstudio)