Research question to plot

Different questions = different visualizations

What is your research question? How can you answer it?

Different kinds of plots let you answer different questions and make different comparisons

One dataset, 25 visualizations:

https://flowingdata.com/2017/01/24/one-dataset-visualized-25-ways/

Same data, (roughly) same question, a bunch of different visualizations:

https://flowingdata.com/2018/10/17/ask-the-question-visualize-the-answer/

Recap of ggplot2

ggplot2

Based on the grammar of graphics

Dataset + variables to coordinate system + geometric object = plot

Geometric objects (“geoms”) are the key piece of the plot

Everything that goes into a plot (including the “ggplot()” part itself) is a function

  • You can tell by the parentheses at the end

  • Sometimes you add options (“arguments”) in the parentheses

  • Sometimes you just have them at the end ()

Basic ggplot template

All screenshots from “Data Visualization” cheatsheet from Rstudio at: https://www.rstudio.com/resources/cheatsheets/

Geom functions

Geoms for 1 variable

Geoms for 2 variables: Continuous X and Y

Geoms for 2 variables: Discrete X, continuous Y

Geoms for 2 variables: Discrete X and Y

Geoms for 2 variables: Continuous bivariate distribution

Geoms for 2 variables: Continuous function

Geoms for 2 variables: Error / variability

Four different ways to include error bars

  • You must specify the min and max for the error bars

Geoms for 3 variables

Geom primitives

Geom primitives

Line segments

Stat functions

Statistical functions

Each geom has a default statistic associated with it

  • e.g., geom_bar uses the “count” statistic

But you can explicitly change the statistic

  • e.g., instead of the count (frequency) use a proportion that you’ve calculated and use the “identity” statistic with the proportion variable

Position functions

Position functions

How to arrange geoms that would otherwise overlap

Arguments in the geom_bar (and similar) geoms:

geom_bar(position = “dodge”)

  • Move bars to be next to one another

geom_bar(position = “fill”)

  • Fill up all vertical space with the bar, split appropriately between categories

geom_bar(position = “stack”)

  • Stack the bars on top of one another, but don’t fill up the whole vertical space

Position functions

This can replace geom_point

geom_jitter()

  • Add random noise on both x and y axes to avoid overplotting

  • Can be modified to only jitter on 1 or the other dimension:

    • geom_jitter(width = 0.5)

    • geom_jitter(height = 0.5)

  • Or to limit how far the jitter spreads:

    • geom_jitter(width = 0.5, height = 0.5)

Coordinate functions

Coordinate system for the plot

coord_cartesian()

  • default for most plots

coord_fixed(ratio = x/y)

  • change aspect ratio between x and y (argument)

coord_flip()

  • Make x into y and y into x

coord_polar()

  • Polar coordinates

coord_trans(ytrans = “sqrt”)

  • Transform one of the axes (argument), such as the y axis being square rooted

Facet functions

Faceting functions

There are a few faceting functions, but one does most of the things you will ever need: facet_grid()

facet_grid(cols = vars(X1))

  • Can facet into columns based on a variable (here, X1)

facet_grid(rows = vars(X2))

  • Can facet into rows based on a variable (here, X2)

facet_grid(rows = vars(X2), cols = vars(X1))

  • Can facet into both rows and columns based on two variables

Theme functions

Theme functions

Themes are sets of changes to the overall look of the plot

  • You just apply the theme and all the changes are made

You can use some built-in default themes (today), or use packaged themes that other people have made (soon) or even make your own theme (just for fun)

Misc plots