Grammar of graphics

Wilkinson’s grammar of graphics

Grammar gives language rules.

Wilkinson, L. (2005). The Grammar of Graphics, 2nd Ed. New York: Springer-Verlag.

The grammar of graphics is the set of rules for developing a graphic.

It does not invoke a specific software package, but any flexible and well-developed graphics software should incorporate these rules.

Grammar of graphics

  • Data
  • Variables
  • Algebra
  • Scales
  • Statistics
  • Geometry
  • Coordinates
  • Aesthetics
  • Facets

Data

This is the information that we want to make a graphic of

In psychology (and science in general), we are used to data sets that are organized in a certain way

  • Rows for participants, columns for variables

  • But data is often not organized in such a way

Variables

Concept associated with the (usually numeric) values in a data set

Usually a column, but doesn’t need to be

Also includes transformations of variables

Algebra

Reshaping of the data to accomodate what you want to do

For example, converting longitudinal data from wide (1 row per person, 1 column per time point) to tall (multiple rows per person, 1 column per variable)

Also includes aggregation: rows indicate whether something occurred and you want to plot that count

  • But not any statistical operations

Scales

The axes on which the graphic is presented

Depends on the type of variable that is being graphed

  • Nominal
  • Ordinal
  • Interval
  • Ratio

Former two will have categorical scales

Latter two will have continuous scales

Statistics

Basic statistical operations on variables

  • Mean
  • Median
  • Range
  • Minimum
  • Maximum
  • Confidence intervals
  • etc.

Geometry

Anything that appears in the graphic is a geometric object

  • Line
  • Point
  • Bar
  • et.

We will spend a lot of time talking about geometric objects

  • They are what differentiate different types of figures

Coordinates

Often related to the scale, but doesn’t need to be

Cartesian coordinates (typical x-y plane)

  • But also polar or other coordinate systems

Including information about rotation and reflection, and warping and stretching along 1 or more dimensions

  • Rotation in factor analysis: rotation and warping / stretching

Aesthetics

The physical attributes of the objects we perceive (see)

  • Position
  • Size
  • Shape
  • Color
  • Movement
  • etc.

Facets

Multiple similar versions of the same graphic, split by some variable

Relationship between two variables at 5 different time points

  • The 5 similar plots showing the relationship at each time point are facets

One panel for treatment group over time, one panel for control group over time

  • The 2 panels are facets

ggplot2

ggplot2

ggplot2 is the R graphics package that implements graphing according to Wilkinson’s rules

  • “gg” stands for “grammar of graphics”

The journal article introducing ggplot2:
Wickham, H. (2010). A layered grammar of graphics. Journal of Computational and Graphical Statistics, 19(1), 3-28.

The book expanding on the article:
Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Springer.

ggplot2

ggplot2 frames each aspect as a layer in the graphic

  • Multiple aspects can be layered on top of one another to create a graphic
  • This includes multiple aspects from the same rule, such as a scatter plot (geometry) and a regression line (geometry)

ggplot layers closely match Wilkinson’s (but not exactly)

  • Data and aesthetic mapping
  • Statistical transformation
  • Geometric objects
  • Position adjustment
  • Scales
  • Coordinate system
  • Faceting

Data and aesthetic mapping

Data is what you expect

  • ggplot2 lives in the “tidyverse” which generally expects data to be organized in rows for participants and columns for variables

Aesthetics are which variables are mapped to which part of the plot

  • Which variable is associated with the x axis?

  • Which variable is associated with the y axis?

Statistical transformation

Any sort of basic statistical operations that are conducted in the service of creating a graphic (not as an analysis of their own)

For example:

  • Calculating the median, minimum, and maximum in order to create a box plot

  • Creating bins for a histogram

Geometric objects

These dictate what plot you’re creating

Points for x-y pairs creates a scatterplot

  • geom_point()

A line creates a line

  • geom_smooth(method = lm) produces a straight line

A bar creates a bar plot

  • geom_bar()

Position adjustment

Jittering or other small changes to improve the appearance and readability of a plot

  • Jittering moves points slightly so that they’re not right on top of one another

  • May adjust the alpha (opacity) of points to more easily see overplotted points

Scales

Maps the data to its aesthetic attributes

Continuous and categorical variables have different types of scales

  • Color works well for continuous (or categorical)

  • Shape really only works for categorical

Coordinate system

The axes and gridlines of the figure

Typically Cartesian (x-y) plane but can be higher dimensions, polar / radial, etc.

Faceting

Multiple similar versions of the same graphic, split by some variable

Used to create “small multiples”

Conclusions

The grammar tells us what words make up our graphical “sentences,” but offers no advice on how to write well.

We can use the grammar of graphics to create any plot

  • But is is the correct plot for the research question?

  • ggplot2 can only help you make the plot…

Application

Defaults

ggplot2 is set up with a variety of default values to simplify making charts

  • In general, this means that you do not have to specify every aspect of the plot

Defaults: Full specification

scatter1 <-
ggplot() +
layer(
data = diamonds, mapping = aes(x = carat, y = price),
geom = "point", stat = "identity", position = "identity") +
scale_y_continuous() +
scale_x_continuous() +
coord_cartesian()

Specify every part of the graphic explicity

Defaults: Full specification

scatter1

Defaults: Take advantage of the defaults and shorthand

scatter2 <- 
ggplot(data = diamonds, aes(x = carat, y = price)) + 
geom_point()

First part specifies the data and the x and y variables

Second part says to add some points to it

ggplot2 makes assumptions and uses defaults for all other parts

  • Cartesian (x y) plane
  • Continuous x and y
  • Plot the actual point rather than some function of it (identity)

Defaults: Take advantage of the defaults and shorthand

scatter2

Positional versus keyword matching

You tell a function what goes where using either positional or keyword matching

  • Positional: Know that in ggplot2, the dataset is always listed first, followed by the x and y aesthetics (ordered as x and then y)
    • Less typing, but you have to know the function really well

ggplot(diamonds, aes(carat, price)) + geom_point()

  • Keyword: Each part of the function is matched to a keyword for that part of the function
    • More typing but always works with any function and you don’t have to guess at its defaults

ggplot(data = diamonds, aes(x = carat, y = price)) + geom_point()