Grammar gives language rules.
Wilkinson, L. (2005). The Grammar of Graphics, 2nd Ed. New York: Springer-Verlag.
The grammar of graphics is the set of rules for developing a graphic.
It does not invoke a specific software package, but any flexible and well-developed graphics software should incorporate these rules.
This is the information that we want to make a graphic of
In psychology (and science in general), we are used to data sets that are organized in a certain way
Rows for participants, columns for variables
But data is often not organized in such a way
Concept associated with the (usually numeric) values in a data set
Usually a column, but doesn’t need to be
Also includes transformations of variables
Reshaping of the data to accomodate what you want to do
For example, converting longitudinal data from wide (1 row per person, 1 column per time point) to tall (multiple rows per person, 1 column per variable)
Also includes aggregation: rows indicate whether something occurred and you want to plot that count
The axes on which the graphic is presented
Depends on the type of variable that is being graphed
Former two will have categorical scales
Latter two will have continuous scales
Basic statistical operations on variables
Anything that appears in the graphic is a geometric object
We will spend a lot of time talking about geometric objects
Often related to the scale, but doesn’t need to be
Cartesian coordinates (typical x-y plane)
Including information about rotation and reflection, and warping and stretching along 1 or more dimensions
The physical attributes of the objects we perceive (see)
Multiple similar versions of the same graphic, split by some variable
Relationship between two variables at 5 different time points
One panel for treatment group over time, one panel for control group over time
ggplot2 is the R graphics package that implements graphing according to Wilkinson’s rules
The journal article introducing ggplot2:
Wickham, H. (2010). A layered grammar of graphics. Journal of Computational and Graphical Statistics, 19(1), 3-28.
The book expanding on the article:
Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Springer.
ggplot2 frames each aspect as a layer in the graphic
ggplot layers closely match Wilkinson’s (but not exactly)
Data is what you expect
Aesthetics are which variables are mapped to which part of the plot
Which variable is associated with the x axis?
Which variable is associated with the y axis?
Any sort of basic statistical operations that are conducted in the service of creating a graphic (not as an analysis of their own)
For example:
Calculating the median, minimum, and maximum in order to create a box plot
Creating bins for a histogram
These dictate what plot you’re creating
Points for x-y pairs creates a scatterplot
A line creates a line
A bar creates a bar plot
Jittering or other small changes to improve the appearance and readability of a plot
Jittering moves points slightly so that they’re not right on top of one another
May adjust the alpha (opacity) of points to more easily see overplotted points
Maps the data to its aesthetic attributes
Continuous and categorical variables have different types of scales
Color works well for continuous (or categorical)
Shape really only works for categorical
The axes and gridlines of the figure
Typically Cartesian (x-y) plane but can be higher dimensions, polar / radial, etc.
Multiple similar versions of the same graphic, split by some variable
Used to create “small multiples”
The grammar tells us what words make up our graphical “sentences,” but offers no advice on how to write well.
We can use the grammar of graphics to create any plot
But is is the correct plot for the research question?
ggplot2 can only help you make the plot…
ggplot2 is set up with a variety of default values to simplify making charts
scatter1 <-
ggplot() +
layer(
data = diamonds, mapping = aes(x = carat, y = price),
geom = "point", stat = "identity", position = "identity") +
scale_y_continuous() +
scale_x_continuous() +
coord_cartesian()
Specify every part of the graphic explicity
scatter1
scatter2 <-
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point()
First part specifies the data and the x and y variables
Second part says to add some points to it
ggplot2 makes assumptions and uses defaults for all other parts
scatter2
You tell a function what goes where using either positional or keyword matching
ggplot(diamonds, aes(carat, price)) + geom_point()
ggplot(data = diamonds, aes(x = carat, y = price)) + geom_point()