Filtering and restructuring

The dplyr package

%>% is the pipe function

  • Take the thing on the left and do the thing on the right to it

There are a variety of functions to use with pipes

  • These are referred to as “verbs” because they perform an action

dplyr verbs

summarize()

  • Reduce multiple values down to a single summary value

mutate()

  • Add or modify variables in the dataset
  • Create new variables that are functions of existing ones

select()

  • Pick variables (columns) based on their names

filter()

  • Pick cases (rows) based on their values

arrange()

  • Change the ordering of the rows

Some other commands

%in%

  • filter(continent %in% c(“Asia”, “Europe”))

& (and)

  • filter((continent == “Asia”) & (year == “1952”))

| (or)

  • filter((continent == “Asia”) | (continent == “Europe”))

== (equal to)

!= (not equal to)

Chain a series of dplyr verbs using pipes

total_gdp_by_country <- gapminder %>%
group_by(continent, country) %>%
summarize(mean_gdp = mean(gdpPercap)) %>%
mutate(totalGDP = gdpPercap * pop)

Take the gapminder dataset

  • Group it by both continent and country

  • Calculate the mean GDP per capita for all observations (within a group)

  • Create a new variable that is the total GDP for a country by multiplying the GDP per capita by the population

This creates a new dataset (total_gdp_by_country) with a new variable (mean_gdp) that is at the continent-country level and a new variable (totalGDP) that is at the continent-country-year level

There are other ways to do this

  • There are at least two other ways, and sometimes you’ll use a combination of all three ways, depending on the situation

Dollar sign syntax

  • Often used to manipulate data and create new variables

  • Refer to the population variable in the gapminder dataset: gapminder$pop

Formula syntax

  • Used to specify models in some packages

  • Run a regression using the lm package: lm(data = gapminder, gdpPercap ~ lifeExp)

See this cheatsheet for some more information:

https://github.com/rstudio/cheatsheets/raw/master/syntax.pdf

Themes and colors

Built-in ggplot themes

theme_grey()

theme_gray()

theme_bw()

theme_linedraw()

theme_light()

theme_dark()

theme_minimal()

theme_classic()

theme_void()

theme_test()

Making your own theme adjustments

You can make your own adjustments to specific parts of the plot

For example:

theme(plot.background = element_rect(fill = “green”))

  • Changes the outer background to green

theme(panel.background = element_rect(fill = “white”, colour = “grey50”))

  • Changes the inner background to white

theme(axis.text = element_text(colour = “blue”))

  • Changes the axis label text to blue

There are a lot of options that you can change:

https://ggplot2.tidyverse.org/reference/theme.html

Colors

R has several hundred built-in pre-defined colors

You can also use unique colors, like FIU blue and gold, by specifying their RGB, CMYK, or hex codes

Color palettes

Instead of specifying a different color each time in each plot, use a color palette with a variety of colors

  • A set of colors to be used as needed

You can define a color palette that you like, such as:

  • A color-blind friendly palette from https://jfly.uni-koeln.de/color/
    cbPalette <- c(“#000000”, “#E69F00”, “#56B4E9”, “#009E73”, “#F0E442”, “#0072B2”, “#D55E00”, “#CC79A7”)

  • For plots where you use the “fill” option:
    scale_fill_manual(values=cbPalette)

  • For plots with lines and points:
    scale_color_manual(values=cbPalette)

Color palettes

You can also create a color palette using end point (and optionally, mid point) colors

Sequential color with a continuous variable mapped to color:

  • scale_color_gradient(low = ‘greenyellow’, high = ‘forestgreen’)

Diverging color with a continuous variable mapped to color:

  • scale_color_gradient2(low = ‘blue’, mid = ‘white’, high = ‘red’)

Color palettes

RColorBrewer package: https://cran.r-project.org/web/packages/RColorBrewer/RColorBrewer.pdf based on the ColorBrewer website for color picking

  • Provides some pre-made color palettes to use

3 types of color palettes:

  • Qualitative (non-ordered categories)

  • Sequential

  • Diverging

Using a color palette

  • Install the package
    install.packages(“RColorBrewer”)

  • Load the package
    library(RColorBrewer)

  • Look at all available palettes
    display.brewer.all()

  • Or just one particular palette
    display.brewer.pal(n = 8, name = ‘RdBu’)

  • For plots where you use the “fill” option
    scale_fill_brewer(palette=“Dark2”)

  • For plots with lines and points
    scale_color_brewer(palette=“Dark2”)

Package themes

There are a ton of themes available

The BBC has released a package of the theme they use for all graphics:

ggthemes is a package with several included themes:

Make your own themes

You can create your own custom theme by modifying some of the many arguments available:

You might also consider using a standard theme (incuding branding like logos) for all of your work:

Axes and titles

Labeling plots and axes

The axes are labeled by default with the variable names

  • That is not super helpful when your variable is called “PLSTo60mo” and no readers know what that means…

Changing the plot title

ggtitle(“Plot of 1st year college GPA by high school GPA”)

  • You can use backslash n as a line break for a long title

Changing the X and Y axis labels

xlab(“High school GPA”)

ylab(“First year college GPA”)

Labeling plots and axes

A more general option: the labs function

  • allows you to change the plot title, the axes labels, and the legend label in one function

labs(title=“Plot of 1st year college GPA by high school GPA”,
x =“High school GPA”,
y = “First year college GPA”,
color = “First generation status”)

Changing the size or font of title or axis labels

Change the size of the title

  • theme(plot.title = element_text(family, face, colour, size))

X axis

  • theme(axis.title.x = element_text(family, face, colour, size))

Y axis

  • theme(axis.title.y = element_text(family, face, colour, size))

Font family (sans, serif), face (plain, bold, italic), size (points, so 10, 12, 14)

More information about fonts here: http://www.cookbook-r.com/Graphs/Fonts/

Removing titles or axis labels

You probably don’t want to do this, in general, but if you do

  • You can modify the theme elements to be “blank” which removes them

  • You can do this with any element in the plot, you just need to know its name

theme(
plot.title = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank())

Rotating axis titles

You may want to do this a lot

  • Rotate a sideways Y axis label so that you can read it without turning your head

theme(axis.title.y = element_text(angle = 0))

Annotations

Annotations

Annotations are any notes or additions that you make in the plot place to highlight or explain

We’ve briefly talked about a few ways to annotate a plot

  • geom_polygon and geom_rect to select an area

  • geom_label and geom_text to add labels to all points

ggplot2 also has an annotate function to add annotations

  • Such as text, line segments, or highlighted boxes

Annotations

Add some text

  • annotate(“text”, x = 4, y = 25, label = “italic(R) ^ 2 == 0.75”, parse = TRUE)

Add a rectangle to the plot

  • annotate(“rect”, xmin = 3, xmax = 4.2, ymin = 12, ymax = 21, alpha = .2)

Line segments

  • annotate(“segment”, x = 2.5, xend = 4, y = 15, yend = 25, colour = “blue”)

Line segment with point

  • annotate(“pointrange”, x = 3.5, y = 20, ymin = 12, ymax = 28, colour = “red”, size = 1.5)

More complex annotations

What about something cool where you run a regression, save the values from it and print the regression equation on the plot?

That’s a little more complex

  • In the lab exercise

  • Some solutions here:

https://community.rstudio.com/t/annotate-ggplot2-with-regression-equation-and-r-squared/6112/4

https://stackoverflow.com/questions/7549694/adding-regression-line-equation-and-r2-on-graph#

More complex annotations

Adding different annotations to different plots (facets) arranged together

Final comments

Final comments

Keep formatting guidelines in mind when making your plots

Remember that ggplot creates layers

  • Make your plot 1 layer at a time

  • Don’t try to change too many things at once

Here is an example of someone going through all the steps to make the final plot that they actually want: