%>% is the pipe function
There are a variety of functions to use with pipes
summarize()
mutate()
select()
filter()
arrange()
%in%
& (and)
| (or)
== (equal to)
!= (not equal to)
total_gdp_by_country <- gapminder %>%
group_by(continent, country) %>%
summarize(mean_gdp = mean(gdpPercap)) %>%
mutate(totalGDP = gdpPercap * pop)
Take the gapminder dataset
Group it by both continent and country
Calculate the mean GDP per capita for all observations (within a group)
Create a new variable that is the total GDP for a country by multiplying the GDP per capita by the population
This creates a new dataset (total_gdp_by_country) with a new variable (mean_gdp) that is at the continent-country level and a new variable (totalGDP) that is at the continent-country-year level
Dollar sign syntax
Often used to manipulate data and create new variables
Refer to the population variable in the gapminder dataset: gapminder$pop
Formula syntax
Used to specify models in some packages
Run a regression using the lm package: lm(data = gapminder, gdpPercap ~ lifeExp)
See this cheatsheet for some more information:
https://github.com/rstudio/cheatsheets/raw/master/syntax.pdf
theme_grey()
theme_gray()
theme_bw()
theme_linedraw()
theme_light()
theme_dark()
theme_minimal()
theme_classic()
theme_void()
theme_test()
You can make your own adjustments to specific parts of the plot
For example:
theme(plot.background = element_rect(fill = “green”))
theme(panel.background = element_rect(fill = “white”, colour = “grey50”))
theme(axis.text = element_text(colour = “blue”))
There are a lot of options that you can change:
R has several hundred built-in pre-defined colors
View the list by typing colors() in the console
See the colors here: http://sape.inf.usi.ch/quick-reference/ggplot2/colour
You can specify a single color to be used by using its name
You can also use unique colors, like FIU blue and gold, by specifying their RGB, CMYK, or hex codes
Instead of specifying a different color each time in each plot, use a color palette with a variety of colors
You can define a color palette that you like, such as:
A color-blind friendly palette from https://jfly.uni-koeln.de/color/
cbPalette <- c(“#000000”, “#E69F00”, “#56B4E9”, “#009E73”, “#F0E442”,
“#0072B2”, “#D55E00”, “#CC79A7”)
For plots where you use the “fill” option:
scale_fill_manual(values=cbPalette)
For plots with lines and points:
scale_color_manual(values=cbPalette)
You can also create a color palette using end point (and optionally, mid point) colors
Sequential color with a continuous variable mapped to color:
Diverging color with a continuous variable mapped to color:
RColorBrewer package: https://cran.r-project.org/web/packages/RColorBrewer/RColorBrewer.pdf based on the ColorBrewer website for color picking
3 types of color palettes:
Qualitative (non-ordered categories)
Sequential
Diverging
Install the package
install.packages(“RColorBrewer”)
Load the package
library(RColorBrewer)
Look at all available palettes
display.brewer.all()
Or just one particular palette
display.brewer.pal(n = 8, name = ‘RdBu’)
For plots where you use the “fill” option
scale_fill_brewer(palette=“Dark2”)
For plots with lines and points
scale_color_brewer(palette=“Dark2”)
There are a ton of themes available
The BBC has released a package of the theme they use for all graphics:
ggthemes is a package with several included themes:
https://cran.r-project.org/web/packages/ggthemes/ggthemes.pdf
Fivethirtyeight
Wall Street Journal
The Economist
APA theme from the jtools package
Tufte
Tint (Tint is not Tufte)
XKCD theme
Brooklyn-99
Game of Thrones
You can create your own custom theme by modifying some of the many arguments available:
You might also consider using a standard theme (incuding branding like logos) for all of your work:
The axes are labeled by default with the variable names
Changing the plot title
ggtitle(“Plot of 1st year college GPA by high school GPA”)
Changing the X and Y axis labels
xlab(“High school GPA”)
ylab(“First year college GPA”)
A more general option: the labs function
labs(title=“Plot of 1st year college GPA by high school GPA”,
x =“High school GPA”,
y = “First year college GPA”,
color = “First generation status”)
Change the size of the title
X axis
Y axis
Font family (sans, serif), face (plain, bold, italic), size (points, so 10, 12, 14)
More information about fonts here: http://www.cookbook-r.com/Graphs/Fonts/
You probably don’t want to do this, in general, but if you do
You can modify the theme elements to be “blank” which removes them
You can do this with any element in the plot, you just need to know its name
theme(
plot.title = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank())
You may want to do this a lot
theme(axis.title.y = element_text(angle = 0))
Annotations are any notes or additions that you make in the plot place to highlight or explain
We’ve briefly talked about a few ways to annotate a plot
geom_polygon and geom_rect to select an area
geom_label and geom_text to add labels to all points
ggplot2 also has an annotate function to add annotations
Add some text
Add a rectangle to the plot
Line segments
Line segment with point
What about something cool where you run a regression, save the values from it and print the regression equation on the plot?
That’s a little more complex
In the lab exercise
Some solutions here:
https://community.rstudio.com/t/annotate-ggplot2-with-regression-equation-and-r-squared/6112/4
https://stackoverflow.com/questions/7549694/adding-regression-line-equation-and-r2-on-graph#
Adding different annotations to different plots (facets) arranged together
Keep formatting guidelines in mind when making your plots
Remember that ggplot creates layers
Make your plot 1 layer at a time
Don’t try to change too many things at once
Here is an example of someone going through all the steps to make the final plot that they actually want: