Introduction to Biostatistics

1 Learning objectives

1.1 Learning objectives

  • Advanced plotting
    • Color, size, opacity
    • Annotations
    • Changing themes, axis labels, re-ordering categories
    • Complex and combined plots

2 Perception

2.1 Perception

  • Data is encoded in graphics
    • How can we make it easier for people to understand?
    • What are humans good (and bad) at?

2.2 What are we good (and bad) at?

  • Good: Comparing position and length
    • Scatterplots, bar plots
  • Good: Color!
    • Use color to highlight, identify, group
    • Caveats: Colorblindness, contrast, accessibility
  • Bad: Comparing areas, volume, curvature
    • Pie charts

3 Data

3.1 Data

library(gapminder)
data(gapminder)
gap_2007 <- gapminder %>% filter(year == 2007)
head(gap_2007)
# A tibble: 6 × 6
  country     continent  year lifeExp      pop gdpPercap
  <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
1 Afghanistan Asia       2007    43.8 31889923      975.
2 Albania     Europe     2007    76.4  3600523     5937.
3 Algeria     Africa     2007    72.3 33333216     6223.
4 Angola      Africa     2007    42.7 12420476     4797.
5 Argentina   Americas   2007    75.3 40301927    12779.
6 Australia   Oceania    2007    81.2 20434176    34435.

4 Colors

4.1 “color” and “fill”

  • Some objects have a “color” attribute
    • This is the color for the whole object
  • Some objects have both “color” and “fill” attributes
    • color” is the outline color
    • fill” is the “filled in” color

4.2 Built-in colors in R

4.3 Specific colors: Cedars-Sinai

  • Red
    • RGB: 220, 30, 52
    • HEX: #dc1e34
  • Grey
    • RGB: 118, 119, 122
    • HEX: #76777a
CS_red <- "#dc1e34"
CS_grey <- "#76777a"

4.4 Create a color palette

color_blind_friendly <- c("#E69F00", "#000000", "#56B4E9", "#009E73", 
                          "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

4.5 Change colors manually

ggplot(data = gap_2007, 
       aes(x = continent)) +
  geom_bar(color = "black", 
           fill = "pink")

4.6 Change colors manually

ggplot(data = gap_2007, 
       aes(x = continent)) +
  geom_bar(color = CS_grey, fill = CS_red)

4.7 Colors based on a variable

ggplot(data = gap_2007, 
       aes(x = gdpPercap,
           y = lifeExp, 
           color = continent)) +
  geom_point()

4.8 Colors based on a variable (CB)

ggplot(data = gap_2007, 
       aes(x = gdpPercap, 
           y = lifeExp, 
           color = continent)) +
  geom_point() +
  scale_color_manual(values = color_blind_friendly)

  • scale_color_manual() for color
  • scale_fill_manual() for fill

5 Size and opacity

5.1 Change line width

ggplot(data = gap_2007, 
       aes(x = continent)) +
  geom_bar(color = "black", 
           fill = "pink", 
           linewidth = 1.5)

5.2 Change point size

ggplot(data = gap_2007, 
       aes(x = gdpPercap, 
           y = lifeExp, 
           color = continent)) + 
  geom_point(size = 4)

5.3 Change opacity of points

ggplot(data = gap_2007, 
       aes(x = gdpPercap, 
           y = lifeExp, 
           color = continent)) + 
  geom_point(size = 4, alpha = 0.3)

6 Annotations

6.1 annotate() layer

  • General function to add different types of annotations
    • "text"
    • "rect"
    • "segment"
    • "pointrange"
  • Also specify location of the annotation
    • x, y
    • xmin, ymin, xmax, ymax, xend, yend

6.2 Add some text

ggplot(data = gap_2007, 
       aes(x = gdpPercap, 
           y = lifeExp, 
           color = continent)) +
  geom_point() +
  annotate("text", 
           x = 30000, y = 60, 
           label = "Some text here")

6.3 Add a rectangle

ggplot(data = gap_2007, 
       aes(x = gdpPercap, 
           y = lifeExp, 
           color = continent)) +
  geom_point() +
  annotate("rect", 
           xmin = 20000, xmax = 40000, 
           ymin = 50, ymax = 70)

6.4 Useful annotation

ggplot(data = gap_2007, 
       aes(x = gdpPercap, 
           y = lifeExp, 
           color = continent)) +
  geom_point() +
  annotate("text", 
           x = 40000, y = 73, 
           label = "Countries with \nhigh GDP per capita and \nhigh life expectancy") +
  annotate("rect", 
           xmin = 30000, xmax = 50000, 
           ymin = 75, ymax = 83, 
           alpha = 0.2)

7 Labels and axes

7.1 labs() layer

  • All the labels in the plot
    • Title, subtitle, caption
    • Legend title
    • X and Y axis labels
  • Still supports older versions: xlab(), ylab(), ggtitle()

7.2 Title and subtitle

ggplot(data = gap_2007, 
       aes(x = gdpPercap, 
           y = lifeExp, 
           color = continent)) +
  geom_point() +
  labs(title = "This is the title",
       subtitle = "This is the subtitle",
       caption = "And here's a caption")

7.3 Title and subtitle

ggplot(data = gap_2007, 
       aes(x = gdpPercap, 
           y = lifeExp, 
           color = continent)) +
  geom_point() +
  labs(title = "Scatterplot of life expectancy vs GDP per capita",
       subtitle = "There is a nonlinear relationship",
       caption = "Data: gapminder package, year 2007")

7.4 Axis and legend labels

ggplot(data = gap_2007, 
       aes(x = gdpPercap, 
           y = lifeExp, 
           color = continent)) +
  geom_point() +
  labs(x = "GDP (per capita)",
       y = "Life expectancy (years)",
       color = "The continents")

8 Changes to theme

8.1 Themes

  • Built-in themes: theme_classic(), theme_grey(), etc.
  • Packages to change theme
    • ggthemes: 538, WSJ, Economist
    • bbplot: BBC
  • Make your own theme: Branding
  • Change theme for whole document
    • e.g., theme_set(theme_classic()) in setup chunk

8.2 Theme elements

8.3 Theme elements

  • Change color of outer background to green
    • theme(panel.background = element_rect(fill = "green"))
  • Change inner background to white with medium grey border
    • theme(plot.background = element_rect(fill = "white", colour = "grey50"))
  • Change axis label text to blue
    • theme(axis.text = element_text(colour = "blue"))
  • Remove the legend
    • theme(legend.position = "none")

8.4 Limits, ticks, etc.

  • Adjust the axis limits, labels, or ticks for continuous variable
    • scale_x_continuous() and scale_y_continuous()
  • Adjust the axis limits, labels, or ticks for discrete variable
    • scale_x_discrete() and scale_y_discrete()
  • Adjust just the limits
    • xlim() and ylim()

9 Adjust variables on the fly

9.1 Default order for factors

ggplot(data = gap_2007, 
       aes(x = continent)) +
  geom_bar()

9.2 Re-order factor levels in the plot

ggplot(data = gap_2007, 
       aes(x = reorder(continent, 
                       pop, 
                       fun = median))) +
  geom_bar()

9.3 Re-order factor levels in the plot

ggplot(data = gap_2007, 
       aes(x = reorder(continent, 
                       lifeExp, 
                       fun = median), 
           y = lifeExp)) +
  geom_boxplot()

9.4 Re-order and other things

gap_2007_americas <- gap_2007 %>% 
  filter(continent == "Americas")
ggplot(data = gap_2007_americas, 
       aes(x = country,
           y = lifeExp)) +
  geom_col()

9.5 Re-order and other things

ggplot(data = gap_2007_americas, 
       aes(x = country,
           y = lifeExp)) +
  geom_col() +
  coord_flip()

9.6 Re-order and other things

ggplot(data = gap_2007_americas, 
       aes(x = reorder(country, 
                       lifeExp, 
                       fun = mean),
           y = lifeExp)) +
  geom_col() +
  coord_flip()

9.7 Alternative: Rotate axis text

ggplot(data = gap_2007_americas, 
       aes(x = reorder(country,
                       lifeExp, 
                       fun = mean),
           y = lifeExp)) +
  geom_col() +
  theme(axis.text.x = element_text(angle = 90, 
                                   hjust=1))

10 Complex / combined plots

10.1 Index plot (Manhattan plot)

ggplot(data = gap_2007, 
       aes(x = 1:nrow(gap_2007), 
           y = pop)) +
  geom_point()

10.2 Index plot with labels

ggplot(data = gap_2007, 
       aes(x = 1:nrow(gap_2007), 
           y = pop)) +
  geom_point() +
  geom_text(aes(label=as.character(country)),
            hjust = 0, nudge_x = 2)

10.3 Index plot with labels

ggplot(data = gap_2007, 
       aes(x = 1:nrow(gap_2007), 
           y = pop)) +
  geom_point() +
  geom_text(aes(label=ifelse((pop > 500000000), 
                             as.character(country), 
                             '')),
            hjust = 0, nudge_x = 2)

10.4 Bells and whistles version

ggplot(data = gap_2007, 
       aes(x = 1:nrow(gap_2007), 
           y = pop)) +
  geom_point() +
  geom_text(aes(label=ifelse((pop > 500000000), 
                             as.character(country), 
                             '')),
            hjust = 0, nudge_x = 2) +
  geom_hline(yintercept = 500000000, 
             color = "blue", 
             linetype = "dashed") +
  annotate("text", 
           x = 50, y = 850000000, 
           label = "China and India have notably larger populations") +
  annotate("rect", 
           xmin = 20, xmax = 75, 
           ymin = 1000000000, ymax = 1400000000, 
           alpha = 0.2) +
  labs(title = "Most countries have populations below 500 million",
       x = "Country number (index)",
       y = "Population")

10.5 Add rugs to a scatterplot

ggplot(data = gap_2007, 
       aes(x = gdpPercap, 
           y = lifeExp)) + 
  geom_point(size = 4, alpha = 0.3) +
  geom_rug()

10.6 Add rugs to a scatterplot

ggplot(data = gap_2007, 
       aes(x = gdpPercap, 
           y = lifeExp)) + 
  geom_point(size = 4, alpha = 0.3) +
  geom_rug(sides = "tr")

10.7 Lollipop plot

ggplot(data = gap_2007_americas, 
       aes(x = country,
           y = lifeExp)) +
  theme(axis.text.x = element_text(angle = 90, 
                                   hjust=1, 
                                   vjust = 0.5)) +
  geom_point() +
  geom_segment(aes(x = country, 
                   xend = country, 
                   y = 0, 
                   yend = lifeExp))

10.8 Lollipop plot

ggplot(data = gap_2007_americas, 
       aes(x = country,
           y = lifeExp)) +
  theme(axis.text.x = element_text(angle = 90, 
                                   hjust=1, 
                                   vjust = 0.5)) +
  geom_point(size = 4) +
  geom_segment(aes(x = country, 
                   xend = country, 
                   y = 0, 
                   yend = lifeExp))

10.9 Raincloud plot

10.10 Raincloud plot

library(ggrain)
ggplot(gap_2007,
       aes(x = continent,
           y = gdpPercap)) +
  geom_rain()

10.11 Raincloud plot

library(ggrain)
ggplot(gap_2007,
       aes(x = 1,
           y = gdpPercap)) +
  geom_rain(color = "black",
            fill = "forestgreen",
            alpha = 0.5) +
  coord_flip()

10.12 Raincloud plot

library(ggrain)
ggplot(gap_2007,
       aes(x = 1,
           y = gdpPercap)) +
  geom_rain(color = "black",
            fill = "forestgreen",
            alpha = 0.5) +
  coord_flip() +
  labs(x = "", y = "GDP (per capita)") +
  scale_x_continuous(labels = NULL, 
                     breaks = NULL)

11 Final thoughts

11.1 Final thoughts

  • ggplot creates layers
    • Make your plot a layer at a time
  • Keep perception in mind
    • What are we good at seeing and processing?
  • Use color carefully
    • And sparingly

11.2 Label your axes (meaningfully)!

https://xkcd.com/833/

12 In-class activities

12.1 In-class activities

  • Make some plots
    • Some of them will be ugly
    • Let’s try to fix them

12.2 Next week

  • Sample vs population
    • Conceptual: Statistical inference
  • Probability and distributions
    • Sampling distributions