Adding a specific equation to a plot

geom_smooth option: add a specific equation to plot

We have used geom_smooth before to create a general line

method = lm for a straight line
method = loess for a weighted average smooth line

What if you have a more specific equation?

This example will use a model-based equation
But you can also specify a certain (e.g., theoretical) equation

geom_smooth option: add a specific equation to plot

Remember that we had a not-quite-linear relationship between HS GPA and college GPA

data(FirstYearGPA)
scatter <- ggplot(data = FirstYearGPA, 
                      aes(x = HSGPA, y = GPA)) +
  geom_point()

geom_smooth option: add a specific equation to plot

scatter + geom_smooth(method = loess, se = FALSE)

## `geom_smooth()` using formula 'y ~ x'

geom_smooth option: add a specific equation to plot

Step 1. Run the model

cubic_model <- lm(GPA ~ HSGPA + I(HSGPA^2) + I(HSGPA^3), 
                  data = FirstYearGPA)
cubic_model

## 
## Call:
## lm(formula = GPA ~ HSGPA + I(HSGPA^2) + I(HSGPA^3), data = FirstYearGPA)
## 
## Coefficients:
## (Intercept)        HSGPA   I(HSGPA^2)   I(HSGPA^3)  
##    -23.9927      23.6388      -6.9833       0.6974

geom_smooth option: add a specific equation to plot

Step 2. Tidy the model

cubic_tidy <- tidy(cubic_model)

print(cubic_tidy)

## # A tibble: 4 × 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)  -24.0      13.6       -1.76  0.0793
## 2 HSGPA         23.6      12.6        1.88  0.0613
## 3 I(HSGPA^2)    -6.98      3.84      -1.82  0.0702
## 4 I(HSGPA^3)     0.697     0.388      1.80  0.0734

geom_smooth option: add a specific equation to plot

Step 3 - version 1. Use formula in geom_smooth

cubic_plot1 <- scatter +
  geom_point() +
  geom_smooth(method = lm, 
              formula = y ~ x + I(x^2) + I(x^3), 
              se = FALSE)

geom_smooth option: add a specific equation to plot

cubic_plot1

geom_smooth option: add a specific equation to plot

Step 3 - version 2. Use model values in geom_smooth

cubic_plot2 <- scatter +
  geom_point() +
  geom_smooth(method = nls, 
formula = y ~ intercept + linear*x + 
                quadratic*x*x + cubic*x*x*x, 
se = FALSE,
method.args = list(start = 
c(intercept = cubic_tidy$estimate[1], 
linear = cubic_tidy$estimate[2], 
quadratic = cubic_tidy$estimate[3], 
cubic = cubic_tidy$estimate[4])))

geom_smooth option: add a specific equation to plot

cubic_plot2

Two lines with a legend

Add another line with a legend

This is a bit of a cheat to get this to work

You take advantage of the fact that

Color arguments generally go outside of aes() in a geom_smooth
But if you put the color argument inside aes(), it treats it as a label instead of a color assignment

Add another line with a legend

model_colors <- RColorBrewer::brewer.pal(3, "Set1")
cubic_plot3 <- scatter + geom_point() +
  geom_smooth(method = nls, se = FALSE,
formula = y ~ intercept + linear*x + 
  quadratic*x*x + cubic*x*x*x, 
method.args = list(start = 
c(intercept = cubic_tidy$estimate[1], 
linear = cubic_tidy$estimate[2], 
quadratic = cubic_tidy$estimate[3], 
cubic = cubic_tidy$estimate[4])),
aes(color = "Cubic", fill = "Cubic")) +
  geom_smooth(method = lm, aes(color = "Linear", 
                               fill = "Linear"), 
              se = FALSE) +
  scale_color_manual(name = "Models", 
                     values = model_colors) +
  scale_fill_manual(name = "Models", 
                    values = model_colors)

Add another line with a legend

cubic_plot3

## `geom_smooth()` using formula 'y ~ x'

Confidence intervals based on your model

Add CIs to the line

Calculate the confidence intervals based on your model with the predict function

fit: predicted value
lwr: lower CI limit
upr: upper CI limit

Add CIs to the line

gpa_cubic_predictions <- predict(object = cubic_model, 
                                 newdata = FirstYearGPA, 
                                 interval = "predict")
head(gpa_cubic_predictions)

##        fit      lwr      upr
## 1 3.285010 2.460852 4.109168
## 2 3.460149 2.624765 4.295534
## 3 3.192314 2.368201 4.016426
## 4 3.100341 2.276226 3.924456
## 5 3.285010 2.460852 4.109168
## 6 3.010882 2.187038 3.834726

gpa_cubic_predict <- cbind(FirstYearGPA, 
                           gpa_cubic_predictions)

Add CIs to the line

geom_ribbon shows a ribbon of color from ymin to ymax

These values are the 95% CIs (default) for each value of X

cubic_plot4 <- scatter + geom_point() +
  geom_line(data = gpa_cubic_predict, 
            aes(x = HSGPA, y = fit)) +
  geom_ribbon(data = gpa_cubic_predict, 
              aes(ymin = lwr, ymax = upr), 
              alpha = .2)

Add CIs to the line

cubic_plot4

nest() and group_by()

Using nest() and group_by()

We have already used group_by() to perform an operation across several groups

e.g., calculate the mean GDP per capita for each country
One value of mean GDP per capita for each country

The nest() function goes one step further and actually splits the dataset up into chunks for each group

You end up with a nested table
You can run separate analyses and get separate results

Using nest() and group_by()

GPA_nest_genderrace <- FirstYearGPA %>%
  group_by(Male, White) %>%
  nest()

Using nest() and group_by()

GPA_nest_genderrace

## # A tibble: 4 × 3
## # Groups:   Male, White [4]
##    Male White data             
##   <int> <int> <list>           
## 1     1     1 <tibble [84 × 8]>
## 2     0     1 <tibble [89 × 8]>
## 3     0     0 <tibble [28 × 8]>
## 4     1     0 <tibble [18 × 8]>

A quick diversion: writing a function

We use functions all the time when using R

lm() is a function, ggplot() is a function, etc.

We can also write functions to do things

Useful when you’d have to write the same code over and over
Just call the function instead

A quick diversion: writing a function

name_of_fxn <- function(input_args){

calculations_and_other_stuff

}

General structure of a function

input_args are things you might give the function

They are optional

calculations_and_other_stuff are whatever calculations you perform (mathematical operations, models, graphics, etc.)

The function returns the last thing it calculated, or you can tell it to return a specific thing by using

return(thing_i_want_returned)

A quick diversion: writing a function

This function has one argument, dataframe

The function uses the specified dataset to run the regression of HSGPA predicing GPA

lm_fxn <- function(dataframe){

lm(GPA ~ HSGPA, data = dataframe)

}

Using nest() and group_by()

The map() function applies a function to a set of objects

data is the name of the column in our nested dataframe that contains the actual data

lm_fxn is the function we just created

This will run the regression specified in lm_fxn to each of the nested dataframes in the nested dataset
We will have 4 separate outputs, one for each group

grouped_lm <- GPA_nest_genderrace %>%
  mutate(model = map(data, lm_fxn))

Using nest() and group_by()

Here is the nested dataframe

One column contains the data for each group
One column contains the regression model for each group

grouped_lm

## # A tibble: 4 × 4
## # Groups:   Male, White [4]
##    Male White data              model 
##   <int> <int> <list>            <list>
## 1     1     1 <tibble [84 × 8]> <lm>  
## 2     0     1 <tibble [89 × 8]> <lm>  
## 3     0     0 <tibble [28 × 8]> <lm>  
## 4     1     0 <tibble [18 × 8]> <lm>

Using nest() and group_by()

We can apply the tidy() function to the nested dataframe

Again, using the map() function

tidy_grouped_lm <- grouped_lm %>% 
  mutate(tidied = map(model, tidy))
tidy_grouped_lm

## # A tibble: 4 × 5
## # Groups:   Male, White [4]
##    Male White data              model  tidied          
##   <int> <int> <list>            <list> <list>          
## 1     1     1 <tibble [84 × 8]> <lm>   <tibble [2 × 5]>
## 2     0     1 <tibble [89 × 8]> <lm>   <tibble [2 × 5]>
## 3     0     0 <tibble [28 × 8]> <lm>   <tibble [2 × 5]>
## 4     1     0 <tibble [18 × 8]> <lm>   <tibble [2 × 5]>

Using nest() and group_by()

Then we can unnest() the dataframe to see values

out_tidy <- tidy_grouped_lm %>%
  unnest(tidied)
out_tidy

## # A tibble: 8 × 9
## # Groups:   Male, White [4]
##    Male White data              model term  estimate std.error statistic p.value
##   <int> <int> <list>            <lis> <chr>    <dbl>     <dbl>     <dbl>   <dbl>
## 1     1     1 <tibble [84 × 8]> <lm>  (Int…    1.93      0.421     4.60  1.53e-5
## 2     1     1 <tibble [84 × 8]> <lm>  HSGPA    0.366     0.122     3.00  3.60e-3
## 3     0     1 <tibble [89 × 8]> <lm>  (Int…    1.14      0.390     2.93  4.33e-3
## 4     0     1 <tibble [89 × 8]> <lm>  HSGPA    0.571     0.111     5.14  1.69e-6
## 5     0     0 <tibble [28 × 8]> <lm>  (Int…   -0.150     0.521    -0.288 7.76e-1
## 6     0     0 <tibble [28 × 8]> <lm>  HSGPA    0.873     0.150     5.83  3.82e-6
## 7     1     0 <tibble [18 × 8]> <lm>  (Int…    0.875     1.09      0.805 4.32e-1
## 8     1     0 <tibble [18 × 8]> <lm>  HSGPA    0.574     0.321     1.79  9.26e-2

Then filter the dataset

The dataframe includes all the coefficients

Here, just intercept and slope
But say that we really only want to look at slope

slopes_only <- out_tidy %>%
  filter(term == "HSGPA")
slopes_only

## # A tibble: 4 × 9
## # Groups:   Male, White [4]
##    Male White data              model term  estimate std.error statistic p.value
##   <int> <int> <list>            <lis> <chr>    <dbl>     <dbl>     <dbl>   <dbl>
## 1     1     1 <tibble [84 × 8]> <lm>  HSGPA    0.366     0.122      3.00 3.60e-3
## 2     0     1 <tibble [89 × 8]> <lm>  HSGPA    0.571     0.111      5.14 1.69e-6
## 3     0     0 <tibble [28 × 8]> <lm>  HSGPA    0.873     0.150      5.83 3.82e-6
## 4     1     0 <tibble [18 × 8]> <lm>  HSGPA    0.574     0.321      1.79 9.26e-2

Then make a plot of the values

est_grouped <- ggplot(data = slopes_only,
            mapping = aes(x = as.factor(Male), y = estimate,
                          ymin = estimate - 2*std.error,
                          ymax = estimate + 2*std.error,
                          group = as.factor(White), color = as.factor(White))) +
  geom_errorbar(position = position_dodge(width = 1)) +
  labs(x = "Gender", y = "Estimate", color = "Race")

Then make a plot of the values

est_grouped

Interactivity overview

Interactivity – what is it?

There are many types of interactivity

Filter data based on a variable
Zoom in on a plot
Add fit lines of various types

Anything that changes your plot from static

Interactivity – how can I do it?

The very simplest way to introduce interactivity is with a package called plotly

Wrap your ggplot (or any other plot) in the ggplotly() function:

ggplotly(scatter)

(This only works for HTML output, so I’m not running it here – see plotlyHTMLdemo file)

You can select certain portions of the plot, zoom in, mouse over to see data values, etc.

This is a very easy way to make your HTML documents interactive, but just using plotly is very limited because the plot is static

But it also doesn’t require any connection to R to work

More ways to introduce interactivity

From least to most flexible / interactive:

flexdashboard package
shinydashboard package
shiny elements in a markdown file
stand-alone shiny app

flexdashboard

https://rmarkdown.rstudio.com/flexdashboard/

Done through a markdown document

After installing flexdashboard, there is a template available for your markdown document

Focus is on the layout and static objects

But you can add interactive objects using shiny
Need to specify runtime: shiny in the YAML (indented once, so it’s an output option)
Note that it’s no longer “Knit” but “Run”

See flexdashboarddemo file for an example

shinydashboard

https://rstudio.github.io/shinydashboard/get_started.html

Done through a (non-markdown) R syntax file

Two functions contain most of your code: ui and server

ui contains information about the look, including input and output objects
server constains information about any calculations or graphics

At the end, run shinyApp(ui, server) to run the app

Note that it’s no longer “Knit” but “Run”

See shinydashboarddemo file for an example

shiny elements in a markdown file

https://bookdown.org/yihui/rmarkdown/shiny-documents.html

Done within a markdown document, but not to make a whole dashboard (as in flexdashboard) but to add a small interactive element to a markdown document

Need to specify runtime: shiny in the YAML (not indented, so it’s a top-level option)
Note that it’s no longer “Knit” but “Run”

See shinyinmarkdowndemo file for an example

Stand-alone shiny app

Start by selecting “New File” and then “Shiny Web App”

You can choose a single file or multiple file format
- Single file: 1 file called “app.R” in a folder with the name of your application
- Multiple file: 1 file called “ui.R” with the UI code and 1 file called “server.R” with the calculations code
Single file can be easier because it’s all in one place, but multiple file is easier to deal with for large projects

The template is already in place, just fill in your input, output, and calculations code

See app.R file for an example

Put it in a folder called “shiny_demo”, open it in R, and run it

Shiny input functions

There are a bunch of pre-defined input widgets (functions)

numericInput

radioButtons

sliderInput

fileInput

With a few exceptions, they end with “Input”

See http://shiny.rstudio.com/tutorial/written-tutorial/lesson3/ for a full list with examples

Each one requires at least 1) a label that you’ll use to refer to it in the program and 2) a text label that will show up in the app

Some widgets have other options too

Shiny output functions

Output functions are actually pairs of functions

One defines the object you’re outputting (in server.R if you have one)
- These usually start with “render” such as renderPlot or renderTable
- Your code (e.g., a call to a named ggplot) will go inside
One defines where to put the object (in the ui.R if you have one)
- These will match the render functions, such as plotOutput or tableOutput
- Usually, the only argument is the named object from the server

See http://shiny.rstudio.com/tutorial/written-tutorial/lesson4/ for a full list with examples

Statistical Graphics and Communication

Stefany Coxe

October 31, 2019

Adding a specific equation to a plot

geom_smooth option: add a specific equation to plot

geom_smooth option: add a specific equation to plot

geom_smooth option: add a specific equation to plot

geom_smooth option: add a specific equation to plot

geom_smooth option: add a specific equation to plot

geom_smooth option: add a specific equation to plot

geom_smooth option: add a specific equation to plot

geom_smooth option: add a specific equation to plot

geom_smooth option: add a specific equation to plot

Two lines with a legend

Add another line with a legend

Add another line with a legend

Add another line with a legend

Confidence intervals based on your model

Add CIs to the line

Add CIs to the line

Add CIs to the line

Add CIs to the line

nest() and group_by()

Using nest() and group_by()

Using nest() and group_by()

Using nest() and group_by()

A quick diversion: writing a function

A quick diversion: writing a function

A quick diversion: writing a function

Using nest() and group_by()

Using nest() and group_by()

Using nest() and group_by()

Using nest() and group_by()

Then filter the dataset

Then make a plot of the values

Then make a plot of the values

Interactivity overview

Interactivity – what is it?

Interactivity – how can I do it?

More ways to introduce interactivity

flexdashboard

shinydashboard

shiny elements in a markdown file

Stand-alone shiny app

Shiny input functions

Shiny output functions