Adding a specific equation to a plot

geom_smooth option: add a specific equation to plot

We have used geom_smooth before to create a general line

  • method = lm for a straight line

  • method = loess for a weighted average smooth line

What if you have a more specific equation?

  • This example will use a model-based equation

  • But you can also specify a certain (e.g., theoretical) equation

geom_smooth option: add a specific equation to plot

Remember that we had a not-quite-linear relationship between HS GPA and college GPA

data(FirstYearGPA)
scatter <- ggplot(data = FirstYearGPA, 
                      aes(x = HSGPA, y = GPA)) +
  geom_point()

geom_smooth option: add a specific equation to plot

scatter + geom_smooth(method = loess, se = FALSE)
## `geom_smooth()` using formula 'y ~ x'

geom_smooth option: add a specific equation to plot

Step 1. Run the model

cubic_model <- lm(GPA ~ HSGPA + I(HSGPA^2) + I(HSGPA^3), 
                  data = FirstYearGPA)
cubic_model
## 
## Call:
## lm(formula = GPA ~ HSGPA + I(HSGPA^2) + I(HSGPA^3), data = FirstYearGPA)
## 
## Coefficients:
## (Intercept)        HSGPA   I(HSGPA^2)   I(HSGPA^3)  
##    -23.9927      23.6388      -6.9833       0.6974

geom_smooth option: add a specific equation to plot

Step 2. Tidy the model

cubic_tidy <- tidy(cubic_model)

print(cubic_tidy)
## # A tibble: 4 × 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)  -24.0      13.6       -1.76  0.0793
## 2 HSGPA         23.6      12.6        1.88  0.0613
## 3 I(HSGPA^2)    -6.98      3.84      -1.82  0.0702
## 4 I(HSGPA^3)     0.697     0.388      1.80  0.0734

geom_smooth option: add a specific equation to plot

Step 3 - version 1. Use formula in geom_smooth

cubic_plot1 <- scatter +
  geom_point() +
  geom_smooth(method = lm, 
              formula = y ~ x + I(x^2) + I(x^3), 
              se = FALSE)

geom_smooth option: add a specific equation to plot

cubic_plot1

geom_smooth option: add a specific equation to plot

Step 3 - version 2. Use model values in geom_smooth

cubic_plot2 <- scatter +
  geom_point() +
  geom_smooth(method = nls, 
formula = y ~ intercept + linear*x + 
                quadratic*x*x + cubic*x*x*x, 
se = FALSE,
method.args = list(start = 
c(intercept = cubic_tidy$estimate[1], 
linear = cubic_tidy$estimate[2], 
quadratic = cubic_tidy$estimate[3], 
cubic = cubic_tidy$estimate[4]))) 

geom_smooth option: add a specific equation to plot

cubic_plot2

Two lines with a legend

Add another line with a legend

This is a bit of a cheat to get this to work

You take advantage of the fact that

  • Color arguments generally go outside of aes() in a geom_smooth

  • But if you put the color argument inside aes(), it treats it as a label instead of a color assignment

Add another line with a legend

model_colors <- RColorBrewer::brewer.pal(3, "Set1")
cubic_plot3 <- scatter + geom_point() +
  geom_smooth(method = nls, se = FALSE,
formula = y ~ intercept + linear*x + 
  quadratic*x*x + cubic*x*x*x, 
method.args = list(start = 
c(intercept = cubic_tidy$estimate[1], 
linear = cubic_tidy$estimate[2], 
quadratic = cubic_tidy$estimate[3], 
cubic = cubic_tidy$estimate[4])),
aes(color = "Cubic", fill = "Cubic")) +
  geom_smooth(method = lm, aes(color = "Linear", 
                               fill = "Linear"), 
              se = FALSE) +
  scale_color_manual(name = "Models", 
                     values = model_colors) +
  scale_fill_manual(name = "Models", 
                    values = model_colors)

Add another line with a legend

cubic_plot3
## `geom_smooth()` using formula 'y ~ x'

Confidence intervals based on your model

Add CIs to the line

Calculate the confidence intervals based on your model with the predict function

  • fit: predicted value

  • lwr: lower CI limit

  • upr: upper CI limit

Add CIs to the line

gpa_cubic_predictions <- predict(object = cubic_model, 
                                 newdata = FirstYearGPA, 
                                 interval = "predict")
head(gpa_cubic_predictions)
##        fit      lwr      upr
## 1 3.285010 2.460852 4.109168
## 2 3.460149 2.624765 4.295534
## 3 3.192314 2.368201 4.016426
## 4 3.100341 2.276226 3.924456
## 5 3.285010 2.460852 4.109168
## 6 3.010882 2.187038 3.834726
gpa_cubic_predict <- cbind(FirstYearGPA, 
                           gpa_cubic_predictions)

Add CIs to the line

geom_ribbon shows a ribbon of color from ymin to ymax

These values are the 95% CIs (default) for each value of X

cubic_plot4 <- scatter + geom_point() +
  geom_line(data = gpa_cubic_predict, 
            aes(x = HSGPA, y = fit)) +
  geom_ribbon(data = gpa_cubic_predict, 
              aes(ymin = lwr, ymax = upr), 
              alpha = .2)

Add CIs to the line

cubic_plot4

nest() and group_by()

Using nest() and group_by()

We have already used group_by() to perform an operation across several groups

  • e.g., calculate the mean GDP per capita for each country

  • One value of mean GDP per capita for each country

The nest() function goes one step further and actually splits the dataset up into chunks for each group

  • You end up with a nested table

  • You can run separate analyses and get separate results

Using nest() and group_by()

GPA_nest_genderrace <- FirstYearGPA %>%
  group_by(Male, White) %>%
  nest()

Using nest() and group_by()

GPA_nest_genderrace
## # A tibble: 4 × 3
## # Groups:   Male, White [4]
##    Male White data             
##   <int> <int> <list>           
## 1     1     1 <tibble [84 × 8]>
## 2     0     1 <tibble [89 × 8]>
## 3     0     0 <tibble [28 × 8]>
## 4     1     0 <tibble [18 × 8]>

A quick diversion: writing a function

We use functions all the time when using R

  • lm() is a function, ggplot() is a function, etc.

We can also write functions to do things

  • Useful when you’d have to write the same code over and over

  • Just call the function instead

A quick diversion: writing a function

name_of_fxn <- function(input_args){

calculations_and_other_stuff

}

General structure of a function

input_args are things you might give the function

  • They are optional

calculations_and_other_stuff are whatever calculations you perform (mathematical operations, models, graphics, etc.)

The function returns the last thing it calculated, or you can tell it to return a specific thing by using

return(thing_i_want_returned)

A quick diversion: writing a function

This function has one argument, dataframe

The function uses the specified dataset to run the regression of HSGPA predicing GPA

lm_fxn <- function(dataframe){

lm(GPA ~ HSGPA, data = dataframe)

}

Using nest() and group_by()

The map() function applies a function to a set of objects

data is the name of the column in our nested dataframe that contains the actual data

lm_fxn is the function we just created

  • This will run the regression specified in lm_fxn to each of the nested dataframes in the nested dataset

  • We will have 4 separate outputs, one for each group

grouped_lm <- GPA_nest_genderrace %>%
  mutate(model = map(data, lm_fxn))

Using nest() and group_by()

Here is the nested dataframe

  • One column contains the data for each group

  • One column contains the regression model for each group

grouped_lm
## # A tibble: 4 × 4
## # Groups:   Male, White [4]
##    Male White data              model 
##   <int> <int> <list>            <list>
## 1     1     1 <tibble [84 × 8]> <lm>  
## 2     0     1 <tibble [89 × 8]> <lm>  
## 3     0     0 <tibble [28 × 8]> <lm>  
## 4     1     0 <tibble [18 × 8]> <lm>

Using nest() and group_by()

We can apply the tidy() function to the nested dataframe

  • Again, using the map() function
tidy_grouped_lm <- grouped_lm %>% 
  mutate(tidied = map(model, tidy))
tidy_grouped_lm
## # A tibble: 4 × 5
## # Groups:   Male, White [4]
##    Male White data              model  tidied          
##   <int> <int> <list>            <list> <list>          
## 1     1     1 <tibble [84 × 8]> <lm>   <tibble [2 × 5]>
## 2     0     1 <tibble [89 × 8]> <lm>   <tibble [2 × 5]>
## 3     0     0 <tibble [28 × 8]> <lm>   <tibble [2 × 5]>
## 4     1     0 <tibble [18 × 8]> <lm>   <tibble [2 × 5]>

Using nest() and group_by()

Then we can unnest() the dataframe to see values

out_tidy <- tidy_grouped_lm %>%
  unnest(tidied)
out_tidy
## # A tibble: 8 × 9
## # Groups:   Male, White [4]
##    Male White data              model term  estimate std.error statistic p.value
##   <int> <int> <list>            <lis> <chr>    <dbl>     <dbl>     <dbl>   <dbl>
## 1     1     1 <tibble [84 × 8]> <lm>  (Int…    1.93      0.421     4.60  1.53e-5
## 2     1     1 <tibble [84 × 8]> <lm>  HSGPA    0.366     0.122     3.00  3.60e-3
## 3     0     1 <tibble [89 × 8]> <lm>  (Int…    1.14      0.390     2.93  4.33e-3
## 4     0     1 <tibble [89 × 8]> <lm>  HSGPA    0.571     0.111     5.14  1.69e-6
## 5     0     0 <tibble [28 × 8]> <lm>  (Int…   -0.150     0.521    -0.288 7.76e-1
## 6     0     0 <tibble [28 × 8]> <lm>  HSGPA    0.873     0.150     5.83  3.82e-6
## 7     1     0 <tibble [18 × 8]> <lm>  (Int…    0.875     1.09      0.805 4.32e-1
## 8     1     0 <tibble [18 × 8]> <lm>  HSGPA    0.574     0.321     1.79  9.26e-2

Then filter the dataset

The dataframe includes all the coefficients

  • Here, just intercept and slope

  • But say that we really only want to look at slope

slopes_only <- out_tidy %>%
  filter(term == "HSGPA")
slopes_only
## # A tibble: 4 × 9
## # Groups:   Male, White [4]
##    Male White data              model term  estimate std.error statistic p.value
##   <int> <int> <list>            <lis> <chr>    <dbl>     <dbl>     <dbl>   <dbl>
## 1     1     1 <tibble [84 × 8]> <lm>  HSGPA    0.366     0.122      3.00 3.60e-3
## 2     0     1 <tibble [89 × 8]> <lm>  HSGPA    0.571     0.111      5.14 1.69e-6
## 3     0     0 <tibble [28 × 8]> <lm>  HSGPA    0.873     0.150      5.83 3.82e-6
## 4     1     0 <tibble [18 × 8]> <lm>  HSGPA    0.574     0.321      1.79 9.26e-2

Then make a plot of the values

est_grouped <- ggplot(data = slopes_only,
            mapping = aes(x = as.factor(Male), y = estimate,
                          ymin = estimate - 2*std.error,
                          ymax = estimate + 2*std.error,
                          group = as.factor(White), color = as.factor(White))) +
  geom_errorbar(position = position_dodge(width = 1)) +
  labs(x = "Gender", y = "Estimate", color = "Race")

Then make a plot of the values

est_grouped 

Interactivity overview

Interactivity – what is it?

There are many types of interactivity

  • Filter data based on a variable

  • Zoom in on a plot

  • Add fit lines of various types

Anything that changes your plot from static

Interactivity – how can I do it?

The very simplest way to introduce interactivity is with a package called plotly

Wrap your ggplot (or any other plot) in the ggplotly() function:

ggplotly(scatter)

(This only works for HTML output, so I’m not running it here – see plotlyHTMLdemo file)

You can select certain portions of the plot, zoom in, mouse over to see data values, etc.

This is a very easy way to make your HTML documents interactive, but just using plotly is very limited because the plot is static

  • But it also doesn’t require any connection to R to work

More ways to introduce interactivity

From least to most flexible / interactive:

  • flexdashboard package

  • shinydashboard package

  • shiny elements in a markdown file

  • stand-alone shiny app

flexdashboard

https://rmarkdown.rstudio.com/flexdashboard/

Done through a markdown document

  • After installing flexdashboard, there is a template available for your markdown document

Focus is on the layout and static objects

  • But you can add interactive objects using shiny

  • Need to specify runtime: shiny in the YAML (indented once, so it’s an output option)

  • Note that it’s no longer “Knit” but “Run”

See flexdashboarddemo file for an example

shinydashboard

https://rstudio.github.io/shinydashboard/get_started.html

Done through a (non-markdown) R syntax file

Two functions contain most of your code: ui and server

  • ui contains information about the look, including input and output objects

  • server constains information about any calculations or graphics

At the end, run shinyApp(ui, server) to run the app

  • Note that it’s no longer “Knit” but “Run”

See shinydashboarddemo file for an example

shiny elements in a markdown file

https://bookdown.org/yihui/rmarkdown/shiny-documents.html

Done within a markdown document, but not to make a whole dashboard (as in flexdashboard) but to add a small interactive element to a markdown document

  • Need to specify runtime: shiny in the YAML (not indented, so it’s a top-level option)

  • Note that it’s no longer “Knit” but “Run”

See shinyinmarkdowndemo file for an example

Stand-alone shiny app

Start by selecting “New File” and then “Shiny Web App”

  • You can choose a single file or multiple file format

    • Single file: 1 file called “app.R” in a folder with the name of your application

    • Multiple file: 1 file called “ui.R” with the UI code and 1 file called “server.R” with the calculations code

  • Single file can be easier because it’s all in one place, but multiple file is easier to deal with for large projects

The template is already in place, just fill in your input, output, and calculations code

See app.R file for an example

  • Put it in a folder called “shiny_demo”, open it in R, and run it

Shiny input functions

There are a bunch of pre-defined input widgets (functions)

numericInput

radioButtons

sliderInput

fileInput

With a few exceptions, they end with “Input”

See http://shiny.rstudio.com/tutorial/written-tutorial/lesson3/ for a full list with examples

Each one requires at least 1) a label that you’ll use to refer to it in the program and 2) a text label that will show up in the app

  • Some widgets have other options too

Shiny output functions

Output functions are actually pairs of functions

  • One defines the object you’re outputting (in server.R if you have one)

    • These usually start with “render” such as renderPlot or renderTable

    • Your code (e.g., a call to a named ggplot) will go inside

  • One defines where to put the object (in the ui.R if you have one)

    • These will match the render functions, such as plotOutput or tableOutput

    • Usually, the only argument is the named object from the server

See http://shiny.rstudio.com/tutorial/written-tutorial/lesson4/ for a full list with examples