We have used geom_smooth
before to create a general
line
method = lm
for a straight line
method = loess
for a weighted average smooth
line
What if you have a more specific equation?
This example will use a model-based equation
But you can also specify a certain (e.g., theoretical) equation
Remember that we had a not-quite-linear relationship between HS GPA and college GPA
data(FirstYearGPA)
scatter <- ggplot(data = FirstYearGPA,
aes(x = HSGPA, y = GPA)) +
geom_point()
scatter + geom_smooth(method = loess, se = FALSE)
## `geom_smooth()` using formula 'y ~ x'
Step 1. Run the model
cubic_model <- lm(GPA ~ HSGPA + I(HSGPA^2) + I(HSGPA^3),
data = FirstYearGPA)
cubic_model
##
## Call:
## lm(formula = GPA ~ HSGPA + I(HSGPA^2) + I(HSGPA^3), data = FirstYearGPA)
##
## Coefficients:
## (Intercept) HSGPA I(HSGPA^2) I(HSGPA^3)
## -23.9927 23.6388 -6.9833 0.6974
Step 2. Tidy the model
cubic_tidy <- tidy(cubic_model)
print(cubic_tidy)
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -24.0 13.6 -1.76 0.0793
## 2 HSGPA 23.6 12.6 1.88 0.0613
## 3 I(HSGPA^2) -6.98 3.84 -1.82 0.0702
## 4 I(HSGPA^3) 0.697 0.388 1.80 0.0734
Step 3 - version 1. Use formula in geom_smooth
cubic_plot1 <- scatter +
geom_point() +
geom_smooth(method = lm,
formula = y ~ x + I(x^2) + I(x^3),
se = FALSE)
cubic_plot1
Step 3 - version 2. Use model values in geom_smooth
cubic_plot2 <- scatter +
geom_point() +
geom_smooth(method = nls,
formula = y ~ intercept + linear*x +
quadratic*x*x + cubic*x*x*x,
se = FALSE,
method.args = list(start =
c(intercept = cubic_tidy$estimate[1],
linear = cubic_tidy$estimate[2],
quadratic = cubic_tidy$estimate[3],
cubic = cubic_tidy$estimate[4])))
cubic_plot2
This is a bit of a cheat to get this to work
You take advantage of the fact that
Color arguments generally go outside of
aes()
in a geom_smooth
But if you put the color argument inside
aes()
, it treats it as a label instead of
a color assignment
model_colors <- RColorBrewer::brewer.pal(3, "Set1")
cubic_plot3 <- scatter + geom_point() +
geom_smooth(method = nls, se = FALSE,
formula = y ~ intercept + linear*x +
quadratic*x*x + cubic*x*x*x,
method.args = list(start =
c(intercept = cubic_tidy$estimate[1],
linear = cubic_tidy$estimate[2],
quadratic = cubic_tidy$estimate[3],
cubic = cubic_tidy$estimate[4])),
aes(color = "Cubic", fill = "Cubic")) +
geom_smooth(method = lm, aes(color = "Linear",
fill = "Linear"),
se = FALSE) +
scale_color_manual(name = "Models",
values = model_colors) +
scale_fill_manual(name = "Models",
values = model_colors)
cubic_plot3
## `geom_smooth()` using formula 'y ~ x'
Calculate the confidence intervals based on your model with the
predict
function
fit
: predicted value
lwr
: lower CI limit
upr
: upper CI limit
gpa_cubic_predictions <- predict(object = cubic_model,
newdata = FirstYearGPA,
interval = "predict")
head(gpa_cubic_predictions)
## fit lwr upr
## 1 3.285010 2.460852 4.109168
## 2 3.460149 2.624765 4.295534
## 3 3.192314 2.368201 4.016426
## 4 3.100341 2.276226 3.924456
## 5 3.285010 2.460852 4.109168
## 6 3.010882 2.187038 3.834726
gpa_cubic_predict <- cbind(FirstYearGPA,
gpa_cubic_predictions)
geom_ribbon
shows a ribbon of color from
ymin
to ymax
These values are the 95% CIs (default) for each value of X
cubic_plot4 <- scatter + geom_point() +
geom_line(data = gpa_cubic_predict,
aes(x = HSGPA, y = fit)) +
geom_ribbon(data = gpa_cubic_predict,
aes(ymin = lwr, ymax = upr),
alpha = .2)
cubic_plot4
We have already used group_by()
to perform an operation
across several groups
e.g., calculate the mean GDP per capita for each country
One value of mean GDP per capita for each country
The nest()
function goes one step further and actually
splits the dataset up into chunks for each group
You end up with a nested table
You can run separate analyses and get separate results
GPA_nest_genderrace <- FirstYearGPA %>%
group_by(Male, White) %>%
nest()
GPA_nest_genderrace
## # A tibble: 4 × 3
## # Groups: Male, White [4]
## Male White data
## <int> <int> <list>
## 1 1 1 <tibble [84 × 8]>
## 2 0 1 <tibble [89 × 8]>
## 3 0 0 <tibble [28 × 8]>
## 4 1 0 <tibble [18 × 8]>
We use functions all the time when using R
lm()
is a function, ggplot()
is a
function, etc.We can also write functions to do things
Useful when you’d have to write the same code over and over
Just call the function instead
name_of_fxn <- function(input_args){
calculations_and_other_stuff
}
General structure of a function
input_args
are things you might give the function
calculations_and_other_stuff
are whatever calculations
you perform (mathematical operations, models, graphics, etc.)
The function returns the last thing it calculated, or you can tell it to return a specific thing by using
return(thing_i_want_returned)
This function has one argument, dataframe
The function uses the specified dataset to run the regression of HSGPA predicing GPA
lm_fxn <- function(dataframe){
lm(GPA ~ HSGPA, data = dataframe)
}
The map()
function applies a function to a set
of objects
data
is the name of the column in our nested dataframe
that contains the actual data
lm_fxn
is the function we just created
This will run the regression specified in lm_fxn
to
each of the nested dataframes in the nested dataset
We will have 4 separate outputs, one for each group
grouped_lm <- GPA_nest_genderrace %>%
mutate(model = map(data, lm_fxn))
Here is the nested dataframe
One column contains the data for each group
One column contains the regression model for each group
grouped_lm
## # A tibble: 4 × 4
## # Groups: Male, White [4]
## Male White data model
## <int> <int> <list> <list>
## 1 1 1 <tibble [84 × 8]> <lm>
## 2 0 1 <tibble [89 × 8]> <lm>
## 3 0 0 <tibble [28 × 8]> <lm>
## 4 1 0 <tibble [18 × 8]> <lm>
We can apply the tidy()
function to the nested
dataframe
map()
functiontidy_grouped_lm <- grouped_lm %>%
mutate(tidied = map(model, tidy))
tidy_grouped_lm
## # A tibble: 4 × 5
## # Groups: Male, White [4]
## Male White data model tidied
## <int> <int> <list> <list> <list>
## 1 1 1 <tibble [84 × 8]> <lm> <tibble [2 × 5]>
## 2 0 1 <tibble [89 × 8]> <lm> <tibble [2 × 5]>
## 3 0 0 <tibble [28 × 8]> <lm> <tibble [2 × 5]>
## 4 1 0 <tibble [18 × 8]> <lm> <tibble [2 × 5]>
Then we can unnest()
the dataframe to see values
out_tidy <- tidy_grouped_lm %>%
unnest(tidied)
out_tidy
## # A tibble: 8 × 9
## # Groups: Male, White [4]
## Male White data model term estimate std.error statistic p.value
## <int> <int> <list> <lis> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 <tibble [84 × 8]> <lm> (Int… 1.93 0.421 4.60 1.53e-5
## 2 1 1 <tibble [84 × 8]> <lm> HSGPA 0.366 0.122 3.00 3.60e-3
## 3 0 1 <tibble [89 × 8]> <lm> (Int… 1.14 0.390 2.93 4.33e-3
## 4 0 1 <tibble [89 × 8]> <lm> HSGPA 0.571 0.111 5.14 1.69e-6
## 5 0 0 <tibble [28 × 8]> <lm> (Int… -0.150 0.521 -0.288 7.76e-1
## 6 0 0 <tibble [28 × 8]> <lm> HSGPA 0.873 0.150 5.83 3.82e-6
## 7 1 0 <tibble [18 × 8]> <lm> (Int… 0.875 1.09 0.805 4.32e-1
## 8 1 0 <tibble [18 × 8]> <lm> HSGPA 0.574 0.321 1.79 9.26e-2
The dataframe includes all the coefficients
Here, just intercept and slope
But say that we really only want to look at slope
slopes_only <- out_tidy %>%
filter(term == "HSGPA")
slopes_only
## # A tibble: 4 × 9
## # Groups: Male, White [4]
## Male White data model term estimate std.error statistic p.value
## <int> <int> <list> <lis> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 <tibble [84 × 8]> <lm> HSGPA 0.366 0.122 3.00 3.60e-3
## 2 0 1 <tibble [89 × 8]> <lm> HSGPA 0.571 0.111 5.14 1.69e-6
## 3 0 0 <tibble [28 × 8]> <lm> HSGPA 0.873 0.150 5.83 3.82e-6
## 4 1 0 <tibble [18 × 8]> <lm> HSGPA 0.574 0.321 1.79 9.26e-2
est_grouped <- ggplot(data = slopes_only,
mapping = aes(x = as.factor(Male), y = estimate,
ymin = estimate - 2*std.error,
ymax = estimate + 2*std.error,
group = as.factor(White), color = as.factor(White))) +
geom_errorbar(position = position_dodge(width = 1)) +
labs(x = "Gender", y = "Estimate", color = "Race")
est_grouped
There are many types of interactivity
Filter data based on a variable
Zoom in on a plot
Add fit lines of various types
Anything that changes your plot from static
The very simplest way to introduce interactivity is with a package called plotly
Wrap your ggplot (or any other plot) in the ggplotly()
function:
ggplotly(scatter)
(This only works for HTML output, so I’m not running it here – see plotlyHTMLdemo file)
You can select certain portions of the plot, zoom in, mouse over to see data values, etc.
This is a very easy way to make your HTML documents
interactive, but just using plotly
is very limited
because the plot is static
From least to most flexible / interactive:
flexdashboard package
shinydashboard package
shiny elements in a markdown file
stand-alone shiny app
https://rmarkdown.rstudio.com/flexdashboard/
Done through a markdown document
Focus is on the layout and static objects
But you can add interactive objects using shiny
Need to specify runtime: shiny
in the YAML (indented
once, so it’s an output option)
Note that it’s no longer “Knit” but “Run”
See flexdashboarddemo file for an example
https://rstudio.github.io/shinydashboard/get_started.html
Done through a (non-markdown) R syntax file
Two functions contain most of your code: ui
and
server
ui
contains information about the look, including
input and output objects
server
constains information about any calculations
or graphics
At the end, run shinyApp(ui, server)
to run the app
See shinydashboarddemo file for an example
https://bookdown.org/yihui/rmarkdown/shiny-documents.html
Done within a markdown document, but not to make a whole dashboard (as in flexdashboard) but to add a small interactive element to a markdown document
Need to specify runtime: shiny
in the YAML (not
indented, so it’s a top-level option)
Note that it’s no longer “Knit” but “Run”
See shinyinmarkdowndemo file for an example
Start by selecting “New File” and then “Shiny Web App”
You can choose a single file or multiple file format
Single file: 1 file called “app.R” in a folder with the name of your application
Multiple file: 1 file called “ui.R” with the UI code and 1 file called “server.R” with the calculations code
Single file can be easier because it’s all in one place, but multiple file is easier to deal with for large projects
The template is already in place, just fill in your input, output, and calculations code
See app.R file for an example
There are a bunch of pre-defined input widgets (functions)
numericInput
radioButtons
sliderInput
fileInput
With a few exceptions, they end with “Input”
See http://shiny.rstudio.com/tutorial/written-tutorial/lesson3/ for a full list with examples
Each one requires at least 1) a label that you’ll use to refer to it in the program and 2) a text label that will show up in the app
Output functions are actually pairs of functions
One defines the object you’re outputting (in
server.R
if you have one)
These usually start with “render” such as renderPlot
or renderTable
Your code (e.g., a call to a named ggplot
) will go
inside
One defines where to put the object (in the ui.R
if
you have one)
These will match the render functions, such as
plotOutput
or tableOutput
Usually, the only argument is the named object from the server
See http://shiny.rstudio.com/tutorial/written-tutorial/lesson4/ for a full list with examples