Notice that all of the variables are type int in the dataframe. This probably works in this situation, but you can “coerce” the variables into different types, such as Smoke and Sex into factor variables.
What type of variable (nominal, ordinal, interval, ratio) are each variable?
Think about the criteria: category vs number, ordered vs not, meaningful 0
Are some ambiguous?
3 Central tendency
Based on the variable types you decided on above, estimate the appropriate measure(s) of central tendency for each variable
4 Dispersion
Based on the variable types you decided on above, estimate the appropriate measure(s) of dispersion for each variable
The variables Smoke and Sex are a little tricky
5 Plots
Based on the plots, what should you present for each variable?
Does what you presented do a good job representing each variable?
Note
Code is hidden, so you can look at it if you’d like, but don’t panic about not understanding it
Active
Code
ggplot(data = Pulse, aes(x = Active)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Active), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Active), color ="red", linewidth =1.5, linetype ="dashed")
Rest
Code
ggplot(data = Pulse, aes(x = Rest)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Rest), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Rest), color ="red", linewidth =1.5, linetype ="dashed")
Smoke
Code
ggplot(data = Pulse, aes(x = Smoke)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Smoke), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Smoke), color ="red", linewidth =1.5, linetype ="dashed")
Sex
Code
ggplot(data = Pulse, aes(x = Sex)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Sex), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Sex), color ="red", linewidth =1.5, linetype ="dashed")
Exercise
Code
ggplot(data = Pulse, aes(x = Exercise)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Exercise), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Exercise), color ="red", linewidth =1.5, linetype ="dashed")
Hgt
Code
ggplot(data = Pulse, aes(x = Hgt)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Hgt), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Hgt), color ="red", linewidth =1.5, linetype ="dashed")
Wgt
Code
ggplot(data = Pulse, aes(x = Wgt)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Wgt), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Wgt), color ="red", linewidth =1.5, linetype ="dashed")
Source Code
---title: "BTS 510 Lab 3"format: html: embed-resources: true self-contained-math: true html-math-method: katex number-sections: true toc: true code-tools: true code-block-bg: true code-block-border-left: "#31BAE9"---```{r}#| label: setupset.seed(12345)library(tidyverse)library(Stat2Data)theme_set(theme_classic(base_size =16))```::: {.callout-note}* **tidyverse** really just for **ggplot2*** The data comes from the **Stat2Data** package, which I've loaded here* `theme_set(theme_classic(base_size = 16))` makes the plots a little simpler and makes their fonts a bit bigger -- this is purely my preference:::## Learning objectives* Understand the different **types** of variables that can be measured* Think about variables in terms of **central tendency** or **location*** Think about variables in terms of **dispersion** or **spread*** Start to think about variable **distributions** and **plots**## Read in data`Pulse` dataset from the **Stat2Data** package* A dataset with 232 observations on the following 7 variables. * `Active`: Pulse rate (beats per minute) after exercise * `Rest`: Resting pulse rate (beats per minute) * `Smoke`: 1=smoker or 0=nonsmoker * `Sex`: 1=female or 0=male * `Exercise`: Typical hours of exercise (per week) * `Hgt`: Height (in inches) * `Wgt`: Weight (in pounds)```{r}data(Pulse)head(Pulse)str(Pulse)```::: {.callout-note}Notice that all of the variables are type `int` in the dataframe. This probably works in this situation, but you can "coerce" the variables into different types, such as `Smoke` and `Sex` into `factor` variables.:::* What type of variable (*nominal, ordinal, interval, ratio*) are each variable? * Think about the criteria: category vs number, ordered vs not, meaningful 0 * Are some ambiguous?## Central tendency* Based on the *variable types* you decided on above, estimate the appropriate measure(s) of *central tendency* for each variable## Dispersion* Based on the *variable types* you decided on above, estimate the appropriate measure(s) of *dispersion* for each variable * The variables `Smoke` and `Sex` are a little tricky## Plots* Based on the plots, what should you present for each variable? * Does what you presented do a good job representing each variable?::: {.callout-note}Code is hidden, so you can look at it if you'd like, but don't panic about not understanding it:::* `Active````{r}#| code-fold: trueggplot(data = Pulse, aes(x = Active)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Active), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Active), color ="red", linewidth =1.5, linetype ="dashed")```* `Rest````{r}#| code-fold: trueggplot(data = Pulse, aes(x = Rest)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Rest), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Rest), color ="red", linewidth =1.5, linetype ="dashed")```* `Smoke````{r}#| code-fold: trueggplot(data = Pulse, aes(x = Smoke)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Smoke), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Smoke), color ="red", linewidth =1.5, linetype ="dashed")```* `Sex````{r}#| code-fold: trueggplot(data = Pulse, aes(x = Sex)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Sex), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Sex), color ="red", linewidth =1.5, linetype ="dashed")```* `Exercise````{r}#| code-fold: trueggplot(data = Pulse, aes(x = Exercise)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Exercise), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Exercise), color ="red", linewidth =1.5, linetype ="dashed")```* `Hgt````{r}#| code-fold: trueggplot(data = Pulse, aes(x = Hgt)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Hgt), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Hgt), color ="red", linewidth =1.5, linetype ="dashed")```* `Wgt````{r}#| code-fold: trueggplot(data = Pulse, aes(x = Wgt)) +geom_histogram(bins =30, color ="grey80") +geom_vline(xintercept =mean(Pulse$Wgt), color ="blue", linewidth =1.5) +geom_vline(xintercept =median(Pulse$Wgt), color ="red", linewidth =1.5, linetype ="dashed")```