Introduction to Biostatistics

1 Learning objectives

1.1 Learning objectives

  • Review the course topics and goals
  • Start using R and Rstudio
  • Start using Quarto documents in R as a reproducible method to present analyses along with narrative text

2 Course topics and goals

2.1 What is statistics? Interdisciplinary

  • Mathematics
  • Probability
  • Uncertainty
  • Decision making
  • Programming / computer science
  • Study design and methodology
  • Integrated into science

2.2 What is statistics? People

  • William Sealy Gosset
    • Guinness Brewing Company: Making better beer
    • \(t\)-distribution and \(t\)-test
  • Ronald Fisher
    • Rothamsted Experimental Station: Agricultural questions
    • Analysis of variance (ANOVA) and \(F\)-distribution
    • Popularized the \(p\)-value and \(p < .05\) as a cut-off

2.3 What is statistics? Context

https://www.instagram.com/p/Bx8p0Pih6qs/

2.4 What is statistics? Context

  • This thing is 8 feet tall. Is that tall?
    • For a person: Yes
    • For a building: No
  • These two things are 12 inches different in height. Is that a lot?
    • For people: Yes
    • For buildings: No

2.5 When to think about statistics?

    • Study design

    • Data collection

    • Data analysis
    • Interpretation

Ronald Fisher:   To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.

2.6 Topics

  • Reading in and manipulating data in R
  • Making plots in R
  • Descriptive statistics
  • Probability, sampling distributions, uncertainty
  • Comparing 2 means or proportions
  • Controlling for multiple comparisons
  • Comparing 3 or more means

2.7 What you’ll learn about

  • Programming (in R)
  • Probability and mathematics underlying statistical methods
  • Conducting (some) statistical analyses
  • Presenting results of statistical analyses
  • Thinking about issues in science related to statistics

3 R and Rstudio

3.1 What is R?

  • Free, open source software
    • Windows, Mac, Unix, cloud
  • Object-oriented programming language
  • Built-in capabilities (“packages”)
  • Users can write their own packages and functions

3.2 What is R?

3.3 What does “object-oriented” mean?

  • Programming built on the idea of “objects”
  • Everything is an object
    • If you have an object’s name, you can do something with it
  • Objects: Datasets, variables, analyses, plots

3.4 What does “object-oriented” mean?

x <- c(1, 1, 2, 3, 5, 8)
x
[1] 1 1 2 3 5 8
  • Line 1
    • Define an object called x that is this set of numbers
    • <- is the assignment operator
  • Line 2
    • Print the object x

3.5 What is a function?

  • Pre-written piece of code that
    • Takes one or more inputs (or arguments)
    • Produces one or more outputs

3.6 What is a function?

mean(x)
[1] 3.333333
  • Function: mean()
  • Input: x
  • Output: mean of x

Documentation for mean() function

3.7 What is a package?

  • Collection of functions (that work together in some way)
    • dplyr package is a set of functions to manipulate data
    • ggplot package is a set of functions to create plots
    • stats package (built-in) is a set of functions to perform basic statistical operations

3.8 What is a package?

  • To use functions in a package, you need to
    • Install the package
      • Just once per machine / R install
    • Load or library() the package
      • Every time you use it

3.9 What is Rstudio?

  • Integrated development environment (IDE) for R
  • Graphical user interface to use R
    • Manage installed packages
    • Connect to git / github
    • Easily see loaded datasets

3.10 What is Rstudio?

4 Quarto and markdown

4.1 What is “markdown”?

  • “Markup” language to format plain text
    • HTML: Hyper-Text Markup Language
  • Headings, links, italics / bold, bullet points, equations
  • Output final, formatted document into multiple formats
    • HTML, Word, PDF documents or slides
  • Used across a wide variety of platforms, not just in R
    • Github, Jekyll, note-taking apps

4.2 markdown example

https://www.markdownguide.org/getting-started/#kicking-the-tires

4.3 markdown example: Organization

  • Organize document with headings
    • # Level 1 heading
    • ## Level 2 heading
    • ### Level 3 heading
  • Most documents can have a table of contents, which will use these headings
    • Slides have a table of contents accessed from bottom left

4.4 markdown example: Format text

  • *italics* turns into italics
  • **bold** turns into bold
  • Mathematical text in between pairs of $
    • $R^2_{multiple}$ turns into \(R^2_{multiple}\)
    • $\hat{Y} = b_0 + b_1 X_1$ turns into \(\hat{Y} = b_0 + b_1 X_1\)

4.5 markdown example: Code chunks

```{r, echo = TRUE}
x <- c(1, 1, 2, 3, 5, 8)
mean(x)
```
  • Intersperse text and code in the same document
    • Description of data and analysis
    • Analysis
    • Summarize the results

4.6 rmarkdown package

  • Implements markdown in R
  • Lets you create several document types
    • HTML or PDF or Word documents
    • Slides in HTML or PDF or Powerpoint
  • Does all the heavy lifting to convert your markdown document to the final, formatted version

4.7 Quarto

  • “Next gen” of markdown
  • Works with multiple languages (even in the same document): R, Python, Julia
  • Additional features
  • Install Quarto from here
    • More general info here

5 In-class activities

5.1 In-class activities

  • Any questions on the course topics and goals
  • Start using R and Rstudio
  • Start using Quarto in R as a reproducible method to present analyses along with narrative text