BTS 510 Lab 9

set.seed(12345)
library(tidyverse)

Warning: package 'purrr' was built under R version 4.5.1

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(Stat2Data)
theme_set(theme_classic(base_size = 16))

1 Learning objectives

Describe maximum likelihood estimation for linear regression
Describe hypothesis testing for linear regression coefficients, including the sampling distribution used and degrees of freedom

2 Data

FirstYearGPA data from the Stat2Data package: n = 219 subjects
- GPA: First-year college GPA on a 0.0 to 4.0 scale
- HSGPA: High school GPA on a 0.0 to 4.0 scale
- SATV: Verbal/critical reading SAT score
- SATM: Math SAT score
- Male: 1= male, 0= female
- HU: Number of credit hours earned in humanities courses in high school
- SS: Number of credit hours earned in social science courses in high school
- FirstGen: 1= student is the first in her or his family to attend college, 0=otherwise
- White: 1= white students, 0= others
- CollegeBound: 1=attended a high school where >=50% students intended to go on to college, 0=otherwise

3 Tasks

Question 1: How do demographic variables (Male, FirstGen and White) predict first year college GPA (GPA)?

Question 2: How does HS GPA (HSGPA) predict first year college GPA (GPA) over demographic variables (Male, FirstGen and White)?

Run the two models above.
What are the log-likelihoods of each model? What can you say about the models based on those values? What can’t you say?
Compare the two models using a likelihood ratio test (LRT). Report the results of the test. What can you say about the models based on the test?
Report the results for the better model (based on the LRT). Include all regression coefficients, R^2, test statistics, p-values.