Fully transition to matrix form for linear regression
Describe matrix solution to least squares estimation
Data matrix
\[\begin{matrix} \textbf{X} \\ (n,p) \end{matrix} = \begin{bmatrix} X_{11} & X_{12} & \dots & X_{1p} \\ X_{21} & X_{22} & \dots & X_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ X_{n1} & X_{n2} & \dots & X_{np} \end{bmatrix}\]
Outcome variable
\[\begin{matrix} \underline{y} \\ (n,1) \end{matrix} = \begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix}\]
Predicted outcome variable
\[\begin{matrix} \underline{\hat{y}} \\ (n,1) \end{matrix} = \begin{bmatrix} \hat{Y}_1 \\ \hat{Y}_2 \\ \vdots \\ \hat{Y}_n \end{bmatrix}\]
\[\begin{matrix}\underline{\hat{y}} \\ (n,1) \end{matrix} = \begin{matrix} \textbf{X} \\ (n,p) \end{matrix} \; \begin{matrix} \underline{b} \\ (p,1) \end{matrix} + \begin{matrix} \underline{b}_0 \\ (n,1) \end{matrix}\] \[\begin{bmatrix}\hat{Y}_1 \\ \hat{Y}_2 \\ \vdots \\ \hat{Y}_n \end{bmatrix} = \begin{bmatrix} X_{11} & X_{12} & \dots & X_{1p} \\ X_{21} & X_{22} & \dots & X_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ X_{n1} & X_{n2} & \dots & X_{np} \end{bmatrix} \; \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_p \end{bmatrix} + \begin{bmatrix} b_0 \\ b_0 \\ \vdots \\ b_0 \end{bmatrix}\]
We talked about the partitioned variation covariation matrix in general before
\[\textbf{P}_{XX, YY} = \textbf{M}' \; \textbf{M} - \frac{1}{n} \textbf{M}' \; \textbf{E} \; \textbf{M} = \left[\begin{array}{c|c} \textbf{P}_{XX} & \textbf{P}_{XY} \\ \hline \textbf{P}_{YX} & \textbf{P}_{YY} \end{array}\right]\]
In linear regression, the variation covariation matrix becomes:
\[\textbf{P} = \left[\begin{array}{c|c} \textbf{P}_{XX} & \underline{p}_{XY} \\ \hline \underline{p}_{YX} & SS_Y \end{array}\right] = \left[\begin{array}{cccc|c} SS_{x1} & SP_{x1,x2} & \dots & SP_{x1,xp} & SP_{x1,y} \\ SP_{x2,x1} & SS_{x2} & \dots & SP_{x2,xp} & SP_{x2,y} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ SP_{xp,x1} & SP_{xp,x2} & \dots & SS_{xp} & SP_{xp,y} \\ \hline SP_{y,x1} & SP_{y,x2} & \dots & SP_{y,xp} & SS_y \\ \end{array}\right]\]
We talked about the partitioned variance covariance matrix in general before
\[\textbf{S}_{XX, YY} = \frac{1}{(n-1)}\left(\textbf{M}' \; \textbf{M} - \frac{1}{n} \textbf{M}' \; \textbf{E} \; \textbf{M}\right) = \left[\begin{array}{c|c} \textbf{S}_{XX} & \textbf{S}_{XY} \\ \hline \textbf{S}_{YX} & \textbf{S}_{YY} \end{array}\right]\]
In linear regression, the variance covariance matrix becomes:
\[\textbf{S} = \frac{1}{n-1} \; \textbf{P} = \left[\begin{array}{c|c} \textbf{S}_{XX} & \underline{s}_{XY} \\ \hline \underline{s}_{YX} & s_y^2 \end{array}\right] = \left[\begin{array}{cccc|c} s_{x1}^2 & s_{x1,x2} & \dots & s_{x1,xp} & s_{x1,y} \\ s_{x2,x1} & s_{x2}^2 & \dots & s_{x2,xp} & s_{x2,y} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ s_{xp,x1} & s_{xp,x2} & \dots & s_{xp}^2 & s_{xp,y} \\ \hline s_{y,x1} & s_{y,x2} & \dots & s_{y,xp} & s_y^2 \\ \end{array}\right]\]
We talked about the partitioned correlation matrix in general before
\[\textbf{R}_{XX, YY} = \left[\begin{array}{c|c} \textbf{R}_{XX} & \textbf{R}_{XY} \\ \hline \textbf{R}_{YX} & \textbf{R}_{YY} \end{array}\right]\]
In linear regression, the correlation matrix becomes:
\[\textbf{R} = \left[\begin{array}{c|c} \textbf{R}_{XX} & \underline{r}_{XY} \\ \hline \underline{r}_{YX} & 1 \end{array}\right] = \left[\begin{array}{cccc|c} 1 & r_{x1,x2} & \dots & r_{x1,xp} & r_{x1,y} \\ r_{x2,x1} & 1 & \dots & r_{x2,xp} & r_{x2,y} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ r_{xp,x1} & r_{xp,x2} & \dots & 1 & r_{xp,y} \\ \hline r_{y,x1} & r_{y,x2} & \dots & r_{y,xp} & 1 \\ \end{array}\right]\]
Last time, we went through the least squares solution and the normal equations to solve for the regression coefficients in a model with a single predictor
\[b_1 = \frac{n \Sigma X Y - (\Sigma X) (\Sigma Y)}{n \Sigma X^2 - (\Sigma X)^2} = \frac{SP_{XY}}{SS_X} = \frac{s_{XY}}{s_X^2}\]
The regression coefficient \(b_1\) is equal to either:
In the non-matrix approach, we could solve for coefficients in terms of covariation, covariance, or correlation (standardized solution)
There are several equivalent matrix formulations for solving for regression coefficients
In matrix form, the solution for unstandardized coefficients is:
\[\underline{b} = \textbf{P}^{-1}_{XX} \; \underline{p}_{XY}\]
In matrix form, the solution for unstandardized coefficients is:
\[\underline{b} = \textbf{S}^{-1}_{XX} \; \underline{s}_{XY}\]
For the solutions based on the covariation or the covariance:
\[b_0 = \overline{Y} - \underline{\overline{X}}\;\underline{b}\]
\[=\overline{Y} - (b_1 \overline{X}_1 + b_2 \overline{X}_2 + \dots + b_p \overline{X}_p)\]
The matrix solution for standardized regression coefficients:
\[\underline{b} = \textbf{R}^{-1}_{XX} \; \underline{r}_{XY}\]
An alternative form of the solution uses the augmented data matrix
\(\begin{matrix} \textbf{X}_A \\ (n,p\color{blue}{+1}) \end{matrix} = \begin{bmatrix} \color{blue}{1} & X_{11} & X_{12} & \dots & X_{1p} \\ \color{blue}{1} & X_{21} & X_{22} & \dots & X_{2p} \\ \color{blue}{\vdots} & \vdots & \vdots & \ddots & \vdots \\ \color{blue}{1} & X_{n1} & X_{n2} & \dots & X_{np} \end{bmatrix}\)
Note: I use \(\textbf{X}_A\) but there is no standard notation for raw data matrix vs augmented data matrix. Count the columns!
\[\begin{matrix}\underline{\hat{y}} \\ (n,1) \end{matrix} = \begin{matrix} \textbf{X}_A \\ (n,p\color{blue}{+1}) \end{matrix} \; \begin{matrix} \underline{b} \\ (p\color{blue}{+1},1) \end{matrix}\] \[\begin{bmatrix}\hat{Y}_1 \\ \hat{Y}_2 \\ \vdots \\ \hat{Y}_n \end{bmatrix} = \begin{bmatrix} \color{blue}{1} & X_{11} & X_{12} & \dots & X_{1p} \\ \color{blue}{1} & X_{21} & X_{22} & \dots & X_{2p} \\ \color{blue}{\vdots} & \vdots & \vdots & \ddots & \vdots \\ \color{blue}{1} & X_{n1} & X_{n2} & \dots & X_{np} \end{bmatrix} \; \begin{bmatrix} \color{blue}{b_0} \\ b_1 \\ b_2 \\ \vdots \\ b_p \end{bmatrix}\]
Adds the intercept (\(b_0\)) to the vector of regression coefficients
Vector of regression coefficients becomes: \(\begin{matrix} \underline{b} \\ (p\color{blue}{+1},1) \end{matrix} = \begin{bmatrix} \color{blue}{b_0} \\ b_1 \\ b_2 \\ \vdots \\ b_p \end{bmatrix}\)
Augmented data matrix (\(\textbf{X}_A\)) has a column of \(1\)s as the first column of the matrix
The solution to OLS regression using the augmented data matrix:
\[\underline{b} = \left(\textbf{X}'_A\textbf{X}_A\right)^{-1} \textbf{X}'_A \;\underline{y}\]
where \(\underline{b}\) is the \((p+1) \times 1\) matrix of regression coefficients
Remember: this version includes the intercept in the vector of coefficients
Regression diagnostics are measures of the extent to which deviant cases affect the outcome of the regression analysis
There are several measures of leverage and some slight differences between them depending on the software package you’re using
They’re all based on the hat matrix
The hat matrix is an \(n \times n\) matrix
The values on the diagonal (one for each of the \(n\) subjects) are the leverage statistics
Substitution:
\(\underline{\hat{y}} = \textbf{X}_A \left(\textbf{X}'_A\textbf{X}_A\right)^{-1} \textbf{X}'_A \;\underline{y}\)
\(\underline{\hat{y}} = \color{blue}{\textbf{X}_A \left(\textbf{X}'_A\textbf{X}_A\right)^{-1} \textbf{X}'_A}\)\(\underline{y}\)
Hat matrix
Why is it called that???
It’s how you go from \(Y\) (observed) to \(\hat{Y}\) (predicted)