Introduce the concept of composites and the statistical operations we can perform on them
Review linear regression
Summarize / review ordinary least squares estimation
All multivariate procedures (and most statistical procedures, in general) rely on composites of variables, also called linear combinations of variables
Statistical procedures create these linear combinations and then do something with them
A composite or linear combination is a way to combine multiple variables into a single variable
To make a composite, you need variables and weights
Usually:
In general, composites look like:
\[u_i = \Sigma a_j X_{ij} = a_1 X_{i1} + a_2 X_{i2} + \cdots + a_p X_{ip}\] for subject \(i\) across variables \(j\) = 1 to p
Remember:
Calculating GPA: Total of 18 units
Variables: A = 4.0, B = 3.0, C = 2.0
\(GPA = \frac{5}{18} (4.0) + \frac{4}{18} (3.0) + \frac{4}{18} (2.0) + \frac{5}{18} (3.0)\) \(= 1.11 + 0.67 + 0.44 + 0.83 = 3.05\)
Predicted score for linear regression
\(\hat{Y}_{i} = b_1 X_{i1} + b_2 X_{i2} + b_3 X_{i3}\)
The general strategy in multivariate analysis is to
For example: Least squares criterion for linear regression
Composites are the basis for all multivariate analyses
Focus on the relationship between
We will do all of this in matrix algebra
Any statistic on a composite can be written as a composite of the corresponding statistics on the original variables (where the weights are the same)
One common example:
Subject 1: \(u_1 = a_1 X_{11} + a_2 X_{12} + a_3 X_{13} + \cdots + a_p X_{1p}\)
Subject 2: \(u_2 = a_1 X_{21} + a_2 X_{22} + a_3 X_{23} + \cdots + a_p X_{2p}\)
Subject \(n\): \(u_n = a_1 X_{n1} + a_2 X_{n2} + a_3 X_{n3} + \cdots + a_p X_{np}\)
Spreadsheet representation
\(\begin{array}{|c||c|c|c|c|} \hline Subject & X_1 & \cdots & X_j & X_p \\ \hline \hline 1 & X_{11} & \cdots & X_{1j} & X_{1p} \\ \hline 2 & X_{21} & \cdots & X_{2j} & X_{2p} \\ \hline 3 & X_{31} & \cdots & X_{3j} & X_{3p} \\ \hline \vdots & \vdots & \ddots & \vdots & \vdots \\ \hline n & X_{n1} & \cdots & X_{nj} & X_{np}\\ \hline \end{array}\)
Matrix representation
\(\textbf{X} = \begin{bmatrix} X_{11} & \cdots & X_{1j} & X_{1p} \\ X_{21} & \cdots & X_{2j} & X_{2p} \\ X_{31} & \cdots & X_{3j} & X_{3p} \\ \vdots & \ddots & \vdots & \vdots \\ X_{n1} & \cdots & X_{nj} & X_{np} \end{bmatrix}\)
Weight vector \(\underline{a}\)
\(\underline{a} = \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_p \\ \end{bmatrix}\)
Composite vector \(\underline{u}\)
\(\underline{u}= \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \\ \end{bmatrix} = \textbf{X} \underline{a} = \begin{bmatrix} X_{11} & \cdots & X_{1j} & X_{1p} \\ X_{21} & \cdots & X_{2j} & X_{2p} \\ \vdots & \ddots & \vdots & \vdots \\ X_{n1} & \cdots & X_{nj} & X_{np} \end{bmatrix} \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_p \\ \end{bmatrix}\)
A composite is something like weighted GPA or predicted score in regression
If we wanted to get the mean of a composite, there are two equivalent ways to do that
The mean of a composite is the composite of the means (of the variables that went into the composite)
\[\overline{\textbf{U}} = \overline{\underline{X}} \; \underline{a}\]
Three \(X\)s:
\(\overline{\textbf{U}}=\begin{bmatrix} \overline{X}_1 & \overline{X}_2 & \overline{X}_3 \end{bmatrix} \; \begin{bmatrix} a_1 \\ a_2 \\ a_3 \end{bmatrix} = a_1 \; \overline{X}_1 + a_2 \; \overline{X}_2 + a_3 \; \overline{X}_3\)
\[\textbf{X} = \begin{bmatrix} 5 & 1 & 2 \\ 9 & 2 & 5 \\ 4 & 6 & 3 \\ 2 & 3 & 6 \\ \end{bmatrix} \hspace{2em} \underline{a} = \begin{bmatrix} 2 \\ 3 \\ 1 \\ \end{bmatrix}\]
Step 1: Get the vector of composites \(\underline{u} = \textbf{X}\underline{a}\)
\(\underline{u} = \color{OrangeRed}{\textbf{X}}\color{blue}{\underline{a}} = \color{OrangeRed}{\begin{bmatrix} 5 & 1 & 2 \\ 9 & 2 & 5 \\ 4 & 6 & 3 \\ 2 & 3 & 6 \\ \end{bmatrix}} \color{blue}{\begin{bmatrix} 2 \\ 3 \\ 1 \\ \end{bmatrix}} =\)
\(\begin{bmatrix} ({\color{OrangeRed}5} \times {\color{blue}2}) + ({\color{OrangeRed}1} \times {\color{blue}3}) + ({\color{OrangeRed}2} \times {\color{blue}1}) \\ ({\color{OrangeRed}9} \times {\color{blue}2}) + ({\color{OrangeRed}2} \times {\color{blue}3}) + ({\color{OrangeRed}5} \times {\color{blue}1}) \\ ({\color{OrangeRed}4} \times {\color{blue}2}) + ({\color{OrangeRed}6} \times {\color{blue}3}) + ({\color{OrangeRed}3} \times {\color{blue}1}) \\ ({\color{OrangeRed}2} \times {\color{blue}2}) + ({\color{OrangeRed}3} \times {\color{blue}3}) + ({\color{OrangeRed}6} \times {\color{blue}1}) \\ \end{bmatrix} = \begin{bmatrix} 15 \\ 29 \\ 29 \\ 19 \\ \end{bmatrix}\)
Step 2: Calculate the mean composite \(\overline{\textbf{U}}\) from \(\underline{u}\)
\(\overline{\textbf{U}} = \frac{1}{n}\:\color{OrangeRed}{\underline{1}'}\:\color{blue}{\underline{u}} = \frac{1}{4}\:\color{OrangeRed}{\begin{bmatrix} 1 & 1 & 1 & 1 \\ \end{bmatrix}}\:\color{blue}{\begin{bmatrix} 15 \\ 29 \\ 29 \\ 19 \\ \end{bmatrix}} =\)
\(\frac{1}{4}\:\begin{bmatrix} (\color{OrangeRed}{1} \times \color{blue}{15}) + (\color{OrangeRed}{1} \times \color{blue}{29}) + (\color{OrangeRed}{1} \times \color{blue}{29}) + (\color{OrangeRed}{1} \times \color{blue}{19}) \\ \end{bmatrix} =\)
\(\frac{1}{4}\:(92) = 23\)
Step 1: Get the mean vector the variables \(\overline{\underline{x}} =\frac{1}{n}\:\underline{1}'\:\textbf{X}\)
\(\overline{\underline{x}} = \frac{1}{n}\: \color{OrangeRed}{\underline{1}'}\:\color{blue}{\textbf{X}} = \frac{1}{4}\:\color{OrangeRed}{\begin{bmatrix} 1 & 1 & 1 & 1 \\ \end{bmatrix}}\:\color{blue}{\begin{bmatrix} 5 & 1 & 2 \\ 9 & 2 & 5 \\ 4 & 6 & 3 \\ 2 & 3 & 6 \\ \end{bmatrix}} =\)
\(\frac{1}{4}\:\begin{bmatrix} ({\color{OrangeRed}1} \times {\color{blue}5}) + ({\color{OrangeRed}1} \times {\color{blue}9}) + ({\color{OrangeRed}1} \times {\color{blue}4}) + ({\color{OrangeRed}1} \times {\color{blue}2}) & ({\color{OrangeRed}1} \times {\color{blue}1}) + ({\color{OrangeRed}1} \times {\color{blue}2}) + ({\color{OrangeRed}1} \times {\color{blue}6}) + ({\color{OrangeRed}1} \times {\color{blue}3}) & ({\color{OrangeRed}1} \times {\color{blue}2}) + ({\color{OrangeRed}1} \times {\color{blue}5}) + ({\color{OrangeRed}1} \times {\color{blue}3}) + ({\color{OrangeRed}1} \times {\color{blue}6}) \\ \end{bmatrix} =\)
\(\frac{1}{4}\:\begin{bmatrix} 5 + 9 + 4 + 2 & 1 + 2 + 6 + 3 & 2 + 5 + 3 + 6 \\ \end{bmatrix} =\)
\(\frac{1}{4}\:\begin{bmatrix} 20 & 12 & 16 \\ \end{bmatrix} = \begin{bmatrix} 5 & 3 & 4 \\ \end{bmatrix}\)
Step 2: Calculate the mean composite \(\overline{\textbf{U}}\) from \(\overline{\underline{x}}\)
\(\overline{\textbf{U}} = \color{OrangeRed}{\overline{\underline{x}}} \; \color{blue}{\underline{a}} = \color{OrangeRed}{\begin{bmatrix} 5 & 3 & 4 \\ \end{bmatrix}} \color{blue}{\begin{bmatrix} 2 \\ 3 \\ 1 \\ \end{bmatrix}} =\)
\(\begin{bmatrix} ({\color{OrangeRed}5} \times {\color{blue}2}) + ({\color{OrangeRed}3} \times {\color{blue}3}) + ({\color{OrangeRed}4} \times {\color{blue}1}) \\ \end{bmatrix} = \begin{bmatrix} 10 + 9 + 4 \\ \end{bmatrix} = 23\)
Variation of a single variable X:
\[SS_X = \underline{x} ' \; \underline{x} - \frac{1}{n} \; \underline{x}' \; \textbf{E} \; \underline{x}\]
Variation of a composite:
\[SS_u = \underline{u} ' \; \underline{u} - \frac{1}{n} \; \underline{u}' \; \textbf{E} \; \underline{u}\]
Substitute in the expression for a composite (\(\underline{u} = \textbf{X} \underline{a}\) or \(\underline{u}' = \underline{a}' \textbf{X}'\)):
\[SS_u = \underline{a} ' \; \textbf{X}' \; \textbf{X} \; \underline{a} - \frac{1}{n} \; \underline{a} ' \; \textbf{X} \; \textbf{E} \; \textbf{X}' \; \underline{a}\]
Factor out terms: pre-multipliers get pre-factored, post-multipliers get post-factored:
\[SS_u = \underline{a}' \left(\textbf{X}' \; \textbf{X} - \frac{1}{n} \; \textbf{X}' \; \textbf{E} \; \textbf{X}\right) \; \underline{a}\]
Remember the variation covariation matrix \(\textbf{P}\):
\[\textbf{P} = \textbf{X}' \; \textbf{X} - \frac{1}{n} \; \textbf{X}' \; \textbf{E} \; \textbf{X}\]
Substitute \(P\) into the expression for variation of a composite:
\[SS_u =\underline{a}' \; \textbf{P} \; \underline{a}\]
Variation of a composite \(\underline{u}\): \(SS_u =\underline{a}' \; \textbf{P} \; \underline{a}\)
Two important points:
Variance of a composite \(\underline{u}\):
\[s^2_u =\underline{a}' \; \textbf{S} \; \underline{a}\]
where \(\textbf{S}\) is the variance covariance matrix:
\[\textbf{S} = \frac{1}{n-1} \; \left(\textbf{X}' \; \textbf{X} - \frac{1}{n} \; \textbf{X}' \; \textbf{E} \; \textbf{X}\right) = \frac{1}{n-1} \; \textbf{P}\]
Why do we care about the mean and variance of composites?
Statistical procedures create composites and then
Calculating the variance of the composite directly is computationally easier
Also, quadratic form will be helpful later
. | Composite 1 | Composite 2 |
---|---|---|
Variables | \(\textbf{X}\) | \(\textbf{X}\) |
Weights | \(\underline{a}\) | \(\underline{c}\) |
Composite | \(\underline{u} = \textbf{X} \; \underline{a}\) | \(\underline{w} = \textbf{X} \; \underline{c}\) |
Mean of composite | \(\overline{U} = \overline{\underline{X}} \; \underline{a}\) | \(\overline{W} = \overline{\underline{X}} \; \underline{c}\) |
Variation of composite | \(\underline{a}' \; \textbf{P}_{XX} \; \underline{a}\) | \(\underline{c}' \; \textbf{P}_{XX} \; \underline{c}\) |
Variance of composite | \(\underline{a}' \; \textbf{S}_{XX} \; \underline{a}\) | \(\underline{c}' \; \textbf{S}_{XX} \; \underline{c}\) |
Covariation bet composites | \(SP_{UW}\) = | \(\underline{a}' \; \textbf{P}_{XX} \; \underline{c}\) |
Covariance bet composites | \(s_{UW}\) = | \(\underline{a}' \; \textbf{S}_{XX} \; \underline{c}\) |
. | Comp 1 on Xs | Comp 2 on Ys |
---|---|---|
Variables | \(\textbf{X}\) | \(\textbf{Y}\) |
Weights | \(\underline{a}\) | \(\underline{d}\) |
Composite | \(\underline{u} = \textbf{X} \; \underline{a}\) | \(\underline{z} = \textbf{Y} \; \underline{d}\) |
Mean of composite | \(\overline{U} = \overline{\underline{X}} \; \underline{a}\) | \(\overline{Z} = \overline{\underline{Y}} \; \underline{d}\) |
Variation of comp | \(SS_U=\underline{a}' \; \textbf{P}_{XX} \; \underline{a}\) | \(SS_Z = \underline{d}' \; \textbf{P}_{YY} \; \underline{d}\) |
Variance of comp | \({s}^2_U=\underline{a}' \; \textbf{S}_{XX} \; \underline{a}\) | \({s}^2_Z = \underline{d}' \; \textbf{S}_{YY} \; \underline{d}\) |
Covariation bet comp | \(SP_{UZ}\) = | \(\underline{a}' \; \textbf{P}_{XY} \; \underline{d}\) |
Covariance bet comp | \(s_{UZ}\) = | \(\underline{a}' \; \textbf{S}_{XY} \; \underline{d}\) |
\[\textbf{M} = \begin{bmatrix} \textbf{X} & \textbf{Y} \end{bmatrix}\]
Order \((n, p+q)\): there are \(p\) X variables and \(q\) Y variables
\[\begin{array}{c|ccc|ccc|} Subjects & & Predictors & & & Outcomes & \\ & X_1 & \dots & X_p & Y_1 & \dots & Y_q \\ \hline 1 & X_{11} & \dots & X_{1p} & Y_{11} & \dots & Y_{1q} \\ \dots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ n & X_{n1} & \dots & X_{np} & Y_{n1} & \dots & Y_{nq} \\ \hline \end{array}\]
\[\textbf{P}_{XX, YY} = \textbf{M}' \; \textbf{M} - \frac{1}{n} \textbf{M}' \; \textbf{E} \; \textbf{M} = \left[\begin{array}{c|c} \textbf{P}_{XX} & \textbf{P}_{XY} \\ \hline \textbf{P}_{YX} & \textbf{P}_{YY} \end{array}\right]\] \[= \left[\begin{array}{c|c} \textbf{X}' \; \textbf{X} - \frac{1}{n} \textbf{X}' \; \textbf{E} \; \textbf{X} & \textbf{X}' \; \textbf{Y} - \frac{1}{n} \textbf{X}' \; \textbf{E} \; \textbf{Y}\\ \hline \textbf{Y}' \; \textbf{X} - \frac{1}{n} \textbf{Y}' \; \textbf{E} \; \textbf{X} & \textbf{Y}' \; \textbf{Y} - \frac{1}{n} \textbf{Y}' \; \textbf{E} \; \textbf{Y} \end{array}\right]\]
\[\textbf{P}_{XX, YY} = \] \[\left[\begin{array}{cccc|cccc} SS_{x1} & SP_{x1,x2} & \dots & SP_{x1,xp} & SP_{x1, y1} & SP_{x1,y2} & \dots & SP_{x1,yq} \\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ SP_{xp,x1} & SP_{xp,x2} & \dots & SS_{xp} & SP_{xp,y1} & SP_{xp,y2} & \dots & SP_{xp,yq} \\ \hline SP_{y1, x1} & SP_{y1,x2} & \dots & SP_{y1,xp} & SS_{y1} & SP_{y1,y2} & \dots & SP_{y1,yq}\\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ SP_{yq,x1} & SP_{yq,x2} & \dots & SP_{yq,xp} & SP_{yq,y1} & SP_{yq,y2} & \dots & SS_{yq} \end{array}\right]\]
\[\textbf{S}_{XX, YY} = \frac{1}{(n-1)}\left(\textbf{M}' \; \textbf{M} - \frac{1}{n} \textbf{M}' \; \textbf{E} \; \textbf{M}\right) = \left[\begin{array}{c|c} \textbf{S}_{XX} & \textbf{S}_{XY} \\ \hline \textbf{S}_{YX} & \textbf{S}_{YY} \end{array}\right]\] \[= \left[\begin{array}{c|c} \frac{1}{(n-1)}(\textbf{X}' \; \textbf{X} - \frac{1}{n} \textbf{X}' \; \textbf{E} \; \textbf{X}) & \frac{1}{(n-1)}(\textbf{X}' \; \textbf{Y} - \frac{1}{n} \textbf{X}' \; \textbf{E} \; \textbf{Y}) \\ \hline \frac{1}{(n-1)}(\textbf{Y}' \; \textbf{X} - \frac{1}{n} \textbf{Y}' \; \textbf{E} \; \textbf{X}) & \frac{1}{(n-1)}(\textbf{Y}' \; \textbf{Y} - \frac{1}{n} \textbf{Y}' \; \textbf{E} \; \textbf{Y}) \end{array}\right]\]
\[\textbf{S}_{XX, YY} =\] \[\left[\begin{array}{cccc|cccc} s^2_{x1} & s_{x1,x2} & \dots & s_{x1,xp} & s_{x1, y1} & s_{x1,y2} & \dots & s_{x1,yq} \\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ s_{xp,x1} & s_{xp,x2} & \dots & s^2_{xp} & s_{xp,y1} & s_{xp,y2} & \dots & s_{xp,yq} \\ \hline s_{y1, x1} & s_{y1,x2} & \dots & s_{y1,xp} & s^2_{y1} & s_{y1,y2} & \dots & s_{y1,yq}\\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ s_{yq,x1} & s_{yq,x2} & \dots & s_{yq,xp} & s_{yq,y1} & s_{yq,y2} & \dots & s^2_{yq} \end{array}\right]\]
\[\textbf{R}_{XX, YY} = \left[\begin{array}{c|c} \textbf{R}_{XX} & \textbf{R}_{XY} \\ \hline \textbf{R}_{YX} & \textbf{R}_{YY} \end{array}\right] =\] \[\left[\begin{array}{cccc|cccc} 1 & r_{x1,x2} & \dots & r_{x1,xp} & r_{x1, y1} & r_{x1,y2} & \dots & r_{x1,yq} \\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ r_{xp,x1} & r_{xp,x2} & \dots & 1 & r_{xp,y1} & r_{xp,y2} & \dots & r_{xp,yq} \\ \hline r_{y1, x1} & r_{y1,x2} & \dots & r_{y1,xp} & 1 & r_{y1,y2} & \dots & r_{y1,yq}\\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ r_{yq,x1} & r_{yq,x2} & \dots & r_{yq,xp} & r_{yq,y1} & r_{yq,y2} & \dots & 1 \end{array}\right]\]
Also called OLS (ordinary least squares) regression, normal regression, just “regression”
Data:
Problem:
Find an equation that “best” summarizes the relationship between \(X\) and \(Y\)
\[\hat{weight} = -253.94 + 5.8 height\]
Each obs has one outcome value (\(Y_i\)), one predicted value (\(\hat{Y}_i\)), and one residual (\(Y_i - \hat{Y}_i\))
Least squares criterion:
For linear regression, there is one value of \(b_0\) and one value of \(b_1\) that minimize the residuals
Functions that have squares in them (like the sum of squared residuals) look like a “U”
That happens using calculus (which you don’t need to know)
But you need to understand what is going on in the process
The tangent line is a line that touches a curve at a single point
The tangent line is horizontal (\(slope= 0\)) at the minimum
We want to find the minimum of the sum of squared residuals
We find the tangent line by using calculus
State the function to be minimized
Differentiate (take the derivative of) the function, with respect to the constants of interest
Set those derivatives equal to 0
Solve the normal equations for the constants of interest
\[\Sigma(Y_i - \hat{Y}_i)^2 =\]
\[\Sigma(Y - (b_1 X + b_0))^2 =\] \[\Sigma(Y - b_1 X - b_0)^2 =\]
\[\Sigma(Y^2 + {b_0}^2 + {b_1}^2 X^2 - 2 b_0 Y - 2 b_1 X Y + 2 b_0 b_1 X ) =\] \[\Sigma Y^2 + \Sigma{b_0}^2 + \Sigma{b_1}^2 X^2 - \Sigma 2 b_0 Y - \Sigma 2 b_1 X Y + \Sigma 2 b_0 b_1 X =\]
\[\Sigma Y^2 + n{b_0}^2 + {b_1}^2 \Sigma X^2 - 2 b_0 \Sigma Y - 2 b_1 \Sigma X Y + 2 b_0 b_1 \Sigma X\]
For \(b_1\):
\[\frac{\partial \Sigma (Y - \hat{Y})^2}{\partial b_1} = 2 b_1 \Sigma X^2 - 2 \Sigma X Y + 2 b_0 \Sigma X\]
For \(b_0\):
\[\frac{\partial \Sigma (Y - \hat{Y})^2}{\partial b_0} = 2 n b_0 - 2 \Sigma Y + 2 b_1 \Sigma X\]
For \(b_1\):
\[2 b_1 \Sigma X^2 - 2 \Sigma X Y + 2 b_0 \Sigma X = 0\] \[\vdots\] \[b_1 = \frac{n \Sigma X Y - (\Sigma X) (\Sigma Y)}{n \Sigma X^2 - (\Sigma X)^2} = \frac{SP_{XY}}{SS_X} = \frac{s_{XY}}{{s_X}^2}\]
For \(b_0\):
\[2 n b_0 - 2 \Sigma Y + 2 b_1 \Sigma X = 0\] \[\vdots\] \[b_0 = \overline{Y} - b_1 \overline{X}\]
The least squares solution gets more complex with more predictors (and thus more regression coefficients to solve for)
Two predictor regression:
The multiple correlation is the correlation between \(Y\) and \(\hat{Y}\)
If you used least squares estimation, the multiple correlation is the maximum possible correlation between \(Y\) and \(\hat{Y}\)
The square of the multiple correlation (\({R^2}_{multiple}\)) tells you the proportion of variation in \(Y\) that is accounted for by the set of predictors
\({R^2}_{multiple} = {r^2}_{Y\hat{Y}} = \frac{SS_{regression}}{SS_Y} = \frac{predictable \; variation}{total \; variation}\)
Next week:
The predicted score in multiple regression is a composite or linear combination
From this scalar version of regression to the matrix version