Matrices have a lot of parts – how can we understand them?
Much of this is in service of assessing:
Ordinary least squares (OLS) estimation minimizes the sum of squared residuals to get regression coefficients
Other multivariate techniques use maximizing functions to find solutions
So why don’t we maximize the multiple correlation instead?
Cut out the “middle man” that is minimizing the sum of squared residuals
Two related reasons
Two unknowns, two equations = solvable for all unknowns:
Two unknowns, one equation = NOT solvable for all unknowns:
The OLS regression weights are unique
But it doesn’t work the opposite direction
Could we get a unique solution by maximizing \(R^2_{multiple}\)?
Approach: we want to simultaneously
Constrain means that we set some part of the model to a value instead of estimating it
Regression
We will use this general approach for other methods, such as factor analysis
Vectors have an algebraic interpretation
Vectors also have a geometric interpretation
PCA, factor analysis: large number of variables reduced to a smaller number of dimensions
Regression diagnostics: distance between points in space
Every vector has a direction and a length
\(X\), \(Y\), and \(Z\) axes represent a 3 dimensional space
The three axes can be written as vectors:
Test that measures these three uncorrelated abilities
This composite is represented by a vector \(\underline{a}' = (1, 2, 2)\)
Data (i.e., vectors and matrices) can be represented geometrically
We have to define our geometric space
Anything in the space is a function of the axes
Linear independence: no vectors are multiples or sums of another
Linear dependence (also called “collinearity”): they are
\(X\) and \(Y\) are not orthogonal, but they are linearly independent
\(Z\) is not linearly independent of \(X\) and \(Y\): 4 + 1 = 5 and 3 - 2 = 1
Using all 3 axes would result in linear dependence
2 dimensions in a flat plane, so the basis can be any 2 linearly independent vectors
3 dimensions in a space, so the basis can be any 3 linearly independent vectors
Same for more dimensions, only it’s harder to imagine it
Use this in PCA and factor analysis to reduce many measures to fewer linearly independent vectors (factors)
\(\underline{z} = \begin{bmatrix} z_1 \\ z_2 \end{bmatrix}\)
The length of \(\underline{z}\) is:
\(length(\underline{z}) = ||\underline{z}|| = \sqrt{z^2_1 + z^2_2} = (\underline{z}'\underline{z})^{1/2}\)
Last expression generalizes to more than 2 dimensions
Standardizing variables:
But standardization doesn’t change:
Mean = 5, SD = 2 (Variance = 4)
Mean = 0, SD = 1 (Variance = 1)
Mean = 5, SD = 2 (Variance = 4)
Mean = 0, SD = 1 (Variance = 1)
To standardize, divide each element by the length of the vector:
The length of the vector \(\begin{bmatrix} \frac{4}{5} \\ \frac{3}{5} \end{bmatrix}\) is 1
Length = 5 and 2.236
Length = 1 and 1
The angle between vectors reflects their correlation
Angle > 90: \(r\) \(\rightarrow\) -1
Angle = 90: \(r\) = 0
Angle < 90: \(r\) \(\rightarrow\) +1
\(\textbf{A} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \end{bmatrix} = \begin{bmatrix} 5 & 2 \\ 2 & 3 \\ \end{bmatrix}\)
Vector 1: \(\underline{a}_1 = \begin{bmatrix} 5 \\ 2 \end{bmatrix}\)
Vector 2: \(\underline{a}_2 = \begin{bmatrix} 2 \\ 3 \\ \end{bmatrix}\)
\(r\) approaches -1
\(r\) = 0
\(r\) appraoches +1
\[\textbf{Q} = \begin{matrix}\textbf{X} \\ (n,p) \end{matrix} \; \begin{matrix}\textbf{X}' \\ (p,n) \end{matrix}\]
The determinant uses all elements in the matrix to provide a summary of the relationships in the matrix
For a \(2 \times 2\) matrix, the determinant is straightforward:
\[\textbf{A} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}\] \[det(\textbf{A}) = \vert \textbf{A} \vert = a_{11}a_{22} - a_{12}a_{21}\]
With a \(2 \times 2\) matrix, it’s easy to see how the determinant relates to correlations between variables
Highly correlated variables
\(\textbf{R}_{XX} = \begin{bmatrix} 1 & 0.99 \\ 0.99 & 1 \end{bmatrix}\)
\(|\textbf{R}_{XX}| =\)
\(1 \times 1 - 0.99 \times 0.99 =\)
\(1 - 0.9801 = 0.0199\)
Moderately correlated variables
\(\textbf{R}_{XX} = \begin{bmatrix} 1 & 0.5 \\ 0.5 & 1 \end{bmatrix}\)
\(|\textbf{R}_{XX}| =\)
\(1 \times 1 - 0.5 \times 0.5 =\)
\(1 - 0.25 = 0.75\)
What does this have to do with regression?
If there is linear dependence in \(\textbf{X}\):
If there is linear dependency (or just highly correlated variables) in your regression, you will get an error message
The message varies depending on the program and the procedure
Rank of a matrix is related to the determinant
Maximum rank of a matrix = lesser of # of rows and # of columns
Linear dependence means there is less information in the matrix than there appears
Eigenvector | Eigenvalue | |
---|---|---|
they are | vector | scalar |
they are | reference axes | amount of variance that is associated with that reference axis |
also called | characteristic vector | characteristic root |
also called | latent vector | latent root |
We want to maximize functions while also building in constraints
Expand the normal equations from least squares estimation
Homogenous equations solution: eigenvectors / eigenvalues
Normed to unity
Normed to root
Eigenvalues and eigenvectors divide up the variance in a matrix
If variables in the matrix are more correlated:
x1 | x2 |
---|---|
7.555 | 23.265 |
2.406 | 16.416 |
1.756 | -8.710 |
2.262 | -11.823 |
3.502 | 12.694 |
7.592 | 28.863 |
8.825 | -0.250 |
2.757 | 9.012 |
2.434 | 7.180 |
5.606 | 34.126 |
Covariance matrix of \(\textbf{A}\)
x1 | x2 | |
---|---|---|
x1 | 7.116 | 19.240 |
x2 | 19.240 | 232.329 |
Correlation matrix of \(\textbf{A}\)
x1 | x2 | |
---|---|---|
x1 | 1.000 | 0.473 |
x2 | 0.473 | 1.000 |
Determinant of cov(\(\textbf{A}\))
Determinant of cor(\(\textbf{A}\))
Rank of \(\textbf{A} = 2\)
1 | 233.960 |
2 | 5.484 |
v1 | v2 |
---|---|
0.085 | -0.996 |
0.996 | 0.085 |
1 | 1.473 |
2 | 0.527 |
v1 | v2 |
---|---|
0.707 | -0.707 |
0.707 | 0.707 |
Eigenvalues and eigenvectors are central to principal components analysis (PCA) and factor analysis (FA)
PCA and FA seek to reduce the dimension of a set of variables by finding a smaller set of axes that can represent all the variables