Multivariate: Mixed models

1 Goals

1.1 Goals

1.1.1 Goals of this section

Multiple measures of the same thing or related things as an outcome
- Possibly over time
Want the variables separate: Not PCA / FA
In this section:
- Last time: MANOVA and repeated measures ANOVA
- Mixed models (this week)

1.1.2 Goals of this lecture

Mixed models as an approach to repeated measures
- Focus on individual change
- Fewer problems with missing data
- Continuous and unevenly spaced time
- Flexible with predictors (continuous and categorical)
Way more complex & interesting than we have time to talk about!
- Take another class: Longitudinal, Multilevel, Categorical, SEM

2 Linear mixed model

2.1 Linear mixed model

2.1.1 Linear mixed model

Also known as random coefficient models, multilevel models, nested models, hierarchical linear models, random effects models
Developed in different disciplines
- Random coefficient models from statistics and biostatistics
- Multilevel models from education

2.1.2 Linear mixed model

Model for non-independent observations
- Cross-sectional
  - Multiple schoolchildren with the same teacher
  - Employees who work in teams or workgroups
- Longitudinal
  - Multiple observations from the same individual over time
Observations from same class/team/person are more similar to one another than observations from different classes/teams/persons

2.1.3 How not to do it

https://xkcd.com/2533

2.1.4 Non-independence

Non-independence means that there is some redundancy (or correlation) between observations
- Effective sample size is smaller that the actual sample size
  - Collect 100 observations but we only have (for example) 72 obs’ worth of information, due to correlations between obs
Smaller effective sample size means standard error is underestimated if you ignore non-independence
- How much the standard errors are underestimated depends on how much the observations are related to one another

2.1.5 Linear mixed model: Motivation

Linear mixed model (LMM): Extension of general linear model (GLM)
- Partitions variation, just like ANOVA and regression
- But more ways to partition and more control over the form

2.1.6 Linear mixed model: Motivation

How are observations related to one another?
- Linear regression: They’re not (Independence)
- Between-subjects ANOVA: They’re not (Independence)
- Repeated-measures ANOVA: According to “compound symmetry”
- LMM has two ways to do this
  - Random effects (this class)
  - Correlated residuals (not this class)

2.1.7 Random effects

Relationship between time and outcome
- X axis = time
- Y axis = outcome
Random effects: Relationship can be different for each person
- Differences in intercept = random intercepts
- Differences in change over time = random slopes
- Can have one or other or both

2.1.8 Data: Executive functioning dataset

id	sex	tx	wave	dlpfc	ef	age	age12
1	1	0	1	-0.184	2.167	12.027	0.027
1	1	0	2	1.129	1.806	13.058	1.058
1	1	0	3	-0.840	1.444	14.074	2.074
1	1	0	4	0.472	2.889	15.112	3.112
2	1	0	1	0.801	0.722	12.089	0.089
2	1	0	2	1.129	1.444	13.124	1.124
2	1	0	3	0.801	1.806	13.997	1.997
2	1	0	4	1.457	2.528	15.021	3.021
3	1	1	1	0.472	3.250	11.953	-0.047
3	1	1	2	1.129	3.250	13.048	1.048
3	1	1	3	0.144	2.528	13.820	1.820
3	1	1	4	0.144	2.528	15.058	3.058
4	0	0	1	0.472	3.611	12.076	0.076
4	0	0	2	0.472	4.333	12.845	0.845
4	0	0	3	0.472	3.972	13.818	1.818
4	0	0	4	0.472	3.972	14.931	2.931

2.1.9 Individual trajectories for first 4 people

2.1.10 All individual trajectories: Spaghetti plot

2.1.11 Assumptions

Linear regression assumes independence of observations
- Definitely not true here
- What should we do?
We can still use the regression lines for each person
- Non-independence only a problem for estimating standard errors
- Can use the estimates of individual intercepts and slopes

2.1.12 So I just report 100 intercepts and slopes?

What do we do with all those regression lines?
- Fixed effects: Average of intercepts and slopes
- Random effects: Variance of intercepts and slopes

Important

You have control over the model: Everyone can have a different slope but they don’t have to
- Both random intercepts and slopes: People can have different change over time
- Only random intercepts: Everyone has the same change over time

2.1.13 Mixed model: Output in R

Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: dlpfc ~ 1 + age12 + tx + age12 * tx + (1 + age12 | id)
   Data: ef_uni

     AIC      BIC   logLik deviance df.resid 
  3502.5   3543.9  -1743.3   3486.5     1296 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.89823 -0.50229 -0.02245  0.51797  2.80891 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 id       (Intercept) 0.67110  0.8192        
          age12       0.05248  0.2291   -0.37
 Residual             0.48857  0.6990        
Number of obs: 1304, groups:  id, 342

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)   0.58230    0.07792 341.09983   7.473 6.63e-13 ***
age12         0.11158    0.03059 335.26580   3.648 0.000307 ***
tx           -0.06801    0.10933 342.28563  -0.622 0.534346    
age12:tx      0.01185    0.04298 338.26053   0.276 0.782972    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
         (Intr) age12  tx    
age12    -0.550              
tx       -0.713  0.392       
age12:tx  0.391 -0.712 -0.552

2.1.14 Mixed model: Output in SPSS

2.1.15 Fixed effects: Means or averages

term	estimate	std.error	statistic	df	p.value
(Intercept)	0.582	0.078	7.473	341.100	0.000
age12	0.112	0.031	3.648	335.266	0.000
tx	-0.068	0.109	-0.622	342.286	0.534
age12:tx	0.012	0.043	0.276	338.261	0.783

R: \(t\)-tests with no df or \(p\)-values with just lme4 package
- df and \(p\)-values with lmerTest package
SPSS: \(t\)-tests

2.1.16 Fixed effects: Interpretation

\(Y = 0.582 + 0.112 (age12) + -0.068 (tx) + 0.012 (age12*tx)\)
- tx = 0 (in \(\color{blue}{blue}\)): \(Y = 0.582 + 0.112 (age12)\)
  - Intercept: Expected dlpfc when age12 = 0, for group tx = 0
  - Slope: Change in dlpfc for 1 unit change in age12, for group tx = 0
- tx = 1 (in \(\color{red}{red}\)): \(Y = 0.514 + 0.123 (age12)\)
  - Intercept: Expected dlpfc when age12 = 0, for group tx = 1
  - Slope: Change in dlpfc for 1 unit change in age12, for group tx = 1

2.1.17 Fixed effects: Figure

2.1.18 Random effects: Variances

term	estimate
var__(Intercept)	0.671
cov__(Intercept).age12	-0.070
var__age12	0.052
var__Observation	0.489

R: No test statistics
SPSS: \(z\)-tests (\(p\)-value/2 for variances)

2.1.19 Random effects: Interpretation

Intercept variance: Variance of individual intercepts
Slope variance: Variance of individual slopes
Correlation between intercept and slope: Correlation between individual intercepts and slopes
- How is a person’s intercept related to their slope?
Residual: Error
- How well we do at predicting the individual trajectory

2.1.20 Prediction interval: Fixed + random

Average effects with individual variation
- What do typical individual effects look like?
- Assume normally distributed variance: \(estimate \pm 1.96\times SD\)
Prediction intervals
- Interval for likely values of individual intercepts and slopes
Not confidence intervals
- Sampling distribution of test statistic
- You get those for fixed effects

2.1.21 Prediction interval: Fixed + random

Average intercept (for tx = 0) = \(0.582\)
- \(1.96 \times SD = 1.96 \times 0.819 = 1.606\)
- 95% of individual intercepts are in [\(-1.023\), \(2.188\)]
Average slope (for tx = 0) = \(0.112\)
- \(1.96 \times SD = 1.96 \times 0.229 = 0.449\)
- 95% of individual slopes are in [\(-0.337\), \(0.561\)]

2.1.22 Means + individual variability

3 LMM: Some more details

3.1 Equations

3.1.1 Linear mixed model: Equations

\(Y = \underbrace{\textbf{X} \beta}_\text{fixed effects} + \underbrace{\textbf{Z} \gamma}_\text{random effects} + \underbrace{\epsilon}_\text{residual}\)

Fixed effects: Average effects of predictors
- Like regression coefficients
Random effects: Individual variation around those averages
- Variances and covariances
Residual: Error in predicting individual trajectories
- Also a variance (but usually not interpreted)

3.1.2 Linear mixed model: Equations

Random \(\color{blue}{intercept}\) and \(\color{red}{slope}\)
- \(Y_{ij} = \underbrace{\beta_{00} + \beta_{10} (age12_{ij}) + \beta_{01} (tx_i) + \beta_{11} (age12_{ij})(tx_i)}_\text{fixed effects} + \underbrace{\color{blue}{r_{0i}} + {\color{red}{r_{1i}}} (age12_{ij})}_\text{random effects} + \underbrace{e_{ij}}_\text{residual}\)

Random effects are normally distributed with mean 0 and variance-covariance matrix \(\textbf{G}\)
- \(\gamma \sim N(0, \textbf{G})\)
- \(\textbf{G} = \begin{bmatrix} {\sigma_{r_{0i}}}^2 \\ \sigma_{r_{0i}r_{1i}} & {\sigma_{r_{1i}}}^2 \\ \end{bmatrix}\)

3.1.3 Linear mixed model: Equations

Random \(\color{blue}{intercept}\) only
- \(Y_{ij} = \underbrace{\beta_{00} + \beta_{10} (age12_{ij}) + \beta_{01} (tx_i) + \beta_{11} (age12_{ij})(tx_i)}_\text{fixed effects} + \underbrace{\color{blue}{r_{0i}}}_\text{random effects} + \underbrace{e_{ij}}_\text{residual}\)

Random effects are normally distributed with mean 0 and variance-covariance matrix \(\textbf{G}\)
- \(\gamma \sim N(0, \textbf{G})\)
- No random slopes, so \(\textbf{G} = \begin{bmatrix} {\sigma_{r_{0i}}}^2 \end{bmatrix}\)

3.1.4 Example data: Equations

Random \(intercept\) and \(slope\)
- \(Y_{ij} = \underbrace{0.582 + 0.112 (age12_{ij}) -0.068 (tx) + 0.012 (tx) (age12_{ij})}_\text{fixed effects} + \underbrace{r_{0i} + r_{1i} (age12_{ij})}_\text{random effects} + \underbrace{e_{ij}}_\text{residual}\)

Random effects are normally distributed with mean 0 and variance-covariance matrix \(\textbf{G}\)
- \(\textbf{G} = \begin{bmatrix} {\sigma_{r_{0i}}}^2 & \sigma_{r_{0i}r_{1i}} \\ \sigma_{r_{0i}r_{1i}} & {\sigma_{r_{1i}}}^2 \\ \end{bmatrix} = \begin{bmatrix} 0.671 & -0.07 \\ -0.07 & 0.052 \\ \end{bmatrix}\)
- Can convert variances and covariances into SDs and correlations for interpretation
  - e.g., \(\sqrt{0.671} = 0.819\)

3.2 Predictors

3.2.1 Adding predictors to the model

The example model has two predictors
- Treatment (tx): Categorical (dummy code)
  - Time-invariant predictor: Same value at all times
- Age (age12): Continuous
  - Time-varying predictor: Different value at each time
Two types of predictors are entered into model in different ways
- Peugh, J. L. (2010). A practical guide to multilevel modeling. Journal of school psychology, 48(1), 85-112.

3.2.2 Data: Executive functioning dataset

id	sex	tx	wave	dlpfc	ef	age	age12
1	1	0	1	-0.184	2.167	12.027	0.027
1	1	0	2	1.129	1.806	13.058	1.058
1	1	0	3	-0.840	1.444	14.074	2.074
1	1	0	4	0.472	2.889	15.112	3.112
2	1	0	1	0.801	0.722	12.089	0.089
2	1	0	2	1.129	1.444	13.124	1.124
2	1	0	3	0.801	1.806	13.997	1.997
2	1	0	4	1.457	2.528	15.021	3.021
3	1	1	1	0.472	3.250	11.953	-0.047
3	1	1	2	1.129	3.250	13.048	1.048
3	1	1	3	0.144	2.528	13.820	1.820
3	1	1	4	0.144	2.528	15.058	3.058
4	0	0	1	0.472	3.611	12.076	0.076
4	0	0	2	0.472	4.333	12.845	0.845
4	0	0	3	0.472	3.972	13.818	1.818
4	0	0	4	0.472	3.972	14.931	2.931

3.2.3 LMM = Multilevel model

LMM is also a multilevel model where
- Level 1: Occasions
- Level 2: Person
- Multiple occasions are nested within each person (L1 w/in L2)
LMM can be re-written in terms of L1 and L2
- Time-varying predictors go in L1 (occasions) part
- Time-invariant predictors go in L2 (person) part

3.2.4 Multilevel models: Adding predictors

Level 1: Trajectories
- \(Y_{ij} = \pi_{0i} + \pi_{1i} (age12_{ij}) + e_{ij}\)

Level 2: People
- \(\pi_{0i} = \beta_{00} + \beta_{01} (tx_i) + r_{0i}\)
- \(\pi_{1i} = \beta_{10} + \beta_{11} (tx_i) + r_{1i}\)

Combined: Put them together
- \(Y_{ij} = \beta_{00} + \beta_{10} (age12_{ij}) + \beta_{01} (tx_i) + \beta_{11} (age12_{ij})(tx_i) + \\r_{0i} + r_{1i} (age12_{ij}) + e_{ij}\)

3.3 Centering

3.3.1 Why center predictors?

Interactions: Reduce collinearity

Interactions and more generally: Improve interpretability
- Intercept: Predicted outcome when \(X\) = 0
- What if \(X\) = 0 doesn’t exist or is somewhere useless?

Mixed models: Unconflate time-specific (L1) and person-specific (L2) relationships

3.3.2 Why does this matter so much for mixed models?

Level 1 (occasion) observations have two kinds of information
- Occasion (L1)
- Person (L2)
If you ask me one day if I’m depressed, that gives you information about
- How depressed I am that day (occasion, L1)
- How depressed I generally am (person, L2)

3.3.3 Centering is more complicated

Grand mean centering (GMC)
- Center all observations at the grand mean of all observations
- Doesn’t change the relationships among variables
Centering within cluster (CWC)
- Center each person’s observations at the mean of that person
- Does change the relationships among variables

3.3.4 Figure: Uncentered

3.3.5 Figure: GMC

3.3.6 Figure: CWC

3.3.7 GMC vs CWC

Centering changes the context for the different clusters (L2: People)
- GMC maintains mean differences between people on L1 predictor
  - What is a person like compared to other people?
- CWC leliminates differences between people on L1 predictor
  - What are people like compared to their own mean?
Different contexts means different interpretations for both level 1 and level 2 predictors

3.3.8 Centering predictors: Some references

Yaremych, H. E., Preacher, K. J., & Hedeker, D. (2021). Centering categorical predictors in multilevel models: Best practices and interpretation. Psychological Methods.
Rights, J. D., Preacher, K. J., & Cole, D. A. (2020). The danger of conflating level‐specific effects of control variables when primary interest lies in level‐2 effects. British Journal of Mathematical and Statistical Psychology, 73, 194-211.
Hamaker, E. L., & Muthén, B. (2020). The fixed versus random effects debate and how it relates to centering in multilevel modeling. Psychological methods, 25(3), 365.
Hoffman, L. (2019). On the interpretation of parameters in multivariate multilevel models across different combinations of model specification and estimation. Advances in methods and practices in psychological science, 2(3), 288-311.
West, S. G., Ryu, E., Kwok, O. M., & Cham, H. (2011). Multilevel modeling: Current and future applications in personality research. Journal of personality, 79(1), 2-50.
Enders, C. K., & Tofighi, D. (2007). Centering predictor variables in cross-sectional multilevel models: a new look at an old issue. Psychological methods, 12(2), 121.

3.3.9 Centering time

“Time” is a special predictor
- Center time variable so that 0 is at a meaningful point
- Intercept: Expected value of outcome when \(X\) = 0
  - Age = 0?
  - Baseline?
  - Centered age at specific time

3.4 Shape of change

3.4.1 Is it linear?

http://www.xkcd/605

3.4.2 If not linear, then what?

https://xkcd.com/2048/

3.4.3 Shape of change: L1 equation

Linear change
- \(Y_{ij} = \pi_{0i} + \pi_{1i} (age12_{ij}) + e_{ij}\)
Quadratic change
- \(Y_{ij} = \pi_{0i} + \pi_{1i} (age12_{ij}) + \pi_{2i} (age12_{ij})^2 + e_{ij}\)
Logarithmic change
- \(Y_{ij} = \pi_{0i} + \pi_{1i} (ln(age12_{ij})) + e_{ij}\)

3.4.4 Non-linear change

Many phases of development or change are non-linear
- Increase followed by plateau / maintanence
- Decrease to a set point
- Sometimes reflect floor or ceiling effects
Non-linear change in inherently more complex
- Straight lines are easy

3.5 Other stuff

3.5.1 Intraclass correlation (ICC)

Quantifies non-independence in repeated outcome
Use “random effects ANOVA” or “unconditional mixed model”
- Like “no predictors” model from logistic regression
Ratio of L1 and L2 variability:
- \(ICC = \frac{\sigma_{r_{0i}}^2}{\sigma_{r_{0i}}^2 + \sigma_{e}^2}\)
- Proportion of variance due to differences between people

3.5.2 Variance explained and variance reduction

When you compare your model to the unconditional model
- Variance explained
- How much variance does my model explain?
- Like \(R^2\)
When you compare your model to some other (simpler) model
- Variance reduction
- How much is (error) variance reduced by adding whatever you added?
- Like \(R_{change}^2\)

3.5.3 Variance explained and variance reduction

Model 1 is simpler, Model 2 is more complex
- The model you “care about” is Model 2
Reduction in variance =

\[\frac{\sigma_{e}^2(Model1) - \sigma_{e}^2(Model2)}{\sigma_{e}^2(Model1)}\]

3.5.4 Missing data

Missing on outcome: OK (assuming MAR)
- Uses all observations for a person to create trajectories
Missing on predictor: Case is dropped
- Make sure no missing or use multiple imputation

3.5.5 Extensions of mixed models

Change in multiple variables at once
- Baldwin et al. (2014): Complicated but possible
Nonnormal outcomes
- More difficult in unexpected ways when outcomes are non-normal
SEM framework (Latent growth models)
- Growth as a predictor, simultaneous growth of multiple processes, and other more complex models

4 Summary

4.1 Summary

4.1.1 Summary of this week

Mixed models as an approach to repeated measures
- Focus on individual change
- Fewer problems with missing data
- Continuous and unevenly spaced time
- Flexible with predictors (continuous and categorical)
Way more complex & interesting than we have time to talk about!
- Adding predictors, shape of change, multiple outcomes
- Take Longitudinal or Multilevel models or Categorical or SEM

4.1.2 Summary: RM ANOVA vs mixed models

RM ANOVA focuses on group level differences in means at each time point
- Only uses complete cases on the outcome
- Categorical predictors only
LMM focuses on individual trajectories over time
- Uses all observations available on the outcome
- Continuous or categorical predictors
In general, I would always use some version of a mixed model