# Linear Mixed-Effects Modeling

**Note** (Aug. 23, 2010): a big selling point of 3dLME is its strong capability of running group analysis when the hemodynamic response function is modeled with multiple basis functions (e.g., TENT, SIN, etc.) in which all the individual regression coefficients can be brought to 3dLME without any summarizing process. Other than this, if you come to this page, read this discussion, and most likely 3dMEMA or even 3dttest is a better choice for you when possible.

Linear mixed-effects modeling is a totally different approach from the traditional AN(C)OVA. Check out this website (introduction to linear mixed-effects modeling) for theoretical considerations. You can still use the Matlab package GroupAna for group analysis with the number of fixed factors up to 4 and for unbalanced designs (unequal number of subjects across groups). You can also use 3dRegAna to handle missing data or designs with covariates. However if you don't have Matlab available or if your design falls beyond the capability of GroupAna/3dRegAna/Yourself, then this R package, Rpack, might be the program to try your luck. Unlike Matlab, R is an open source platform (a clone of S) with continuing support from a huge pool of statisticians and programmers. 3dLME adopts the linear mixed-effects modeling approach in R (similar to its counterparts in SAS, SPSS, and other statistical platforms) for FMRI group analysis.

Advantages:

a. Unbalanced designs

Like GroupAna, 3dLME can run analysis of unbalanced designs with unequal number of subjects across groups. However unlike GroupAna in which Type I sum of squares was adopted, the approach of linear mixed-effect modeling in 3dLME is quite different. And more options may be added in the future.

b. Missing data

Previously if a few beta values are missing from a subject, that subject may have to stay out of the whole group analysis. Now this is not an issue in 3dLME. Whatever partial information available can contribute to the analysis. Assumption: the missing data are missing at random (MAR).

c. Unlimited number of factors and covariates

The only upper bound is the modeling strategy, computer power and patience.

d. Bringing individual beta's instead of AUC to group analysis (still under development)

If each regressor is modeled with multiple basis functions at individual subject analysis level, the traditional approach for group analysis is through AUC (Area Under the Curve). However the AUC approach is very problematic in the following aspectis: (1) It does not consider the shape difference among hemodynamic response functions (HRF). It is possible to see no difference with AUC when the HRFs have different shapes. (2) The AUC approach is especially troublesome if the coefficients have different signs (some are positive while others negative). If you add up all the coefficients, they may cancel each other. If you simply exclude some black-sheep (e.g., neative) coefficients in your AUC approach, the analysis would be biased to say the least.

With 3dLME it is pretty easy to carry individual basis function coefficients or the re-assembled HRF's to group analysis without any summarizing, and handle any potential within-subject correlation across time points.

e. Model fine-tuning through various diagnostic methods (still under development)

Some model diagnostic tools will be added to improve a model or compare multiple models. Even if your design can be dealt with 3dANOVA2/3dANOVA3/GroupAna, 3dLME has the flexibility to model heteroscedasticity (different variances) across subject groups, or variance-covariance structures.

For example, different subject groups might have different within-group variability. Whether to model such a heterogeneity in residual variance structure could make a critical difference in terms of conclusions.

Bye, 3dRegAna! Well maybe not quite yet. If you can run a one- or two-sample or paired t test with 3dttest without the covariate(s), 3dRegAna is still a handy program for very simple cases by following some examples. However anything beyond that could get very tedious and unwieldy with 3dRegAna: If you ever tried to create a design matrix for 3dRegAna modeling random effects of subjects, you should know what I mean!

f. More flexibility in modeling residual correlation structure (still under development)

Sphericity (circularity) for a within-subject factor and homogeneity of variance for a between-subject factor are presumptions using traditional ANOVAs. Sometimes people correct F values (for main effects and interactions) if those presumptions are violated, but such a correction is just approximation. Even if your experiment design can be dealt with 3dANOVA2/3dANOVA3/GroupAna, 3dLME has the flexibility to model heteroscedasticity (different variances) across subject groups, or variance-covariance structures.

Downsides:

a. High computation cost

Simple designs should be done within an hour, but sophisticated ones may take a couple of days or even more.

b. Sometimes it's difficult to compare the results with the traditional ANOVA

You don't really need to grasp anything about R except for a few simple commands mentioned below unless you want to explore the R world further, but the following assumes that you've already installed R and a few packages.

## Running data analysis

(1) **This step is almost mandatory**. Otherwise it is going to fail on most machines due to memory limitations. Down-sample all input files close to the original EPI resolution using 3dresample. There is no point running group analysis on fine resolutions such as 1x1x1 mm^3. For example, if the original EPI resolution is 3.75 X 3.75 X 5 mm^3, resample to 3 X 3 X 3 mm^3

(2) Create a text file **model.txt** in the following format which stores all the information about your model and input files (3 fixed factors plus a covariate, age, in this example):

Data:Volume

Output:OutputFileName

MASK:../data/Mask+tlrc.BRIK

Model:Gender*Object*Modality+Age

COV:Age

RanEff:1 <-- a random intercept

VarStr:0 <-- replace "0" with "1~1|Gender" if you suspect difference variances across groups (genders)

CorStr:0

SS:sequential

Clusters:4 <-- number of parallel jobs

MFace-FFace

Male*Face*0*0-Female*Face*0*0

MVisual-MAudial

Male*0*Visual*0-Male*0*Audial*0

...... <--don'tput these dots in the file since this is meant to show incompleteness

Subj Gender Object Modality Age InputFile

Jim Male Face Visual 25 file1+tlrc

Carol Female House Audial 23 file2+tlrc

Karl Male House Visual 26 file3+tlrc

Casey Female Face Audial 24 file4+tlrc

...... <--don'tput these dots in the file since this is meant to show incompleteness

Line 1:** Data format** can be either "Volume" (input files with one sub-brick) or "Surface" (surface data in text format with one column in each input file).

Line 2: **Output File Name** can be any string. No suffix such as tlrc is needed.

Line 3: a group **mask** file can speed up the computation, but this is optional. If no mask is provided, leave the line simply as

MASK:

Line 4: **Model formula** can be combination of your fixed factors and covariates. For example a fully "loaded" model with 3 categorical factors and 2 covariates (**continuous** variables) can have a format of **Covariate1*****Covariate2*****FactorA*FactorB*FactorC**. I will discuss more options in the future. Don't worry about nesting if you have any (e.g. subjects nested within group factor FixedFactorA in the above example) since the program handles this internally. Don't worry about random effects either for now. The default mode includes a random intercept in the linear mixed-effects model, but I may expand this aspect in the future.

**Note**: a covariate is defined as a continuous variable here, NOT a variable of no interest.

One thing important about the model formula is that you have to be a little careful about the order of the factors appearing in the formula if the design is unbalanced. Due to the fact that the default type of sums of squares is set as SEQUENTIAL (type I) instead of others such as marginal (type III), the order has some effect on those F (not t) tests for main effects and interactions. Because of this sensitive (and controversial) issue, in the formula you may want to arrange the factors in an order of decreasing importance. As the F statistics for all main effects and interactions provided by 3dLME are sequential (type I in SAS terminology), *i.e*., the F for a fixed effect (*e.g*., Modality) is conditional on the preceding terms (*e.g*., Gender and Object) in the model formula. This means different order of the terms presented in the formula may lead to slightly different F values if the design is unbalanced. See more discussion on different types of F here. Because of this nature for sequential sums of squares, it is recommended to put the covariates first in the model formula so that the covariate(s) are properly controlled. Also the t tests for contrasts are also sequential.

If some fixed effects and/or interactions turn out to be insignificant in the fully "loaded" model, you can remove those insignificant terms and reanalyze the data with a more parsimonious model.

Line5: Covariates are listed here. If no covariate exists, leave the line simply as

COV:

Do this for multiple covariates

COV:covariate1*covarite2

The covariates do not have to be de-mean-ed in model.txt. However you have to be careful when interpreting the results (main effects, interactions, and contrasts). For example, if IQ is a covariate in the analysis, any conclusion by default should be interpreted as a person with THE mean IQ (e.g., 95) of this group of subjects, which may or may not coincide with the average IQ (e.g., 110) of some general population. Another example: Suppose we have 2 groups (old and young) with age as a covariate because we want to regress out the within-group variation of age in the analysis. The group difference should be interpreted as the difference between one group at the mean age of this group and the other group at its own mean age. This is the default mode in 3dLME, so if you want an effect not at the mean value of a covariate, modification and adjustment have to be done in 3dLME for such a special case.

Line 6: This line (TRUE or FALSE) tells 3dLME.R whether a random intercept is incorporated into the model. Typically you want to set it as TRUE for group analysis unless (1) each subject has only one input file; or (2) the random effect is statistically insignificant (usually tested through intra-class correlation coefficients (ICCs)).

Line 7: Heteroscedasticity across groups can be modeled through this line. Options are: 0 - nothing; 1 - different variances across groups.

Line 8: Temporal correlation can be handled through the correlation structure of the residuals. Options are: 0 - nothing; 1 - corAR1; 2 - corARMA(2,0); 3 - corARMA(1,1)

Line 9: Two choices for the type of sums of squares in the F tests of main effects and interactions: sequential (type I) and marginal (type III). Choose seqential unless you have strong reason to use marginal.

Line 10: this option allows for parallel computing. For example on a Mac OS X with 2 processors of Dual Core, you can run 4 parallel jobs.

Line 11 and below: Contrast specifications. Only pair-wise comparisons are currently allowed, but this may change in the future. The examples above are hopefully self-evident. Each specification contains a line for label and another for details.

Below the contrast section is the header for data specification: The 1st and last columns (and their column names) have to be **Subj** and **InputFile**, . The order of those fixed effects (factors and covariates) have to match up with the order in the model formula in line 4, but the factor and covariate names can be any strings. Also **InputFile** is NOT allowed to occur anywhere above this line, but OK down below. Those input files under column **Input****File** don't have to be in the current working directory as model.txt as long as their path is explicitly specified as part of the filenames.

Design matrix information and input files are listed below the header line in the *long* format with one beta per line. The level names under **Subj** and those **categorical factors** (Gender, Object and Modality above) can be any mixture of characters and numbers, but not pure numbers. The bigger (more columns/rows) the matrix, the longer the runtime. All the input files are supposed to contain *single* sub-brick.

(3) Execute the following command at the prompt in the directory where file **model.txt** exists:

3dLME.R MyOutput

You may want to put an "&" at the end of the above command line so that the execution runs in the background. Check the progress with the timer in file MyOut which stores stdout and stderr, and smile if you see "**Congratulations! You've got the result ...**" at the end of the file. Well, try to maintain your composure even if something like "Error in ... line 22 ... Execution halted" sticks on the screen.

If you run the analysis on a remote host computer, you can use the following command line so that you can safely log out at the local machine without terminating the running job:

nohup 3dLME.R MyOutput &

The file you are interested, OutputFileName+tlrc.*, is the output file that contains all the voxel-wise statistical (F only at this point) values in the conventional AFNI data format of multiple sub-bricks. Enjoy the thresholding...

## Dealing with multiple basis functions

Suppose we modeled the hemodynamic response (HDR) functions with 6 basis functions of tent in 3dDeconvolve and we want to compare whether two conditions have the same HDR. The null hypothesis in this case is that for each basis function the beta is the same across the two conditions, and we would have an F value (only one sub-brick) in the output. First we need to pre-process the data a little by contrasting the betas for each basis function. In other words, we use 3dcalc to obtain the difference between the two betas for each basis function, and then create model.txt as below. Notice (1) We specify a model of **Time-1** because we don't want to include an intercept in the model for this special null hypothesis (-1 in the model means no intercept). (2) The number 1 in CorStr indicates that we are going to model the residuals with an auto-regressive time series of order 1; Options for the correlation structure are: 0 - nothing; 1 - corAR1; 2 - corARMA(2,0); 3 - corARMA(1,1). (3) Formula "~order|Subj" shows the structure of the correlation model in which the primary covariate (order) is relative to the grouping factor (Subj).

If other types (e.g., SPMG2, SPMG3, sine, csplin, poly, etc.) of basis functions are adopted in 3dDeconvolve, use -iresp to get the HDR functions, and feed in the value at each time point as input for 3dLME.

Data:Volume

Output:OutputFileName

MASK:../data/Mask+tlrc.BRIK

Model:Time-1

COV:

RanEff:1

VarStr:0

CorStr:1~order|Subj

SS:sequential

Clusters:4

Subj Time order InputFile

S1 Diff0 0 S1Diff0+tlrc

S1 Diff1 1 S1Diff1+tlrc

S1 Diff2 2 S1Diff2+tlrc

S1 Diff3 3 S1Diff3+tlrc

S1 Diff4 4 S1Diff4+tlrc

S1 Diff5 5 S1Diff5+tlrc

S2 Diff0 0 S2Diff0+tlrc

S2 Diff1 1 S2Diff1+tlrc

S2 Diff2 2 S2Diff2+tlrc

...... <--don'tput these dots in the file since this is meant to show imcompletion

The output file contains: (1) an overall F value testing whether at least one effect from the basis functions is significant; (2) the effect of each basis function and the associated t-statistic. Contrasts among the effects of the basis functions can also be specified through model.txt.

## Theoretical background

A linear mixed-effects model (LME) is also called in literature hierarchical linear model or multi-level model in which a continuous dependent variable is modeled as the combination of fixed and random effects. The repeated-measures design structure we usually encounter in FMRI can be envisioned as a two-level model in the LME language:

Level of Data Variables in the LME model Examples

=================================================================================================================

Level 1 within-subject factors (fixed effects) task/stimulus categories

repeated measures dependent variable beta (regressionn coefficient)

------------------------------------------------------------------------------------------------------------

Level 2 subject variable (random factor) subject

unit of analysis subject-level covariates (fixed effects) age, IQ

between-subject factors (fixed effects) subject groups (sex, genotypes, ...)

## Related topics

2. Independent component analysis: 3dICA

## References

1. Introduction to linear mixed-effects modeling

2. R Project

## Acknowledgements

Thank Tom Ross, Daniel Glen, Rick Reynolds, Karsten Tabelow, and Max Kuhn for assistance during the development of the package.

Last modified 2013-03-15 15:10