# ANCOVA

**Note (August 19, 2010): although the approaches described here are still as valid as their assumptions, a better approach is to use either 3dtttest++ or mixed-effect meta-analysis with 3dMEMA in terms of modeling capability/flexibility and result presentation.**

FMRI group analysis with ANOVA assumes that all subjects are drawn from completely randomized design. However such a randomness is not always met. In fact in many situations the investigator does not have a set of homogeneous subjects, and the potential differences could distort the analysis. For example, in FMRI studies direct control of variability due to subject performance (e.g., age effect of the subjects) is most likely unrealistic: the investigator doesn't at will select individual subjects to an experiment. Instead indirect (statistical) control is available through analysis of covariance (ANCOVA), achieved by measuring one or more concomitant variates in addition to the variates (factors) of primary interest. Such concomitant variates (nuisance, extraneous or ancillary variables) are also called covariates whose measurements are made for the purpose of adjusting the measurements on the variates (factors). A covariate is supposed to have some cause-effect relation with your dependent variable (percent signal change). Potential covariates include age, cortex thickness, behavioral data such as response time. Although mostly continuous variables, discrete variables are sometimes treated as covariates (e.g., sex) if the user is not specifically interested in such a variable except "regressing" them out in the analysis.

By accounting for the correlation between the covariate and the dependent variable, potential sources of bias (variability of the dependent variable which is attributable to the covariates) in the experiment can be removed, and such a group analysis would lead to having more statistical power by decreasing the size of the mean square against which the effects are tested. Especially when the regression of the covariates is linear, covariance adjustment is very effective by making the groups are more comparable.

There are basically **three assumptions** in ANCOVA: (1)
linearity of regression- linear relation between percent signal change
and the covariate; (2) exact measurement of the covariate; (3) the
covariate is independent of all factors (independent variables), and
does not correlate highly with other covariates. Nonlinear
relationships can compromise the analysis while inaccuracy on
measurement of the covariate contributes additional variability to the
data that is not taken into account by ANCOVA.

Before a group analysis is implemented, the user should plan the
analysis more discreetly before running ANCOVA on what contrasts and
simple effects are desirable. To account for subject variability in
group analysis, it is recommended the user run a one-way ANCOVA for
each contrast or simple effect separately, similar to its counterpart
of two-sample *t* test or one-sample *t* test with *3dttest*. The following examples examplify how to implement such commonly encountered covariance analyses with *3dRegAna*. ANCOVA with more than one covariate can be easily worked out with *3dRegAna*
if the user understands the principle underlying ANCOVA. In fact with
its flexibility on the number of predictor variables and design
balance, 3dRegAna is a versatile program that can handle most group
analyses if the user knows how to implement them.

(1) Contrast with covariate effect removed

Suppose we want to test a contrast at group level with age effect removed from the analysis.
This is the counterpart of a one-sample *t* test.

The corresponding model for this analysis is*Y _{i} = β_{o} + β_{1}X_{i} + €_{i}*

*, i*= 1, 2, ...,

*n*,

where *Y** _{i}* is the contrast from subject

*i*while

*X*is the corresponding covariate. Parameter

_{i}*β*reflects the effect of the covariate on the dependent variable

_{1}*Y*(percent signal change) we would like to remove from

_{i}*Y*

*, and*

_{i}*β*is the adjusted contrast after the covariate effect is removed. The sign of

_{o}*β*indicates the effect of the covariate on percent signal change: Positive

_{1 }*β*means that bigger

_{1 }*X*leads to higher percent signal change, and negative

_{i}*β*means that bigger

_{1}*X*decreases percent signal change.

_{i}First remove the mean from the covariate (ages of the subjects in this example). Centralizing
the covariate is **mandatory** for this type of analysis, otherwise the mean age would interfere with the contrast (*β _{o}* in the above model) . The
decentralized age is listed as the only column in the following

*3dRegAna*script. To verify the demeaning effect, this column should add up to 0.

*
3dRegAna \
-rows 15 \
-cols 1 \
-workmem 1000 \
-xydata 0.1 *

*Contrast*

*1+tlrc.BRIK \*

-xydata 7.1

-xydata 7.1

*Contrast*

*2+tlrc.BRIK \*

...

...

-xydata -0.9

-xydata -0.9

*Contrast*

*8+tlrc.BRIK \*

...

...

-xydata -5.9

-xydata -5.9

*Contrast*

*11+tlrc.BRIK \*

-xydata 4.1

-xydata 4.1

*Contrast*

*12+tlrc.BRIK \*

...

...

-xydata -3.9

-xydata -3.9

*Contrast*

*15+tlrc.BRIK \*

-model 1 : 0 \

-bucket 0 GroupContr \

-brick 0 coef 0

-model 1 : 0 \

-bucket 0 GroupContr \

-brick 0 coef 0

*"*

*GroupContr*

*" \*

-brick 1 tstat 0

-brick 1 tstat 0

*"*

*GroupContr*

*t*

*" \*

-brick 2 coef 1

-brick 2 coef 1

*"Age Effect" \*

-brick 3 tstat 1

-brick 3 tstat 1

*"Age Effect t"*

Please note:

(a) The -model option in 3dRegAna is for specifying a reduced model, and **only**
effective for the following output sub-bricks: *F*-statistic and R^{2} for the
specified model, and *F*-statistic for each regression coefficient if
option -fcoef is used. The regression coefficients and their *t*
statistics are calculated based on the **full** model using least squares principle, regardless of the specified model with the -model option.

(b) As the program was developed in the old days when computer memory was a big deal, option -**workmem**
(unit: MB) was provided to adopt to the user's computing environment.
With a miserable default value of 12 MB, most likely you'd like to
juice it up to something like 1000 MB (= 1 GB), otherwise your patience
will be significantly challenged.

(c) If the last few lines with option -brick are absent, the sub-brick names in the output file Pat_vs_Norm+tlrc would be labeled
as "Coef #0", "Coef #0 t", etc.. The following script can be used to make
them more self-revealing:

*
3drefit \-sublabel 0 "*

*GroupContr*

*" \*

-sublabel 1 "

-sublabel 1 "

*GroupContr*

*t*

*" \*

-sublabel 2 "Age Effect" \

-sublabel 3 "Age Effect t" \

-sublabel 2 "Age Effect" \

-sublabel 3 "Age Effect t" \

*GroupContr*

*+tlrc*

(2) Comparing two groups with covariate effect removed

Suppose we would like to compare two groups (patient and normal) on
a condition or contrast with age effect removed from the analysis. This
is similar to a two-sample *t* test on the two groups. The model for this case is

Y_{i} = β_{o} + β_{1}X_{1i}
+ β_{2}X_{2i} + β_{3}X_{3i} + €_{i}*, i* = 1, 2, ..., *n*,

where *β _{o }*is the intercept of straight line fitting in the model.

Centralizing the covariate, variable *X _{1i }*in the above model, is not really necessary for this type of analysis, but if done, make sure the

**whole**covariate column in the 3dRegAna script add up to 0. And demeaning of

*X*makes the interpretation of β

_{1i}_{o}very revealing because it is the very effect of patient group if

*X*is defined as below and if covariate mean is removed (flip 0's and 1's in the 2nd column if normal group effect is desirable). The coefficient

_{2i }*β*is the slope of the fitting lines, representing the influence of the covariate on the dependent variable

_{1 }*Y*(percent signal change).

_{i}The
0's and 1's in the second column differentiate the two groups with 0
coding for patient and 1 for normal. Basically this defines a dummy
variable in the above model:

0, when the subject is a patient; *X _{2i}* = {

1, when the subject is normal.

Coefficient *β _{2 }*is thus the effect of normal group
relative to patient group, e.g., the contrast between the two groups:
the magnitude of normal group more active than patients if positve, or
less if negative.

The variable *X _{3i }*in the above model is defined as the product of

*X*and

_{1i }*X*, and thus the third column in the script below models the interaction between age effect and group effect, which is meant to find out whether the effect of one variable depends on the specific value of the other.

_{2i}In the *3dRegAna* script down below the 3rd column is simply the product between the first and second columns, and the coefficient *β _{3 }*reveals the interaction effect between age and group. When

*β*is positive, age effect augments the difference between the two group; when negative, age effect decreases the group difference. Without this variable added in the model ANCOVA would bear an assumption of homogeneity of regression in which two parallel lines are separated vertically by the main effect of each group. Parallelism - equal slope of regression lines - is equivalent to having no interaction between the covariate and factors. However, it would be more appropriate to consider any potential interaction with which the correlation between age effect and hemodynamic response is assumed to be different for each group. In other words, each group has its separate slope of the linear regression instead of parallel fitting, thus the interaction between the covariate and percent signal change is automatically considered.

_{3 }The input files are the regression coefficients or contrast intensities (not statistics) from individual subject analysis. Like *3dttest*, unequal sample size for the two groups is not a problem with *3dRegAna*.

*
3dRegAna \
-rows 30 \
-cols 3 \
-workmem 1000 \
-xydata 0.1 0 0 patient/Pat1+tlrc.BRIK \-xydata 7.1 0 0 patient/Pat2+tlrc.BRIK \...-xydata -0.9 0 0 patient/Pat8+tlrc.BRIK \...-xydata -5.9 0 0 patient/Pat11+tlrc.BRIK \-xydata 4.1 0 0 patient/Pat12+tlrc.BRIK \...-xydata -3.9 0 0 patient/Pat15+tlrc.BRIK \-xydata 2.1 1 2.1 normal/Norm1+tlrc.BRIK \...-xydata -0.9 1 -0.9 normal/Norm3+tlrc.BRIK \-xydata 0.1 1 0.1 normal/Norm4+tlrc.BRIK \...-xydata -3.9 1 -3.9 normal/Norm9+tlrc.BRIK \...-xydata -8.9 1 -8.9 normal/Norm14+tlrc.BRIK \-xydata 0.1 1 0.1 normal/Norm15+tlrc.BRIK \-model 1 2 3 : 0 \
-bucket 0 Pat_vs_Norm \-brick 0 coef 0 'PatEff' \-brick 1 tstat 0 '*

*PatEff*

*t' \*

-brick 2 coef 1 'Age Effect of Pat Group' \

-brick 3 tstat 1 'Age Effect of Pat Group t-stat' \

-brick 4 coef 2 'Norm-Pat' \

-brick 5 tstat 2 'Norm-Pat t' \

-brick 6 coef 3 'Interaction' \ # the difference of age effect between Normal and Patient groups

-brick 7 tstat 3 'Interaction t'

-brick 2 coef 1 'Age Effect of Pat Group' \

-brick 3 tstat 1 'Age Effect of Pat Group t-stat' \

-brick 4 coef 2 'Norm-Pat' \

-brick 5 tstat 2 'Norm-Pat t' \

-brick 6 coef 3 'Interaction' \ # the difference of age effect between Normal and Patient groups

-brick 7 tstat 3 'Interaction t'

If the last few lines with option -brick are absent, the sub-brick names in the output file Pat_vs_Norm+tlrc would be labeled as "Coef #0", "Coef #1", etc.. The following script can be alternatively used to make them more self-revealing:

*
3drefit \-sublabel 0 "PatEff" \-sublabel 1 "PatEff t" \-sublabel 2 "Age Effect" \-sublabel 3 "Age Effect t" \-sublabel 4 "Norm-Pat" \-sublabel 5 "Norm-Pat t" \-sublabel 6 "Interaction" \-sublabel 7 "Interaction t" \Pat_vs_Norm+tlrc
*

(3) More than one covariate

With one more covariate, it would be, just like the first covariate, adding one more column specifically devoting for that covariate. In addition if you want to consider any potential interactions with the group and/or the first covariate, insert appropriate columns as well. Anything else should be self-evident.

`Related links`

* Back to Gang Chen's home page

Last modified: July 19, 2005

Last modified 2011-03-02 10:23