================== Welcome to 3dMSS ==================
Program for Voxelwise Multilevel Smoothing Spline (MSS) Analysis
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Version 1.0.6, Sept 24, 2023
Author: Gang Chen (gangchen@mail.nih.gov)
Website - https://afni.nimh.nih.gov/gangchen_homepage
SSCC/NIMH, National Institutes of Health, Bethesda MD 20892, USA
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Introduction
------
Multilevel Smoothing-Spline (MSS) Modeling
The linearity assumption surrounding a quantitative variable in common
practice may be a reasonable approximation especially when the variable
is confined within a narrow range, but can be inappropriate under some
circumstances when the variable's effect is non-monotonic or tortuous.
As a more flexible and adaptive approach, multilevel smoothing splines
(MSS) offers a more powerful analytical tool for population-level
neuroimaging data analysis that involves one or more quantitative
predictors. More theoretical discussion can be found in
Chen, G., Nash, T.A., Cole, K.M., Kohn, P.D., Wei, S.-M., Gregory, M.D.,
Eisenberg, D.P., Cox, R.W., Berman, K.F., Shane Kippenhan, J., 2021.
Beyond linearity in neuroimaging: Capturing nonlinear relationships with
application to longitudinal studies. NeuroImage 233, 117891.
https://doi.org/10.1016/j.neuroimage.2021.117891
Chen, G., Taylor, P.A., Reynolds, R.C., Leibenluft, E., Pine, D.S.,
Brotman, M.A., Pagliaccio, D., Haller, S.P., 2023. BOLD Response is more
than just magnitude: Improving detection sensitivity through capturing
hemodynamic profiles. NeuroImage 277, 120224.
https://doi.org/10.1016/j.neuroimage.2023.120224
To be able to run 3dMSS, one needs to have the following R packages
installed: "gamm4" and "snow". To install these R packages, run the
following command at the terminal:
rPkgsInstall -pkgs "gamm4,snow"
Alternatively you may install them in R:
install.packages("gamm4")
install.packages("snow")
It is best to go through all the examples below to get hang of the MSS
scripting interface. Once the 3dMSS script is constructed, it can be run
by copying and pasting to the terminal. Alternatively (and probably better)
you save the script as a text file, for example, called MSS.txt, and execute
it with the following (assuming on tc shell),
nohup tcsh -x MSS.txt &
or,
nohup tcsh -x MSS.txt > diary.txt &
or,
nohup tcsh -x MSS.txt |& tee diary.txt &
The advantage of the latter commands is that the progression is saved into
the text file diary.txt and, if anything goes awry, can be examined later.
Example 1 --- simplest case: one group of subjects with a between-subject
quantitative variable that does not vary within subject. MSS analysis is
set up to model the trajectory or trend along age, and can be specified
through the option -mrr, which is solved via a model formuation of ridge
regression. Again, the following exemplary script assumes that 'age' is
a between-subjects variable (not varying within subject):
3dMSS -prefix MSS -jobs 16 \
-mrr 's(age,k=10)' \
-qVars 'age' \
-mask myMask.nii \
-bounds -2 2 \
-prediction @pred.txt \
-dataTable @data.txt
The part 's(age,k=10)' indicates that 'age' is modeled via a smooth curve.
The minimum number of samples should be 6 or more. 'k=10' inside the model
specification s() sets the number of knots. If the number of data samples (e.g.,
age) is less than 10, set k to the number of available samples (e.g., 8).
No empty space is allowed in the model formulation. With the option
-bounds, values beyond [-2, 2] will be treated as outliers and considered
as missing. If you want to set a range, choose one that make sense with
your specific input data.
The file pred.txt lists all the expl1anatory variables (excluding lower-level variables
such as subject) for prediction. The file should be in a data.frame format as below:
label age
time1 1
time2 2
time3 3
...
time8 8
time9 9
time10 10
...
The file data.txt stores the information for all the variables and input data in a
data.frame format. For example:
Subj age InputFile
S1 1 ~/alex/MSS/S1.nii
S2 2 ~/alex/MSS/S2.nii
...
In the output the first sub-brick shows the statistical evidence in the
form of chi-square distribution with 2 degrees of freedom (2 DFs do not mean
anything, just for the convenience of information coding). This sub-brick is
the statistical evidence for the trejectory of the group. If you want to
estimate the trend at the population level, use the option -prediction with a
table that codes the ages you would like to track the trend. In the output
there is one predicted value for each age plus the associated uncertainty
(standard error). For example, with 10 age values, there will be 10 predicted
values plus 10 standard errors. The sub-bricks for prediction and standard
errors are interleaved.
Example 2 --- Largely same as Example 1, but with 'age' as a within-subject
quantitative variable (varying within each subject). The model is better
specified by replacing the line of -mrr in Example 1 with the following
two lines:
-mrr 's(age,k=10)+s(Subj,bs="re")' \
-vt Subj 's(Subj)' \
The part 's(age,k=10)' indicates that 'age' is modeled via a smooth curve.
The minimum number of samples should be 6 or more. 'k=10' inside the model
specification s() sets the number of knots. If the number of data samples (e.g.,
age) is less than 10, set k to the number of available samples (e.g., 8).
The second term 's(Subj,bs="re")' in the model specification means that
each subject is allowed to have a varying intercept or random effect ('re').
To estimate the smooth trajectory through the option -prediction, the option
-vt has to be included in this case to indicate the varying term (usually
subjects). That is, if prediction is desirable, one has to explicitly
declare the variable (e.g., Subj) that is associated with the varying term
(e.g., s(Subj)). No empty space is allowed in the model formulation and the
the varying term.
The full script version is
3dMSS -prefix MSS -jobs 16 \
-mrr 's(age,k=10)+s(Subj,bs="re")' \
-vt Subj 's(Subj)' \
-qVars 'age' \
-mask myMask.nii \
-bounds -2 2 \
-prediction @pred.txt \
-dataTable @data.txt
All the rest remains the same as Example 1.
Alternatively, this model with varying subject-level intercept can be
specified with
-lme 's(age,k=10)' \
-ranEff 'list(Subj=~1)' \
which is solved through the linear mixed-effect (lme) platform. The -vt is
not needed when making prediction through the option -prediction. The two
specifications, -mrr and -lme, would render similar results, but the
runtime may differ depending on the amount of data and model complexity.
Example 3 --- two groups and one quantitative variable (age). MSS analysis is
set up to compare the trajectory or trend along age between the two groups,
which are quantitatively coded as -1 and 1. For example, if the two groups
are females and males, you can code females as -1 and males as 1. The following
script applies to the situation when the quantitative variable does not vary
within subject,
3dMSS -prefix MSS -jobs 16 \
-mrr 's(age,k=10)+s(age,k=10,by=grp)' \
-qVars 'age' \
-mask myMask.nii \
-bounds -2 2 \
-prediction @pred.txt \
-dataTable @data.txt
The part 's(age,k=10)' indicates that 'age' is modeled via a smooth curve.
The minimum number of samples should be 6 or more. 'k=10' inside the model
specification s() sets the number of knots. If the number of data samples (e.g.,
age) is less than 10, set k to the number of available samples (e.g., 8).
Use the script below when the quantitative variable varies within subject,
3dMSS -prefix MSS -jobs 16 \
-mrr 's(age,k=10)+s(age,k=10,by=grp)+s(Subj,bs="re")' \
-vt Subj 's(Subj)' \
-qVars 'age' \
-mask myMask.nii \
-bounds -2 2 \
-prediction @pred.txt \
-dataTable @data.txt
or an LME version:
3dMSS -prefix MSS -jobs 16 \
-lme 's(age,k=10)+s(age,k=10,by=grp)' \
-ranEff 'list(Subj=~1)' \
-qVars 'age' \
-mask myMask.nii \
-bounds -2 2 \
-prediction @pred.txt \
-dataTable @data.txt
Example 4 --- modeling hemodynamic response: this 3dMSS script is
intended to (1) assess the presence of HRF for one group or (2) compare
HRFs between two conditions for one group. For first case, each HRF at
the indiividual level is characterized at 14 time points with a time
resolution TR = 1.25s. In the second case, obtain the HRF contrast
between the two conditions. For either case, each individual should have
14 input files. Two covariates are considered: sex and age.
3dMSS -prefix output -jobs 16 \
-lme 'sex+age+s(TR,k=10)' \
-ranEff 'list(subject=~1)' \
-qVars 'sex,age,TR' \
-prediction @HRF.table \
-dataTable @smooth-HRF.table
The part 's(TR,k=10)' indicates that 'TR' is modeled via a smooth curve.
The minimum number of samples should be 6 or more. 'k=10' inside the model
specification s() sets the number of knots. If the number of data samples (e.g.,
TR) is less than 10, set k to the number of available samples (e.g., 8).
The output filename and number of CPUs for parallelization are
specified through -prefix and -jobs, respectively. The expression
s() in the model specification indicator '-lme' represents the
smooth function, and the term 's(TR)' codes the overall HRF profile groups.
The term 'list(subject=~1)' under the option '-ranEff'
indicates the random effects for the cross-individual variability in
intercept. The number of thin plate spline bases was set to the
default K = 10. The option '-qVars' identifies quantitative
variables (TR and age in this case plus dummy-coded sex and
group). The last two specifiers -prediction and -dataTable list one
table for HRF prediction and another for input data information,
respectively. The input file 'smooth-HRF.table' is structured in a
long data frame format:
subject age sex TR InputFile
s1 29 1 0 s1.Inc.b0.nii
s1 29 1 1 s1.Inc.b1.nii
s1 29 1 2 s1.Inc.b2.nii
s1 29 1 3 s1.Inc.b3.nii
s1 29 1 4 s1.Inc.b4.nii
...
The factor 'sex' is dummy-coded with 1s and -1s. The following
table as the input file 'HRF.table' provides the specifications for
predicted HRFs:
label age sex TR
time1 6.2 1 0.00
time2 6.2 1 0.25
time3 6.2 1 0.50
...
Example 5 --- modeling hemodynamic response: this 3dMSS script is
intended to (1) compares HRFs under one task condition between the
two groups of patients (PT) and healthy volunteer (HV) at the
population level, or (2) assess the interaction between group and
task condition (2 levels). For the second case, obtain the HRF
contrast at each time point. In either case, if the HRF is represented
with 14 time points with a time resolution TR = 1.25s, each individual
should have 14 input files. Two covariates are considered: sex and age.
3dMSS -prefix output -jobs 16 \
-lme 'sex+age+s(TR,k=10)+s(TR,k=10,by=group)' \
-ranEff 'list(subject=~1)' \
-qVars 'sex,age,TR,group' \
-prediction @HRF.table \
-dataTable @smooth-HRF.table
The part 's(age,k=10)' indicates that 'TR' is modeled via a smooth curve.
The minimum number of samples should be 6 or more. 'k=10' inside the model
specification s() sets the number of knots. If the number of data samples (e.g.,
TR) is less than 10, set k to the number of available samples (e.g., 8).
The output filename and number of CPUs for parallelization are
specified through -prefix and -jobs, respectively. The expression
s() in the model specification indicator '-lme' represents the
smooth function, and the two terms 's(TR)' and 's(TR,by=group)' code
the overall HRF profile and the HRF difference between the two
groups. The term 'list(subject=~1)' under the option '-ranEff'
indicates the random effects for the cross-individual variability in
intercept. The number of thin plate spline bases was set to the
default K = 10. The option '-qVars' identifies quantitative
variables (TR and age in this case plus dummy-coded sex and
group). The last two specifiers -prediction and -dataTable list one
table for HRF prediction and another for input data information,
respectively. The input file 'smooth-HRF.table' is structured in a
long data frame format:
subject age sex group TR InputFile
s1 29 1 1 0 s1.Inc.b0.nii
s1 29 1 1 1 s1.Inc.b1.nii
s1 29 1 1 2 s1.Inc.b2.nii
s1 29 1 1 3 s1.Inc.b3.nii
s1 29 1 1 4 s1.Inc.b4.nii
...
Both 'group' and 'sex' are dummy-coded with 1s and -1s. The following
table as the input file 'HRF.table' provides the specifications for
predicted HRFs:
label age sex group TR
g1.t1 6.2 1 1 0.00
g1.t2 6.2 1 1 0.25
g1.t3 6.2 1 1 0.50
...
g2.t1 3.5 -1 -1 0.00
g2.t2 3.5 -1 -1 0.25
g2.t3 3.5 -1 -1 0.50
...
Options in alphabetical order:
------------------------------
-bounds lb ub: This option is for outlier removal. Two numbers are expected from
the user: the lower bound (lb) and the upper bound (ub). The input data will
be confined within [lb, ub]: any values in the input data that are beyond
the bounds will be removed and treated as missing. Make sure the first number
less than the second. You do not have to use this option to censor your data!
-cio: Use AFNI's C io functions, which is default. Alternatively -Rio
can be used.
-dataTable TABLE: List the data structure with a header as the first line.
NOTE:
1) This option has to occur last in the script; that is, no other
options are allowed thereafter. Each line should end with a backslash
except for the last line.
2) The order of the columns should not matter except that the last
column has to be the one for input files, 'InputFile'. Each row should
contain only one input file in the table of long format (cf. wide format)
as defined in R. Input files can be in AFNI, NIfTI or surface format.
AFNI files can be specified with sub-brick selector (square brackets
[] within quotes) specified with a number or label.
3) It is fine to have variables (or columns) in the table that are
not modeled in the analysis.
4) When the table is part of the script, a backslash is needed at the end
of each line to indicate the continuation to the next line. Alternatively,
one can save the context of the table as a separate file, e.g.,
calling it table.txt, and then in the script specify the data with
'-dataTable @table.txt'. However, when the table is provided as a separate
file, do NOT put any quotes around the square brackets for each sub-brick,
otherwise the program would not properly read the files, unlike the
situation when quotes are required if the table is included as part of the
script. Backslash is also not needed at the end of each line, but it would
not cause any problem if present. This option of separating the table from
the script is useful: (a) when there are many input files so that
the program complains with an 'Arg list too long' error; (b) when
you want to try different models with the same dataset.
-dbgArgs: This option will enable R to save the parameters in a
file called .3dMSS.dbg.AFNI.args in the current directory
so that debugging can be performed.
-help: this help message
-IF var_name: var_name is used to specify the column name that is designated for
input files of effect estimate. The default (when this option is not invoked
is 'InputFile', in which case the column header has to be exactly as 'InputFile'
This input file for effect estimates has to be the last column.
-jobs NJOBS: On a multi-processor machine, parallel computing will speed
up the program significantly.
Choose 1 for a single-processor computer.
-lme FORMULA: Specify the fixed effect components of the model. The
expression FORMULA with more than one variable has to be surrounded
within (single or double) quotes. Variable names in the formula
should be consistent with the ones used in the header of -dataTable.
See examples in the help for details.
-mask MASK: Process voxels inside this mask only.
Default is no masking.
-mrr FORMULA: Specify the model formulation through multilevel smoothing splines.
Expression FORMULA with more than one variable has to be surrounded
within (single or double) quotes. Variable names in the formula
should be consistent with the ones used in the header of -dataTable.
The nonlinear trajectory is specified through the expression of s(x,k=?)
where s() indicates a smooth function, x is a quantitative variable with
which one would like to trace the trajectory and k is the number of smooth
splines (knots). The default (when k is missing) for k is 10, which is good
enough most of the time when there are more than 10 data points of x. When
there are less than 10 data points of x, choose a value of k slightly less
than the number of data points.
-prediction TABLE: Provide a data table so that predicted values could be generated for
graphical illustration. Usually the table should contain similar structure as the input
file except that 1) reserve the first column for effect labels which will be used for
sub-brick names in the output for those predicted values; 2) columns for those varying
smoothing terms (e.g., subject) and response variable (i.e., Y) should not be includes.
Try to specify equally-spaced values with a small for the quantitative variable of
modeled trajectory (e.g., age) so that smooth curves could be plotted after the
analysis. See Examples in the help for a couple of specific tables used for predictions.
-prefix PREFIX: Output file name. For AFNI format, provide prefix only,
with no view+suffix needed. Filename for NIfTI format should have
.nii attached (otherwise the output would be saved in AFNI format).
-qVars variable_list: Identify quantitative variables (or covariates) with
this option. The list with more than one variable has to be
separated with comma (,) without any other characters such as
spaces and should be surrounded within (single or double) quotes.
For example, -qVars "Age,IQ"
WARNINGS:
1) Centering a quantitative variable through -qVarsCenters is
very critical when other fixed effects are of interest.
2) Between-subjects covariates are generally acceptable.
However EXTREME caution should be taken when the groups
differ substantially in the average value of the covariate.
-ranEff FORMULA: Specify the random effect components of the model. The
expression FORMULA with more than one variable has to be surrounded
within (single or double) quotes. Variable names in the formula
should be consistent with the ones used in the header of -dataTable.
In the MSS context the simplest model is "list(Subj=~1)" in which the
varying or random effect from each subject is incorporated in the model.
Each random-effects factor is specified within parentheses per formula
convention in R.
-Rio: Use R's io functions. The alternative is -cio.
-sdiff variable_list: This option is used to specify a factor for group comparisons.
For example, if one wants to compare age trajectory between two groups through
"s(age,by=group)" in model specification, use "-sdiff 'group'" to generate
the predicted trajectory of group differences through the values provided in the
prediction table under the option -prediction. Currently it only allows for one group
comparison. Perform separate analyses if more than one group comparison is
desirable. " .
-show_allowed_options: list of allowed options
-vt var formulation: This option is for specifying varying smoothing terms. Two components
are required: the first one 'var' indicates the variable (e.g., subject) around
which the smoothing will vary while the second component specifies the smoothing
formulation (e.g., s(age,subject)). When there is no varying smoothing terms (e.g.,
no within-subject variables), do not use this option.