3dMSS


             ================== Welcome to 3dMSS ==================
       Program for Voxelwise Multilevel Smoothing Spline (MSS) Analysis
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Version 1.0.6, Sept 24, 2023
Author: Gang Chen (gangchen@mail.nih.gov)
Website - https://afni.nimh.nih.gov/gangchen_homepage
SSCC/NIMH, National Institutes of Health, Bethesda MD 20892, USA
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Introduction
------

 Multilevel Smoothing-Spline (MSS) Modeling

 The linearity assumption surrounding a quantitative variable in common
 practice may be a reasonable approximation especially when the variable
 is confined within a narrow range, but can be inappropriate under some
 circumstances when the variable's effect is non-monotonic or tortuous.
 As a more flexible and adaptive approach, multilevel smoothing splines
 (MSS) offers a more powerful analytical tool for population-level
 neuroimaging data analysis that involves one or more quantitative
 predictors. More theoretical discussion can be found in

 Chen, G., Nash, T.A., Cole, K.M., Kohn, P.D., Wei, S.-M., Gregory, M.D.,
 Eisenberg, D.P., Cox, R.W., Berman, K.F., Shane Kippenhan, J., 2021.
 Beyond linearity in neuroimaging: Capturing nonlinear relationships with
 application to longitudinal studies. NeuroImage 233, 117891.
 https://doi.org/10.1016/j.neuroimage.2021.117891

 Chen, G., Taylor, P.A., Reynolds, R.C., Leibenluft, E., Pine, D.S.,
 Brotman, M.A., Pagliaccio, D., Haller, S.P., 2023. BOLD Response is more
 than just magnitude: Improving detection sensitivity through capturing
 hemodynamic profiles. NeuroImage 277, 120224.
 https://doi.org/10.1016/j.neuroimage.2023.120224

 To be able to run 3dMSS, one needs to have the following R packages
 installed: "gamm4" and "snow". To install these R packages, run the
 following command at the terminal:

 rPkgsInstall -pkgs "gamm4,snow"

 Alternatively you may install them in R:

 install.packages("gamm4")
 install.packages("snow")

 It is best to go through all the examples below to get hang of the MSS
 scripting interface. Once the 3dMSS script is constructed, it can be run
 by copying and pasting to the terminal. Alternatively (and probably better)
 you save the script as a text file, for example, called MSS.txt, and execute
 it with the following (assuming on tc shell),

 nohup tcsh -x MSS.txt &

 or,

 nohup tcsh -x MSS.txt > diary.txt &

 or,

 nohup tcsh -x MSS.txt |& tee diary.txt &

 The advantage of the latter commands is that the progression is saved into
 the text file diary.txt and, if anything goes awry, can be examined later.

Example 1 --- simplest case: one group of subjects with a between-subject
  quantitative variable that does not vary within subject. MSS analysis is
  set up to model the trajectory or trend along age, and can be specified
  through the option -mrr, which is solved via a model formuation of ridge
  regression. Again, the following exemplary script assumes that 'age' is
  a between-subjects variable (not varying within subject):

   3dMSS -prefix MSS -jobs 16                     \
          -mrr 's(age,k=10)'                      \
          -qVars 'age'                            \
          -mask myMask.nii                        \
          -bounds  -2 2                           \
          -prediction @pred.txt                   \
          -dataTable  @data.txt

  The part 's(age,k=10)' indicates that 'age' is modeled via a smooth curve.
  The minimum number of samples should be 6 or more. 'k=10' inside the model
  specification s() sets the number of knots. If the number of data samples (e.g.,
  age) is less than 10, set k to the number of available samples (e.g., 8).

  No empty space is allowed in the model formulation. With the option
  -bounds, values beyond [-2, 2] will be treated as outliers and considered
  as missing. If you want to set a range, choose one that make sense with
  your specific input data.

   The file pred.txt lists all the expl1anatory variables (excluding lower-level variables
   such as subject) for prediction. The file should be in a data.frame format as below:

   label  age
   time1   1
   time2   2
   time3   3
    ...
   time8   8
   time9   9
   time10 10
    ...

   The file data.txt stores the information for all the variables and input data in a
   data.frame format. For example:

   Subj   age   InputFile
   S1      1   ~/alex/MSS/S1.nii
   S2      2   ~/alex/MSS/S2.nii
    ...

  In the output the first sub-brick shows the statistical evidence in the
  form of chi-square distribution with 2 degrees of freedom (2 DFs do not mean
  anything, just for the convenience of information coding). This sub-brick is
  the statistical evidence for the trejectory of the group. If you want to
  estimate the trend at the population level, use the option -prediction with a
  table that codes the ages you would like to track the trend. In the output
  there is one predicted value for each age plus the associated uncertainty
  (standard error). For example, with 10 age values, there will be 10 predicted
  values plus 10 standard errors. The sub-bricks for prediction and standard
  errors are interleaved.


Example 2 --- Largely same as Example 1, but with 'age' as a within-subject
  quantitative variable (varying within each subject). The model is better
  specified by replacing the line of -mrr in Example 1 with the following
  two lines:

          -mrr 's(age,k=10)+s(Subj,bs="re")'    \
          -vt Subj 's(Subj)'                    \

  The part 's(age,k=10)' indicates that 'age' is modeled via a smooth curve.
  The minimum number of samples should be 6 or more. 'k=10' inside the model
  specification s() sets the number of knots. If the number of data samples (e.g.,
  age) is less than 10, set k to the number of available samples (e.g., 8).

  The second term 's(Subj,bs="re")' in the model specification means that
  each subject is allowed to have a varying intercept or random effect ('re').
  To estimate the smooth trajectory through the option -prediction, the option
  -vt has to be included in this case to indicate the varying term (usually
  subjects). That is, if prediction is desirable, one has to explicitly
  declare the variable (e.g., Subj) that is associated with the varying term
  (e.g., s(Subj)). No empty space is allowed in the model formulation and the
  the varying term.

  The full script version is

   3dMSS -prefix MSS -jobs 16                     \
          -mrr 's(age,k=10)+s(Subj,bs="re")'      \
          -vt Subj 's(Subj)'                      \
          -qVars 'age'                            \
          -mask myMask.nii                        \
          -bounds  -2 2                           \
          -prediction @pred.txt                   \
          -dataTable  @data.txt

  All the rest remains the same as Example 1.

  Alternatively, this model with varying subject-level intercept can be
  specified with

          -lme 's(age,k=10)'                      \
          -ranEff 'list(Subj=~1)'                 \

  which is solved through the linear mixed-effect (lme) platform. The -vt is
  not needed when making prediction through the option -prediction. The two
  specifications, -mrr and -lme, would render similar results, but the
  runtime may differ depending on the amount of data and model complexity.


Example 3 --- two groups and one quantitative variable (age). MSS analysis is
  set up to compare the trajectory or trend along age between the two groups,
  which are quantitatively coded as -1 and 1. For example, if the two groups
  are females and males, you can code females as -1 and males as 1. The following
  script applies to the situation when the quantitative variable does not vary
  within subject,

  3dMSS -prefix MSS -jobs 16                      \
          -mrr 's(age,k=10)+s(age,k=10,by=grp)'   \
          -qVars 'age'                            \
          -mask myMask.nii                        \
          -bounds  -2 2                           \
          -prediction @pred.txt                   \
          -dataTable  @data.txt

  The part 's(age,k=10)' indicates that 'age' is modeled via a smooth curve.
  The minimum number of samples should be 6 or more. 'k=10' inside the model
  specification s() sets the number of knots. If the number of data samples (e.g.,
  age) is less than 10, set k to the number of available samples (e.g., 8).

  Use the script below when the quantitative variable varies within subject,

  3dMSS -prefix MSS -jobs 16                      \
          -mrr 's(age,k=10)+s(age,k=10,by=grp)+s(Subj,bs="re")' \
          -vt  Subj 's(Subj)'                     \
          -qVars 'age'                            \
          -mask myMask.nii                        \
          -bounds  -2 2                           \
          -prediction @pred.txt                   \
          -dataTable  @data.txt

  or an LME version:

  3dMSS -prefix MSS -jobs 16                      \
          -lme 's(age,k=10)+s(age,k=10,by=grp)'   \
          -ranEff 'list(Subj=~1)'                 \
          -qVars 'age'                            \
          -mask myMask.nii                        \
          -bounds  -2 2                           \
          -prediction @pred.txt                   \
          -dataTable  @data.txt


Example 4 --- modeling hemodynamic response: this 3dMSS script is
  intended to (1) assess the presence of HRF for one group or (2) compare
  HRFs between two conditions for one group. For first case, each HRF at
  the indiividual level is characterized at 14 time points with a time
  resolution TR = 1.25s. In the second case, obtain the HRF contrast
  between the two conditions. For either case, each individual should have
  14 input files. Two covariates are considered: sex and age.

    3dMSS -prefix output -jobs 16           \
        -lme 'sex+age+s(TR,k=10)'           \
        -ranEff 'list(subject=~1)'          \
        -qVars 'sex,age,TR'                 \
        -prediction @HRF.table              \
        -dataTable  @smooth-HRF.table

  The part 's(TR,k=10)' indicates that 'TR' is modeled via a smooth curve.
  The minimum number of samples should be 6 or more. 'k=10' inside the model
  specification s() sets the number of knots. If the number of data samples (e.g.,
  TR) is less than 10, set k to the number of available samples (e.g., 8).

  The output filename and number of CPUs for parallelization are
  specified through -prefix and -jobs, respectively. The expression
  s() in the model specification indicator '-lme' represents the
  smooth function, and the term 's(TR)' codes the overall HRF profile groups.
  The term 'list(subject=~1)' under the option '-ranEff'
  indicates the random effects for the cross-individual variability in
  intercept. The number of thin plate spline bases was set to the
  default K = 10. The option '-qVars' identifies quantitative
  variables (TR and age in this case plus dummy-coded sex and
  group). The last two specifiers -prediction and -dataTable list one
  table for HRF prediction and another for input data information,
  respectively. The input file 'smooth-HRF.table' is structured in a
  long data frame format:

  subject age sex    TR  InputFile
  s1      29   1     0   s1.Inc.b0.nii
  s1      29   1     1   s1.Inc.b1.nii
  s1      29   1     2   s1.Inc.b2.nii
  s1      29   1     3   s1.Inc.b3.nii
  s1      29   1     4   s1.Inc.b4.nii
  ...

  The factor 'sex' is dummy-coded with 1s and -1s. The following
  table as the input file 'HRF.table' provides the specifications for
  predicted HRFs:

  label   age   sex      TR
  time1   6.2     1      0.00
  time2   6.2     1      0.25
  time3   6.2     1      0.50
  ...


Example 5 --- modeling hemodynamic response: this 3dMSS script is
  intended to (1) compares HRFs under one task condition between the
  two groups of patients (PT) and healthy volunteer (HV) at the
  population level, or (2) assess the interaction between group and
  task condition (2 levels). For the second case, obtain the HRF
  contrast at each time point. In either case, if the HRF is represented
  with 14 time points with a time resolution TR = 1.25s, each individual
  should have 14 input files. Two covariates are considered: sex and age.

  3dMSS -prefix output -jobs 16                       \
        -lme 'sex+age+s(TR,k=10)+s(TR,k=10,by=group)' \
        -ranEff 'list(subject=~1)'                    \
        -qVars 'sex,age,TR,group'                     \
        -prediction @HRF.table                        \
        -dataTable  @smooth-HRF.table

  The part 's(age,k=10)' indicates that 'TR' is modeled via a smooth curve.
  The minimum number of samples should be 6 or more. 'k=10' inside the model
  specification s() sets the number of knots. If the number of data samples (e.g.,
  TR) is less than 10, set k to the number of available samples (e.g., 8).

  The output filename and number of CPUs for parallelization are
  specified through -prefix and -jobs, respectively. The expression
  s() in the model specification indicator '-lme' represents the
  smooth function, and the two terms 's(TR)' and 's(TR,by=group)' code
  the overall HRF profile and the HRF difference between the two
  groups. The term 'list(subject=~1)' under the option '-ranEff'
  indicates the random effects for the cross-individual variability in
  intercept. The number of thin plate spline bases was set to the
  default K = 10. The option '-qVars' identifies quantitative
  variables (TR and age in this case plus dummy-coded sex and
  group). The last two specifiers -prediction and -dataTable list one
  table for HRF prediction and another for input data information,
  respectively. The input file 'smooth-HRF.table' is structured in a
  long data frame format:

  subject age sex group TR  InputFile
  s1      29   1    1   0   s1.Inc.b0.nii
  s1      29   1    1   1   s1.Inc.b1.nii
  s1      29   1    1   2   s1.Inc.b2.nii
  s1      29   1    1   3   s1.Inc.b3.nii
  s1      29   1    1   4   s1.Inc.b4.nii
  ...

  Both 'group' and 'sex' are dummy-coded with 1s and -1s. The following
  table as the input file 'HRF.table' provides the specifications for
  predicted HRFs:

  label   age   sex   group   TR
  g1.t1   6.2     1      1    0.00
  g1.t2   6.2     1      1    0.25
  g1.t3   6.2     1      1    0.50
  ...
  g2.t1   3.5    -1     -1    0.00
  g2.t2   3.5    -1     -1    0.25
  g2.t3   3.5    -1     -1    0.50
  ...


Options in alphabetical order:
------------------------------

   -bounds lb ub: This option is for outlier removal. Two numbers are expected from
         the user: the lower bound (lb) and the upper bound (ub). The input data will
         be confined within [lb, ub]: any values in the input data that are beyond
         the bounds will be removed and treated as missing. Make sure the first number
         less than the second. You do not have to use this option to censor your data!

   -cio: Use AFNI's C io functions, which is default. Alternatively -Rio
         can be used.

   -dataTable TABLE: List the data structure with a header as the first line.

         NOTE:

         1) This option has to occur last in the script; that is, no other
         options are allowed thereafter. Each line should end with a backslash
         except for the last line.

         2) The order of the columns should not matter except that the last
         column has to be the one for input files, 'InputFile'. Each row should
         contain only one input file in the table of long format (cf. wide format)
         as defined in R. Input files can be in AFNI, NIfTI or surface format.
         AFNI files can be specified with sub-brick selector (square brackets
         [] within quotes) specified with a number or label.

         3) It is fine to have variables (or columns) in the table that are
         not modeled in the analysis.

         4) When the table is part of the script, a backslash is needed at the end
         of each line to indicate the continuation to the next line. Alternatively,
         one can save the context of the table as a separate file, e.g.,
         calling it table.txt, and then in the script specify the data with
         '-dataTable @table.txt'. However, when the table is provided as a separate
         file, do NOT put any quotes around the square brackets for each sub-brick,
         otherwise the program would not properly read the files, unlike the
         situation when quotes are required if the table is included as part of the
         script. Backslash is also not needed at the end of each line, but it would
         not cause any problem if present. This option of separating the table from
         the script is useful: (a) when there are many input files so that
         the program complains with an 'Arg list too long' error; (b) when
         you want to try different models with the same dataset.

   -dbgArgs: This option will enable R to save the parameters in a
         file called .3dMSS.dbg.AFNI.args in the current directory
          so that debugging can be performed.

   -help: this help message

   -IF var_name: var_name is used to specify the column name that is designated for
        input files of effect estimate. The default (when this option is not invoked
        is 'InputFile', in which case the column header has to be exactly as 'InputFile'
        This input file for effect estimates has to be the last column.

   -jobs NJOBS: On a multi-processor machine, parallel computing will speed
         up the program significantly.
         Choose 1 for a single-processor computer.

   -lme FORMULA: Specify the fixed effect components of the model. The
         expression FORMULA with more than one variable has to be surrounded
         within (single or double) quotes. Variable names in the formula
         should be consistent with the ones used in the header of -dataTable.
         See examples in the help for details.

   -mask MASK: Process voxels inside this mask only.
          Default is no masking.

   -mrr FORMULA: Specify the model formulation through multilevel smoothing splines.
         Expression FORMULA with more than one variable has to be surrounded
         within (single or double) quotes. Variable names in the formula
         should be consistent with the ones used in the header of -dataTable.
         The nonlinear trajectory is specified through the expression of s(x,k=?)
         where s() indicates a smooth function, x is a quantitative variable with
         which one would like to trace the trajectory and k is the number of smooth
         splines (knots). The default (when k is missing) for k is 10, which is good
         enough most of the time when there are more than 10 data points of x. When
         there are less than 10 data points of x, choose a value of k slightly less
         than the number of data points.

   -prediction TABLE: Provide a data table so that predicted values could be generated for
graphical illustration. Usually the table should contain similar structure as the input
file except that 1) reserve the first column for effect labels which will be used for
sub-brick names in the output for those predicted values; 2) columns for those varying
smoothing terms (e.g., subject) and response variable (i.e., Y) should not be includes.
Try to specify equally-spaced values with a small for the quantitative variable of
modeled trajectory (e.g., age) so that smooth curves could be plotted after the
analysis. See Examples in the help for a couple of specific tables used for predictions.

   -prefix PREFIX: Output file name. For AFNI format, provide prefix only,
         with no view+suffix needed. Filename for NIfTI format should have
         .nii attached (otherwise the output would be saved in AFNI format).

   -qVars variable_list: Identify quantitative variables (or covariates) with
         this option. The list with more than one variable has to be
         separated with comma (,) without any other characters such as
         spaces and should be surrounded within (single or double) quotes.
         For example, -qVars "Age,IQ"
         WARNINGS:
         1) Centering a quantitative variable through -qVarsCenters is
         very critical when other fixed effects are of interest.
         2) Between-subjects covariates are generally acceptable.
         However EXTREME caution should be taken when the groups
         differ substantially in the average value of the covariate.


   -ranEff FORMULA: Specify the random effect components of the model. The
         expression FORMULA with more than one variable has to be surrounded
         within (single or double) quotes. Variable names in the formula
         should be consistent with the ones used in the header of -dataTable.
         In the MSS context the simplest model is "list(Subj=~1)" in which the
         varying or random effect from each subject is incorporated in the model.
         Each random-effects factor is specified within parentheses per formula
         convention in R.

   -Rio: Use R's io functions. The alternative is -cio.

   -sdiff variable_list: This option is used to specify a factor for group comparisons.
         For example, if one wants to compare age trajectory between two groups through
         "s(age,by=group)" in model specification, use "-sdiff 'group'" to generate
         the predicted trajectory of group differences through the values provided in the
         prediction table under the option -prediction. Currently it only allows for one group
         comparison. Perform separate analyses if more than one group comparison is
         desirable. "  .

   -show_allowed_options: list of allowed options

   -vt var formulation: This option is for specifying varying smoothing terms. Two components
         are required: the first one 'var' indicates the variable (e.g., subject) around
         which the smoothing will vary while the second component specifies the smoothing
         formulation (e.g., s(age,subject)). When there is no varying smoothing terms (e.g.,
         no within-subject variables), do not use this option.