14.2.12. Chen et al. (2018a). A tail of two sides: Artificially doubled false positive …¶
Introduction¶
Here we present commands used in the following paper:
- Chen G, Cox RW, Glen DR, Rajendra JK, Reynolds RC, Taylor PA (2019). A tail of two sides: Artificially doubled false positive rates in neuroimaging due to the sidedness choice with t-tests. Human Brain Mapping 40:1037-1043.
Abstract: One-sided t-tests are widely used in neuroimaging data analysis. While such a test may be applicable when investigating specific regions and prior information about directionality is present,we argue here that it is often mis-applied, with severe consequences for false positive rate (FPR) control. Conceptually, a pair of one-sided t-tests conducted in tandem (e.g., to test separately for both positive and negative effects), effectively amounts to a two-sided t-test. However, replacing the two-sided test with a pair of one-sided tests without multiple comparisons correction essentially doubles the intended FPR of statements made about the same study; that is, the actual family-wise error (FWE) of results at the whole brain level would be 10% instead of the 5% intended by the researcher. Therefore, we strongly recommend that, unless otherwise explicitly justified, two-sided t-tests be applied instead of two simultaneous one-sided t-tests.
Study keywords: task-block, EPI, MPRAGE, human, control, adult, MNI space, nonlinear align
Main programs:
3dMEMA
, 3dClustSim
, 3dClusterize
, gen_group_command.py
,
3dmask_tool
Download scripts¶
To download, either:
... click the link(s) in the following table (perhaps Rightclick -> “Save Link As…”):
An example of group level analyses with two-tailed testing (using
3dMEMA
,3dClustSim
and3dClusterize
, among others)... or copy+paste into a terminal:
curl -O https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/codex/fmri/media/2018a_ChenEtal/s.nimh_group_level_02_mema_bisided.tcsh
View scripts¶
These scripts were run on the NIH’s Biowulf HPC cluster, hence some of
the variables like $SLURM_CPUS_PER_TASK
, $SLURM_JOBID
, etc.
s.nimh_group_level_02_mema_bisided.tcsh
¶
1#!/bin/tcsh
2
3# ----------------------------------------------------------------------
4# Script: s.nimh_group_level_02_mema_bisided.tcsh
5# Run on: openfmri/ds001_R2.0.4
6# Date : May, 2018
7#
8# Take a group of subjects processed for task-based FMRI, run 3dMEMA
9# on their final statistical results. Clustering is also performed
10# here, using 3dClustSim (with the ACF blur estimation). Some
11# individual volumes of t-statistics and effect estimate maps are also
12# output. This provides an example of using the newest clusterizing
13# program in AFNI, 3dClusterize (an update version of 3dclust).
14#
15# Used as an example of proper two-sided testing in:
16#
17# A tail of two sides: Artificially doubled false positive rates in
18# neuroimaging due to the sidedness choice with t-tests
19#
20# by Chen G, Glen DR, Rajendra JK, Reynolds RC, Cox RW, Taylor PA.
21# Scientific and Statistical Computing Core, NIMH/NIH/DHHS, USA.
22#
23# ... as applied to an earlier processed dataset from *this* study:
24#
25# Some comments and corrections on FMRI processing with AFNI in
26# "Exploring the Impact of Analysis Software on Task fMRI Results"
27#
28# by Taylor PA, Chen G, Glen DR, Rajendra JK, Reynolds RC, Cox RW.
29# Scientific and Statistical Computing Core, NIMH/NIH/DHHS, USA.
30#
31# NOTE: The processing described in "Some comments..." includes sets
32# of processing commands correcting/improving upon some sets of
33# commands run in an earlier study by a different group (see therein
34# for more details). However, even the "NIMH" set of commands still
35# includes some non-ideal features, as described there, for the
36# purposes of comparison with the other group's processing stream.
37# Please read the associated text carefully before using/adapting
38# those scripts.
39#
40# ----------------------------------------------------------------------
41#
42# To run for a single subject, for example:
43#
44# tcsh -ef s.nimh_group_level_02_mema_bisided.tcsh
45#
46# On a cluster you might want to use scratch disk space for faster
47# writing, and a lot of memory and CPUs, depending on the
48# data/processing choices. Syntax might be:
49#
50# sbatch \
51# --partition=SOME_NAMES \
52# --cpus-per-task=32 \
53# --mem=64g \
54# --gres=lscratch:80 \
55# s.nimh_group_level_02_mema_bisided.tcsh
56#
57# ----------------------------------------------------------------------
58
59# ================ Set up paths and many params ======================
60
61set here = $PWD
62set p_0 = /data/NIMH_SSCC
63set path_main = ${p_0}/openfmri/ds001_R2.0.4
64set path_proc = ${path_main}/data_proc_nimh
65set odir = GROUP_LEVEL_nimh
66set path_mema = ${path_proc}/${odir}
67
68# Running 3dMEMA
69set script_mema = do_group_level_nimh.tcsh # generated command file name
70set omema = mema_results.nii.gz # output effect+stats filename
71set groupid = controls # volume label for stat result
72
73# Cluster parameters
74set csim_neigh = 1 # neighborhood; could be NN=1,2,3
75set csim_NN = "NN${csim_neigh}" # other form of neigh
76set csim_sided = "bisided" # test type; could be 1sided, 2sided or bisided
77set csim_pthr = 0.001 # voxelwise thr (was higher, 0.01, in orig study)
78set csim_alpha = 0.05 # nominal FWE
79set csim_pref = "clust_${csim_sided}" # prefix for outputting stuff
80
81# ================== Optional: Biowulf cluster =========================
82
83# This section for possibly running on Biowulf cluster.
84
85#### The following 2 lines could be uncommented to load these modules;
86#### that would likely not be necessary outside of NIH's Biowulf cluster.
87# source /etc/profile.d/modules.csh
88# module load afni R
89
90# set thread count if we are running SLURM
91if ( $?SLURM_CPUS_PER_TASK ) then
92 setenv OMP_NUM_THREADS $SLURM_CPUS_PER_TASK
93endif
94
95# Set temporary output directory; then requires using something like
96# this on the swarm command line: --sbatch '--gres=lscratch:50'.
97# These variables used again *after* afni_proc.py command, if running
98# on Biowulf.
99if( $?SLURM_JOBID )then
100 set tempdir = /lscratch/$SLURM_JOBID/${odir}
101 set usetemp = 1
102else
103 set tempdir = $path_mema
104 set usetemp = 0
105endif
106
107mkdir -p $tempdir
108
109
110# ====================== Voxelwise modeling ==========================
111
112# Create and execute 3dMEMA command. Note the inclusion of the
113# "-options -missing_data 0" flags, to avoid some numerical badness of
114# possible FOV edge effects, etc.
115gen_group_command.py \
116 -command 3dMEMA \
117 -write_script ${tempdir}/${script_mema} \
118 -prefix ${tempdir}/${omema} \
119 -dsets ${path_proc}/sub*/stats.sub*_REML+tlrc.HEAD \
120 -set_labels "$groupid" \
121 -subs_betas 'pumps_demean_vs_ctrl_demean#0_Coef' \
122 -subs_tstats 'pumps_demean_vs_ctrl_demean#0_Tstat' \
123 -options -missing_data 0
124
125tcsh -ef ${tempdir}/${script_mema}
126echo "++ MEMAed!"
127
128cd ${tempdir}
129echo "++ Now in the group stat dir: $PWD"
130
131# ====================== Make group mask ==========================
132
133# Make a group level mask for reporting results. This also determines
134# the volume over which multiple comparisons corrections for voxelwise
135# stats are done.
136# The "-frac 1.0" is to make an intersection mask.
1373dmask_tool \
138 -prefix mask.nii.gz \
139 -input `ls ${path_proc}/sub-*/mask_epi_anat.*.HEAD` \
140 -frac 1.0
141
142# ==================== Calc average smoothness =======================
143
144# Get the line from each blur estimate file that has our desired
145# parameters; we are running REML here and using the ACF smoothness
146# estimation, so the gpattern variable reflects both. For N subjects,
147# the output *1D file contains N rows of the 3 ACF parameters.
148set gpattern = 'err_reml ACF'
149grep -h "$gpattern" ${path_proc}/sub*/blur_est* \
150 | cut -d\ -f1-3 \
151 > group_ACF_ests.1D
152
153# Get the group average ACF parameters.
154set blur_est = ( `3dTstat -mean -prefix - group_ACF_ests.1D\'` )
155echo "++ The group average ACF params are: $blur_est"
156
157# ==================== Cluster simulations =======================
158
159# Simulations for FWE corrected cluster-size inference: make a cluster
160# table based on the simulations, and *that* gets parsed and applied
161# to the actual data.
1623dClustSim \
163 -both \
164 -mask mask.nii.gz \
165 -acf $blur_est \
166 -prefix ClustSim
167
168# Get the volume threshold value from the ClustSim results, based on
169# user's choice of parameters as set above. The "-verb 0" means that
170# just the threshold number of voxels is returned, so we can save that
171# as a variable.
172set clust_thrvol = `1d_tool.py -verb 0 \
173 -infile ClustSim.${csim_NN}_${csim_sided}.1D \
174 -csim_pthr $csim_pthr \
175 -csim_alpha "$csim_alpha"`
176
177# Get the statistic value equivalent to the desired voxelwise p-value,
178# for thresholding purposes. Using the same p-value and sidedness
179# that were selected in the ClustSim results. This program also gets
180# the number of degrees of freedom (DOF) from the header of the volume
181# containing the statistic. The "-quiet" means that only the
182# associated statistic value is returned, so we can save it to a
183# variable.
184#
185# Note: actually, you can do this calculation within 3dClusterize
186# directly now, specifying the threshold as a p-value.
187
188set voxstat_thr = `p2dsetstat -quiet \
189 -pval $csim_pthr \
190 "-${csim_sided}" \
191 -inset "${omema}[${groupid}:t]"`
192
193echo "++ The final cluster volume threshold is: $clust_thrvol"
194echo "++ The voxelwise stat value threshold is: $voxstat_thr"
195
196# ================== Make cluster maps =====================
197
198# Run the 'clusterize' program to make maps of the clusters (sorted by
199# size) and to output a cluster-masked map of the effect estimate (EE)
200# data, in this case %BOLD fluctuation values-- i.e., the stuff we
201# should report.
202#
203# The main inputs are: dataset contain the statistic volume to be
204# thresholded, a mask within which to find clusters, and the ClustSim
205# parameters+outputs. A common addition to this list of inputs is to
206# specify the location of the EE volume in the input dset, for
207# reporting the EE info within any found clusters.
208#
209# SPECS:
210# 3dClusterize \
211# -inset :input dset to get info from; like both stat and eff data
212# -ithr :str label of stat vol in inset; or, use vol index: 1
213# -idat :(opt) str label EE vol in inset; or, use vol index: 0
214# -mask :same mask input to 3dClustSim (here, whole brain)
215# -bisided :specify *type* of test; followed by threshold limit info
216# -NN :what neighborhood definition was used?
217# -clust_nvox :cluster size threshold from 3dClustSim
218# -pref_map :name for cluster map output vol
219# -pref_dat :name for cluster-masked EE output vol
220# > ${csim_pref}_report.txt :dump text table of cluster report to file
221#
222# Note: one-sided testing would require slightly different syntax:
223# '-1sided ...' would not be followed by 2 numbers, as used here for
224# '-bisided ...', but instead by specifying a tail (i.e.,
225# "RIGHT_TAIL") and then a single number (its threshold value). Also,
226# if you use the "p=..." form of specifying a threshold value in
227# 3dClusterize, then the syntax would be a bit different, just using
228# that single argument after even '-bisided ...', for example. Please
229# see the help file for a more full description of the option(s).
230
2313dClusterize \
232 -inset ${omema} \
233 -ithr "${groupid}:t" \
234 -idat "${groupid}:b" \
235 -mask mask.nii.gz \
236 -${csim_sided} -$voxstat_thr $voxstat_thr \
237 -NN ${csim_neigh} \
238 -clust_nvox ${clust_thrvol} \
239 -pref_map ${csim_pref}_map.nii.gz \
240 -pref_dat ${csim_pref}_EE.nii.gz \
241 > ${csim_pref}_report.txt
242
243echo "++ Clusterizing complete."
244
245cd $here
246
247# ===================================================================
248
249# Again, Biowulf-running considerations: if processing went fine and
250# we were using a temporary directory, copy data back.
251if( $usetemp && -d $tempdir )then
252 echo "Copying data from $tempdir to $path_mema"
253 mkdir -p $path_mema
254 \cp -pr $tempdir/* $path_mema/
255endif
256
257# ===================================================================
258
259echo "\n++ DONE with group level stats + clustering!\n"
260
261time ; exit 0