One-Way within-Subject ANOVA without Constant
A general cell-mean (or structural) model of one-way within-subject (repeated-measures) ANOVA (3dANOVA2 -type 3) is
Yij = μ + αi+ βj + εij (I)
where,
Yij independent variable – regression coefficient (% signal change) from individual subject analysis;
μ constant – grand mean;
αi constants subject to Σαi = 0 – simple effect of factor A at level i, i = 1, 2, ..., a;
βj independent N(0, σp2) – random effect of subject j, j = 1, 2, ..., b (σp2 - population variance);
εij independent N(0, σ2) – random error or within-subject variability or interaction between the factor of interest and subject (σ2 - variance of sampling error).
Assumptions are:
E(Yij) = μ + αi, Var(Yij) = σp2 + σ2, Cov(Yij, Yi'j) = σp2 (i ~= i'), Cov(Yij,Yi'j') = 0 (j ~ = j');
Correlation between any two levels of factor A is: σp2/(σp2 + σ2).
However in some special situations a constant term in model (I) is not desirable in group analysis. For example, suppose we model the hemodynamic response of a condition with a set of basis functions (e.g., tent, gamma variates, stick functions, etc.). All the coefficients corresponding to the condition from individual subject analysis could be summed over to get the average or area under the curve as a representative for group analysis. However there are some pitfalls with this convenient strategy:
First, there might have some negative coefficients especially in the case of deconvolution (stick as basis function). People usually are uneasy with canceling each other among the coefficients. To avoid such a problem, it has been suggested to focus on those TR's where the response function peaks around. This work-around solution is still problematic as it is quite arbitrary to say the least.
Secondly, to make the summing process legitimate for group analysis, all the coefficients are assumed samples from some independent random variables. This is a very strong assumption, and most likely the coefficients are sequentially correlated, violating the independence.
Thirdly, what we really want to test is the following null hypothesis of the k coefficients
H0: c1 = 0, c2 = 0, ..., ck = 0 (II)
instead of
H0: Σci = 0 (III)
The acception regions of the two hypotheses are subtly different in the k-dimensional space (k-dim sphere vs. space between two (k-1)-dim planes). One possible failure of testing (III) when running contrast between two IRF's is that they might have the same average (or area under the curve) but different shape.
Another example is low-order simple effect or contrast testing. Suppose we have a 2X2 design experiment of perceptal study: factor A - category - has 2 levels (animal and tool), and factor B - modality - also has 2 levels (picture and sound), . There are 4 regression coefficients from each subject: c11 (animal picture), c12 (animal sound), c21 (tool picture), and c22 (tool sound). Suppose we want to test the effect of animal (or the contrast between animal and tool) regardless of picture or sound. Similar to the situation of basis function modeling, the correct hypothesis is
H0: C11 = 0 and C12 = 0 (or C11-C21 = 0 and C12-C22 = 0) (IV)
but not
H0: C11+C12 = 0 (or C11+C12-C21-C22 = 0) in a two-way within-subject ANOVA (this is an equivalent version of AUC?).
Both hypotheses (II) and (IV) necessitate a new type of modeling one-way within-subject (repeated-measures) ANOVA:
Yij = μi+ βj + εij (V)
where all terms bear the same meaning as in model (I) except factor effects (constants) {μi, i = 1, 2, ..., a}. And the new assumptions are the same as before except E(Yij) = μi.
The absence of a constant in model (IV) is because we don't want a common mean removed from each effect and would like to test the following null hypothesis
H0: μ1 = 0, μ2 = 0, ..., and μa = 0 (VI)
instead of the main effect of factor A. (VI) is like an extension of one-sample t test to a whole set of factor effects instead of one specific simple effect μi.
Solving (V) as a general linear model is very straightforwad, but we would like to analyze it in a more computationally economical way by calculating sums of squares as in the traditional ANOVA approach.
Intuitively we would pick MSA* = [1/(ab)]ΣiΣj(Yi.-Y.j+Y..)2 (thereafter a dot in the place of an index indicates the term as the mean over that index) as the mean squares for hypothesis (VI). With
Yi. = (1/b)ΣjYij= μi+ β. + εi.,
Y.j = (1/a)ΣiYij= μ.+ βj + ε.j,
Y.. = [1/(ab)]ΣiΣjYij= μ.+ β. + ε..,
βj ~ N(0, σp2),
β. = (1/b)Σjβj ~ N(0, σp2/b),
εi. = (1/b)Σjεij~ N(0, σ2/b),
ε.j = (1/a)Σiεij~ N(0, σ2/a),
ε.. = [1/(ab)]ΣiΣjεij~ N(0, σ2/(ab)),
the expected value of MSA* can be derived as the following
E(MSA*)
= (1/a)E(ΣYi.2)
= (1/a)E[Σ(μi+ β. + εi.)2]
= (1/a)E[Σ(μi2+ β.2 + εi.2+2μiβ. + 2μiεi.+ 2β.εi.)]
= (1/a)Σ[μi2+ Eβ.2 + Eεi.2+2E(μiβ.) + 2E(μiεi.) + 2E(β.εi.)]
= (1/a)Σ(μi2+ σp2/b + σ2/b)
= (1/a)Σμi2+ (σp2 + σ2)/b
Similar derivation can be applied to (mean squares for Subject) and MSAS (mean squares for interaction term between factor A and Subject), which are the same as in traditional one-way within-subject ANOVA,
E(MSS) = aσp2 + σ2,
E(MSAS) = σ2
Based on the above 3 terms we could come up with infinite number of F tests for (VI), for example,
F1 = (abMSA* - MSS)/[MSAS/(a-1)]
F2 = abMSA*/[MSS+(a-1)MSAS]
My questions are:
(1) Are the above derivations correct?
(2) How to choose among the infinite possible F tests for (III)?
(3) How to determine the degrees of freedom for those combined terms in the F tests, such as the numeratore in F1 and the denominator in F2?
Or hypothesis (VI) of model (V) is equivalent to testing in model (I)
H0: μ = 0 and αi = 0, i = 1, 2, ..., a
If we break the above hypothesis into 2:
H0: μ = 0
and
H0: αi = 0, i = 1, 2, ..., a
As
Y.. = [1/(ab)]ΣiΣjYij= μ+ β. + ε..
E(Y..2) = E(μ+ β. + ε..)2
= E(μ2+ β.2 + ε..2+2aμβ. + 2με.. + 2aβ.ε..)
= μ2+ σp2/b + σ2/ab
= μ2+ (aσp2 + σ2)/ab
An appropriate test is F = ab Y..2/MSAS ~ F(1, (a-1)(b-1))
Another approach would be starting with the reduced model under null hypothesis (VI)
Yij = βj + εij (VII)
The errors under model (V) and (VII) are respectively,
εij = Yij - μi - βj
εij = Yij - βj
But I could not go any further from here, thus failed to work out a formula on hypothesis (III). Any suggestions about this approach? The error term sums of squares are accordingly as below???
SSEF = ΣiΣj(Yij - Yi·- Y·j)2
SSER = ΣiΣj(Yij - Yi·)2
However in some special situations a constant term in model (I) is not desirable in group analysis. For example, suppose we model the hemodynamic response of a condition with a set of basis functions (e.g., tent, gamma variates, stick functions, etc.). All the coefficients corresponding to the condition from individual subject analysis could be summed over to get the average or area under the curve as a representative for group analysis. However there are some pitfalls with this convenient strategy:
First, there might have some negative coefficients especially in the case of deconvolution (stick as basis function). People usually are uneasy with canceling each other among the coefficients. To avoid such a problem, it has been suggested to focus on those TR's where the response function peaks around. This work-around solution is still problematic as it is quite arbitrary to say the least.
Secondly, to make the summing process legitimate for group analysis, all the coefficients are assumed samples from some independent random variables. This is a very strong assumption, and most likely the coefficients are sequentially correlated, violating the independence.
Thirdly, what we really want to test is the following null hypothesis of the k coefficients
H0: c1 = 0, c2 = 0, ..., ck = 0 (II)
instead of
H0: Σci = 0 (III)
The acception regions of the two hypotheses are subtly different in the k-dimensional space (k-dim sphere vs. space between two (k-1)-dim planes). One possible failure of testing (III) when running contrast between two IRF's is that they might have the same average (or area under the curve) but different shape.
Another example is low-order simple effect or contrast testing. Suppose we have a 2X2 design experiment of perceptal study: factor A - category - has 2 levels (animal and tool), and factor B - modality - also has 2 levels (picture and sound), . There are 4 regression coefficients from each subject: c11 (animal picture), c12 (animal sound), c21 (tool picture), and c22 (tool sound). Suppose we want to test the effect of animal (or the contrast between animal and tool) regardless of picture or sound. Similar to the situation of basis function modeling, the correct hypothesis is
H0: C11 = 0 and C12 = 0 (or C11-C21 = 0 and C12-C22 = 0) (IV)
but not
H0: C11+C12 = 0 (or C11+C12-C21-C22 = 0) in a two-way within-subject ANOVA (this is an equivalent version of AUC?).
Both hypotheses (II) and (IV) necessitate a new type of modeling one-way within-subject (repeated-measures) ANOVA:
Yij = μi+ βj + εij (V)
where all terms bear the same meaning as in model (I) except factor effects (constants) {μi, i = 1, 2, ..., a}. And the new assumptions are the same as before except E(Yij) = μi.
The absence of a constant in model (IV) is because we don't want a common mean removed from each effect and would like to test the following null hypothesis
H0: μ1 = 0, μ2 = 0, ..., and μa = 0 (VI)
instead of the main effect of factor A. (VI) is like an extension of one-sample t test to a whole set of factor effects instead of one specific simple effect μi.
Solving (V) as a general linear model is very straightforwad, but we would like to analyze it in a more computationally economical way by calculating sums of squares as in the traditional ANOVA approach.
Intuitively we would pick MSA* = [1/(ab)]ΣiΣj(Yi.-Y.j+Y..)2 (thereafter a dot in the place of an index indicates the term as the mean over that index) as the mean squares for hypothesis (VI). With
Yi. = (1/b)ΣjYij= μi+ β. + εi.,
Y.j = (1/a)ΣiYij= μ.+ βj + ε.j,
Y.. = [1/(ab)]ΣiΣjYij= μ.+ β. + ε..,
βj ~ N(0, σp2),
β. = (1/b)Σjβj ~ N(0, σp2/b),
εi. = (1/b)Σjεij~ N(0, σ2/b),
ε.j = (1/a)Σiεij~ N(0, σ2/a),
ε.. = [1/(ab)]ΣiΣjεij~ N(0, σ2/(ab)),
the expected value of MSA* can be derived as the following
E(MSA*)
= (1/a)E(ΣYi.2)
= (1/a)E[Σ(μi+ β. + εi.)2]
= (1/a)E[Σ(μi2+ β.2 + εi.2+2μiβ. + 2μiεi.+ 2β.εi.)]
= (1/a)Σ[μi2+ Eβ.2 + Eεi.2+2E(μiβ.) + 2E(μiεi.) + 2E(β.εi.)]
= (1/a)Σ(μi2+ σp2/b + σ2/b)
= (1/a)Σμi2+ (σp2 + σ2)/b
Similar derivation can be applied to (mean squares for Subject) and MSAS (mean squares for interaction term between factor A and Subject), which are the same as in traditional one-way within-subject ANOVA,
E(MSS) = aσp2 + σ2,
E(MSAS) = σ2
Based on the above 3 terms we could come up with infinite number of F tests for (VI), for example,
F1 = (abMSA* - MSS)/[MSAS/(a-1)]
F2 = abMSA*/[MSS+(a-1)MSAS]
My questions are:
(1) Are the above derivations correct?
(2) How to choose among the infinite possible F tests for (III)?
(3) How to determine the degrees of freedom for those combined terms in the F tests, such as the numeratore in F1 and the denominator in F2?
Or hypothesis (VI) of model (V) is equivalent to testing in model (I)
H0: μ = 0 and αi = 0, i = 1, 2, ..., a
If we break the above hypothesis into 2:
H0: μ = 0
and
H0: αi = 0, i = 1, 2, ..., a
As
Y.. = [1/(ab)]ΣiΣjYij= μ+ β. + ε..
E(Y..2) = E(μ+ β. + ε..)2
= E(μ2+ β.2 + ε..2+2aμβ. + 2με.. + 2aβ.ε..)
= μ2+ σp2/b + σ2/ab
= μ2+ (aσp2 + σ2)/ab
An appropriate test is F = ab Y..2/MSAS ~ F(1, (a-1)(b-1))
Another approach would be starting with the reduced model under null hypothesis (VI)
Yij = βj + εij (VII)
The errors under model (V) and (VII) are respectively,
εij = Yij - μi - βj
εij = Yij - βj
But I could not go any further from here, thus failed to work out a formula on hypothesis (III). Any suggestions about this approach? The error term sums of squares are accordingly as below???
SSEF = ΣiΣj(Yij - Yi·- Y·j)2
SSER = ΣiΣj(Yij - Yi·)2