Resolution of Students t-tests, ANOVA and analysis of variance components
Resolution of Students t-tests, ANOVA and analysis of variance components
Lessons in biostatistics
Responsible writing in science
Abstract
Significance testing in comparisons is based on Student’s t-tests for pairs and analysis of variance (ANOVA) for simultaneous comparison of several
procedures.
Access to the average, standard deviation and number of observations is sufficient for calculating the significance of differences using the Student’s
tests and the ANOVA. Once an ANOVA has been calculated, analysis of variance components from summary data becomes possible. Simple calculati-
ons based on summary data provide inference on significance testing. Examples are given from laboratory management and method comparisons.
It is emphasized that the usual criteria of the underlying distribution of the raw data must be fulfilled.
Key words: analysis of variance; biostatistics; measurement comparisons
Introduction
Comparison of results from different experimental an ANOVA table to provide analysis of variance
designs, between instruments and between meth- components is particularly discussed.
ods is an everyday task in analytical chemistry and
its applied sciences, e.g. laboratory medicine. Usu- Methods
ally, statistical inference is based on original obser-
vations fed into statistical packages that deliver Student’s t-test
the requested statistics. The obvious procedure is
to start with inspecting the raw data to determine There are two Student’s t-tests; one evaluates pairs
the appropriate statistical methods to be used. of results with something in common, known as
However, sometimes the raw data may not be the dependent test, tdep. The other compares the
available whereas the central tendency (e.g. the averages of independent results, tind.
average), the dispersion (e.g. the standard devia- A classic example of a dependent design is com-
tion) and the number of observations may be. Yet, paring the results obtained from the same individ-
it may be desirable to evaluate the significance of uals before and after a treatment. An independent
a difference between datasets. Typical situations design would be, for instance, comparing the re-
may be related to laboratory management and sults obtained in groups of healthy men and wom-
scientific evaluation of reports. We describe how en. Thus, the tdep considers the difference between
this can be accomplished if the datasets are inde- every pair of values, whereas the tind only consid-
pendent and fulfil the requirements of Student’s t- ers the averages, the standard deviation and num-
test or analysis of variance (ANOVA). Resolution of ber of observations in each group. Access to these
intermediary quantities allows calculating the t- data dispersion. Provided the original datasets can
value. be assumed to be normally distributed the signifi-
To further understand the difference between the cance (tind) of a difference between the averages
t-values and the formal prerequisites and condi- can be estimated according to Equation 2 (Eq.2).
tions for their use, it may be helpful to consider The tind considers the difference between the aver-
how they are calculated. ages of two datasets in relation to the square root
The tdep considers the differences between paired of the sum of their respective squared standard er-
measurements and is calculated by the different ror of the mean.
but equivalent formats of the Equation 1 (Eq.1):
x1 − x 2
t ind =
∑d i
s12 s 22
+ (Eq.2)
t dep =
di
= n =
∑ di
(Eq.1)
n1 n 2
sd sd sd × n
n n This standard calculation may be less known or
recognized in the era of calculators and statistical
packages. The averages and standard deviations
If expressed in words, tdep is the average of the dif- are calculated or otherwise available from the
original data sets and then entered into Eq.2. If the
ferences between observations, d , divided by its number of observations in the groups is similar
sd and the standard deviations of the same magni-
standard error . Rearrangement of Eq.1 may
n tude, the degrees of freedom, df, is n1 + n2 – 2.
facilitate calculations. The degrees of freedom, df, Consider, for instance, that the cholesterol concen-
tration of two groups of healthy men from widely
is n-1.
different environments was reported as the aver-
The differences shall be normally distributed. If ages, standard deviation and number of observa-
that distribution is far from normal then using the tions (Table 1). The calculated tind using Eq.2 was:
tdep cannot be justified and a non-parametric test
should be applied, e.g. the Wilcoxon test.
5.0 − 4.4 0.6 0.6 × 10
The tdep is limited to comparing two sets of de- tind = = = = 3.14
.
2
1.4 1.3 2 1 3.65
pendent individual data, e.g. results before and af- + × 1.4 2 + 1.32
100 100 10
ter an intervention. The datasets must be of equal
sizes but need not be normally distributed. If the
parameters of Eq.1 are known, i.e. the average dif- Although we do not know for certain if the origi-
ference between pairs and their standard error of nal results were normally distributed – and inde-
the mean, then tdep can be calculated without di- pendent – this may be a reasonable assumption
rect access to the original results. It is unlikely that considering that the averages and standard devia-
results of a comparison are reported in this way tions of the data were originally provided.
and the intermediary calculation of tdep does not
have a given place in the arsenal.
Table 1. Results of cholesterol concentration measurements in
two groups of men
Student’s tind from intermediary data
Group 1 Group 2
Access to the average, standard deviation and
Average 5.0 4.4
number of observations but not the original ob-
Standard deviation 1.4 1.3
servations allows evaluating the significance of
the difference; i.e. when results are presented with F-value 1.16
only information about the central tendency and Number of observations 100 100
The variances were compared and evaluated by an should be compared with the critical P-value (Pcrit)
F-test and found not significantly different. Thus, for the desired significance level and degrees of
the df = 100 + 100 – 2 = 198. freedom for the individual datasets using the func-
The significance of a t-value (however generated) tion F.INV.RT(p,df1,df2). Since Pcrit = 1.39, the null
is evaluated by the same t-table which is available hypothesis is not discarded and the variances are
in many text books and on the internet (1). The equal. The “RT” (right tail) in these functions limits
null hypothesis is that there is no difference be- the calculations to a one-tailed situation where
tween the groups. The null hypothesis is discarded only the upper limit of the right skew distribution
if the t-value is above a critical value (usually cor- is considered.
responding to P = 0.05). A calculated t-value can If the F-test reveals that there is a high probability
be directly evaluated by the Excel functions T. of a significant difference between the variances,
DIST.2T(t, df) or T.DIST.RT(t, df), the functions refer- then estimating the df according to Welch-Satter-
ring to a two- or one-tail problem, respectively. thwaite should be considered (2,3). A detailed dis-
The cholesterol values of the two groups are sta- cussion of this procedure is outside the scope of
tistically different with P = 0.002 and P = 0.001 for the present paper. The calculated df will not al-
the two- and one-tail problem, respectively. The ways be an integer and since only the df, not the t-
“critical value” is obtained by the functions T. value per se is affected, the outcome may only in-
INV(probability;df) and T.INV.2T(probability;df) for directly have an effect on the inference.
one and two tail problems, respectively. Accordingly, however, Excel offers two tind proce-
The variances of the distributions as well as the dif- dures, “assuming equal variances” and “unequal
ference between their averages are the important variances”. It is safer to always use the latter; if the
quantities in evaluating the difference between variances happen to be similar, the t-value and the
the distributions. The variances are assumed to be degrees of freedom are anyway calculated cor-
rectly.
equal in the estimation of the df. This can be test-
ed using the “F-test”. This test is designed to com- If the datasets are not normally distributed, non-
pare the dispersion (variance) of two datasets with parametric procedures should be used, e.g. the
the null hypothesis that there is no difference. The Mann-Whitney test.
assumption in this case is that one of the variances
is larger than the other; this is therefore a one- ANOVA from intermediary results
tailed problem. To fit tables and other calculations If a specific quantity of a given sample is measured
the larger of the two variances shall be in the nom- repeatedly on several occasions, e.g. using differ-
inator. Consequently, the calculated statistic, the F- ent instruments or on different days, it may be in-
value, is always above 1. The farther away from teresting to compare the averages in the groups
one, the larger is the probability that there is a dif- or from the various occasions. The procedure of
ference between the variances. choice in this case is the ANOVA. The ANOVA re-
To quantify the probability of a difference be- duces the risk of overestimating a significance of
tween the variances, a table should be consulted differences caused by chance which may be an ef-
but the table data can also be retrieved from Excel. fect of repeated tind.
As an example we can evaluate a possible differ- Since several groups/instruments are studied, ob-
ence between the variances reported in Table 1. servations are repeated in “two directions”, within
the groups and between the groups. Consequent-
The probability (P) for a significant difference of ly, the ANOVA reports the variation within the
2
1.4 groups and between the groups.
the F-value 1.16 is F = = 1.16 . The corre-
1.3 The ANOVA is calculated from the “sum of squares”,
sponding probability P = 0.231 (one-tail) is ob- i.e. the differences between observations and their
tained using the function F.DIST.RT(x,df1,df2). This averages, squared. Essentially, this is the same
principle as that of calculating the sample vari- The following abbreviations are used to describe
the calculations involved: SSb, SSw and SStot repre-
ance, i.e. the sum of squares ∑ (x i − x ) divided
2
sent the sums of squares between groups (SSb),
by the df (n-1). within groups (SSw) and total (SStot); MS represents
The stepwise resolution of the example in Table 2 the mean square obtained as SS/df; i individual
is given in Equations 3, 4 and 5 (Eq.3-5), and also
summarized in Table 3. groups, xi the average of the values in group i, xi
the average of all observations, s (m ) the standard
( )
i =m
SSb = ∑ ni × (xi − x ) = 0.0294;
2 deviation of the values in group m, m the number
i =1 of groups, nm the number of observations in
df = m – 1 = 4; group m and N the total number of observations.
The symbol Σ is a conventional shorthand symbol,
MSb = SSb / df = 0.073 interpreted as the sum of the terms in the adjacent
(Eq.3) parenthesis.
The Eq. 3 - 5 show that in the calculation of an
ANOVA only the averages, variances, and the ob-
( ) = 0.2222;
i =m
SS w = ∑ (ni − 1) × s (i )
2
servations in each group and number of groups
i =1 are necessary.
df = N – m = 91 The sum of squares may be difficult to visualize
but divided by the degrees of freedom the mean
MSw = SSw / df = 0.0024 squares (MS) are created. These represent the vari-
(Eq.4) ances within the groups (MSw) and between the
groups (MSb). However, the latter also includes the
variances emanating from the within groups and a
SS tot = SSw + SSb = (N − 1) × s(i ) = 0.2516
2
correction needs to be considered to estimate the
“pure between group variance”. See below Equa-
df = N – 1 = 95 tions 6 and 7 (Eq.6 and 7).
(Eq.5)
df SS MS F - value P P0.05
Between 4 0.0294 0.0073 3.01 0.022 2.47
Within 91 0.2222 0.0024
Total 95 0.2516
df – degrees of freedom. SS – sum of squares. MS – mean square.
MSb > MSw and thus the ratio MSb / MSw > 1. µn rep- 1.48
resents the true averages of the groups. This ratio 1.46
can be recognized as an F-test and used to evalu- 1.44
Triglycerides (mmol/L)
ate the difference between the groups. The calcu-
1.42
lated F-value is evaluated in a common F-table or
by Excel function F.DIST.RT(F,df1,df2) or F. 1.40
Variance SD CV, %
Pure between component 0.000255 0.016024 1.1
Ditto adjusted for unbalance 0.000257 0.016024 1.1
Within component 0.002442 0.049414 3.5
Total 0.002697 0.051947 3.7
SD – standard deviation. CV – coefficient of variation. The number of significant digits is exaggerated to visualize the effect of
correction for an unbalanced design.
( 2
)
standard deviation is s b + MS w . The result of
tions regarding normality and equal variances will
apply as when using raw data but since the input
the example in Table 2 is summarized in Table 4. data, particularly the standard deviation, already
If the MSb is smaller than MSw, their difference require normality this is usually not a major issue.
(Eq.6) would become negative and the sb cannot The intermediate calculation of an ANOVA may be
be calculated. In such cases the total variance is justified since the results of repeated measure-
conventionally set equal to MSw. ments of a particular quantity will vary randomly.
A total variance can also be calculated directly This also applies to the situation when the same
from all the observations. However, this approach sample is measured repeatedly in different labora-
may over- or underestimate the intra-laboratory tories. The use of the “Analysis of variance compo-
variance depending on the between- and within nents” procedure can be of great help in finding
group variances. the root cause to impaired quality of measure-
ments (2,4,5). The use of intermediary data may be
particularly useful in managing the quality of con-
Discussion glomerates of laboratories where access to sum-
mary data would allow simple calculation of with-
Statistical software may produce results, irrespec-
in- and between laboratory imprecision and even-
tive of the validity of the input data, or put another
tually a fair appreciation of the total imprecision.
way, the chosen statistical procedure may not be
“fit for purpose”. It is therefore necessary to under- Potential conflict of interest
stand what is going on “behind the scene”. As a
None declared.
bonus, procedures to estimate some test quanti-
References
1. NIST/SEMATECH e-Handbook of Statistical Methods Availa- 4. Armitage P, Berry G, Matthews JSN, eds. Statistical methods
ble at: http://www.itl.nist.gov/div898/handbook/prc/secti- in medical research. 4th ed. Malden, MA: Blackwell Science
on4/prc44.htm. Accessed February 2nd 2017. Ltd., 2008.
2. Kallner A, ed. Laboratory statistics. 1st ed. Waltham, MA: El- 5. Cardinal RN. Graduate-level statistics for psychology and
sevier Inc., 2014. neuroscience. Available at https://egret.psychol.cam.ac.uk/
3. Clinical and Laboratory Standards Institute (CLSI). User veri- psychology/graduate/Guide_to_ANOVA.pdf. Accessed Fe-
fication of precision and estimation of bias. CLSI document bruary 2nd 2017.
EP15 3A. 3rd ed. Wayne, PA: CLSI, 2014.