0% found this document useful (0 votes)
8 views

Resolution of Students t-tests, ANOVA and analysis of variance components

The document discusses the application of Student's t-tests and ANOVA for significance testing in biostatistics, particularly when raw data is unavailable. It emphasizes the importance of using summary statistics such as averages, standard deviations, and sample sizes to evaluate differences between datasets. The paper also provides examples and calculations to illustrate the methodologies for analyzing variance components and assessing statistical significance in laboratory settings.

Uploaded by

cubalinda493
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Resolution of Students t-tests, ANOVA and analysis of variance components

The document discusses the application of Student's t-tests and ANOVA for significance testing in biostatistics, particularly when raw data is unavailable. It emphasizes the importance of using summary statistics such as averages, standard deviations, and sample sizes to evaluate differences between datasets. The paper also provides examples and calculations to illustrate the methodologies for analyzing variance components and assessing statistical significance in laboratory settings.

Uploaded by

cubalinda493
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Special issue:

Lessons in biostatistics
Responsible writing in science

Resolution of Students t-tests, ANOVA and analysis of variance components


from intermediary data
Kallner Anders*
Department of Clinical Chemistry, Karolinska University Hospital, Stockholm, Sweden

*Corresponding author: [email protected]

Abstract
Significance testing in comparisons is based on Student’s t-tests for pairs and analysis of variance (ANOVA) for simultaneous comparison of several
procedures.
Access to the average, standard deviation and number of observations is sufficient for calculating the significance of differences using the Student’s
tests and the ANOVA. Once an ANOVA has been calculated, analysis of variance components from summary data becomes possible. Simple calculati-
ons based on summary data provide inference on significance testing. Examples are given from laboratory management and method comparisons.
It is emphasized that the usual criteria of the underlying distribution of the raw data must be fulfilled.
Key words: analysis of variance; biostatistics; measurement comparisons

Received: October 28, 2016 Accepted: April 4, 2017

Introduction
Comparison of results from different experimental an ANOVA table to provide analysis of variance
designs, between instruments and between meth- components is particularly discussed.
ods is an everyday task in analytical chemistry and
its applied sciences, e.g. laboratory medicine. Usu- Methods
ally, statistical inference is based on original obser-
vations fed into statistical packages that deliver Student’s t-test
the requested statistics. The obvious procedure is
to start with inspecting the raw data to determine There are two Student’s t-tests; one evaluates pairs
the appropriate statistical methods to be used. of results with something in common, known as
However, sometimes the raw data may not be the dependent test, tdep. The other compares the
available whereas the central tendency (e.g. the averages of independent results, tind.
average), the dispersion (e.g. the standard devia- A classic example of a dependent design is com-
tion) and the number of observations may be. Yet, paring the results obtained from the same individ-
it may be desirable to evaluate the significance of uals before and after a treatment. An independent
a difference between datasets. Typical situations design would be, for instance, comparing the re-
may be related to laboratory management and sults obtained in groups of healthy men and wom-
scientific evaluation of reports. We describe how en. Thus, the tdep considers the difference between
this can be accomplished if the datasets are inde- every pair of values, whereas the tind only consid-
pendent and fulfil the requirements of Student’s t- ers the averages, the standard deviation and num-
test or analysis of variance (ANOVA). Resolution of ber of observations in each group. Access to these

https://doi.org/10.11613/BM.2017.026 Biochemia Medica 2017;27(2):253–8



©Copyright by Croatian Society of Medical Biochemistry and Laboratory Medicine. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creative-
commons.org/licenses/by/4.0/) which permits users to read, download, copy, distribute, print, search, or link to the full texts of these articles in any medium or format and to remix, transform and build upon
253
the material, provided the original work is properly cited and any changes properly indicated​.
Kallner A. Intermediary data for comparison of measurements

intermediary quantities allows calculating the t- data dispersion. Provided the original datasets can
value. be assumed to be normally distributed the signifi-
To further understand the difference between the cance (tind) of a difference between the averages
t-values and the formal prerequisites and condi- can be estimated according to Equation 2 (Eq.2).
tions for their use, it may be helpful to consider The tind considers the difference between the aver-
how they are calculated. ages of two datasets in relation to the square root
The tdep considers the differences between paired of the sum of their respective squared standard er-
measurements and is calculated by the different ror of the mean.
but equivalent formats of the Equation 1 (Eq.1):
x1 − x 2
t ind =
∑d i
s12 s 22
+ (Eq.2)
t dep =
di
= n =
∑ di
(Eq.1)
n1 n 2
sd sd sd × n
n n This standard calculation may be less known or
recognized in the era of calculators and statistical
packages. The averages and standard deviations
If expressed in words, tdep is the average of the dif- are calculated or otherwise available from the
original data sets and then entered into Eq.2. If the
ferences between observations, d , divided by its number of observations in the groups is similar
 sd  and the standard deviations of the same magni-
standard error   . Rearrangement of Eq.1 may

 n tude, the degrees of freedom, df, is n1 + n2 – 2.
facilitate calculations. The degrees of freedom, df, Consider, for instance, that the cholesterol concen-
tration of two groups of healthy men from widely
is n-1.
different environments was reported as the aver-
The differences shall be normally distributed. If ages, standard deviation and number of observa-
that distribution is far from normal then using the tions (Table 1). The calculated tind using Eq.2 was:
tdep cannot be justified and a non-parametric test
should be applied, e.g. the Wilcoxon test.
5.0 − 4.4 0.6 0.6 × 10
The tdep is limited to comparing two sets of de- tind = = = = 3.14
.
2
1.4 1.3 2 1 3.65
pendent individual data, e.g. results before and af- + × 1.4 2 + 1.32
100 100 10
ter an intervention. The datasets must be of equal
sizes but need not be normally distributed. If the
parameters of Eq.1 are known, i.e. the average dif- Although we do not know for certain if the origi-
ference between pairs and their standard error of nal results were normally distributed – and inde-
the mean, then tdep can be calculated without di- pendent – this may be a reasonable assumption
rect access to the original results. It is unlikely that considering that the averages and standard devia-
results of a comparison are reported in this way tions of the data were originally provided.
and the intermediary calculation of tdep does not
have a given place in the arsenal.
Table 1. Results of cholesterol concentration measurements in
two groups of men
Student’s tind from intermediary data
Group 1 Group 2
Access to the average, standard deviation and
Average 5.0 4.4
number of observations but not the original ob-
Standard deviation 1.4 1.3
servations allows evaluating the significance of
the difference; i.e. when results are presented with F-value 1.16
only information about the central tendency and Number of observations 100 100

Biochemia Medica 2017;27(2):253–8 https://doi.org/10.11613/BM.2017.026


254
Kallner A. Intermediary data for comparison of measurements

The variances were compared and evaluated by an should be compared with the critical P-value (Pcrit)
F-test and found not significantly different. Thus, for the desired significance level and degrees of
the df = 100 + 100 – 2 = 198. freedom for the individual datasets using the func-
The significance of a t-value (however generated) tion F.INV.RT(p,df1,df2). Since Pcrit = 1.39, the null
is evaluated by the same t-table which is available hypothesis is not discarded and the variances are
in many text books and on the internet (1). The equal. The “RT” (right tail) in these functions limits
null hypothesis is that there is no difference be- the calculations to a one-tailed situation where
tween the groups. The null hypothesis is discarded only the upper limit of the right skew distribution
if the t-value is above a critical value (usually cor- is considered.
responding to P = 0.05). A calculated t-value can If the F-test reveals that there is a high probability
be directly evaluated by the Excel functions T. of a significant difference between the variances,
DIST.2T(t, df) or T.DIST.RT(t, df), the functions refer- then estimating the df according to Welch-Satter-
ring to a two- or one-tail problem, respectively. thwaite should be considered (2,3). A detailed dis-
The cholesterol values of the two groups are sta- cussion of this procedure is outside the scope of
tistically different with P = 0.002 and P = 0.001 for the present paper. The calculated df will not al-
the two- and one-tail problem, respectively. The ways be an integer and since only the df, not the t-
“critical value” is obtained by the functions T. value per se is affected, the outcome may only in-
INV(probability;df) and T.INV.2T(probability;df) for directly have an effect on the inference.
one and two tail problems, respectively. Accordingly, however, Excel offers two tind proce-
The variances of the distributions as well as the dif- dures, “assuming equal variances” and “unequal
ference between their averages are the important variances”. It is safer to always use the latter; if the
quantities in evaluating the difference between variances happen to be similar, the t-value and the
the distributions. The variances are assumed to be degrees of freedom are anyway calculated cor-
rectly.
equal in the estimation of the df. This can be test-
ed using the “F-test”. This test is designed to com- If the datasets are not normally distributed, non-
pare the dispersion (variance) of two datasets with parametric procedures should be used, e.g. the
the null hypothesis that there is no difference. The Mann-Whitney test.
assumption in this case is that one of the variances
is larger than the other; this is therefore a one- ANOVA from intermediary results
tailed problem. To fit tables and other calculations If a specific quantity of a given sample is measured
the larger of the two variances shall be in the nom- repeatedly on several occasions, e.g. using differ-
inator. Consequently, the calculated statistic, the F- ent instruments or on different days, it may be in-
value, is always above 1. The farther away from teresting to compare the averages in the groups
one, the larger is the probability that there is a dif- or from the various occasions. The procedure of
ference between the variances. choice in this case is the ANOVA. The ANOVA re-
To quantify the probability of a difference be- duces the risk of overestimating a significance of
tween the variances, a table should be consulted differences caused by chance which may be an ef-
but the table data can also be retrieved from Excel. fect of repeated tind.
As an example we can evaluate a possible differ- Since several groups/instruments are studied, ob-
ence between the variances reported in Table 1. servations are repeated in “two directions”, within
the groups and between the groups. Consequent-
The probability (P) for a significant difference of ly, the ANOVA reports the variation within the
2
 1.4  groups and between the groups.
the F-value 1.16 is F =   = 1.16 . The corre-
 1.3  The ANOVA is calculated from the “sum of squares”,
sponding probability P = 0.231 (one-tail) is ob- i.e. the differences between observations and their
tained using the function F.DIST.RT(x,df1,df2). This averages, squared. Essentially, this is the same

https://doi.org/10.11613/BM.2017.026 Biochemia Medica 2017;27(2):253–8


255
Kallner A. Intermediary data for comparison of measurements

principle as that of calculating the sample vari- The following abbreviations are used to describe
the calculations involved: SSb, SSw and SStot repre-
ance, i.e. the sum of squares ∑ (x i − x ) divided
2
sent the sums of squares between groups (SSb),
by the df (n-1). within groups (SSw) and total (SStot); MS represents
The stepwise resolution of the example in Table 2 the mean square obtained as SS/df; i individual
is given in Equations 3, 4 and 5 (Eq.3-5), and also
summarized in Table 3. groups, xi the average of the values in group i, xi
the average of all observations, s (m ) the standard
( )
i =m
SSb = ∑ ni × (xi − x ) = 0.0294;
2 deviation of the values in group m, m the number
i =1 of groups, nm the number of observations in
df = m – 1 = 4; group m and N the total number of observations.
The symbol Σ is a conventional shorthand symbol,
MSb = SSb / df = 0.073 interpreted as the sum of the terms in the adjacent
(Eq.3) parenthesis.
The Eq. 3 - 5 show that in the calculation of an
ANOVA only the averages, variances, and the ob-
( ) = 0.2222;
i =m
SS w = ∑ (ni − 1) × s (i )
2
servations in each group and number of groups
i =1 are necessary.
df = N – m = 91 The sum of squares may be difficult to visualize
but divided by the degrees of freedom the mean
MSw = SSw / df = 0.0024 squares (MS) are created. These represent the vari-
(Eq.4) ances within the groups (MSw) and between the
groups (MSb). However, the latter also includes the
variances emanating from the within groups and a
SS tot = SSw + SSb = (N − 1) × s(i ) = 0.2516
2
correction needs to be considered to estimate the
“pure between group variance”. See below Equa-
df = N – 1 = 95 tions 6 and 7 (Eq.6 and 7).
(Eq.5)

Table 2. Results from repeated measurements of a sample in five different laboratories

Lab 1 Lab 2 Lab 3 Lab 4 Lab 5


Number of observations 18 15 24 21 18
Average 1.38 1.37 1.42 1.39 1.40
Standard deviation 0.040 0.050 0.060 0.050 0.040

Table 3. The ANOVA analysis based on data from Table 2

df SS MS F - value P P0.05
Between 4 0.0294 0.0073 3.01 0.022 2.47
Within 91 0.2222 0.0024
Total 95 0.2516
df – degrees of freedom. SS – sum of squares. MS – mean square.

Biochemia Medica 2017;27(2):253–8 https://doi.org/10.11613/BM.2017.026


256
Kallner A. Intermediary data for comparison of measurements

If the null hypothesis H0: µ1=µ2=…=µm, is false then 1.50

MSb > MSw and thus the ratio MSb / MSw > 1. µn rep- 1.48
resents the true averages of the groups. This ratio 1.46
can be recognized as an F-test and used to evalu- 1.44

Triglycerides (mmol/L)
ate the difference between the groups. The calcu-
1.42
lated F-value is evaluated in a common F-table or
by Excel function F.DIST.RT(F,df1,df2) or F. 1.40

DIST(F,df1,df2,cumulative) and expressed as a prob- 1.38


ability for the validity of the null hypothesis. Nor- 1.36
mally, a probability less than 5% (i.e. P < 0.05) is an- 1.34
ticipated for statistical significance.
1.32
The results of an ANOVA are conventionally re-
1.30
ported in a table (Table 3) based on the actual re- 0 1 2 3 4 5
sults, Figure 1. Laboratory number

Analysis of variance components


Figure 1. Results from Table 2.
The ANOVA allows defining the between- (repro- The squares represent the averages, the continuous line the
standard deviations, and the dotted error bars the standard er-
ducibility) and within- (repeatability) group vari- ror of the mean for each laboratory listed in Table 2.
ances. The key elements of the ANOVA table are
the MSs.
The MSw represents the within group variance
whereas the MSb is a composite measure of the number of observations differs between the
“pure between ( )
sb2
” and within-group variances.
groups and a correction needs to be included:
The necessary correction to isolate the pure be-
s(ni )
2
tween-group variance is: n0 = ni − (Eq.7)
N
MSb − MS w
sb2 = (Eq.6) where N is the total number of observations (2,4).
n
However, the correction by subtracting the rela-
where n is the number of observations in the tive variance of the number of observations over
groups. If the number of observations is the same the groups is bound to be small and therefore the
in every group, the design is “balanced” and n average number of observations in the groups is
equals the average number of observations in the usually appropriate (Table 4).
groups, whereas in an unbalanced design the The combined variance is:

Table 4. The analysis of variance components

Variance SD CV, %
Pure between component 0.000255 0.016024 1.1
Ditto adjusted for unbalance 0.000257 0.016024 1.1
Within component 0.002442 0.049414 3.5
Total 0.002697 0.051947 3.7
SD – standard deviation. CV – coefficient of variation. The number of significant digits is exaggerated to visualize the effect of
correction for an unbalanced design.

https://doi.org/10.11613/BM.2017.026 Biochemia Medica 2017;27(2):253–8


257
Kallner A. Intermediary data for comparison of measurements

ties without access to the original data become


sb2 + MS w (Eq.8) available. This may have practical consequences in
laboratories’ comparisons of results, particularly
This quantity is also called intra-laboratory vari- using Student’s independent t tests, ANOVA and
ance (3). The corresponding intra-laboratory Analysis of variance components. The same limita-

( 2
)
standard deviation is s b + MS w . The result of
tions regarding normality and equal variances will
apply as when using raw data but since the input
the example in Table 2 is summarized in Table 4. data, particularly the standard deviation, already
If the MSb is smaller than MSw, their difference require normality this is usually not a major issue.
(Eq.6) would become negative and the sb cannot The intermediate calculation of an ANOVA may be
be calculated. In such cases the total variance is justified since the results of repeated measure-
conventionally set equal to MSw. ments of a particular quantity will vary randomly.
A total variance can also be calculated directly This also applies to the situation when the same
from all the observations. However, this approach sample is measured repeatedly in different labora-
may over- or underestimate the intra-laboratory tories. The use of the “Analysis of variance compo-
variance depending on the between- and within nents” procedure can be of great help in finding
group variances. the root cause to impaired quality of measure-
ments (2,4,5). The use of intermediary data may be
particularly useful in managing the quality of con-
Discussion glomerates of laboratories where access to sum-
mary data would allow simple calculation of with-
Statistical software may produce results, irrespec-
in- and between laboratory imprecision and even-
tive of the validity of the input data, or put another
tually a fair appreciation of the total imprecision.
way, the chosen statistical procedure may not be
“fit for purpose”. It is therefore necessary to under- Potential conflict of interest
stand what is going on “behind the scene”. As a
None declared.
bonus, procedures to estimate some test quanti-

References
1. NIST/SEMATECH e-Handbook of Statistical Methods Availa- 4. Armitage P, Berry G, Matthews JSN, eds. Statistical methods
ble at: http://www.itl.nist.gov/div898/handbook/prc/secti- in medical research. 4th ed. Malden, MA: Blackwell Science
on4/prc44.htm. Accessed February 2nd 2017. Ltd., 2008.
2. Kallner A, ed. Laboratory statistics. 1st ed. Waltham, MA: El- 5. Cardinal RN. Graduate-level statistics for psychology and
sevier Inc., 2014. neuroscience. Available at https://egret.psychol.cam.ac.uk/
3. Clinical and Laboratory Standards Institute (CLSI). User veri- psychology/graduate/Guide_to_ANOVA.pdf. Accessed Fe-
fication of precision and estimation of bias. CLSI document bruary 2nd 2017.
EP15 3A. 3rd ed. Wayne, PA: CLSI, 2014.

Biochemia Medica 2017;27(2):253–8 https://doi.org/10.11613/BM.2017.026


258

You might also like