0% found this document useful (0 votes)
319 views

Statistical Comparison of The Slopes of Two Regression Lines A Tutorial

This article provides a tutorial on statistically comparing the slopes of two regression lines using the Student's t-test. It outlines the fundamentals of the t-test and how it can be used to compare slopes. The article analyzes whether there are relevant differences between using the standard errors of the slopes versus the standard errors of the regressions in the t-test. Through simulations, it finds that for small sample sizes common in analytical laboratories, the t-test based on the standard error of regression must be used. It also discusses ensuring the equality of the variance models. Alternatives to the t-test for comparing slopes, such as analysis of covariance, are also reviewed.

Uploaded by

AstridChoque
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
319 views

Statistical Comparison of The Slopes of Two Regression Lines A Tutorial

This article provides a tutorial on statistically comparing the slopes of two regression lines using the Student's t-test. It outlines the fundamentals of the t-test and how it can be used to compare slopes. The article analyzes whether there are relevant differences between using the standard errors of the slopes versus the standard errors of the regressions in the t-test. Through simulations, it finds that for small sample sizes common in analytical laboratories, the t-test based on the standard error of regression must be used. It also discusses ensuring the equality of the variance models. Alternatives to the t-test for comparing slopes, such as analysis of covariance, are also reviewed.

Uploaded by

AstridChoque
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

G Model

ACA 233236 No. of Pages 12

Analytica Chimica Acta xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Analytica Chimica Acta


journal homepage: www.elsevier.com/locate/aca

Review

Statistical comparison of the slopes of two regression lines: A tutorial


J.M. Andrade a, * , M.G. Estévez-Pérez b
a
Department of Analytical Chemistry, University of A Coruña, Campus da Zapateira, E-15008 A Coruña, Spain
b
Department of Estadística e Investigación Operativa, University of A Coruña, Campus de Elviña, E-15008 A Coruña, Spain

H I G H L I G H T S G R A P H I C A L A B S T R A C T

 Presents the fundamentals of the


Student’s t-test, with several associ-
ated problems.
 Major emphasis was placed on com-
paring two slopes.
 Some alternatives are discussed,
including ad-hoc bootstrap.
 General rules are given to avoid
common misunderstandings.

A R T I C L E I N F O A B S T R A C T

Article history: Comparing the slopes of two regression lines is an almost daily task in analytical laboratories. The usual
Received 10 March 2014 procedure is based on a Student’s t-test although literature differs in whether the standard errors of the
Received in revised form 25 April 2014 slopes or the standard errors of the regressions should be employed to get a pooled standard error. In this
Accepted 29 April 2014
work fundamental concepts on the use of the Student’s test were reviewed and Monte Carlo simulations
Available online xxx
were done to ascertain whether relevant differences arise when the two options are considered. It was
concluded that for small sample sets (as it is usual in analytical laboratories) the Student’s t-test based on
Keywords:
the standard error of regression models must be used and special attention must be paid on the equality
Comparison of slopes
Calibration
of the models variances. Finally, alternative approaches were reviewed, with emphasis on a simple one
Monte Carlo based on the analysis of the covariance (ANCOVA).
Analysis of covariance ã 2014 Elsevier B.V. All rights reserved.
Quality control

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00
2. Outline of the Student’s t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00
2.1. Comparison of two slopes by the Student’s t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00
3. Comparing texp and t1exp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00
3.1. Analytical comparison of texp and t1exp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00
3.2. Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00
3.3. Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00
4. Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00
4.1. Monte Carlo simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00
4.2. A practical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00
5. The use of dummy variables to compare two slopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00

* Corresponding author at: University of A Coruña, Analytical Chemistry, Campus da Zapateira, E-15008 A Coruña, Galicia, Spain. Tel.: +34 981167000; fax: +34 981167065.
E-mail address: [email protected] (J.M. Andrade).

http://dx.doi.org/10.1016/j.aca.2014.04.057
0003-2670/ ã 2014 Elsevier B.V. All rights reserved.

Please cite this article in press as: J.M. Andrade, M.G. Estévez-Pérez, Statistical comparison of the slopes of two regression lines: A tutorial, Anal.
Chim. Acta (2014), http://dx.doi.org/10.1016/j.aca.2014.04.057
G Model
ACA 233236 No. of Pages 12

2 J.M. Andrade, M.G. Estévez-Pérez / Analytica Chimica Acta xxx (2014) xxx–xxx

6. Some alternatives to the use of the t-test ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00


7. Conclusions . . . . . . . . . .............. .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00
Acknowledgements . . . .............. .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00
References . . . . . . . . . . .............. .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 00

Dr. Jose Andrade is a Professor of Analytical Dr. Graciela Estévez-Pérez is a Senior Lecturer of
Chemistry at the University of A Coruña (Spain) Statistics, Department of Mathematics, University of
since 2011. His interests are on multivariate data A Coruña since 1998. In 2001, she received her Ph.D.
analysis in the environmental and petrochemical at the University of Santiago de Compostela. Her
fields; FTIR applications to speed industrial quality research interests focus on Theoretical Statistics –
control and multivariate regression in atomic Analysis of Dependent Data and Nonparametric
spectrometry. He edited a book for the RSC Statistical Functional – with applications in scien-
introducing the principles of multivariate calibra- tific fields, as Geophysics and Biology. More recently
tion in atomic spectrometry and participated in she created some statistical packages for R Statisti-
dedicated training courses at several industrial cal Software and applied them to the treatment of
laboratories and various universities. Ecological data.

1. Introduction sciences interdisciplinary knowledge is not simple to acquire.


However, it is relevant for the analysts to understand why
Analytical chemists have to decide almost daily on the calculations are done in the way there are and, not less important,
statistical equality of the slopes of two regression straight lines to be aware of the limitations of the tests. Hence, it is expected that
(in short, calibration or, best, standardization [1] lines). This has the notes presented here are of help for researchers in analytical
many relevant practical implications in, for instance, quality laboratories in order to make them aware of the pitfalls that
control (to ascertain the stability of a device, to set a calibration common practices may undergone.
time delay, etc.), method comparison studies (to assess the Surveys made on January 2014 on the ISI Web of Science and
equivalence of two procedures) and method development (e.g., SCOPUS using the key word ‘slope’ into the title (it was assumed
to chose a suitable setup), to cite but a few. It is also relevant to that this would reflect the main issue within the paper, although
determine whether the standard additions method (SAM) should the limitation of this approach is acknowledged) yielded only
be considered the quantification procedure when measuring two papers in Talanta (not relevant for the discussions here
aliquots of unknown samples. This involves deciding on the because they dealt with biases of slopes and intercepts when
existence of a matrix effect when measuring the analyte in the final errors are present in both variables and with weighted least
aliquots obtained after a sample treatment procedure. If such an squares, without details on how the slopes are compared) and
effect is demonstrated, the method becomes slow and more Analytica Chimica Acta (although they include ‘slope comparison
expensive because of the need to perform a dedicated calibration method’ in the titles they do not focus on the issues considered
per sample, in case the sample treatment cannot be improved. here); one in TrAC (from 1989) and 10 in Analytical Chemistry (of
Matrix effects are ascertained currently by testing the statistical these, only one was related partially to the issue considered
equality of the slopes of aqueous- (let us call this the ‘direct here), one in Analytical and Bioanalytical Chemistry, none in
method’, DM, for the purposes of this work) and SAM-based Chemometrics and Intelligent Laboratoy Systems. Analusis
standardization lines. If they are not demonstrated to be offered similar results but for a series of papers dealing with
statistically different, the laboratory translates this to ‘equal the slope-ranking method to study non linearity, which is out of
slopes’ although this is not correct strictly because all we can do is the scope of the present discussion and a paper on method-
to accept or reject a null hypothesis (denoted as H0; H0: the comparison studies (where slopes are not compared). Opening
population slopes are statistically equal, which itself is never the range of possible solutions using key words like ‘calibration
assured). In case SAM can be avoided, the laboratory throughput lines slopes’, ‘slope comparison calibrations’, ‘regression lines
will improve and costs will be reduced. slopes’, ‘comparison two slopes’, and the like, led to some more
The main issue seems trivial at first glance because, roughly, we results although not as many as it could be thought of initially
are interested in comparing two means (each with its correspond- (much less papers became relevant after refining the results
ing standard error) and, so, we would resort immediately to the using the abstracts). Further searches in other scientific areas
well-known Student’s (William Sealey Gosset, 1876–1937) t-test. showed that, seemingly, the most active field in comparing slopes
However, this usual application is not risk-free. is the medical (including psychology) field. Some pertinent
Despite the huge importance this topic has for analytical studies amongst those mentioned above will be referred to in the
laboratories it was not broadly discussed in the last 13 years or so in last section of this work.
the analytical chemistry literature. Furthermore, it is our opinion The main objective of this paper is to discuss several problems
that the routine (trivialized) use of the Student’s test might suffer associated to the routine application of the classical and widely-
from a lack of training on fundamental issues upon which it is used Student’s t-test to compare the slopes of two regression lines.
rooted. The widespread use of computerized statistical modules It is organized so that the Student’s t-test to compare two means is
may not overcome such problem and, on the contrary may increase reviewed first. Next, its application to the regression problem is
it. The fundamentals and limitations behind the statistical tests discussed and a general overview of the inherent problems
analytical chemists need to apply are, of course, treated in most associated to its use is presented. Finally, some alternatives will be
statistical textbooks. Disappointingly, many times, the explan- presented as a way to ameliorate some of the problems reviewed in
ations are too specialized and focus on mathematical arguments, the previous sections, with special emphasis on the use of ANCOVA
and because of the different jargons and backgrounds of the two (analysis of the covariance) models.

Please cite this article in press as: J.M. Andrade, M.G. Estévez-Pérez, Statistical comparison of the slopes of two regression lines: A tutorial, Anal.
Chim. Acta (2014), http://dx.doi.org/10.1016/j.aca.2014.04.057
G Model
ACA 233236 No. of Pages 12

J.M. Andrade, M.G. Estévez-Pérez / Analytica Chimica Acta xxx (2014) xxx–xxx 3

2. Outline of the Student’s t-test In case the null hypothesis of the F-test cannot be rejected, the
equality of population variances is assumed. Following, since s21
A Student’s t-test is any statistical hypothesis test in which the and s22 are estimating the same quantity (s 2 ¼ s 21 ¼ s 22 ) in Eq. (1), it
test statistic follows a Student’s t distribution when the null is reasonable to combine them in a unique estimator, spool, which is
hypothesis is supported. Most t-test statistics can be formulated as a weighted average of the two variances (Eq. (2)). Finally, the t-test
texperimental ¼ ðu^  uÞ=SE, being u a population parameter, u ^ an statistic, whose expression can be seen in Eq. (3) (which can be
estimator of u and SE the standard error of the estimator (or, derived from Eq. (1), assuming that population variances are equal
equivalently, an estimation of the standard deviation of the and substituting this value by spool), follows a Student’s t
estimator) [2]. distribution with n1 + n2  2 degrees of freedom (dof) when the
The most frequently used t-test compares the means of two null hypothesis holds on.
independent populations. For this, the t-test studies whether the
ðn1  1Þs21 þ ðn2  1Þs22
difference between the population means is statistically zero or, s2pool ¼ (2)
stated in other words, whether the confidence interval associated ðn1 þ n2  2Þ
to the subtraction of the population means includes zero.
Assuming that the two data series come from normal populations ðx 1  x 2 Þ
texperimental ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi (3)
with means m1 and m2 and variances s 21 and s 22 , respectively, the t- spool 1=n1 þ 1=n2
test contrasts the hypothesis H0: m1 = m2 versus H1: m1 6¼ m2. The Welch tested the performance of this statistic and demonstrat-
analysis must be based only in statistics derived from the ed that when n1 = n2 the test is very robust even when the
experimental data; that is in the sample means x1 and x2 (more population variances are different, although ‘the greater the
specifically in their subtraction, D ¼ x 1  x 2 , which estimates the disparity between the n’s the more likely this factor to bias the test’
true difference between the means) and in their sample variances [6]. Further, the experimental statistic follows strictly a Student’s t
(which are estimated usually by s21 and s22 ). We know that only when either n1 = n2 or when the two variances are exactly the
D ð¼ x 1  x 2 Þ also follows a normal distribution whose true mean is same (s 21 ¼ s 22 ).
m1  m2 and its true variance is calculated as s 21 =n1 þ s 22 =n2 , being If the null hypothesis of the F-test can be rejected, so the
n1, n2 the sample sizes for each sample data. In symbols,
  variances are not statistically equal, there is not an exact solution to
D ¼ x 1  x 2  N m1  m2 ; s 21 =n1 þ s 22 =n2 . This yields the general the comparison of the two means (this is the so-called Fisher–
test posed by Welch [3] and depicted in Eq. (1). It is worth Behrens problem) [7–9]. According to Satterthwaite [10], ‘the exact
remembering that it is critical to verify that both series follow a distribution of a complex estimate of variance is too involved for
normal distribution (in regression analysis the series will be the everyday use. It is therefore proposed to use, as an approximation
residuals of the calibration lines), otherwise non-parametric to the exact distribution, a chi-square distribution in which the
statistics should be used. Unfortunately, many current applications number of degrees of freedom is chosen so as to provide good
lack this study. agreement between the two’. Such an approach was studied,
ðx 1  x 2 Þ  ðm1  m2 Þ among others, by Fisher and Behrens [6], Cochran, Cox and
texperimental ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (1) Edwards [11] and Welch and Satterthwaite themselves, by means
s 21 =n1 þ s 22 =n2 of a statistic derived from Eq. (4), which is distributed approxi-
In most textbooks the theoretical means m1 and m2 do not mately as a Student’s t. The dof must be approximated from the
appear in the previous equation because the null hypothesis variances and the number of experimental points using the Welch,
assumes that their difference is zero (this is a usual assumption but Smith-Satherthwaite’s rule [2,3], Eq. (5) below, an equivalent
it may be other value) and, thus, their subtraction is not written formulation [7–9,12] or, alternatively and more convenient,
explicitly. Here, we will maintain that criterion to simplify applying Eq. (5b) to approximate the critical t-value [11,13,14].
readability and to allow for a direct comparison with other texts. In Eq. (5b), t1 and t2 are the Student’s tabulated values for each
Note that ‘sample’ is used here not in a chemical sense (part of a series of data at a (100a)% significance level, two tails and ni  1,
physical material to be studied) but as a statistical term (a set of i = 1, 2 dof. If the number of data points of each series is the same
values extracted randomly from an overall population of values). then tcritical =t1 = t2.
Finally, recall that the denominator in Eq. (1) corresponds to the ðx 1  x 2 Þ
standard error of the difference, D. texperimental ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (4)
The problem now is that we have to estimate simultaneously s21 =n1 þ s22 =n2
the averages and the variances of two a priori unknown
populations, from a usually indeed very limited number of ½ðs21 =n1 Þ þ ðs22 =n2 Þ2
laboratory experimental data. Even worse, it is a well-known fact dof ¼ (5)
½ðs1 =n1 Þ =ðn1  1Þ þ ½ðs22 =n2 Þ2 =ðn2
2 2
 1Þ
that the true variance of a population is underestimated when
small sample sizes are used [4,5] (as it usually happens in
t1 s21 þ t2 s22
analytical laboratories due to time and resource constraints). tcritical ¼ (5b)
In order to proceed, we need to estimate as accurately as s21 þ s22
possible the standard error of the difference and, also very Later, Welch refined his earlier work and in a small appendix to a
important, the effective degrees of freedom. For this, common paper from Aspin [15] proposed a more accurate estimation of the
practice starts the application of the Student’s t-test (or Welch’s dof (Eq. (5c) below), so this test is also known as the Aspin–Welch
test) by a first preliminary screening of the similarity of the test.
population variances by the Fisher–Snedecor’s F-test. This is a
simple test that consists of dividing the largest by the lowest "  !#
1 c2 ð1  cÞ2
variance and comparing it to unity (this involves using one-tail ¼ þ with
dof n1  1 n2  1
tables to get the upper critical levels, as it is common). The null
hypothesis of the test is H0: the variances are statistically equal s21 =n1
c¼ ; and n1  n2 (5c)
(s 21 ¼ s 22 ), and the statistic test F ¼ s21 =s22 has a Fisher–Snedecor’s F ðs1 =n1 þ s22 =n2 Þ
2

distribution with n1 1 and n2  1 degrees of freedom (dof).

Please cite this article in press as: J.M. Andrade, M.G. Estévez-Pérez, Statistical comparison of the slopes of two regression lines: A tutorial, Anal.
Chim. Acta (2014), http://dx.doi.org/10.1016/j.aca.2014.04.057
G Model
ACA 233236 No. of Pages 12

4 J.M. Andrade, M.G. Estévez-Pérez / Analytica Chimica Acta xxx (2014) xxx–xxx

It is pedagogic to indicate that the Aspin–Welch test (Eq. (4)) variability, etc. The true model parameters (denoted by Greek
reduces to Eq. (3) when n1 = n2 and that it can be used also when the letters, a and b) are estimated from the experimental data
variances are statistically equal. Both tests, can be written as c  tf, fðxi ; yi Þgni¼1 by means of the classical unweighted least squares
where c is a constant (different for each test) and tf is a notation to method (LS) leading to the well-known estimators (denoted by
indicate a statistic whose distribution follows a Student’s t with f Roman letters): b ^ ¼ b ¼ Pn ððx  x Þðy  yÞÞ= Pn ðx  x Þ2 and
i¼1 i i i¼1 i
degrees of freedom (f is also different for each test). Major problem a
^ ¼ a ¼ y  bx. It must be underlined that ‘linear regression’, a
to estimate them is that c and f are functions of n1, n2 and the ratio of ubiquitous and ambiguous term used most frequently, does not
the variances. However, the classical test in Eq. (3) follows strictly a necessarily imply straight line regression (a cubic polynomial
Student’s distribution when the ratio of the variances is strictly one yields also a line).
(statistically equal), otherwise is severely biased; whereas the Unfortunately, LS does not estimate the distribution of the
second criterion (Eq. (4)) is much less liable to bias [16]. errors (the only supposition on the distribution of the errors in LS
As Welch indicated in his work [3], if it is known in advance that is that they have common variance, independent on the
s 1 = s 1, then there can be no doubt that the classical test (Eq. (3)) is concentrations [22]; i.e., they are homoscedastic). Further
a better criterion than the second (Eq. (4)). If, however, there exists hypotheses and assumptions are required to gather additional
the possibility that s 1 and s 2 differ and n1 6¼ n2, then Eq. (3) may information on the distribution of the parameter estimators and
lead to a biased test and it will be safer to use Eq. (4). Nevertheless, model residuals. Most notably, it is assumed that the analytical
in most cases it is possible to arrange by experimental work in the signals are normally distributed (recall that no error in x is
laboratory that n1 and n2 are equal or almost so, and hence, allowed in the classical LS method). Under this condition, the
practically, serious errors will not often occur. It is also worth estimators of the regression parameters are normally distributed:
underlining that Eq. (4) does not yield an exact test for small pffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi
a being Nða; ðs= nÞð 1 þ ðx 2 =s2x ÞÞand b being Nðb; s= nsx Þ,
samples [3]. Curiously, the problem stem from how Gosset
(Student) developed his famous test [17]. Influenced by Pearson where sx is the standard deviation of the x values. Also, the
(whom he collaborated with) he studied the distribution of the residual variance (often known as squared standard error of the
probabilities using the Pearson type III curves [18]. When studying regression), s2y=x , estimates s 2 (the variance of the regression, or
the problem of the comparison of two means with unequal variance of the model) from the experimental data and is related
population variances, the researchers took the same approach, for with a chi-squared distribution with n  2 dof. More comprehen-
convenience, although other possibilities may be possible as well, sive details and full mathematical treatments, which are out of the
like the use of the Student’s distribution itself (later on, the scope of this work, can be found elsewhere (e.g., [22–24]).
Pearson’s type VII curve) [16]. Something that Welch himself In the usual situation where the slopes of the regression lines
acknowledged when discussing on the choice of the criterion to be obtained by SAM and DM must be compared, two samples of data
used [3]. are considered fðxi1 ; yi1 Þgni¼1 1
and f(xj2 ; yj2 Þgnj¼1 2
, each one to estimate
Recall that both the t-test and the F-test take for granted that its corresponding regression line, y1 = a1 + b1x1 + e1 and y2 = a2 +
the ‘populations’ from which the means and the variances derive b2x2 + e2, respectively. The comparison of the slopes of two
are normally distributed. Here, ‘population’ refers to the residuals regression lines, i.e. the resolution of the null hypothesis test
of the experimental values (xexperimental  xmean) which are the H0 : b1 ¼ b2 ,b1  b2 ¼ 0, can be performed by means of a
values that, strictly speaking, must follow a normal distribution. Student’s t-test statistic similar to Eq. (3) and (4). In fact, note
However, normality is an assumption that could not be verified that the hypothesis of normality of the residuals in the
and, many times, is an add we impose to the data [6]. Indeed, some regression models guaranties that b1  b2 2 Nðb1  b2 ;
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
authors are somehow reluctant to the routine use of the F test due s 21 =n1 s2x1 þ s 22 =n1 s2x2 Þ, which yields the statistic
to the actual limited size of the samples handled in laboratories, qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
which makes it difficult to ascertain whether a truly normal Z ¼ ðb1  b2 Þ= s 21 =n1 s2x1 þ s 22 =n2 s2x2 (analogous to Eq. (1)) to
behavior occurs [19]. In addition, it has been recommended to compare the population slopes. Here, s 2i =ni s2xi ¼ varðbi Þ; i ¼ 1; 2
decide on the significance of the F test at 99% probability levels are the variances of the slopes (denoted by s2bi ); and s 2i ; i ¼ 1; 2;
instead of the classical 95% ones [20,21]. Substantial statistical denote the residual variances of the models.
research has also been conducted to evaluate the performance of Some time must be spent now on looking for a statistic (and its
the t-test when the normality assumption is violated. Such studies distribution) which is not a function of the population variances,
concluded that, for equal sample sizes, the test is highly robust because they cannot be known in advance and they cannot be
[2,3]. Fortunately, this condition is fulfilled usually in most routine estimated accurately. Historically, this has been made as when
comparisons of the SAM and DM methods, because it is a simple comparing two means, i.e., comparing the regression variances
and straightforward way of working (there might be a difference of (s 2i ; i ¼ 1; 2) using an F-test and generating two different statistics
at most one calibration solution, the unspiked sample). according to its result. However, a misunderstanding may arise
here as we have several variances at our disposal: those of the
2.1. Comparison of two slopes by the Student’s t-test slopes and those of the regression models. Usual practice is to
compare the latter [8,13,19,22,25], i.e., to compare the regression
Let us now recall briefly first some important hints of linear variances by means of F exp ¼ ðs2ðy=xÞ;large =s2ðy=xÞ;small Þ, which has an
regression. In analytical chemistry the linear (straight line-first order F n1 2;n2 2 distribution. Miller considered also this option although
polynomial degree) regression model attempts to describe a the equation he proposed for the t-test [26] was incorrect, likely
deterministic situation where an analytical signal (y) depends of a because his source (a book from Edwards [25]) employed an
constant background (intercept, a) and of an analytical parameter unusual nomenclature, and his more recent textbooks did not
(x, usually the concentrations of a series of calibration solutions) discuss this issue.
that causes such a signal throughout a multiplicative factor (which If the null hypothesis of the F-test cannot be rejected, the
is defined by a first order polynomial), the slope b; i.e., y = a + bx + e; regression variances, s 21 and s 22 , estimated by s2ðy=xÞ and s2ðy=xÞ , are
being e the error or disturbance term associated to that model. The 1
considered equal. Then, as for the means differences test discussed
2

latter will be a random term (with a normal distribution) originated above, we can pool the estimates of the error variances, weighting
from different causes; e.g., instrumental random noise, human each by their degrees of freedom (Eq. (6)). Consequently, the t-test
reproducibility, changes in the environmental conditions, material

Please cite this article in press as: J.M. Andrade, M.G. Estévez-Pérez, Statistical comparison of the slopes of two regression lines: A tutorial, Anal.
Chim. Acta (2014), http://dx.doi.org/10.1016/j.aca.2014.04.057
G Model
ACA 233236 No. of Pages 12

J.M. Andrade, M.G. Estévez-Pérez / Analytica Chimica Acta xxx (2014) xxx–xxx 5

Table 1
Conceptual similarities between the comparison of two mean values and the comparison of the slopes of two regression lines.

Means comparison Slopes comparison

Null hypothesis H0 m1 ¼ m2 H0 b1 ¼ b2

0 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !
Statistic and its s 21 s 22 A s 21 s 22
x1  x2 2 N@m1  m2 ; þ b 1  b 2 2 N b1  b2 ; P þP
distribution n1 n2 ðxi1  x1 Þ2 ðxj2  x2 Þ2
 m
 m
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
x1  x2 2 N m1  m2 ; varðx1 Þ þ varðx2 Þ b1  b2 2 N b1  b2 ; varðb1 Þ þ varðb2 Þ

being s 21 ; s 22 the variances of the random being s 21 ; s 22 the variances of the regression model in each group, i.e., the variances of the
variables in each dataset response variables in each dataset

F-test s21 sð2y=xÞ


F m1 m2 ¼ 2 F n1 1;n2 1 F b1 b2 ¼ 1
2 F n1 2;n2 2
s22 sð2y=xÞ
2

xp x2 with s2 ¼ ðn1 1Þs21 þðn2 1Þs22 ðn1 2Þs2ðy=xÞ1 þðn2 2Þs2ðy=xÞ2


1 ffiffiffiffiffiffiffiffiffi b1 b2
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
t-Test assuming equal tm1 m2 ¼ tb1 b2 ¼ with s2ðy=xÞ;pool ¼
spool 1
n1 þn2
1 pool n1 þn2 2 sðy=xÞ;pool P 1 2 þP 1 n1 þn2 4
variances ðxi1 x 1 Þ ðxj2 x 2 Þ2

statistic given by Eq. (6) follows a tn1 þn2 4 distribution when the demonstration will be given in next section. Table 1 shows the
null hypothesis is supported. analogy between the means comparison problem and the slopes
comparison one.
ðb1  b2 Þ
texp ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X X Finally, we have to consider the situation where the null
s2ðy=xÞ;pool ð1= ðxi;1  x1 Þ2 þ 1= ðxi;2  x2 Þ2 ÞÞ hypothesis of the F-test can be rejected. The residual variances of
the models are not equal and there is not an exact test to compare
ðn1  2Þs2ðy=xÞ1 þ ðn2  2Þs2ðy=xÞ2 the slopes as we face the same situation as when comparing two
with s2ðy=xÞ;pool ¼ (6) averages. In this case all authors consider the same equation (the
n1 þ n2  4
Welch’s test [3]) as a suitable approach, and Eq. (8) is used. The
On the contrary, other authors [20,27–29] considered the statistic texp follows a Student’s t-distribution with f dof, which can
standard error of the slopes instead of the standard error of be calculated using the expression shown in Eq. (8) or with the
regressions and proposed Eq. (7) alternatively.
  Some of them [20] equivalent to Eq. (5c) (which was applied less routinely), see
proposed also the statistic F 1exp ¼ s2blarge =s2bsmall to decide if s2b1 and Eq. (8b) [22]:(8b) [22]:
s2b2 can be pooled.
ðb1  b2 Þ ðb1  b2 Þ
ðb1  b2 Þ texp ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P P ffi
t1exp ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2b1 þ s2b2 ðs2 = ðxi1  x 1 Þ2 Þ þ ðs2 = ðxj2  x 2 Þ2 Þ
ðy=xÞ1 ðy=xÞ2
sb;pool ð1=n1 þ 1=n2 Þ
2
P 2 P
½ðs2ðy=xÞ1 = ðxi1  x 1 Þ Þ þ ðxj2  x 2 Þ2 Þ2
ðs2ðy=xÞ2 =
ðn1  2Þs2b1 þ ðn2  2Þs2b2 with f ¼ P P
with s2b;pool ¼ (7) ðs2ðy=xÞ1 = ðxi1  x 1 Þ2 Þ2 =n1 þ ðs2ðy=xÞ2 = ðxj2  x 2 Þ2 Þ2 =n2
n1 þ n2  4
These two ‘alternative’ equations need some further com- (8)
ments and, indeed, most of the paper is devoted to analyze the "  !#
t1exp statistic. With regards to F 1exp , it cannot be used because 1 c2 ð1  cÞ2
¼ þ
although the variance of the residuals (s2y=x ), squared standard error dof n1  2 n2  2
of the regression) has a distribution related with a chi-squared, the P
estimated variance of the slope (s2b ) does not (in fact, its s2ðy=xÞ1 = ðxi1  x 1 Þ2
with c ¼ P P ;
distribution is not known in advance and it is not possible to ½ðs2ðy=xÞ1 = ðxi1  x 1 Þ2 Þ þ ðs2ðy=xÞ2 = ðxj2  x 2 Þ2 Þ
assure  that it will follow  an F distribution). Note that
F exp ¼ s2ðy=xÞ;large =s2ðy=xÞ;small is the quotient between two and n1  n2 (8b)
chi-squared distributions and consequently has a Snedecor’s F As it happened when comparing the means, if s 1 = s 2 strictly
distribution, with the appropriate dof. the statistic in Eq. (6) has a better sensitivity than Eq. (8), i.e., any
With respect to the statistic in Eq. (7), it assumes that t1exp has a real difference b1  b2 will be detected more frequently by texp
Student’s t-distribution with n1 + n2  4 dof, under H0: b1 = b2. than by texp. However, if s 1 6¼ s2, n1 6¼ n2 and n1 s2x1 6¼ n2 s2x2 , both
However, we could not find detailed justifications for that tests based on either Eqs. (6) or (8) are biased but the bias of the
approach. We believe that this result, which is not mathematically former is higher than the bias of the latter. See [3] for a more
correct (as the next section will demonstrate), was obtained as a detailed, technical discussion. Despite the equations above seem
plain generalization of the Student’s t statistic – see Eqs. (2) and (3) hard to apply, they are not because calculations like, e.g.,
P
– to the case of comparing two slopes. Observe that Eq. (7) is s2ðy=xÞ = ðxi1  x1 Þ2 are obtained immediately from any spread-
1
obtained from Eqs. (2) and (3) replacing s21 and s22 by s2b1 and s2b2 , sheet or statistical package under the heading ‘squared standard
respectively. It is worth underlining that despite the t1exp statistic error of the slope’ ðs2b1 Þ.
appears more intuitive with the concepts learned when studying Eventual arguments ‘in favor’ of the t1exp statistic might be: (i) it
the Student’s t test, it does not follow a tn1 þn2 4 distribution when seems more intuitive (from a pedagogic perspective) and linked
H0 is true. with the concepts learned when studying the Student’s t-test; (ii)
The question of why the statistic in Eq. (7), based on the as sb is calculated dividing sy/x by the sum of squares in the
 
standard error of the slopes s2b , cannot be used to test H0: b1 = b2 abscissa, they are related immediately; (iii) the sentence ‘ . . . the
is addressed in simple, conceptual terms here and an empirical choice of the criterion must depend on what sort of departure from

Please cite this article in press as: J.M. Andrade, M.G. Estévez-Pérez, Statistical comparison of the slopes of two regression lines: A tutorial, Anal.
Chim. Acta (2014), http://dx.doi.org/10.1016/j.aca.2014.04.057
G Model
ACA 233236 No. of Pages 12

6 J.M. Andrade, M.G. Estévez-Pérez / Analytica Chimica Acta xxx (2014) xxx–xxx

the hypothesis under test we are most interested in detecting . . . ’ statistics (and so, between the tests) becomes. As an example,
[3] suggests that Eq. (7) arose as a way to stress that the analyst is consider two calibrations which differ in just a point although with
interested on comparing the slopes, regardless of whether a equal ssx values, then Eq. (9) yields a value of 5.5 for the ratio
calibration would happen to be more precise than another. In most between the two squared statistics; i.e., t1exp would be about 2.3
situations the analytical procedure is not changed, only the times texp and, accordingly, the contrast of H0 might conclude on
calibration stage (SAM vs. DM), and thus it was assumed that the different decisions.
standard errors of the regressions would be comparable. Under
3.2. Simulation study
this guess, it might have appeared reasonable to use s2b instead of
s2y=x to make the calculations.
The equations above show that the difference between the two
Arguments against the use of t1exp are: (i) t1exp does not follow a
statistics depend on the sample sizes but also on the distribution of
tn1 þn2 4 distribution when H0 is true, and therefore it does not
the calibration solutions (through the sum of squares of the
provide correct results when it is applied to the comparison of
explanatory variables, ssx) and on the standard errors of the slopes.
slopes, as next section will demonstrate. (ii) Eq. (6) is supported by
A complete simulation study with two different objectives has
strict developments (despite some approximations are required)
been undergone. On one hand, the behavior of the tests has been
whereas Eq. (7) seems a plain analogy to the original Students’ t-
compared in several situations, assuming that texp and t1exp have a
test (as commented above). (iii) The classical t-test used to
tn1 þn2 4 distribution when the null hypothesis (b1 = b2) is
compare a mean against a given value and/or another mean
supported. On the other hand, the sample distribution of t1exp ,
considers the variance(s) of the random variable(s) which is(are)
under H0: b1 = b2, has been contrasted with tn1 þn2 4 using the
under scrutinity. In regression analysis the random variables are
Kolmogorov–Smirnov’s test.
the response variables (y), whose variances yield the variance of
To evaluate the performance of texp and t1exp in the different
the models (i.e., s2y=x ), and therefore they are the variances to be
situations which could emerge in laboratory practice, three
used, as the slope is not a variable but a parameter (a constant).
scenarios were simulated: (A) coincident regression lines, (B)
3. Comparing texp and t1exp coincident slopes and (C) different slopes. Several true values for
the parameters of the models ai, bi and s were considered. In
This section is dedicated to compare the statistics given in Eqs. addition, in each scenario, three situations for the explanatory
(6) and (7) that motivated the discussion in this paper. An variables were considered, see Table 2 for details. The size of the
analytical comparison of both statistics will be done first to test values used to make the simulations (samples sizes, slopes,
whether they are equivalent. Then, their behavior will be shown intercepts and variances of the regressions) corresponded to
through a complete simulation study. particular atomic spectrometry calibrations carried out in our
laboratory, but this does not reduce the validity of the conclusions
3.1. Analytical comparison of texp and t1exp
explained hereinafter. For each scenario, each situation, and each
combination of parameters, 1000 pairs of regression lines were
A simple way to compare texp and t1exp is by means of their
generated considering the constraints given in Table 2. Then
quotient squared because both expressions have the same
for each pair of straight lines texp and t1exp were calculated.
numerator and their denominators squared are simpler to manage.
The average of the p-values over the R = 1000 trials and the
If they yield statistically equal results the ratio is expected to be
acceptance proportions (i.e., the percentage of times, out of 1000,
around one. The ratio, after some algebraic treatment, is given in
in which the null hypothesis was accepted) at the 5% level were
Eq. (9):
computed.
ðt1exp Þ2 ðb1  b2 Þ2 =s2b;pool ð1=n1 þ 1=n2 Þ In addition, the distribution of t1exp (under the assumption H0:
¼ b1 = b2) was ascertained. For this, we considered the 1000
t2exp ðb1  b2 Þ2 =s2ðy=xÞ;pool ð1=n1 s2x1 þ 1=n2 s2x2 Þ
replicates of t1exp statistic obtained for a particular scenario and

situation, achieving in this way a random sample to estimate the
ððn1  2Þs2b1 ssx1 þ ðn2  2Þs2b2 ssx2 Þ=ðn1 þ n2  4Þ ð1=ssx1 þ 1=ssx2 Þ
¼
t1exp distribution. Next, the Kolmogorov–Smirnov’s test for good-
ððn1  2Þs2b1 þ ðn2  2Þs2b2 Þ=ðn1 þ n2  4Þ ð1=n1 þ 1=n2 Þ
ness of fit was used to comparethe distribution of t1exp with a
!
n1 n2 ðn1  2Þs2b1 ðssx1 =ssx2 Þ þ ðn2  2Þs2b2 ðssx2 =ssx1 Þ t-distribution with n1 + n2  4 dof H0 : t1exp 2 tn1 þn2 4 , generating
¼ 1þ ; (9) a p-value and a decision at the 5% of significance. This process was
n1 þ n2 ðn1  2Þs2b1 þ ðn2  2Þs2b2
repeated RKS = 100 times (RKS means repeats of the Kolmogorov–
Smirnov’s test), that is, for each scenario and situation 100
X X
being ssx1 ¼ ðxi;1  x1 Þ2 and ssx2 ¼ ðxi;2  x2 Þ2 the squared independent random samples (replicates) of the t1exp distribution
sums of the explanatory variables x1 and x2, respectively. (each of size 1000) were obtained and, for each replicate the
As the second addend of Eq. (9) is non negative, then Kolmogorov–Smirnov’s test was applied. The average of the
ðt1exp =texp Þ2 > n1 n2 =ðn1 þ n2 Þ. In addition, when n1, n2
3, situation p-values over the RKS = 100 replicates and the acceptance
that we will always find in practice, the previous quotient is greater proportions at the 5% significance level were computed.
than 1. That is, the alternative statistic (t1exp , Eq. (7)) will always
3.3. Software
take values greater than texp , causing the test based on t1exp to be
less conservative than the latter. This means that t1exp would reject
The simulation experiments were carried out using in-house
H0: b1 = b2 for slopes that, in reality, are statistically equal and then
routines written in the R language (R Development Core Team
it would lead to more false positives.
2013) [30], which are available upon request.
In laboratories it is usual to set the two sample sizes equal
 
(n1 = n2 = n) and, so, ðt1exp =texp Þ2 ¼ ðn=2Þð1 þ ðs2b1 ssx1 =ssx2 þ 4. Results and discussion
s2b2 ðssx2 =ssx1 ÞÞ=ðs2b1 þ s2b2 ÞÞ > n=2: Even, if the squared sums of the
explanatory variables were equal (ssx1 ¼ ssx2 ), the ratio would 4.1. Monte Carlo simulations
result in ðt1exp =texp Þ2 ¼ n. This is possible when the sample points of
the two regressions (x1 and x2) are the same. Therefore, the larger The first part of the simulation study compared the criteria
the sample sizes are, the larger the difference between the defined by Eqs. (6) and (7). In current laboratory practices where

Please cite this article in press as: J.M. Andrade, M.G. Estévez-Pérez, Statistical comparison of the slopes of two regression lines: A tutorial, Anal.
Chim. Acta (2014), http://dx.doi.org/10.1016/j.aca.2014.04.057
G Model
ACA 233236 No. of Pages 12

J.M. Andrade, M.G. Estévez-Pérez / Analytica Chimica Acta xxx (2014) xxx–xxx 7

Table 2
Different scenarios, situations and parameters considered in the simulation study. a = ordinate, b = slope, s = variance of the model, [I,L] = interval were the abscisas take
values. Observe that the superscript ‘2’ does not mean ‘square’ but ‘calibration number 2’.

Scenario (A): coincident regression Scenario (B): coincident Scenario (C): different
lines slopes slopes

Situation X1: explanatory variables take the same values a1 ¼ a2 ¼ 0:15 a1 ¼ 0:15; a2 ¼ 0:0 a1 ¼ 0:15; a2 ¼ 0:0
b1 ¼ b2 ¼ 0:1 b1 ¼ b2 ¼ 0:1 b1 ¼ 0:1; b2 ¼ 0:06
x1i ¼ x2j ; i ¼ 1; :::; n1 n1 ¼ n2 2 f3; 5; 7; 9g n1 ¼ n2 2 f3; 5; 7; 9g n1 ¼ n2 2 f3; 5; 7; 9g
j ¼ 1; :::; n2 s 2 f0:01; 0:05; 0:1g s 2 f0:01; 0:05; 0:1g s 2 f0:01; 0:05; 0:1g
½I1 ; L1  ¼ ½I2 ; L2  ¼ ½0; 5 ½I1 ; L1  ¼ ½I2 ; L2  ¼ ½0; 5 ½I1 ; L1  ¼ ½I2 ; L2  ¼ ½0; 5

Situation X2: explanatory variables take different values on the same a1 ¼ a2 ¼ 0:15 a1 ¼ 0:15; a2 ¼ 0:0 a1 ¼ 0:15; a2 ¼ 0:0
b1 ¼ b2 ¼ 0:1 b1 ¼ b2 ¼ 0:1 b1 ¼ 0:1; b2 ¼ 0:06
x1i6¼ x2j ; i
¼ 1; :::; n1 n1 ; n2 2 f3; 5; 7; 9g n1 ; n2 2 f3; 5; 7; 9g n1 ; n2 2 f3; 5; 7; 9g
interval
j ¼ 1; :::; n2 s 2 f0:01; 0:05; 0:1g s 2 f0:01; 0:05; 0:1g s 2 f0:01; 0:05; 0:1g
½I1 ; L1  ¼ ½I2 ; L2  ¼ ½0; 5 ½I1 ; L1  ¼ ½I2 ; L2  ¼ ½0; 5 ½I1 ; L1  ¼ ½I2 ; L2  ¼ ½0; 5

Situation X3: explanatory variables take different values on different a1 ¼ a2 ¼ 0:15 a1 ¼ 0:15; a2 ¼ 0:0 a1 ¼ 0:15; a2 ¼ 0:0
intervals b1 ¼ b2 ¼ 0:1 b1 ¼ b2 ¼ 0:1 b1 ¼ 0:1; b2 ¼ 0:06
n1 ; n2 2 f3; 5; 7; 9g n1 ; n2 2 f3; 5; 7; 9g n1 ; n2 2 f3; 5; 7; 9g
s 2 f0:01; 0:05; 0:1g s 2 f0:01; 0:05; 0:1g s 2 f0:01; 0:05; 0:1g
x1i 6¼ x2j ; i ¼ 1; :::; n1 ½I1 ; L1  ¼ ½0; 5 ½I1 ; L1  ¼ ½0; 5 ½I1 ; L1  ¼ ½0; 5
j ¼ 1; :::; n2 ½I2 ; L2  ¼ ½0; 10 ½I2 ; L2  ¼ ½0; 10 ½I2 ; L2  ¼ ½0; 10

the slopes of two calibration procedures are to be compared the Nevertheless, such ‘superiority’ is false because the distribution of
most common situations are those denoted as X1 and X2 (see t1exp was observed not to follow a Student’s t-distribution with
Table 1) and, so, only the results corresponding to them will be n1 + n2  4 dof as it was assumed in the first part of the simulation
shown. However, simulations were made for all the situations and study. This will be presented in more detail below.
the conclusions agreed with those presented next. The average p- To study whether the alternative statistic follows a t distribu-
values over the R = 1000 trials and the acceptance proportions at tion, the analysis of the distribution of t1exp , under the assumption
the 5% level results derived from the Monte Carlo studies for the H0: b1 = b2, has been carried out only for scenario A, for each of the
statistics given in Eqs. (6) and (7) are summarized in Tables 3 and 4, three situations X1, X2 and X3 (Table 2). Note that the distributions
for situations X1 and X2 respectively. We will focus our of texp and t1exp in scenarios A and B are identical because they are
explanations on the X1 situation because the results and independent of a1 and a2 (intercepts of the regression models).
conclusions for the other two situations (X2 and X3) were very Moreover, it is not possible to study the distribution of statistics
similar and, therefore, they will not be repeated for brevity. under H0 in scenario C because such hypothesis is not supported
Inspection of Tables 3 and 4 reveals that the test given by Eq. (6) (the slopes are different by design). As for the previous part of the
provides very satisfactory results in scenarios A and B (where the simulation study only the results obtained for situation X1 are
slopes are equal), even when noise has large variance. In all cases, presented.
the average p-values move around 0.5 (please, bear in mind that In this part of the study, the Kolmogorov–Smirnov’s test was
this amount is the average value of the p-value in any hypothesis used to compare the distributions of both texp and t1exp with a
test where H0 is true), and the empirical acceptance probabilities t-distribution, with n1 + n2  4 dof. Recall that this hypothesis is the
are close to the nominal one (0.95). It is also worth mentioning that basis of the Student’s t-test to compare the slopes of two regression
this fact does not depend on the sample sizes (see Tables 3 and 4, lines (see Section 2.1). The average of the p-values throughout the
different n values). In addition, we considered simulations with 100 RKS trials and the obtained acceptance proportions at the 5%
larger sample sizes (e.g., n = 10, 20, 50, despite they are not significance level were computed. In all cases, the test based on
included into the paper for brevity), and the results were totally Eq. (6) gave average p-values around 0.5 and empirical acceptance
similar to those shown above. probabilities close to the nominal one (0.95). However, for the test
On the contrary, the behavior of the test given by Eq. (7) is very based on Eq. (7) it was found that both the average p-values and the
poor and it worsened when n increased. It can be seen that the empirical acceptance probabilities were equal to 0 in all trials.
empirical acceptance probabilities dropped to levels close to 0.5 Fig. 1 shows the experimentally estimated distributions of texp and
(instead of being around 0.95). t1exp versus the t-distribution with n1 + n2  4 dof for one of the 100
In scenario C, where the slopes are truly different by design, the trials and three different possibilities for the variances of the
p-values and the empirical acceptance probabilities are close to statistics under comparison. As it can be seen in these graphs, the
zero. This means that the null hypothesis is rejected; i.e., the slopes distribution of the texp statistic fits well the t-distribution (with
are found to be different, which in this case is correct, except for the n1 + n2  4 dof), for each value of n1, n2 and s . However, the
cases where noise had large variances (sigma = 0.1), because in distribution of the t1exp statistic clearly departures from the
such cases it is hard to distinguish the regressions. The numerical t-distribution, being the difference much larger when n1, n2 and
study was carried out also with sample sizes somewhat larger s increase.
(n = 10, 20, 50) and the results showed the consistency of the tests Additional simulation studies were carried out and similar
because when n is large, both tests are able to differentiate the conclusions were obtained in all cases. Besides, the biased
slopes (i.e., they improve their performance). character of the statistic given by Eq. (6) was observed throughout
It is worth clarifying that because t1exp will always be greater all the simulations, as Welch pointed out [3]. Indeed, if the
than texp (as pointed out in Section 3.2) the p-values and the condition s 1 = s 2 is not satisfied strictly the test in Eq. (6) is biased
acceptance probabilities of H0 will be lower for the former statistic. when either the sample sizes or the sums of squares of the
This might suggest, in scenario C, a superiority of t1exp versus texp . explanatory variables are different.

Please cite this article in press as: J.M. Andrade, M.G. Estévez-Pérez, Statistical comparison of the slopes of two regression lines: A tutorial, Anal.
Chim. Acta (2014), http://dx.doi.org/10.1016/j.aca.2014.04.057
8

ACA 233236 No. of Pages 12


G Model
Chim. Acta (2014), http://dx.doi.org/10.1016/j.aca.2014.04.057
Please cite this article in press as: J.M. Andrade, M.G. Estévez-Pérez, Statistical comparison of the slopes of two regression lines: A tutorial, Anal.

Table 3
Average p-values (‘p-value’) and acceptance proportions at nominal level 0.05 (‘Accept 0.05’) over 1000 trials for situation X1, ‘sigma’ is the variance of the model (see Table 2).

Scenario A n=3 n=5 n=7 n=9


Statistic Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1
texp p-value 0.499 0.508 0.494 0.490 0.497 0.514 0.503 0.499 0.481 0.507 0.499 0.488
Accept 0.05 0.943 0.963 0.962 0.952 0.944 0.949 0.961 0.957 0.952 0.954 0.949 0.941
t0 exp p-value 0.371 0.372 0.357 0.268 0.279 0.298 0.237 0.235 0.222 0.212 0.211 0.194
Accept 0.05 0.853 0.875 0.880 0.683 0.697 0.701 0.588 0.585 0.552 0.531 0.517 0.515

Scenario B n=3 n=5 n=7 n=9


Statistic Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1
texp p-value 0.499 0.509 0.494 0.490 0.497 0.514 0.503 0.499 0.481 0.507 0.499 0.488
Accept 0.05 0.943 0.963 0.962 0.952 0.944 0.949 0.961 0.957 0.952 0.954 0.949 0.941
t0 exp p-value 0.371 0.372 0.357 0.268 0.279 0.298 0.237 0.235 0.222 0.212 0.211 0.194
Accept 0.05 0.853 0.875 0.880 0.683 0.697 0.701 0.588 0.585 0.552 0.531 0.517 0.515

J.M. Andrade, M.G. Estévez-Pérez / Analytica Chimica Acta xxx (2014) xxx–xxx
Scenario C n=3 n=5 n=7 n=9
Statistic Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.01 Sigma = 0.01 sigma = 0.05 Sigma = 0.1
texp p-value 0.018 0.224 0.401 0.000 0.138 0.340 0.000 0.092 0.308 0.000 0.066 0.293
Accept 0.05 0.017 0.811 0.905 0.000 0.544 0.832 0.000 0.386 0.765 0.000 0.292 0.759
t0 exp p-value 0.004 0.129 0.279 0.000 0.040 0.164 0.000 0.015 0.117 0.000 0.008 0.097
Accept 0.05 0.000 0.530 0.770 0.000 0.135 0.436 0.000 0.052 0.322 0.000 0.021 0.245

Table 4
Average p-values (‘p-value’) and acceptance proportions at nominal level 0.05 (‘Accept 0.05’) over 1000 trials for situation X2, ‘sigma’ is the variance of the model (see Table 2).

Scenario A n=3 n=5 n=7 n=9


Statistic Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1
texp p-value 0.503 0.506 0.505 0.506 0.517 0.493 0.510 0.504 0.493 0.513 0.504 0.506
Accept 0.05 0.949 0.955 0.960 0.951 0.952 0.953 0.951 0.946 0.939 0.950 0.949 0.942
0
t exp p-value 0.370 0.372 0.367 0.292 0.297 0.276 0.252 0.242 0.234 0.219 0.221 0.215
Accept 0.05 0.870 0.882 0.890 0.679 0.711 0.669 0.592 0.586 0.569 0.552 0.513 0.524

Scenario B n=3 n=5 n=7 n=9


Statistic Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1
texp p-value 0.503 0.506 0.505 0.506 0.517 0.943 0.510 0.504 0.493 0.513 0.504 0.506
Accept 0.05 0.949 0.955 0.960 0.951 0.952 0.953 0.951 0.946 0.939 0.950 0.949 0.942
0
t exp p-value 0.370 0.372 0.367 0.292 0.297 0.276 0.252 0.242 0.234 0.219 0.221 0.215
Accept 0.05 0.870 0.882 0.890 0.679 0.711 0.669 0.592 0.586 0.569 0.552 0.513 0.524

Scenario C n=3 n=5 n=7 n=9


Statistic Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1 Sigma = 0.01 Sigma = 0.05 Sigma = 0.1
texp p-value 0.010 0.227 0.394 0.001 0.134 0.353 0.000 0.087 0.302 0.000 0.063 0.280
Accept 0.05 0.007 0.805 0.921 0.000 0.533 0.838 0.000 0.396 0.785 0.000 0.288 0.755
0
t exp p-value 0.003 0.130 0.270 0.000 0.038 0.172 0.000 0.016 0.114 0.000 0.009 0.083
Accept 0.05 0.000 0.535 0.778 0.000 0.123 0.465 0.000 0.065 0.301 0.000 0.019 0.229
G Model
ACA 233236 No. of Pages 12

J.M. Andrade, M.G. Estévez-Pérez / Analytica Chimica Acta xxx (2014) xxx–xxx 9

Fig. 1. Estimated density of t1exp and texp versus the t-distribution with n1 + n2  4 dof for one of the 100 trials and three situations: (a) small variance (s = 0.01),
(b) medium variance (s = 0.05) and (c) large variance (s = 0.1).

4.2. A practical example


rejected (i.e., the variances of the two fits, direct calibration and
Finally, as a matter of example, two examples are shown standard additions, were not different), regardless of the signifi-
(Table 5) where texp and t1exp were calculated and yield different cance level being considered (5% or 1%). Finally Eqs. (6) and (7)
conclusions. Example 1 is about studying whether electrothermal were applied. It was found that the calculated t-statistics
atomic absorption spectrometry (ETAAS) required a SAM stan- (alternatively, t1exp ) were higher than the tabulated values (the
dardization methodology, instead of the DM one [31]. Example 2 p-values were lower than 0.05 or 0.01, one tail, 5% or 1%,
was intended to check whether an ionization suppressor improved respectively) and, therefore, the null hypothesis could be rejected
some flame atomic absorption spectrometry (FAAS) measure- (the slopes were not comparable). The exception occurred for
ments. In both cases, the basic statistics of the first-order example 2, where texp and t1exp led to different conclusions (the
polynomial straight line were obtained first (this can be done latter being incorrect, as discussed above).
straightforwardly using common spreadsheets). Next the F-test
was applied to compare the squared standard errors (s2y=x ) of the 5. The use of dummy variables to compare two slopes
two regressions in each example, the experimental values were
clearly lower than the tabulated values (example 1: Fexp = 1.45; The Student’s t-test can only be used to compare the slopes of
example 2: Fexp = 1.12) and so the null hypothesis could not be two regression lines. When more than two slopes need to be

Please cite this article in press as: J.M. Andrade, M.G. Estévez-Pérez, Statistical comparison of the slopes of two regression lines: A tutorial, Anal.
Chim. Acta (2014), http://dx.doi.org/10.1016/j.aca.2014.04.057
G Model
ACA 233236 No. of Pages 12

10 J.M. Andrade, M.G. Estévez-Pérez / Analytica Chimica Acta xxx (2014) xxx–xxx

Table 5
Two examples where application of the alternative t-test formulations yielded different decisions when using the direct calibration method (DM) and the standard additions
one (SAM) (sb = standard error of the slope, sa = standard error of the ordinate, r = correlation coefficient).

Example 1 Example 2

Concentration (mg Cu L1) DM SAM Concentration (mg L1) DM SAM


0.00 0.003 0.109 0.00 0.010 0.105
10.0 0.061 0.152 1.00 0.100 0.185
20.0 0.123 0.191 2.20 0.204 0.295
30.0 0.170 0.242 3.10 0.283 0.367
40.0 0.215 0.295 3.90 0.335 0.418
50.0 0.272 0.321 5.00 0.450 0.525
6.00 0.535 0.605
Slope sb 0.0053 0.0001 0.0044 0.0002 Slope sb 0.0869 0.0014 0.0832 0.0014
Intercept sa 0.0082 0.0043 0.1083 0.0051 Intercept sa 0.0106 0.0050 0.1050 0.0046
Sy/x 0.0059 0.0071 Sy/x 0.0073 0.0069
r 0.9986 0.9971 r 0.9993 0.9994
texp 4.07 (p-value = 3.5  103) texp 1.96 (p-value = 7.6  102)

t1exp 9.97 (p-value = 8.7  106) t1exp 5.18 (p-value = 4.0  104)

compared the Analysis of Covariance procedure (ANCOVA) is To check that the residual variance is the same in the two
required. This is the analogous situation to the comparison of regressions, a common F-test statistic must be applied; if more
several averages, which needs the ANOVA test (ANalysis Of than two regressions are to be compared, other tests have to be
VAriance). Surprisingly, this old approach [23] has not been applied, like those of Levene, Bartlett or Cochran [4,5].
applied frequently in analytical chemistry, although it was The analysis of the model in Eq. (10) requires studying at least
revisited recently [22] and it is implemented in some commercial two hypotheses:
software. Good news is that if dedicated statistical commercial
software is not available, the procedure can be programmed 1. Test for coincidence. The exact coincidence of the two
straightforwardly in any popular spreadsheet. In addition, it allows regression lines can be evaluated testing the null hypothesis
us to circumvent elegantly the discussion on which variances must H0: b2 = b3 = 0 (this is a joint test for the two coefficients, with
be used to carry out the t statistic above. the corresponding theoretical values for the coefficients equal
ANCOVA tests H0 : b1 ¼ b2 ¼ ::: ¼ bk with the alternative to zero, which simplifies the calculation quite a lot). If H0
hypothesis that the regression lines were not derived from cannot be rejected it results that the ordinates and the slopes
populations with equal slopes. It can also be used to compare are statistically equal and, so, both regression lines are
only two standardization lines (H0: b1 = b2 versus H1: b1 6¼ b2). In identical. If H0 is rejected, we must continue with a test for
this case it results on an F-test value that is equal to t-test squared. parallelism, if the interest of the analysis is equality of slopes,
Thus the t-test and the ANCOVA will always return the same or a test for equality of intercepts, if the interest of the analysis
decision when there are two lines, that is, the t-test becomes a is on comparing the intercepts.
special case of ANCOVA. Furthermore, ANCOVA also tests whether 2. Test for parallelism. This is the issue we need to address, in
the intercepts are equal for those regression lines which come from general. It consists on testing the significance of the interaction
populations with equal slopes. term xz in the model (Eq. (10)) and, thus, we must test now the
Here we present a user-oriented approach for the particular null hypothesis H0: b3 = 0. If such hypothesis cannot be
case of two regression lines. More technical details and how to rejected, the slopes agree for the two lines; i.e., they are
compare several regression lines can be found elsewhere [23,32]. parallel but the intercepts differ.
To test the hypothesis that two slopes are equal, that is, the two
regression lines are parallel, we need to calculate a unique In practical terms, to compare both the coincidence and
statistical model which describes the relationship between the parallelism of the two regression lines the significance of the
analytical signal (y) and its (major) cause (i.e., the analyte or the parameters using appropriate ANOVA tables or t-tests for
property being studied, x) for the two regression lines. The significance must be tested. An example of how to proceed is
following model will just do that: shown in Table 6. It is about comparing two methods in ETAAS,
with and without a mineralisation step. From the experimental
y ¼ b0 þ b1 x þ b2 z þ b3 xz þ e (10)
results, the dummy and cross-product variables are calculated
There, e is the error or disturbance term associated to that model following the multiplication rules, and the multivariate regression
and z is an ‘independent’ dummy variable. This takes the value of obtained.
zero for the data points associated to a regression (e.g., the SAM The LS fit for the overall data (Eq. (10)) is y = 0.22851
standardization) and the value of 1 for the other data points (e.g., ( 0.00497) + 0.00437 ( 0.00016)x + 0.02877 ( 0.00702)
the DM regression). When z = 0 only one of the calibrations is z + 0.00069 ( 0.00023)z where the indicates the standard error
considered, say, y = b0 + b1x + e; whereas when z = 1 the regression associated to each coefficient. The Student’s t-test for each
will be y = (b0 + b2) + (b1 + b3)x+ e. coefficient is given in Table 6. The coefficient of the interaction
ANCOVA is intended for situations where the variances of the x  z cannot be considered as zero and, therefore, the lines are not
analytical signal (y) are independent on the concentrations parallel. In addition, they do not have the same ordinate, as the
(homoscedasticity), and the errors are normal and independent. coefficient multiplying the dummy variable revealed.

Please cite this article in press as: J.M. Andrade, M.G. Estévez-Pérez, Statistical comparison of the slopes of two regression lines: A tutorial, Anal.
Chim. Acta (2014), http://dx.doi.org/10.1016/j.aca.2014.04.057
G Model
ACA 233236 No. of Pages 12

J.M. Andrade, M.G. Estévez-Pérez / Analytica Chimica Acta xxx (2014) xxx–xxx 11

Table 6
Example illustrating the use of a dummy variable to compare the coefficients of two regression lines.

(y) (x) (z) xz


Absorbance Concentration Dummy for codification

With mineralization 0.227 0 0 0


0.268 10 0 0
0.317 20 0 0
0.366 30 0 0
0.411 40 0 0
0.437 50 0 0

Without 0.252 0 1 0
mineralization 0.311 10 1 10
0.360 20 1 20
0.412 30 1 30
0.467 40 1 40
0.502 50 1 50

Student’s t-test for the coefficients:


Intercept x z xz
texperimental 45.97 26.63 4.09 3.01
ttabulated(95%, 8, 2 tails) 2.31
Conclusion Reject H0 Reject H0
Not same Not parallel
ordinate

Despite ANCOVA constitutes a pragmatic alternative to the Two serious problems in ANCOVA are lack of homoscedasticity
classical t-test, it is not problem-free. Indeed, its conventional use (whose occurrence might impede the use of the LS criterion) and
is to investigate whether the (two) population regression lines lack of normality (which would invalidate the use of the t-test).
coincide, given the lines are parallel [33] . . . which is exactly what Three types of heteroscedasticity may occur in Eq. (10): (i) e
analytical chemists want to test for(!). Thus, our common use of depends on z but not on x (the variances of the two regressions are
ANCOVA as a ‘test’ to check merely the slopes is somewhat ‘rough’ different), (ii) e depends on x but not on z, and (iii) e depends on
and, indeed, we are interested only in the part of ANCOVA that both. According to Ng and Wilcox [35] there are two general
evaluates whether the difference of the slopes is statistically zero. strategies to handle violations of assumptions in regression
Although ANCOVA checks also the equality of the ordinates this is analysis: to replace the t statistic with a robust alternative and
rarely of interest when the SAM and the DM procedures are to avoid the use of LS by applying robust regression. The former can
compared. be addressed by (among other possibilities) quasi t-tests based on a
heteroscedastic consistent (HC) covariance matrix, out of which
6. Some alternatives to the use of the t-test the so-called HC4 outstands (see [35] for more details). The latter
option includes bootstrap procedures, although it was found that a
Crawford and Garthwaite [34] presented a tiered approach by combination of HC4 and bootstrap yielded best performances for
which the slope of a regression line can be compared against a suite relatively low samples (20, which is still too high for current
of other control regression lines. The idea was to develop a laboratory practices).
modified independent-samples t-test in which the line to be Bootstrap, conceptually, is a method to obtain information
compared does not contribute to the estimate of the within-group about an unknown population by examining a small subset of its
variance of the other lines. This study is very appealing for quality objects (i.e., statistical sample or, just, ‘sample’). When the number
control applications as it opens up the possibility to test a of objects is very reduced (as it is usually the case in laboratories)
calibration obtained at a time to a historical series or to compare non-parametric bootstrap offers relevant advantages over other
the slope of a performance qualification test against reference or parametric approaches, one of them that the probability of the
historical values. A simple t-like test is selected after tiered distribution function of the population has not to be assumed in
comparisons among the variances of the regression line to be advance. In bootstrap the data are resampled with replacement
tested and the control lines. many times in order to generate empirical estimates of the
As cited above, a practical limitation of ANCOVA is that it starts statistics and use them to make inferences about the populations
from considering that the regression lines are parallel. Rogosa [33] [36,37]. A complete discussion on bootstrapping is out of the scope
formulated an alternative ANCOVA which does not require that of this paper (the references given in the previous sentence can
initial assumption. The procedure focuses on calculating the introduce this issue). Here, we will only mention that boot-
difference between the sample regression lines at any value of the strapping is of particular usefulness when: (i) the theoretical
explanatory variable (x). This alternative seems of little use for distribution of a statistic of interest is complicated or unknown
routine applications in laboratories because we are usually not (bootstrapping is distribution-independent and it can be used to
interested in the average difference between the regressions evaluate the properties of a distribution underlying the sample);
(although it is true that a constant difference is a good indication of and (ii) the sample size is insufficient to get a straightforward and
parallel lines, Rogosa’s method would constitute a too elaborated sound statistical inference (this situation is very common in
option to get that information). In addition, the procedure is not analytical chemistry and, therefore, it is required to estimate
optimal when the coefficient b3 in Eq. (10) is exactly zero (i.e., H0: statistics that are not distorted by the specific values derived from a
b3 = 0). given study).

Please cite this article in press as: J.M. Andrade, M.G. Estévez-Pérez, Statistical comparison of the slopes of two regression lines: A tutorial, Anal.
Chim. Acta (2014), http://dx.doi.org/10.1016/j.aca.2014.04.057
G Model
ACA 233236 No. of Pages 12

12 J.M. Andrade, M.G. Estévez-Pérez / Analytica Chimica Acta xxx (2014) xxx–xxx

That the problem of comparing two slopes is far from being in: J.M. Andrade-Garda (Ed.), Basic Chemometric Techniques in Atomic
Spectroscopy, 2nd ed., RSC, Cambridge, 2013.
addressed fully is clear because ‘there is no way of being [2] J.S. Milton, Statistical Methods in the Biological and Health Sciences, 3rd ed.,
reasonably certain that standard methods for comparing slopes McGraw Hill, Boston, 1999.
yield accurate probability coverage’ [38]. In particular, because [3] B.L. Welch, The significance of the difference between two means when the
population variances are unequal, Biometrika 29 (1938) 350–362.
standard methods rely on the assumption of variance homogeneity [4] C. Mongay Fernández, Quimiometría, PUV, Valencia, 2005.
and normality. However, as discussed above, the tests employed [5] G. Ramis Ramos, M.C. García Álvarez-Coque, Quimiometría, Editorial Síntesis,
usually have low power to detect situations where these Madrid, 2001.
[6] B.L. Welch, The generalization of ‘Student’s’ problem when several different
assumptions should be abandoned (the low number of data points population variances are involved, Biometrika 34 (1947) 28–35.
available in chemical calibrations being an important drawback). [7] M.C. Ortiz, L.A. Sarabia, M.S. Sánchez, A. Herrero, Quality of analytical
Even under normal distributions, ‘the power of tests for hetero- measurements: statistical methods for internal validation, in: S.D. Brown, R.
Tauler, B. Walcack (Eds.), Comprehensive Chemometrics: Chemical and
scedasticity can be too low, and it is unknown how to be reasonably
Biochemical Data Analysis, vol. 1, Elsevier, Amsterdam, 2009.
certain that the power is sufficiently high’ [38]. Accordingly, [8] L. Sachs, Applied Statistics, Springer, New York, 1984.
Wilcox resourced to a bootstrap method, instead of relying on [9] J.N. Miller, J.C. Miller, Statistics for Analytical Chemistry, 2nd ed., Ellis
standard tests, to evaluate whether the difference between the Horwood, London, 1988.
[10] F.E. Satterthwaite, An approximate distribution of estimates of variance
slopes includes zero. components, Biometr. Bull. 2 (6) (1946) 110–114.
Bootstrap (randomization) was also considered by Hartmann [11] L.L. Havilcek, R.D. Crain, Practical Statistics for the Physical Sciences, ACS,
et al. [39] when discussing method-comparison and it was presented Washington DC, 1988.
[12] R.L. Anderson, Practical Statistics for Analytical Chemists, Van Nostrand
as an alternative worth of studying in order to address situations in Reinhold, New York, 1987.
analytical laboratories which might preclude the application of [13] D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De Jong, P.J. Lewi, J.
typical parametric tests (lack of normality, outliers, etc). Similarly, Smeyers-Verbeke, Chemometrics a Textbook, Elsevier, Amsterdam, 1988.
[14] D.L. Massart, J. Smeyers-Verbeke, F.X. Rius, Method validation: software to
Wehrens et al. [37] presented a case study to compare the intercepts compare the slopes of two calibration lines with different residual variance,
of two regression lines where their standard errors were considered. Trends Anal. Chem. 8 (2) (1989) 49–51.
The example was generalized to similarly compare the two slopes. [15] A.A. Aspin, Tables for use in comparisons whose accuracy involves two
variances, separately estimated, Biometrika 36 (1949) 290–296.
The authors warned about the different approach that would be [16] E. Tanburn, Note on an approximation used by B.L. Welch, Biometrika 29
required to apply permutation tests. (1938) 361–362.
[17] Student, The probable error of a mean, Biometrika 6 (1) (1908) 1–25.
[18] E.S. Pearson, Biometrika 30 (3 and 4) (1939) 210–250.
7. Conclusions
[19] J.H. Zar, Biostatistical Analysis, Prentice Hall, New Jersey, 1996.
[20] J.W. Einax, M. Reichenbächer, Solution to quality assurance challenge 2, Anal.
A detailed explanation of the Student’s methodology to Bioanal. Chem. 384 (2006) 14–18.
compare the slopes of two regression lines has been considered. [21] L. Cuadros Rodríguez, A.M. García Campaña, F. Alés Barrero, C. Jiménez Linares,
M. Román Ceba, Validation of an analytical instrumental method by standard
The problem has been addressed considering two situations: when addition methodology, J. AOAC Int. 78 (1995) 471–476.
the variances of the (first-order polynomial)  straight
 line models [22] M.C. Ortiz, S. Sánchez, L. Sarabia, Quality of analytical measurements:
 
are equal texp and when these are diferent texp . Our simulations univariate regression, in: S.D. Brown, R. Tauler, B. Walczack (Eds.),
Comprehensive Chemometrics: Chemical and Biochemical Data Analysis,
agreed with previous Welch’s studies and, so, the general use of vol. 1, Elsevier, Amsterdam, 2009.
texp rather than texp is recommended, unless the variances of the [23] N.R. Draper, H. Smith, Applied Regression Analysis, John Wiley & Sons, New
models are definitely equal (in which case they will yield the same York, 1998.
[24] J.M. Vilar Fernández, Modelos estadásticos aplicados, Servicio de Publica-
result). A potential alternative based on the standard errors of the ciones de la Universidad de A Coruña, A Coruña, 2003.
slopes has also been investigated by means of a simulation study. [25] A.L. Edwards, An Introduction to Linear Regression and Correlation, Freeman,
The simulations revealed that this alternative
  must not be used New York, 1984.
[26] J.N. Miller, Errors in calibration graphs, Spectrosc. Int. 3 (1991) 45–47.
because the distribution of the statistic t1exp is not known and it
[27] P.C. Meier, R.E. Zünd, Statistical Methods in Analytical Chemistry, Wiley
does not follow a tn1 þ n2  4 distribution as some authors Interscience, New York, 1993.
assumed. [28] K.L. Wuensch, Comparing correlation coefficients, slopes and intercepts,
February 2014. Available at http://core.ecu.edu/psyc/wuenschk/docs30/
If the use of standard error of the slopes is preferred by the
CompareCorrCoeff.pdf.
analysts, then decisions must be rooted on a bootstrap approach, as [29] R. Boqué, X. Rius, in: R. Cela (Ed.), Profundizando en la calibracién
Wehrens et al presented for the intercepts [37], as in effect the multivariante, Universidad de Santiago de Compostela Quimiometría práctica,
bootstrap methodology will reveal the distribution of the t1exp Santiago de Compostela, 1994.
[30] R Development Core Team, R: A Language and Environment for Statistical
statistic. However, a more comprehensive study which ensures a Computing, R Foundation for Statistical Computing, Vienna, Austria, 2013.
significant improvement over the classical t-test would be [31] S, Carballo, J. Terán, R.M. Soto, A. Carlosena, J.M. Andrade, D. Prada, Green
required. approaches to determine metals in lubricating oils by electrothermal atomic
absorption spectrometry (ETAAS), Microchem. J. 108 (2013) 74–80.
Finally, it was seen that a simple alternative to compare several [32] P. Veldt Larsen, Comparing regression lines. Master of Applied Statistics,
slopes can be based on the ANCOVA (analysis of covariance) February 2014 Available at http://statmaster.sdu.dk/maskel/docs/sample/
models, despite this approach is not problem-free. ST111/module09/module.pdf.
[33] D. Rogosa, Comparing nonparallel regression lines, Psychol. Bull. 88 (2) (1980)
307–321.
Acknowledgements [34] J.R. Crawford, P.H. Garthwaite, Statistical methods for single-case studies in
neuropsychology: comparing the slope of a patient’s regression line with those
of a control sample, Cortex 40 (2004) 533–548.
The Galician Government, ‘Xunta of Galicia’, is acknowledged by
[35] M. Ng, R.R. Wilcox, Comparing the regression slopes of independent groups, Br.
its support to the QANAP group (Programa de Consolidación y J. Math. Statist. Psychol. 63 (2010) 319–340.
Estructuración de Unidades de Investigación Competitiva, Ref: [36] C.Z. Mooney, R.D. Duval, Bootstrapping. A Nonparametric Approach to
Statistical Inference, Sage Publications, Newbury Park California, 1993.
GRC2013–047). Financial support from the Xunta of Galicia to
[37] R. Wehrens, H. Putter, L.M.C. Buydens, The bootstrap: a tutorial, Chemom.
research projects 2012-PG226 and CN2012/130 is also acknowl- Intell. Lab. Syst. 54 (2000) 35–52.
edged. [38] R.R. Wilcox, Comparing the slopes of two independent regression lines when
there is complete heteroscedasticity, Br. J. Math. Statist. Psychol. 50 (1997)
309–317.
References [39] C. Hartmann, J. Smeyers-Verbeke, W. Penninckx, Y. Vander Heyden, P.
Vankeerberghen, D.L. Massart, Reappraisal of hypothesis testing for method
[1] J.M. Andrade-Garda, A. Carlosena-Zubieta, R. Soto-Ferreiro, J. Terán-Baa- validation: detection of systematic error by comparing the means of two
monde, M. Thompson, Classical linear regression by the least squares method, methods or of two laboratories, Anal. Chem. 67 (1995) 4491–4499.

Please cite this article in press as: J.M. Andrade, M.G. Estévez-Pérez, Statistical comparison of the slopes of two regression lines: A tutorial, Anal.
Chim. Acta (2014), http://dx.doi.org/10.1016/j.aca.2014.04.057

You might also like