Guidance For Robustness/Ruggedness Tests in Method Validation
Guidance For Robustness/Ruggedness Tests in Method Validation
Laarbeeklaan103,1090Brussel,Belgium
bUnileverResearchVlaardingen,P.O.Box114,3130ACVlaardingen,TheNetherlands
Content
1. Introduction
1.1 Definitions
1.2 Situating robustness in method development and validation
1.3 Objectives of a robustness evaluation
1.4 The steps in a robustness test
2.Selectionoffactorsandlevels
5. Determining responses
5.1 Responses measured in a robustness test
5.2 Corrected response results
1. Introduction
1.1 Definitions
The definition for robustness/ruggedness applied is "The robustness/ruggedness of an
analytical procedure is a measure of its capacity to remain unaffected by small, but
deliberate variations in method parameters and provides an indication of its reliability
during normal usage" [1].
Robustness can be described as the ability to reproduce the (analytical) method in different
laboratories or under different circumstances without the occurrence of unexpected
differences in the obtained result(s), and a robustness test as an experimental set-up to
evaluate the robustness of a method. The term ruggedness is frequently used as a synonym
[2-5]. Several definitions for robustness or ruggedness exist which are, however, all closely
related [1,6-10]. The one nowadays most widely applied in the pharmaceutical world is the
one given by the International Conference on Harmonisation of Technical Requirements for
the Registration of Pharmaceuticals for Human Use (ICH) [1] and which was given above.
Only in Ref. [9] a distinction between the terms ruggedness and robustness is made and
ruggedness is defined there as the degree of reproducibility of the test results obtained under
a variety of normal test conditions, such as different laboratories, different analysts, different
instruments, different lots of reagents, different elapsed assay times, different assay
temperatures, different days, etc. The latter definition will not be applied since detailed
guidelines exist for the estimation of the reproducibility and the intermediate precision
[11,12]. The ICH guidelines [1] also recommend that "one consequence of the evaluation of
robustness should be that a series of system suitability parameters (e.g. resolution tests) is
established to ensure that the validity of the analytical procedure is maintained whenever
used".
The assessment of the robustness of a method is not required yet by the ICH guidelines, but
it can be expected that in the near future it will become obligatory.
Robustness testing is nowadays best known and most widely applied in the pharmaceutical
world because of the strict regulations in that domain set by regulatory authorities which
require extensively validated methods. Therefore most definitions and existing
methodologies, e.g. those from the ICH, can be found in that field, as one can observe from
the above. However, this has no implications for robustness testing of analytical methods in
other domains and this guideline is therefore not restricted to pharmaceutical methods.
1.2 Situating robustness in method development and validation
Robustness tests were originally introduced to avoid problems in interlaboratory studies and
to identify the potentially responsible factors [2]. This means that a robustness test was
performed at a late stage in the method validation since interlaboratory studies are performed
in the final stage. Thus the robustness test was considered a part of method validation related
to the precision (reproducibility) determination of the method [3,13-16].
However, performing a robustness test late in the validation procedure involves the risk that
when a method is found not to be robust, it should be redeveloped and optimised. At this
stage much effort and money have already been spent in the optimisation and validation, and
therefore one wants to avoid this. Therefore the performance of a robustness test has been
shifting to earlier points of time in the life of the method. The Dutch Pharmacists Guidelines
[6], the ICH Guidelines [7] as well as some authors working in bio-analysis [17] consider
robustness a method validation topic performed during the development and optimisation
phase of a method, while others [18] consider it as belonging to the development of the
analytical procedure.
The robustness test can be viewed as a part of method validation that is performed at the end
of method development or at the beginning of the validation procedure. The exact position
has relatively little influence on how it is performed.
1.3 Objectives of a robustness evaluation
The robustness test examines the potential sources of variability in one or a number of
responses of the method. In the first instance, the quantitative aspects (content
determinations, recoveries) of the method are evaluated. However besides these responses
also those for which system suitability test (SST) limits can be defined (e.g. resolution, tailing
factors, capacity factors, column efficiency in a chromatographic method) can be evaluated
(See Section 5).
To examine potential sources of variability, a number of factors are selected from the
operating procedure (See Section 2.1) and examined in an interval (See Section 2.2) that
slightly exceeds the variations which can be expected when a method is transferred from one
instrument to another or from one laboratory to another. These factors are then examined in
an experimental design (See Section 3) and the effect of the factors on the response(s) of the
method is evaluated (See Section 6). In this way the factors that could impair the method
performance are discovered. The analyst then knows that such factors must be more strictly
controlled during the execution of the method.
Another aim of a ruggedness/robustness test may be to predict reproducibility or
intermediate precision estimates [9]. In this guideline this kind of ruggedness testing is not
considered.
The information gained from the robustness test can be used to define SST limits (See
Section 7). This allows to determine SST limits based on experimental evidence and not
arbitrarily on the experience of the analyst.
1.4 The steps in a robustness test
The following steps can be identified: (a) identification of the factors to be tested, (b)
definitionofthedifferentlevelsforthefactors,(c)selectionoftheexperimentaldesign,(d)
definitionoftheexperimentalprotocol(completeexperimentalsetup),(e)definitionofthe
responses to be determined, (f) execution of the experiments and determination of the
responsesofthemethod,(g)calculationofeffects,(h)statisticaland/orgraphicalanalysisof
theeffects,and(i)drawingchemicallyrelevantconclusionsfromthestatisticalanalysisand,
ifnecessary,takingmeasurestoimprovetheperformanceofthemethod.Thesedifferent
stepsareschematicallyrepresentedinFig.1andareconsideredinmoredetailbelow.An
exampleofaworkedoutrobustnesstestcasestudyisdescribedinSection8.
2.Selectionoffactorsandlevels
2.1. Selection of the factors
The factors to be investigated in a robustness test are related to the analytical procedure
(operational factors) and to the environmental conditions (environmental factors). The
operational factors are selected from the description of the analytical method (operating
procedure), whereas the environmental factors are not necessarily specified explicitly in the
analytical method.
The selected factors can be quantitative (continuous), qualitative (discrete) or mixture
factors. Table 1 indicates a list of factors that could be considered during robustness testing
of chromatographic (liquid, gas or thin-layer chromatography) or electrophoretic methods.
3
The list is not exhaustive, but gives the reader an idea of the factors commonly examined. If
a sample preparation procedure (liquid-liquid extraction, solid-liquid extraction,
ultrafiltration, dialysis) is required before the chromatographic or electrophoretic analysis,
also factors from this procedure should be considered in the robustness testing. The number
of factors to be examined further increases when the analytical procedure requires a pre- or a
post-column derivatisation step.
Examples of quantitative factors are the pH of a solution, the temperature or the
concentration of a solution; of qualitative factors the batch of a reagent or the manufacturer
of a chromatographic column, and of a mixture factor the fraction of organic modifier in a
mobile phase.
The selected factors should represent those that are most likely to be changed when a
method is transferred between laboratories, analysts or instruments and that potentially could
influence the response(s) of the method.
2.1.1. Mixture-related factors
Mixtures of solvents are often used in analytical methods [5], e.g. mobile phases in
chromatography or buffers in electrophoresis are mixtures. In a mixture of p components
only p-1 can be changed independently. In HPLC analysis the mobile phase can contain,
besides the aqueous phase, one to three organic modifiers, yielding mixtures of two to four
components. In robustness testing both mixture variables and process variables (e.g. flow,
temperature, wavelength) need to be combined in the same experimental set-up. The simplest
procedure is to select maximally p-1 components to be examined as factors in the
experimental design. These p-1 factors are then mathematically independent, called mixturerelated variables [19] and are treated in the design in the same way as the process variables.
The pth component is used as adjusting component: its value is determined by that of the p-1
mixture related variables. The contributions of the different components in the mixture
preferably are expressed as volume fractions. As adjusting component the solvent occurring
with the highest fraction in the mixture is selected. (Example)
If one component of a mixture is found to be important, this means in practice that the
mixture composition as a whole is important. Since it is not possible to control only one of
the components of a mixture, the composition of the mixture as a whole should be more
strictly controlled.
Example. For a mobile phase containing methanol/acetonitrile/aqueous buffer with a
composition of 10:20:70 (V/V/V), the methanol (MeOH) content and the acetonitrile (ACN)
content can be selected as mixture-related variables and entered as factors in the design while
the buffer content is used as adjusting component. This latter component is not considered to
be a factor. The nominal levels (prescribed method conditions) of MeOH and of ACN are
then 0.10 and 0.20 respectively.
2.1.2. Quantitative factors
A set of factors often can be entered in the experimental design in different ways and this can
lead to physically more or less meaningful information. Therefore, when setting up a
robustness test the analyst should carefully consider how to define or formulate the factors.
As an example, consider the compounds of a buffer.
The composition of a buffer can be defined by the concentrations of its acidic (C a) and basic
(Cb) compounds (Example (a)).
There are several possibilities to examine these two compounds in a design, namely as two
different factors or combined to represent the pH and/or the ionic strength (). If one wants
to maximise the information extracted from a robustness test, it may be preferable to choose
4
the factors in such a way that the effects have a physical meaning. In that case one should use
pH and .
If the emphasis is only on measuring the robustness of the method then one could use the
first approach (Example (b)). This involves that when one of the two factors (C a or Cb) is
found to be important, the second one also needs to be controlled strictly, as was the case
with mixtures.
Cb
In the second approach, the two concentrations are combined to one factor,
. Depending
Ca
on the variation introduced in this factor, one will simulate a change in the pH, in the ionic
strength or in both. If one keeps the molar ratio constant and changes the concentrations of
Ca and Cb then the factor examines a change in ionic strength. When the ratio is changed,
then one can introduce, depending on the kind of buffer used, a change in the pH (for
NaH 2 PO 4
instance for a buffer like
where only one compound contributes to ) or both in
H 3PO 4
Na 2 HPO 4
the pH and the ionic strength (for instance for a buffer like
where both
NaH 2 PO 4
compounds are contributing to ) [5]. (Example (c))
Example
(a) A phosphate buffer description extracted from a monograph of Ref. [20] is for instance:
Place 6.8g potassium dihydrogen phosphate, 500ml water and 1.8ml phosphoric acid in
a 1000ml volumetric flask. Adjust to volume with water and mix well. Render the
contents of the flask homogenous by shaking vigorously until all solids are dissolved.
(b) The factors Ca and Cb are then defined as the volume H3PO4 per litre buffer and the
weight of NaH2PO4 per litre buffer, respectively.
NaH 2 PO 4
H3PO 4 which is then multiplied with a given
constant to define the extreme levels for this factor. For the above described buffer two
NaH 2 PO 4
where the ratio between [NaH2PO4]
H
PO
3 4
and [H3PO4] is kept constant and a = 1 represents the nominal situation while a<1 or a>1
are the extreme levels. In this situation the pH is kept constant while the ionic strength
changes.
A second possibility is
a * NaH 2 PO 4
H 3PO 4 with again a = 1 the nominal level and a<1 or a>1
the extreme levels. In this situation the ratio is changed which means that the pH is
changed.
It can also be remarked that
Na 2 HPO 4
(i)
for a buffer like
where both compounds are contributing to the ionic
NaH 2 PO 4
strength this latter approach changes both the pH and ,
(ii)
only one of the two above possibilities can be defined as a factor in an
experimental design.
Another regularly used way to describe the preparation of a buffer is to dissolve a given
amount of a salt, e.g. NaH2PO4 and then to adjust the pH by adding an acid (e.g. H3PO4) or a
5
base (e.g. NaOH). In this situation the pH should be chosen as a factor. The concentration of
the salt could also be examined as a second factor (Example).
Example. A buffer according to this type of definition is for instance: 10mM phosphate
buffer, pH 3.0. In this situation the pH and the concentration phosphate (which represents
the ionic strength) could be examined as factors.
2.1.3. Qualitative factors
Often qualitative factors are also included. For instance, for chromatographic methods,
factors related to the column, such as the column manufacturer", the "batch of the column"
or even columns from the same batch are examined. Examining columns from a same batch
is done to evaluate if characteristics unique to a single column, e.g. artefacts of the column
packing procedure, affect the results. Columns from different batches are used to evaluate
the batch-to-batch variations and from different manufacturers to examine the variations
between manufacturers.
However, one should be aware of the fact that no observed significant effect for such a
qualitative factor does not mean that this factor never has an influence. Examination of a
very limited number of representatives (e.g. different columns) does not allow to draw any
conclusion about the total population. Only conclusions concerning the robustness of the
method with respect to the examined representatives can be made (see also Section 8).
When including several qualitative factors in an experimental design impossible factor
combinations should be avoided. An example is the combination of the factors "manufacturer
of column material" and "batch of material" in one two-level design (Section 3). Selecting
two levels for the manufacturer of material would give manufacturers I and J. Selecting two
levels for the batch of material is not possible since one cannot define batches common to
both manufacturers I and J.
2.2. Selection of the factor levels
2.2.1 Quantitative and mixture factors
The factor levels are usually defined symmetrically around the nominal level prescribed in the
operating procedure. The interval chosen between the extreme levels represents the
(somewhat exaggerated) limits between which the factors are expected to vary when a
method is transferred. In most case studies the levels are defined by the analyst according to
his/her personal opinion. However, selection of the levels also can be based on the precision
or the uncertainty [21] with which a factor can be set and reset. For instance the uncertainty
in the factor "pH of a solution" will depend on the uncertainty of the pH meter result and on
the uncertainty related to the calibration of the pH meter. Suppose one knows, for instance
from a systematic determination of the uncertainties [21], that the pH varies with a
confidence level of 95% in the interval pH 0.02. How to do this is described further in this
Section. Due to the uncertainty in the pH, one can expect the nominal pH (pH nom) to vary
between the levels pHnom 0.02. To select the extreme levels in a robustness experiment this
interval is enlarged to represent possible variations between instruments or laboratories. This
is done by multiplying the uncertainty with a coefficient k which gives as extreme levels pH
k*0.02. The value k = 5 is proposed as default value. Other values can be used when the
analyst considers larger or smaller intervals for certain factors to be feasible. Since the
selected extreme levels are also subjected to uncertainty, k = 2 are the strictest conditions for
which extreme levels can be evaluated that are clearly different from each other. To be clearly
different from the nominal level, k >= 3 is needed to define the extreme levels.
To quantify the uncertainty in analytical measurements detailed Eurachem guidelines exist
[21]. They are quite tedious to apply since they try to quantify all sources of variability in a
6
factor level, which is not always obvious to do. Therefore, for the purpose of robustness
testing a simpler alternative is proposed [5].
Proposal. For each measured response one so-called absolute uncertainty is defined which
quantifies the most obvious source of variation. For instance (i) for a mass, consider the last
number given by the balance or a value specified by the manufacturer, as uncertain, e.g.
0.1mg for an analytical balance; (ii) for a volume, take the uncertainty in the internal volume
of the volumetric recipient, specified by the manufacturer, e.g. 0.08ml for a 100ml volumetric
flask; (iii) for a pH value, use the last digit of the display or a value specified by the
manufacturer of the pH meter. When a response is calculated from a combination of
measured components - as for instance is the case with a concentration which is the quotient
between a mass and a volume - the following rules are applied, (i) the absolute uncertainty
for a sum or a difference is the sum of the absolute uncertainties in the terms, and (ii) the
relative uncertainty (i.e. ratio of absolute uncertainty over response value) for a product or a
quotient is the sum of the relative uncertainties in the terms (Example).
Example. Consider for instance the determination of the uncertainty in the concentration of
a solution. Suppose a reagent solution with a nominal concentration of 100 mgL-1 is defined
in the operating procedure and is prepared in a 100 ml volumetric flask. The concentration C
is determined as C = m/V where m is the mass weighed and V the volume. To determine the
uncertainty in C the uncertainties in m and V are estimated first. The absolute uncertainty in
the mass is defined as 2 x 0.1mg (mass obtained from a difference of two measurements) and
in the volume as 0.08ml. This gives relative uncertainties of 0.02 and 0.0008, and for the
concentration of 0.0208. The absolute uncertainty in the concentration is then 2mgL -1 and
the extreme levels to be examined in a design would be about 90 and 110mgL-1 (k=5).
The introduction of the coefficient k should also compensate for the occasional sources of
variability which where not taken into account in the estimation of the absolute uncertainty.
A similar reasoning as for the quantitative factors is valid for mixture factors (Example).
Example. Consider a mobile phase 30:70 V/V MeOH/H 2O prepared using graduated
cylinders of 500ml for MeOH (uncertainty internal volume 1.88ml) and of 1000ml for water
(uncertainty internal volume 5ml). The fraction of methanol is calculated as f MeOH =
VMeOH
(volumes considered additive for ease of calculation). When applying the
VMeOH VH 2 O
alternative estimates (not the Eurachem ones), only the uncertainties in the internal volumes
are taken into account. According to these rules the absolute uncertainty in f MeOH is 0.004
and the one in f H 2 O 0.01.
More detailed information about these uncertainties:
Volume (= V)
Absolute uncertainty (V)
Relative uncertainty (V)
MeOH
300ml
1.88ml
6.27*10-3
H2O
700ml
5ml
7.14*10-3
Fraction (= f)
Relative uncertainty (f)
Absolute uncertainty (f)
0.3
1.32*10-2
0.004
0.7
1.40*10-2
0.01
MeOH + H2O
1000ml
6.88ml
6.88*10-3
Therefore the Plackett-Burman designs are included in this guideline. For an inexperienced
experimental design user, Plackett-Burman designs are easier to construct than fractional
factorial designs. The latter are however also given, for the sake of completeness (see
Selection of a fractional factorial design).
For a given number of factors, both within the Plackett-Burman and the fractional factorial
designs, two options are presented. The first option consists of using minimal designs, i.e. the
designs with the absolute minimal number of experiments for a number of factors, while the
second option allows a more extensive statistical interpretation of the effects. The
recommended Plackett-Burman designs are described in Table 2. The smallest number of
factors to be examined in an experimental design was considered to be three. For statistical
reasons concerning effect interpretation, designs with less than eight experiments are not
used, while those with more than 24 experiments are considered unpractical. The designs are
constructed as follows. The first line for the designs with N= 8-24 as described by Plackett
and Burman [25] is given below:
N=8
N=12
N=16
N=20
N=24
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+ +
+
+ + +
+ + +
+ + + + +
+ + + +
+
+
+ +
+
+
+
+
with N being the number of experiments and (+) and (-) the levels of the factors.
An example of a Plackett-Burman design for N = 12 is shown in Table 3. The first row in the
design is copied from the list above. The following N-2 rows (in the example, 10) are
obtained by a cyclical permutation of one position (i.e. shifting the line by one position to the
right) compared to the previous row. This means that the sign of the first factor (A) in the
second row is equal to that of the last factor (K) in the first row. The signs of the following
N-2 factors in the second row are equal to those of the first N-2 factors of the first row. The
third row is derived from the second one in an analogous way. This procedure is repeated N2 times until all but one line is formed. The last (Nth) row consists only of minus signs.
A Plackett-Burman design with N experiments can examine up to N-1 factors. After
determination of the number of real factors to be examined, the remaining columns in the
design are defined as dummy factors. A dummy factor is an imaginary factor for which the
change from one level to the other has no physical meaning.
In the minimal designs (Table 2a) the significance of effects is determined based on the
distribution of the factor effects themselves (see Section 6.2.2.3), while in the other designs
(Table 2b) the standard error on the effects is estimated from dummy factor effects (see
Section 6.2.2.2) [26,27]. The latter designs are therefore not always those with the smallest
number of experiments possible for a given number of factors, because a minimal number of
degrees of freedom to estimate the experimental error was taken into account, i.e. some
columns are needed for dummy factors. The Plackett-Burman designs in Table 2b are chosen
so that at least three dummy factors are included.
For complex methods, e.g. with an extensive sample pretreatment and/or a post-column
derivatisation, it might be necessary to examine a large number of factors and a relatively
large design is required which becomes tedious to perform. In such cases it may be more
practical to split the factors in two sets and evaluate them in two smaller designs that are
easier to execute. For instance, the factors of the derivatisation procedure are examined in
one design and those related to the analytical technique in a second. The most commonly
used designs consist of eight to sixteen experiments.
9
4 Experimental work
4.1 Execution of trials
Aliquots of the same test sample and standard(s) are examined at the different experimental
conditions. In case there is a large range of concentrations to be determined (factor 100 or
more) several concentrations could be examined.
The design experiments are preferably performed in a random sequence. For practical
reasons experiments may be blocked (sorted) by one or more factors. This means that for the
blocked factor first all experiments where it is at one level are performed and afterwards
those at the other. Within the blocks the experiments are randomised. Even though blocking
is often used this way of working can contain some pitfalls. Indeed, if drift (time effect)
occurs the estimated effect(s) of the blocked factor(s) will be affected by the drift [24,29,30].
If blocking is performed, at least a minimal check for drift is recommended. With drift is
meant that a response measured at constant conditions (e.g. nominal ones) is changing
(increasing or decreasing) as a function of time.
Blocking by external factors not tested in the design such as, for instance, days is also
possible. When a design cannot be performed within one day, it can be executed in blocks on
different days. This kind of blocking can also cause a blocking effect which is confounded
with one or more effects estimated for the design factors. Which effects are confounded in
that case depend on the sequence the design experiments are performed [30].
4.2 Minimising the influence of uncontrolled factors
A method can be subject to unavoidable drift. For instance, all HPLC columns are ageing and
as a consequence some responses drift as a function of time. A robustness test on methods
10
with drifting responses is still useful since it will indicate whether or not other factors affect
the response. However, some of the estimated factor effects are corrupted when they are
calculated from the measured data without taking some precautions, such as (i) correcting
for the drift using replicated (nominal) experiments, (ii) confounding the time effects (due to
the drift of the response) with dummy variables effects, or (iii) confounding the time effects
with non-significant interactions in fractional factorial designs.
4.2.1 Using replicated experiments
A number of additional experiments, usually at nominal levels, can be added to the
experimental design experiments to complete the experimental set-up. These replicate
experiments are performed before, at regular time intervals between, and after the
robustness test experiments of the Plackett-Burman design. The simplest possibility is to
carry out two replicate experiments, one before and one after the design experiments. These
experiments allow (i) to check if the method performs well at the beginning and at the end
of the experiments, (ii) to obtain a first estimate for drift, (iii) to correct the measured
results for possible time effects, such as drift, and occasionally (iv) to normalise the effects
(see Section 6.1).
5. Determining responses
5.1 Responses measured in a robustness test
From the experiments performed, a number of responses can be determined. For
chromatographic methods, responses describing a quantity such as the content of main
substance and by-products, and/or peak areas or peak heights are the more evident. The
evaluation of the content can hide certain effects: indeed, when a factor has a similar effect on
11
the peak area/height of the sample and standard(s), this effect will not be seen anymore in the
content. Therefore, the evaluation of both peak areas/heights and contents can indicate
different factors as important and is to be preferred. An alternative for the study of
areas/heights is to calculate the content from the area/height of the sample measured for the
different design experiments relative to the result(s) of the standard(s) measured at nominal
conditions.
For a separation method one should also consider one or more parameters describing the
quality of the separation, such as, for example, the resolution or the relative retention. The
evaluation of these separation parameters can also lead to system suitability test (SST) limits
as required by the ICH. When determining SST-limits, other responses such as capacity
factors or retention times, asymmetry factors and number of theoretical plates can also be
studied (see Section 8).
5.2 Corrected response results
If one checked for drift, e.g. by replicate nominal experiments, corrected response results can
be calculated from the measured results. The corrected design results are calculated as
( p 1 i )* ynom,before i* ynom, after
p 1
(1)
where i = 1, 2, ..., p and p is the number of design experiments between two consecutive
nominal experiments, yi,corrected is a corrected design result, yi,measured the corresponding
measured design result, ynom,begin the nominal result at the beginning of the experiments
(before design), ynom,before and ynom,after the nominal results measured before and after the
design result for which one is correcting. Equation (1) is only correct if the hypothesis can be
accepted that the experiments were performed equidistant in time.
Y - Y
(2)
where X can represent (i) real factors A, B, C, ...; or (ii) the dummy factors from PlackettBurman designs or the two-factor interactions from fractional factorial designs, EX isthe
effectofXonresponseY;Y(+)andY()arethesumsofthe(corrected)responseswhere
Xisattheextremelevels(+)and(),respectively,andNisthenumberofexperimentsofthe
design.
The effects can also be normalised relative to the average nominal result ( Y ), in case the
response is not drifting, or to the nominal result measured before the design experiments,
when the response is drifting [26]. Usually normalised effects, much more than the regular
effect estimates, allow the user of the method to consider the influence of a factor as
important, even without statistical interpretation.
12
E X (%)
EX
100%
Y
(3)
EX
( SE ) e
tcritical
(4)
with (SE)e,thestandarderrorofaneffect,whichrepresentstheexperimentalvariability
withinthedesign.Forrobustnessexperiments,this(SE)ecanbeestimatedindifferentways
(seebelow). The statistic given in equation (4) can be rewritten as
Ecritical=tcritical.(SE)e
ornormalisedrelativetotheresponsevalue(Yn)
E
.100%
%Ecritical= critical
Yn
(5)
(6)
13
2
sa2 sb
na nb
(7)
where sa2 and sb2 estimate the variances of the two sets of measurements and na and nb are
the numbers of measurements of those sets. The standard error of an effect thus becomes
( SE )e
s2
s2
N/2 N/2
4s 2
N
(8)
since sa2 and sb2 are estimated by the same variance, s2, and na = nb = N/2.
The variance s2 can be determined from replicated experiments at nominal levels or from
duplicated design experiments. The tcritical isatabulatedtvalueatRdegreesoffreedom
where Ris the number ofdegrees of freedom with which s2 is estimated. When using
replicatednominalexperimentsR=n1withnbeingthenumberofreplicatedexperiments,
N
d i2
2
while for duplicated design experiments s = i 1
and R = N, with di being the
2N
differencesbetweentheduplicatedexperiments.
With this criterion, only estimates of s2 obtained under intermediate precision conditions lead
to relevant conclusions [27]. With intermediate precision conditions is meant here that at least
the factor time was varied, i.e. the measurements were performed on different days.
Thus, this criterion can only be used if intermediate precision estimates are available which
often is not yet the case when a robustness test is performed. Secondly, it is not always
practically feasible to duplicate design experiments under intermediate precision conditions
given the increased workload.
6.2.2.2 Estimation of error from dummy effects or from two-factor interaction effects
An estimate of (SE)e can also be obtained from dummy or interaction effects, i.e. effects that
are considered negligible. The following equation is used
14
(SE)e
Eerror
nerror
(9)
2
where Eerror
is the sum of squares of the nerror dummy or interaction effects. The (SE)e is
then used in the equation (4) or (5) to perform the statistical test.
One should be aware of the low power of a statistical test with few degrees of freedom. If
dummy factors are used, a design which contains at least three dummy factors should be
selected (Table 2). The minimal designs described above have no, or at least not a sufficient,
number of degrees of freedom to test the effects on their significance using the dummy
effects. As a consequence the power of the t-test to detect any significance is low. In these
cases the algorithm of Dong is to be preferred (see Section 6.2.2.3).
One should also take into account the fact that in some situations the dummy effects are
potentially affected by the drift (see Section 4.2.2). These should be eliminated from the
estimation of (SE)e, since they are not necessarily representing non-significant effects
anymore.
(10)
m 1 Ei2
(11)
(12)
SME t (1 * / 2 ,df ) s1
(13)
In the calculation of s1 the effect of factor C (age of the reference solution) is left out,
because this effect is larger than 2.5 s0 . Further, equations (12) and (13) are applied to
define the ME and SME limits.
ME t( 0.975, 6 ) 0.830 2.45 0.830 2.03
16
For the SME limit the significance level is adapted using the adjustment as defined by
Sidak [41], which results in a corrected significance level equal to =0.0085, i.e., (1-(10.05)1/6).
SME t ( 0.996 , 6 ) 0.830 3.98 0.830 3.30
In Figure 3 the half-normal plot is presented, with the ME and SME limits included. It
indicates that all effects are more or less equal to random error, except for the effect of factor
C. This factor has a significant effect when the factor levels are varied in the interval as
specified in the robustness test. It is evident, for this example, that factor C deviates from the
straight line. However, the interpretation is not always so obvious. The algorithm of Dong or
one of the other statistical interpretation methods are then helpful tools.
X (0)
X (1) X ( 1) Ecritical
X (1) X ( 1) Ecritical
, X (0)
2 EX
2 EX
(14)
where X(0), X(1) and X(-1) are the real values of factor X for the levels (0), (1) and (-1)
respectively.
Example. Suppose the factor pH of a buffer was examined in the interval 6.5 7.1 with the
nominal pH being 6.8, and its influence on the resolution was evaluated. A significant effect
(EX = 0.427) was found with Ecritical = 0.370. Non-significant factors levels for pH are then
estimated as 6.8
2 * 0.427
2 * 0.427
pH is controlled within this interval (e.g. within 6.6 7.0) no significant effect of the pH on
the resolution will be found anymore.
It is evident (i) that such levels can be calculated only for quantitative factors, (ii) that the
extreme levels must be symmetrically situated around the nominal one, and (iii) that a linear
behaviour of the response as a function of the factor levels is assumed.
considered robust for its quantitative assay. In that case it can be expected that in none of the
points of the experimental domain, including those at which certain (system suitability)
responses have their worst result, there would be a problem with the quantitative response.
Of course, the hypothesis that the worst case conditions do not affect the quantitative results
can easily be verified in practice. This is further discussed in Section 8.
Beside the recommendation of the ICH guidelines, there are also practical reasons for
defining SST-limits based on the results of a robustness test. From experience it was seen
that the SST limits selected independently from the results of a robustness test, frequently are
violated when the method is transferred. This is due to the fact that they are chosen too
strictly and relatively arbitrarily based on the experience of the analyst in the optimisation
laboratory. On the other hand, it is neither considered desirable to choose as SST-limit the
most extreme value that still allows a quantitative determination. For instance, when the
operational conditions after method development give a resolution of about six, a resolution
of two is not considered acceptable, even if quantification still seems possible. It is namely
important to maintain the method at all times around the conditions at which it is optimised
and validated. Therefore it is considered preferable to derive the system suitability limits from
the robustness test, since there the most extreme variations in the factors that still are
probable under acceptable conditions, are examined.
These worst-case conditions are predicted from the calculated effects. The worst-case
situation is then the factor combination giving the lowest resolution. For responses like the
capacity factor it is the one causing the smallest result, while for the tailing factor it is usually
the situation resulting in the highest value. To define the worst-case conditions only the
statistically significant factors (at =0.05) and the ones that come close to it (significant at
=0.1) are considered. The factors not significant at =0.1 are considered negligible and
their effects are considered to originate only from experimental error. As the experimental
designs proposed in this guideline are saturated two-level designs, only linear effects for the
maintained factors are considered in the prediction of the worst-case situation which is
acceptable since in robustness testing only a restricted domain of the response surface is
considered. The factor level combination for which the equation
E Fk
E
E
Y = b0 + F1 *F1 + F2 *F2 + +
(15)
*Fk
2
2
2
predicts the worst result is derived. In Eq. (15) Y represents the response, b0 the average
design result, E F i the effect of the factor considered for the worst-case experiment and F i
the level of this factor (1 or +1). Non-important factors are kept at nominal value (F k = 0).
An example of calculation is given in Section 8.
The SST-limit can experimentally be determined from the result of one or several
experiments performed at these conditions, or it can be predicted. When the experiment is
replicated the SST-limit can be defined as the upper or lower limit from the one-sided 95%
confidence interval [13] around the worst case mean. For resolution and capacity factor, for
instance, the lower limit would be chosen, while for the tailing factor it would be the upper
s
when it is the upper one.
n
If no significant effects were occurring for a response then its SST limit can be determined
analogously to the above situation but the measurements will be executed at nominal
conditions.
A less strict and easier alternative is to define the (average) worst-case result YWorst case as
the SST-limit.
Finally, one could estimate the SST-limit from the theoretical model of Eq. (15) without
18
Tables 3 and 9, at first sight, seem to be different. This is however not the case. Table 3 was
constructed, as described in Section 3, starting from the first line given by Plackett and
Burman. The design of Table 9 is the one generated by a statistical software package. With
this we would like to indicate that when someone is using its own available software to select
or create a design, the sequences of rows and columns are not necessarily the ones given in
this guideline, though the final designs are equivalent.
Execution of the trials
No additional nominal experiments were added to the experimental set-up. For each of the
12 experimental design runs, three injections are performed (i) a blank injection, (ii) an
injection of the reference solution, and (iii) an injection of the sample solution. With this setup it is assumed that in practice the sample and standard, used to determine the sample
content are analysed under identical experimental conditions (see Section 5).
Responses determined
The responses determined in this robustness test are (i) the percent recoveries of MC, RC1
and RC2, (ii) the resolution (Rs) of the critical peak pair, which is MC and RC1, (iii) the
capacity factor (k) of MC, (iv) the tailing or asymmetry factor (Asf) of MC, and (v) the
analysis time given as the retention time (t R) of the last eluting substance RC2. Table 10
shows the experimentally obtained design values for the responses that are studied.
Calculation of effects
The effects of the different factors on the considered responses are shown in Table 11a. Since
in this case study no additional nominal experiments are performed no normalised effect
values are calculated.
Graphical interpretation of effects
Normal probability and half-normal plots are drawn with the effects estimated for the
different responses. They are shown in Fig. 4. From these plots it can be observed that the
interpretation of these plots is not always straightforward and it can be recommended to
combine them with a statistical interpretation. This latter interpretation allows to draw the
critical effects on the plots, as was for instance done in Figure 3. In Fig. 4 no critical effects
were drawn on the plots since actually they belong to the statistical interpretation of the
effects.
In both types of plots, the visual identification of important effects becomes less evident as
the total number of plotted effects decreases (i.e. for smaller designs). It is not always
obvious to draw the line formed by the non-significant effects. The graphical interpretation
becomes more interesting when the number of estimated effects is large and only a limited
number is expected to be significant. The plots also can be used to indicate suspect dummy
factor effects which are relatively high and that are possible outliers to the population of nonsignificant effects (cf. EDum3 on %RC2) and therefore occasionally can be eliminated from the
statistical interpretation (cf. further).
Statistical interpretation of effects
Two statistical interpretations were performed on this data set, (i) the one in which the
experimental error is estimated from the dummy effects (Section 6.2.2.2), and (ii) the one
which uses Dongs criterion (Section 6.2.2.3). The criterion based on an intermediate
precision estimate (Section 6.2.2.1) is not used since that kind of information was not
available at the moment the robustness test was performed. Notice that in general only one
statistical interpretation will be applied. Here two are given for comparison. The critical
effects obtained with both interpretation methods are shown in Table 11b. The significance of
the factor effects according to both interpretation methods is shown in Table 12. It can be
20
observed that the quantitative results of the method, the percent recovery of the substances,
are not considered significantly affected by one of the examined factors according to the
interpretation using the dummy factor effects to estimate the experimental error.
When Dongs criterion is applied some factors were found to be significant for the %RC2,
however, only at =0.1 level. This difference can be explained by the fact that the effect of
one dummy (Dum3) is relatively high which affects the critical effect estimated from the
dummy effects, while this is not the case for the limit from Dongs criterion. This
demonstrates that Dongs criterion is a more robust estimator of the experimental error,
when relatively large dummy factor effects occur.
Based on the graphical methods (Fig. 4) one also could have decided to remove Dum3 from
the statistical interpretation since it seems to be an outlier to the population of nonsignificant effects. After removal of Dum3 from the estimation of (SE)e the critical effects
become comparable to those estimated with Dongs criterion (see Table 11b).
Evaluation of the robustness of the method
The assay of MC and its related compounds can be considered robust because (i) none of the
factors studied has a significant effect (at =0.05 level) on the determination of the recovery
of the main and related compounds when the dummy effects are used to estimate the
experimental error, (ii) using Dongs criterion none of the factor effects is significant neither
at =0.05, (iii) the most extreme results obtained in the design (Table 10), are within the
acceptance limits for the recovery (95-105%), that was handled in this case study and (iv) the
percent relative standard deviations of the design results are also considered acceptable for
this method (1.2%, 1.5% and 1.6% for MC, RC1 and RC2 respectively).
The fact that for the responses such as resolution, capacity factor, retention time or
asymmetry factor several significant effects are found does not mean that the method should
be considered as non-robust or that the method was not well optimised. When the
quantitative aspect of the method is not influenced by the factors examined the method can
be considered robust. The standardisation one would have to make to prevent factors from
affecting responses such as for instance the capacity factor, would be so strict that execution
of the method would not be feasible anymore and moreover, would go beyond the original
intention of the robustness test.
Derivation of system suitability limits from robustness test results
The worst-case factor-level combinations for the responses for which SST limits were
desired are shown in Table 13a. The worst-case situation for resolution is the factor
combination giving the lowest resolution, for the capacity factor it is the one causing the
smallest capacity factor, while for the tailing factor it is the factor combination resulting in
the highest value. These worst-case conditions were predicted from the significances
observed with the statistical interpretation using the dummy effects (Table 12a) (Example).
Example. Consider the response resolution between MC and RC1. Significant effects at
=0.10 are observed for the factors pH, column, temperature %B end and buffer conc. (see
Table 12a). To define the worst-case conditions the non-significant factors %B begin, flow
and wavelength are kept at nominal level as described in Section 7. To define the worst-case
levels for the significant factors the estimated effects are considered (Table 11a). For pH the
effect was estimated to be 0.427. This means that level (+1) gives a higher response than
level (-1) as can be derived from the equation for effects (Eq. 2), and that the worst
resolution is obtained at level (-1). For the factors column, temperature, %B end and buffer
concentration the worst-case levels are defined analogously as being (-1), (-1), (+1) and (-1)
respectively. This combination of levels is the one given in Table 13a as the predicted worstcase factor-level combination for Rs(MC-RC1).
21
The worst-case experiment for a given response was then carried out in three independent
replicates. The results and the system suitability limits derived from these experiments are
shown in Table 13b.
The results of the two other possibilities to define SST-limits, namely taking the average
worst-case result, or estimating them from the theoretical model of the effects, are also
shown. It can be observed that the SST-limits calculated from the theoretical model are the
least strict ones in the case study.
Remark
As mentioned in Section 7, if one doubts about the hypothesis that the quantitative results of
the method are not affected by the worst-case conditions of Table 13a, quantitative
experiments (to determine the recovery of the substances in this example) can be executed at
these conditions to confirm.
Acknowledgements
Y. Vander Heyden is a postdoctoral fellow of the Fund for Scientific Research (FWO)
Vlaanderen. B. Boulanger, P. Chiap, Ph. Hubert, G. Caliaro and J.M. Nivet (SFSTP
commission); P. Kiechle and C. Hartmann (Novartis, Basel, Switzerland) are thanked for
various discussions on the subject.
22
References
[1] ICH Harmonised Tripartite Guideline prepared within the Third International Conference
on Harmonisation of Technical Requirements for the Registration of Pharmaceuticals for
Human Use (ICH), Text on Validation of Analytical Procedures, 1994,
(http:/www.ifpma.org/ich1.html).
[2] Youden, E.H. Steiner; Statistical Manual of the Association of Official Analytical
Chemists; The Association of Official Analytical Chemists ed.; Arlington, 1975, p. 33-36, 7071, 82-83.
[3] J.A. Van Leeuwen, L.M.C. Buydens, B.G.M. Vandeginste, G. Kateman, P.J.
Schoenmakers, M. Mulholland; RES, an expert system for the set-up and interpretation of a
ruggedness test in HPLC method validation. Part 1 : The ruggedness test in HPLC method
validation; Chemometrics and Intelligent Laboratory systems 10 (1991) 337-347.
[4]M. Mulholland; Ruggedness testing in analytical chemistry; TRAC, 7 (1988) 383-389.
[5] Y. Vander Heyden, F. Questier and D.L. Massart; Ruggedness testing of
chromatographic methods : selection of factors and levels; JournalofPharmaceuticaland
BiomedicalAnalysis18(1998)4356.
[6] F.J. van de Vaart et al.; Validation in Pharmaceutical and Biopharmaceutical Analysis; Het
Pharmaceutisch Weekblad 127 (1992) 1229-1235.
[7] ICH Harmonised Tripartite Guideline prepared within the Third International Conference
on Harmonisation of Technical Requirements for the Registration of Pharmaceuticals for
Human Use (ICH), Validation of Analytical Procedures : Methodology, 1996, 1-8
(http:/www.ifpma.org/ich1.html).
[8] J.CaporalGautier,J.M.Nivet,P.Algranti,M.Guilloteau,M.Histe,M.Lallier,J.J.
N'GuyenHuuandR.Russotto;Guidedevalidationanalytique,Rapportd'unecommission
SFSTP,STPPharmaPratiques,2(1992)205239.
[9]TheUnitedStatesPharmacopeia,23thedition,NationalFormulary18,UnitedStates
PharmacopeialConvention,1995,Rockville,USA.
[10]Drugs Directorate Guidelines, Acceptable Methods; Health Protection Branch - Health
and Welfare Canada; 1992; 20-22.
[11] International Organisation for Standardisation (ISO); Accuracy (trueness and precision)
of measurement methods and results - Part 2 : Basic method for the determination of
repeatability and reproducibility of a standard measurement method; International Standard
ISO 5725-2:1994(E), First edition.
[12] International Organisation for Standardisation (ISO); Accuracy (trueness and precision)
of measurement methods and results - Part 3 : Intermediate measures of the precision of a
standard measurement method; International Standard ISO 5725-3:1994(E), First edition.
[13] D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De Jong, P.J. Lewi and J.
Smeyers-Verbeke; Handbook of Chemometrics and Qualimetrics: Part A, Elsevier,
Amsterdam, 1997.
23
24
[27] Y. Vander Heyden, C. Hartmann and D.L. Massart; L. Michel, P. Kiechle and F. Erni;
Ruggedness tests on an HPLC assay : comparison of tests at two and three levels by using
two-level Plackett-Burman designs; Analytica Chimica Acta 316 (1995) 15-26.
[28] Y. Vander Heyden, F. Questier and D.L. Massart, A ruggedness test strategy for
procedure related factors : experimental set-up and interpretation; Journal of
PharmaceuticalandBiomedicalAnalysis17(1998)153168.
[29] J.L. Goupy, Methods for experimental design, principles and applications for
physicists and chemists, Elsevier, Amsterdam, 1993, pp. 159-177, 421-427.
[30] Y. Vander Heyden, A. Bourgeois, D.L. Massart, Influence of the sequence of
experiments in a ruggedness test when drift occurs, Analytica Chimica Acta 347 (1997)
369-384.
[31] A. Nijhuis, H.C.M. van der Knaap, S. de Jong and B.G.M. Vandeginste, Strategy for
ruggedness tests in chromatographic method validation, Analytica Chimica Acta, 391 (1999)
187-202.
[32] K. Jones, Optimization of experimental data, International Laboratory 16, 9 (1986) 3245.
[33] Y. Vander Heyden and D.L. Massart; Y. Zhu and J. Hoogmartens; J. De Beer;
Ruggedness tests on the HPLC assay of the United States Pharmacopeia XXIII for
tetracycline hydrochloride : comparison of different columns in an interlaboratory approach;
Journal of Pharmaceutical and Biomedical Analysis 14 (1996) 1313-1326.
[34] Y.VanderHeyden,C.HartmannandD.L.Massart;P.Nuyten,A.M.HollandsandP.
Schoenmakers; Ruggedness testing of a size exclusion chromatographic assay for low
molecularmasspolymers;Journal of Chromatography A 756 (1996) 89-106.
[35] Y.VanderHeyden,G.M.R.Vandenbossche,C.DeMuynck,K.Strobbe,P.VanAerde,
J.P. Remon and D.L. Massart, Influence of process parameters on the viscosity of
Carbopol974Pdispersions,Ph.D.thesis,personalcommunication.
[36] F. Dong, On the identification of active contrasts in unreplicated fractional factorials,
Statistica Sinica 3 (1993) 209-217.
[37] C. Daniel, Use of half-normal plots in interpreting factorial two-level experiment,
Technometrics 1 (1959) 311-341.
[38] D.A. Zahn, An empirical study of the half-normal plot, Technometrics 17 (1975) 201211.
[39] R.V. Lenth, Quick and easy analysis of unreplicated factorials, Technometrics 31 (1989)
469-473.
[40] P.D. Haaland and M.A. OConnel, Inference for effect-saturated fractional factorials,
Technometrics 37 (1995) 82-93.
25
[41] Z. Sidak, Rectangular confidence regions for the means of multivariate normal
distributions, Journal of the American Statistical Association 62 (1967) 626-633.
[42] Y. Vander Heyden, M. Jimidar, E. Hund, N. Niemeijer, R. Peeters, J. Smeyers-Verbeke,
D.L. Massart and J. Hoogmartens; Determination of system suitability limits with a
robustness test; Journal of Chromatography A, 845(1999)145154.
26
Table 1
Potential factors to be examined in the robustness testing of some analytical methods. HPLC
= high performance liquid chromatography, TLC = thin layer chromatography and CE
= capillary electrophoresis.
Method
1) HPLC
Factors
pH of the mobile phase
Amount of the organic modifier
Buffer concentration, salt concentrations
or ionic strength
Concentration of additives (ion pairing agents,
competing amine)
Flow rate
Column temperature
For gradient elution :
initial mobile phase composition
final mobile phase composition
slope of the gradient
Column factors :
batch of stationary phase
manufacturer
age of the column
Detector factors :
wavelength (UV or fluorimetric
detection)
voltage (electrochemical detection)
Integration factors : sensitivity
Injection temperature
Column temperature
Detection temperature
For temperature program :
initial temperature
final temperature
slope of the temperature gradient
Flow-rate of the gas
For flow-program :
initial flow
final flow
slope of the flow gradient
Split-flow
Type of liner
Column factors :
batch of stationary phase
manufacturer
age of the column
27
Table 1 (continued)
Method
Factors
3) TLC
Eluent composition
pH of the mobile phase
Temperature
Development distance
Spot shape
Spot size
Batch of the plates
Volume of sample
Drying conditions (temperature, time)
Conditions of spot visualisation (spraying of
reagent, UV detection, dipping into a reagent)
Electrolyte concentration
Buffer pH
Concentration of additives (organic solvents, chiral
selectors, surfactants)
Temperature
Applied voltage
Sample injection time
Sample concentration
Concentration of the liquids to rinse
Rinse times
Detector factors : wavelength (UV or fluorimetric
detection)
Factors related to the capillary :
batch
manufacturer
Integration factors
28
Table 2
Plackett-Burman designs applied in the guideline
(a) Minimal designs
No.of
factors
37
811
1215
1619
2023
Selecteddesign
PlackettBurmandesignfor7factors
PlackettBurmandesignfor11factors
PlackettBurmandesignfor15factors
PlackettBurmandesignfor19factors
PlackettBurmandesignfor23factors
No.ofdummy No.ofexperi
factors
ments(N)
40
30
30
30
30
8
12
16
20
24
Selecteddesign
PlackettBurmandesignfor7factors
PlackettBurmandesignfor11factors
PlackettBurmandesignfor15factors
PlackettBurmandesignfor19factors
PlackettBurmandesignfor23factors
No.ofdummy No.ofexperi
factors
ments(N)
43
63
63
63
63
8
12
16
20
24
29
Table3
PlackettBurmandesignfor11factors(N=12)
Exp.
1
2
3
4
5
6
7
8
9
10
11
12
Factors
Response
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
y1
y2
y3
y4
y5
y6
y7
y8
y9
y10
y11
y12
30
Table 4
Fractional factorial designs applied in the guideline
Legend : ng = not given
(a) Minimal designs
No. of
factors
Selected design
Generators
Expansion
generators
3
4
Full factorial : 23
Half-fraction factorial : 24-1
D = ABC
8
8
D = AB
E = AC
D = -AB
E = -AC
D = AB
E = AC
F = BC
D = -AB
E = -AC
F = -BC
D = AB
E = AC
F = BC
G = ABC
D = -AB
E = -AC
F = -BC
G = ABC
E=ABC,
F=BCD,
G=ABD
H=ACD
ng
16
9
10
11
12
13
14
15
ng
ng
ng
ng
ng
ng
ng
ng
ng
ng
ng
ng
ng
ng
16
16
16
16
16
16
16
31
Table 4 (continued)
(b) Designs for statistical interpretation of effects from two-factor interactions
No. of
factors
Selected design
Generators
3
4
Full factorial : 23
Half-fraction factorial : 24-1
D = ABC
8
8
E = ABCD
16
E=ABC,
F=BCD
16
E=ABC,
F=BCD,
G=ABD
16
E=ABC,
F=BCD,
G=ABD
H=ACD
16
32
Table 5
Rankits to draw a half-normal plot for the most frequently used screening designs
(effect 1 indicates the smallest effect)
Effect
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
N=8
0.09
0.27
0.46
0.66
0.90
1.21
1.71
Design size
N=12
0.06
0.17
0.29
0.41
0.53
0.67
0.81
0.98
1.19
1.45
1.91
N=16
0.04
0.12
0.21
0.29
0.38
0.47
0.57
0.67
0.78
0.89
1.02
1.18
1.36
1.61
2.04
33
Table 6
Effects from a seven-factors Plackett-Burman design (case study extracted from [15])
Factor (definition)
Standard concentration (D)
Ionic strength buffer (A)
pH of buffer (G)
Flow mobile phase (B)
Mobile phase composition (E)
Detection wavelength (F)
Age of the standard (C)
|Effect|
0.075
0.795
0.860
0.860
0.970
1.035
4.945
Rankit
0.09
0.27
0.46
0.66
0.90
1.21
1.71
34
Table 7
Composition of the mobile phase during the solvent gradient (% volume fractions); A =
0.25% ammonium acetate in water, B = acetonitrile, C = water
Solvent
A
0
50
13
50
Time (min)
15
50
25
43
43
25
25
25
25
25
17
50
22
50
Table 8
Factors and levels investigated in the robustness test.
Legend : Alltech = Alltech Hypersyl 3m BDS C18, and Prodigy = Phenomenex Prodigy
3m ODS (3) 100 A C18.
Factor
1. The flow of the mobile phase
2. The pH of the buffer
3. The column temperature
Units
Limits
Level (-1)
Level (+1)
Nominal
ml/min
0.1
1.4
1.6
1.5
0.3
6.5
7.1
6.8
23
33
ambient
Alltech
Prodigy
Alltech
24
26
25
41
45
43
% m/v
0.025
0.225
0.275
0.25
nm
260
270
265
35
1
1
1
1
1
-1
-1
-1
-1
-1
1
-1
1
1
-1
-1
-1
1
1
-1
-1
1
1
-1
Abbreviations
1. pH
2. Column
3. Dum1, dum2, dum3
4. Temp
5. % B begin
6. % B end
7. Flow
8. Wavelength
9. Buffer conc.
1
-1
1
-1
1
1
-1
-1
1
1
-1
-1
-1
1
1
-1
-1
1
-1
1
1
-1
1
-1
1
-1
-1
1
-1
-1
-1
1
1
1
1
-1
1
-1
1
1
-1
1
1
1
-1
-1
-1
-1
-1
-1
-1
1
1
1
1
-1
1
-1
1
-1
H
Flow
1
1
-1
-1
1
-1
1
1
1
-1
-1
-1
-1
1
1
1
-1
-1
1
-1
1
1
-1
-1
-1
-1
1
-1
1
-1
1
1
-1
1
1
-1
Factor
pH of the buffer
Column manufacturer
Dummy variables
Column temperature
Percentage B in the mobile phase at the start of the gradient
Percentage B in the mobile phase at the end of the gradient
Flow of the mobile phase
Wavelength of the detector
Concentration of the buffer
36
Table 10
Results of the experiments for the studied responses
Responses
Exp.
%MC
%RC1
%RC2 Rs(MC-RC1)
k(MC) Asf(MC)
tR(RC2)
1
2
3
4
5
6
7
8
9
10
11
12
101.6
101.7
101.6
101.9
101.8
101.1
101.1
101.6
98.4
99.7
99.7
102.3
100.9
101.2
101.7
103.0
99.3
99.9
100.8
100.2
97.1
100.5
98.6
101.1
101.4
102.7
101.3
102.9
99.1
101.7
101.4
98.8
101.8
99.3
98.7
103.1
5.691
7.484
5.770
5.025
5.440
5.711
5.932
4.962
5.427
6.344
6.715
5.186
3.800
5.083
4.000
3.167
3.800
5.817
5.250
3.200
3.367
5.350
4.783
4.933
0.813
1.031
1.453
1.549
1.458
0.861
0.836
1.059
0.977
0.853
0.920
1.412
11.500
13.000
9.833
9.483
10.317
12.567
12.083
8.417
9.200
13.800
13.317
11.150
Mean
RSD
101.0
1.15
100.4
1.52
101.0
1.61
5.807
4.379
1.102
11.222
37
Table 11
(a) Effects of the factors on the different responses, and
(b) Critical effects obtained from the different statistical interpretation methods
(a)
Effects on
Factors
%MC
%RC1
%RC2
Rs(MCRC1)
pH
Column
Dum1
Temp
%B begin
%B end
Dum2
0.683
-0.450
-0.683
-0.717
-1.117
0.883
-0.750
0.850
-0.083
-0.917
-1.150
-0.617
1.450
-1.150
0.000
-0.300
-0.500
-0.367
-1.067
0.467
-0.167
0.427
1.011
-0.154
0.408
-0.226
-0.584
-0.198
-0.547
1.269
-0.047
-0.008
-0.869
-0.347
-0.030
0.204
-0.432
-0.065
-0.103
-0.147
-0.013
-0.003
0.039
2.978
-0.039
-0.333
-0.539
-1.150
-0.122
Flow
Wavelength
Buffer conc.
Dum3
-0.017
0.517
-0.617
-0.250
-0.883
0.650
0.717
-0.350
-0.300
-0.533
1.100
-2.500
0.031
0.041
0.380
0.106
-0.592
0.047
-0.019
0.036
-0.146
0.067
0.029
-0.011
-0.939
0.084
0.022
0.144
tR(RC2)
(b)
%MC
%RC1
k(MC) Asf(MC)
tR(RC2)
1.919
2.778
Without Dum3:
= 0.1 1.419
Without Dum3:
2.054
4.694
1.604
3.471
1.088
0.500
0.122
0.121
0.354
0.370
0.090
0.090
0.262
0.084
0.067
0.228
0.186
0.545
0.440
1.476
1.205
1.939
1.582
1.307
1.064
0.691
0.562
38
%RC1
%RC2
a
(Without Dum3)
pH
Column
Dum1
Temp
%B begin
%B end
Dum2
Flow
Wavelength
Buffer conc.
Dum3
(b)
Factors
pH
Column
Dum1
Temp
%B begin
%B end
Dum2
Flow
Wavelength
Buffer conc.
Dum3
- (**)a
- (*)a
Rs(MCRC1)
**
*
**
*
**
-
k(MC) Asf(MC)
*
*
*
*
*
-
*
*
**
*
*
-
tR(RC2)
*
**
*
*
*
-
%RC1
-
%RC2
-
Rs(MCRC1)
-
k(MC) Asf(MC)
tR(RC2)
-
39
Table 13
(a) Predicted worst-case factor-level combinations for the different responses, and
(b) results at these conditions together with the derived SST-limits.
(a)
Factors
Rs(MCRC1)
pH
Column
Temp
%B begin
%B end
Flow
Wavelength
-1
-1
-1
0
+1
0
0
+1
-1
0
+1
+1
+1
0
+1
-1
-1
-1
0
-1
0
Buffer conc.
-1
k(MC) Asf(MC)
(b)
Run
Rs(MC-RC1)
k(MC)
Asf(MC)
1
2
3
4.870
4.819
4.702
2.800
2.800
2.817
1.453
1.483
1.429
Mean
Std. dev. (s)
n
4.797
0.0861
3
2.806
9.81*E-03
3
1.455
0.0271
3
2.57
1.62
40
Figure captions
Figure 1: Schematic representation of the different steps in a robustness test
Figure 2: (a) Normal probability plot, and (b) Half-normal probability plot
Figure 3: Half-normal probability plot for the effects of Table 6 with identification of the
critical effects ME and SME.
Figure 4: Normal probability and half-normal plots for the factor effects on the responses of
Table 11a.
41
Identification of factors
Definition of responses
Definition of additional
experiments, e.g. at
nominal level
Experimental set-up
Execution of experiments
Determination of responses
SST responses,
e.g. resolution
tailing factor
capacity factor
Quantitative aspect
responses
e.g. content
Calculation of effects
Statistical/graphical
interpretation of effects
Drawing conclusions
+
Give advice for
measures to be taken
Define worst-case
conditions for SST-responses
Determine SST-limits
42
Figure 2: (a) Normal probability plot, and (b) Half-normal probability plot for the effects
estimated from a N=12 Plackett-Burman design.
(a)
E 1.5
x
p
e 1.0
c
t
e
d .5
N
o
r 0.0
m
a
l -.5
-1.0
-1.5
-5
-4
-3
-2
-1
Observed effect
(b)
43
Figure 3
Half-normal probability plot for the effects of Table 6 with identification of the critical effects
ME and SME.
44
Figure 4: Normal probability and half-normal plots for the factor effects on the responses of
Table 11a.
%MC
%MC
1.5
1.2
1.0
Ef ect
.5
0.8
E xp e c t e d N o r m a l
0.0
0.6
0.4
-.5
0.2
-1.0
-1.5
-1.5
-1.0
-.5
0.0
.5
1.0
Rankit
%RC1
%RC1
1.5
1.6
1.0
1.4
1.2
E ffe c t
.5
E xp e c t e d N o r m a l
0.0
0.8
0.6
-.5
0.4
-1.0
0.2
0
-1.5
-1.5
-1.0
-.5
0.0
.5
1.0
1.5
1
Rankit
%RC2
%RC2
1.5
3
1.0
2.5
E ffe c t
.5
E xp e c t e d N o r m a l
0.0
1.5
-.5
-1.0
0.5
0
-1.5
-3
-2
-1
1
Rankit
45
Rs(MC-RC1)
Rs(MC-RC1)
1.5
1.2
1.0
1
.5
E ffe c t
0.8
E xp e c t e d N o r m a l
0.0
0.6
0.4
-.5
0.2
-1.0
-1.5
-1.0
-.5
0.0
.5
1.0
1.5
Rankit
k'(MC)
k'(MC)
1.5
1.4
1.0
1.2
.5
E ffe c t
1
0.8
0.0
E xp e c t e d N o r m a l
0.6
-.5
0.4
0.2
-1.0
-1.5
-1.0
-.5
0.0
.5
1.0
1.5
1
Rankit
Asf(MC)
Asf(MC)
1.5
0.5
1.0
0.45
.5
0.35
E ffe c t
0.4
E xp e c t e d N o r m a l
0.0
0.3
0.25
-.5
0.2
0.15
0.1
-1.0
0.05
0
-1.5
-.5
-.4
-.3
-.2
-.1
.0
.1
.2
.3
Rankit
46
tR(RC2)
tR(RC2)
1.5
3.5
1.0
3
.5
E ffe c t
2.5
0.0
E xp e c t e d N o r m a l
1.5
-.5
1
0.5
-1.0
-1.5
-2
-1
Rankit
47