Sampling of Discrete Materials-A New Introduction To The
Sampling of Discrete Materials-A New Introduction To The
www.elsevier.com/locate/chemolab
Abstract
The purpose of a theory of sampling is to answer two questions: How should one select a sample?—How much material should be
selected? Parts I (qualitative approach), II and III (quantitative approach) of this series propose answers to these two fundamental questions.
These answers are not entirely new (answers have been formulated since 1950), but a scientific theory is a living structure that has to be kept
up to date. At a course given in Brasilia in 1998, pointed questions were raised which suggested that the introduction to the qualitative
approach had to be clarified. Part I represents the most updated introduction to theory of sampling (TOS). More than 200 scientific papers,
books, lectures and courses on sampling theory—and practice—have been published or offered to the public by the author over a period of 50
years. A brief, chronological account of the development history of TOS is presented for the first time in part IV—with a comprehensive
literature survey as part V.
D 2004 Elsevier B.V. All rights reserved.
Keywords: Theory of sampling; Discrete materials; Heterogeneity; Sampling errors; Accuracy; Bias; Reproducibility; Representativity; Quality control; Quality
assurance; Chemometrics
People in charge of sampling, analysis, quality control and analysis alone. Another purpose of the present work is
chemometrics—in industry and research—often overlook therefore to contribute to the sampling aspect of the science
this extremely important point. of quality in the broadest possible context.
The title refers to bdiscreteQ materials. Matter is
essentially discrete, or discontinuous, beginning right at
the atomic scale. Continuity is a mathematical concept, 2. Factors at stake
which does not apply to actual matter. With particulate
solids the discrete elements are fragments, with liquids and In biology, medicine and pharmacy, the factors at stake are
gases they are molecules and ions. The term bdiscrete human health and life: our health and our lives! The problem
materialsQ covers matter from every possible source: ores is serious. It is, however, almost completely ignored. The
and minerals; cements; agricultural products; products of same factors arise, e.g. in the control of food or beverages; in
animal origin such as bones from which gelatine is the environmental control of air or water; control of discarded
extracted, human food and cattle feed. In fact, I began household refuse or industrial effluents.
writing this paper at the time of the Belgian chicken and In industry, the factor at stake is money, whether
British mad cow disease disputes, and the early editorial sampling for technical or for commercial purposes. Often
work associated with it ran concurrently with the foot-and- huge amounts of money—money you don’t make because
mouth crisis affecting European farming. These are poignant your products are inferior or money you lose because your
reminders of the important role played by proper sampling production facilities do not work as well as they could. In
from only the last 5 years. both cases the loss can very often be traced back to poor
Other examples are the materials processed and produced quality control and, more often than not, to an unnecessarily
in the chemical, pharmaceutical, and oil industries; materials large sampling error. Examples of such errors generating
involved in environmental control; the potability of the losses in the million-dollar range are trivial; in the gold
water we drink; the quality of the air we breathe; the refuse mining industry, losses in the billion-dollar range are
we discard; bodily fluids analyzed clinically; all the reported. The stakes are high, the problem is serious.
materials measured (assayed) in biological, pharmaceutical
or medical research; in soil contamination studies, and so
on. The list above is not exhaustive—it covers everything 3. Sampling—a technique or a science?
except compact solids such as mineral deposits.
Quality Control (QC), as well as traceability, have Sampling is both a technique and a science. Sampling is
become vital necessities in all these fields, whether for fundamentally a mass reduction achieved by the appropriate
technical or for commercial purposes. Quality estimation of technical means, but this operation must respect the batch
any batch of matter by one form or another of analysis composition as best as possible. We will see that there is no
begins in most, if not all cases, with several sampling stages. such thing as exact sampling: sampling itself generates
Indeed, analysis is practically always carried out on assay— errors and these errors must be controlled. Sampling science
portions weighing only a few centigrams.1 is nothing other than a theory of the sampling errors. The
To obtain analytical samples of matter as small as this most detrimental sampling errors occur when the scientific
from batches that may weigh anything from kilograms to aspects of sampling are ignored; when sampling is regarded
several tons (or even thousands of tons, e.g. when sampling as a mere handling technique, the tools which are a shovel
coal or iron ore in loading seagoing vessels), to achieve such or a scoop and a few bags or containers.
an extreme mass reduction, these batches must undergo a Even though the present sampling theory has been
certain number of sampling stages, alternating with essential developed for materials of mineral origin, it is universally
size reduction stages when dealing with solids. Sampling is valid for all types of discrete material: with particulate
nothing other than a mass reduction and is indeed a major solids, the size of the constituents (the fragments, the grains)
part of quality control, which is very often not recognized. is expressed in mm or Am. With liquids/gases the size of
For the present purpose, we can therefore summarize this ions and molecules is expressed in angstroms. The differ-
technical context as follows: ence between particulate solids, liquids and gases is not a
difference in essence but a mere difference in scale. The
Quality estimation ¼ Sampling þ Analysis
same theoretical rules apply to all phases. In part I, we will
There are of course many other significant aspects to only deal with the qualitative aspects of sampling theory.
bqualityQ, for example a modern multivariate approach, for The corresponding quantitative aspects will be summarized
which see e.g. the recent comprehensive textbook by [1] and in parts II and III.
the extensive references quoted herein. However, even this Below follow the essential minimum of definitions and
powerful chemometric approach still needs to take account descriptions of the relationships of the elements of the
of the sampling issues, which cannot be substituted by data theory of sampling. This tutorial must be brief and
comprehensive within given limits, and a strict axiomatic
1
Exception: up to 100 g when analyzing precious metals by fire assay. description is therefore necessary.
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 9
4. Basic definitions and notations 5 Samples vs. specimens; sampling vs. specimen-
taking
5 Constituents and lot to be sampled o Sample, S: subset of lot L, the elements of which
o Constituent elements or constituents, F: In a given have been selected in a correct way (defined
material, constituents are defined as the smallest below). By definition, samples are reliable.
operative elements that can be regarded as o Specimen: subset of lot L, the elements of which
autonomous and unalterable during the physical, have been selected in a non-correct way (defined
chemical and mechanical conditions of sampling: below). No safe decision can be made on the basis
with particulate solids these are fragments (hence of analyses carried out on specimens. By definition
the notation F); with liquids and gases they are specimens are unreliable and therefore dangerous.
molecules and ions. o Sampling: mass reduction of lot L by selection of
o Lot, L: From a theoretical standpoint, the batch or a certain subset of units, with the purpose—not
lot of discrete material to be sampled and always fulfilled—of obtaining a true, reliable
analyzed can be regarded as a set of units. We sample S (when the conditions of sampling
must take into consideration two kinds of sets and correctness are respected). Sampling provides
two kinds of units. either samples or specimens.
o Set: The set can either be: o Specimen-taking: the operation of extraction of
! a population of non-ordered units (e.g., unreliable specimens.
heap of stationary material), or
! a series of ordered units (e.g., sequence of
elementary cross-sections of a flowing 5. Definitions based on selection conditions
stream. The order here is chronological; it
may alternatively be spatial). Non-destructive sampling can be achieved only by
selection of a certain number of constituents of L that,
Different mathematical laws apply to these two kinds of
when gathered, will make up a certain subset: either a true
sets. It is a grave mistake, which can be very costly, to use
sample S or a specimen.
wrong laws or formulas. Unfortunately—most people do!
The selection issue, perhaps surprisingly, falls within the
o Unit, U: The unit taken into consideration can
mathematical province of the calculus of probability.
either be:
According to the distribution of the selection probabilities
! a single constituent F (solid fragment,
of the constituents, a selection, a sampling or a sample is
molecule, ion), or
defined so as to be:
! a group, G of neighbouring constituents, i.e.
an increment (a key concept) 5 Non-probabilistic: when certain constituents of the lot
o Increment, I: group of neighbouring constituents have a zero probability of being selected (Fig. 1).
extracted from lot L at the same time, in a single 5 Probabilistic: when all constituents of the lot have a
operation (pass) of the sampling tool or device. non-zero probability of being selected for the benefit of
Fig. 1. When sampling, e.g. from a truckload with a shovel, only about the top 15% or so of the entire lot is directly accessible. This illustration is a generic
example of very many practical sampling situations in all sciences, especially if it is recognised that the exact same situation holds over very many scales, e.g.
from railroad cars to analytical vials. The sampling process is non-probabilistic, however, and can consequently never produce a representative sample.
10 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24
Fig. 4. Example of true fractional shovelling with a sampling rate (in this case) of 1:5. The key issue is to select the sample in a random fashion only after
completion of shovelling.
12 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24
Fig. 5. Example of degenerate fractional shovelling, here also with a sampling rate 1:5. In this case, the sample necessarily must be pre-selected, which invites
selection bias, etc.
tees the uniformity of the selection probability. The safest take their loads in sequence. The whole operation lasted only
operator is the one who does not know about the material 2 days and cost a very small fraction of the total expenses
he is splitting. (which were formidable at the time).
Until recently, shovelling, true or otherwise, used to be 5 Incremental sampling: This is the preferred method
restricted to hand shovels and to relatively small weights (a for sampling flowing streams of particulate solids, for
few tons maximum in the year 2000), but the author has liquids and multi-phase media, i.e. so-called bone-dimen-
extended this technique to mechanical shovels with much sional objectsQ the sampling of which will be developed in
higher capacities, which opens up new possibilities for full in part III below. In this introduction, we only need to
deterministic fractional shovelling of much higher weights. present the three principal possibilities actually imple-
For instance, in a big harbour in the Pacific area, the author mented in the industrial and technological practice and
was acting as a referee between seller and buyer of a 16,000- outline their respective properties. They are illustrated in
metric ton lot of valuable concentrate, whose results did not Figs. 6–9.
match. It was possible to rent front-end loaders that were The operation of taking all of the stream for a fraction of
used as mechanical shovels. A three-stage deterministic the time is based on the mathematical model of point
fractional shovelling with a splitting rate of 1/20 (5-ton integration of a curve, which will be fully developed and
loader) was implemented; followed by a 1/10 (5-ton loader) illustrated in part III. It generates errors specific to the
and finally a 1/10 (0.5-ton loader) splitting, thus obtaining an incremental sampling of one-dimensional flowing streams.
800-ton primary sample, an 80-ton secondary sample and an It is the only one that is probabilistic and easily rendered
8-ton tertiary sample, which was sent to the laboratory for correct (Fig. 7).
further reduction. This large-scale operation was correct and Figs. 8 illustrates the taking of a fraction of the stream
commendable as the operators, though very competent, were for the whole of the time. It is obviously not
not acquainted with mineralogy and had been instructed to probabilistic.
Fig. 6. Developing the mathematical model behind practical sampling of a stream of particulate matter. Preview of Part III, here only used to facilitate a
backdrop for illustrating the fundamental concept of bcross-stream samplingQ, which is shown here in three different alternatives: stopped-belt, uni-directional
and bi-directional sampling.
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 13
Fig. 7. The principle of correct cross-stream sampling: taking all of the flow for a fraction of the time; shown here are two correct increments.
Figs. 9 illustrates the taking of a fraction of the stream for Relative errors are often easier to use and to compare
a fraction of the time. It also is obviously neither than absolute errors, and this fact will be used
probabilistic nor correct. extensively in the exposition of the theory of sampling
in all of what follows below. Several essential defi-
nitions now need to be introduced (below a star
8. Sampling errors—the Global Estimation Error (GEE) introduces the name and notation of a new sampling
error):
5 Component A: Let A be a certain component of interest.
This must be a physical entity that can be isolated or * Global Estimation Error (GEE): this is the relative
observed without any alteration of the material, e.g. a difference between the analytical result a R and the
mineral—as opposed to the entity a bchemical actual, well-defined but unknown, value of a L
componentQ, e.g. such as a metal in the same mineral,
aR aL
the proportion of which is estimated by analysis. In the GEE ¼ ð1Þ
size analysis of an aggregate for example, component A aL
can be a certain size fraction of interest, e.g. the 5 Properties of GEE: Above, it was mentioned that
material which remains between, say, the 10- and 5-mm quality estimation is a sequence of two error-generating
screens. In a moisture analysis, component A is the groups of operations: sampling and analysis, which
free, adsorbed water as opposed to the constitutional implies:
water that belongs to the lattice structure of the minerals
present. Global Estimator Error GEE
5 Grade a: mass proportion, defined as follows (for ¼ Total Sampling Error TSE
solids): þ Total Analytical Error TAE ð2Þ
Mass of component A in a given object
a¼ * Total Sampling Error (TSE), defined as follows:
Total mass of dry solids in the same object
(when a is the moisture content of the object, this aS aL
TSE ¼ ð3Þ
proportion is usually expressed with the mass of wet aL
solids in the denominator instead of that of dry solids).
* Total Analytical Error (TAE) defined as follows:
o Grade a L of lot L: true, well-definable proportion
of component A in lot L. This proportion is always aR aS
unknown. It is also sometimes called the bcritical TAE ¼ ð4Þ
aL
content of LQ. The purpose of the entire sampling
evaluation process is to estimate a L . 5 Properties of TSE: For all purposes, sampling oper-
o Grade a S of sample S: true, well-definable ations can in general be broken up into two main
proportion of component A in sample S. This stages, primary and secondary, which generate two
proportion is always unknown. The grade a S is an main groups of errors: the primary error—extraction of
estimator of a L . a laboratory sample S 1 from the lot L; and the
5 Analytical result a R : this is an estimate of a S and hence, secondary error—extraction of an assay portion from
in turn, also an estimator for a L . The analytical result is sample S 1 (potential subdivision of the sampling
used as the final estimate of a L . process in more stages than two, if needed in practice,
Fig. 8. Example of non-probabilistic, hence incorrect sampling of a one-dimensional body or stream. It is incorrect only to take a fraction (of the cross-
section) of the flow, even though extracted all of the time. This has a critical bearing on many currently popular PAT schemes (Process Analytical
Technologies), which often suffer from exactly this incorrectness.
14 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24
Fig. 9. Another example of non-probabilistic, hence incorrect sampling of a one-dimensional body or stream. This is but a variant of the situation depicted in
Fig. 8, in which sampling is only taking place for a fraction of the time from a fraction of the stream.
is trivial and not needed for the present introductory are especially in the domain of trace and ultra-trace
theoretical understanding): concentrations, etc.
Total Sampling Error TSE 5 Sampling vs. analysis-first conclusions: In view of the
¼ Primary Sampling Error PSE above definitions, we hope that the reader will accept
þ Secondary Sampling Error SSE ð5Þ the following first conclusions:
n Sampling requires at least as much care (and
* Primary Sampling Error (PSE), a random error investment) as analysis, which is unfortunately all
defined as follows: too often overlooked. Why? Analytical chemistry
is taught extensively at practically all universities
aS 1 aL
PSE ¼ ð6Þ worldwide—but the theory of sampling is not.
aL (There exist only a few notable exceptions that
* Secondary Sampling Error (SSE), a random error the author is aware of, such as at Lappeenranta
defined as follows: Technological University, Finland and 2lborg
University Esbjerg, Denmark, as well as at
aS 2 aS 1 the Powder Science and Technology group
SSE ¼ ð7Þ
aL [POSTEC], Porsgrunn in Norway). Sampling
would seem to fall in a no-man’s land between
5 Additivity of sampling and analytical errors. From the several university departments and thus remains
above definitions, we can easily deduce the following ignored by practically all. When confronted, all
relationships, well-known from statistics: bresponsibleQ authorities give the same, universal
answer—that sampling is somebody else’s prob-
Global Estimation Error GEE lem. Whose problem? Analysts (chemical, other)
¼ PSE þ SSE þ TAE Additivity of Errors ð8Þ are all brought up academically to be totally
convinced that their task first begins when the
The three random error components of GEE are
proverbial blaboratory sampleQ is received and
independent of each other with the following
then processed as best as they can, which usually
consequences (the mean, or expected value of a
is very well. When specifically asked however,
random error is the bias):
they usually don’t know how it has been obtained
and, which is worse, they usually don’t care. This
mðGEEÞ ¼ mðPSEÞ þ mðSSEÞ
is both a professional as well as a scientific insult
þ mðTAEÞ Additivity of Biases ð9Þ to the integrity of the overall quality estimation
process!
r2 ðGEEÞ ¼ r2 ðPSEÞ þ r2 ðSSEÞ n Proper sampling of discrete materials and more
specifically particulate solids is, however, the
þ r2 ðTAEÞ Additivity of Variances object of a complete theory that has never been
ð10Þ seriously contested in 50 years! More than 120
articles in various European languages and 9 books
5 Remark: Experience shows that sampling biases and in French and English have been published on the
variances, especially those resulting from the prac- theory of sampling by this author (clandestine
tical implementation of the theoretical model, can translations into Russian and Japanese also exist).
be much, much larger than analytical biases and Various other authors have also written books and
variances. In our activities as a consultant and numerous articles about this theory. Is the theory
troubleshooter we have met primary sampling biases ignored because it is new? Hardly—the first paper
as large as 1000% (relative) and secondary sam- was written in 1950! A chronology of sampling
pling biases as large as 50%, whereas analytical theory development (for the first time ever) is
biases do not usually exceed 0.1–1%. Exceptions presented in part IV of this series.
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 15
n It is meaningless and misleading to express an 10. Breaking up the relative sampling error, the
analytical result with three or four digits (which the components of TSE
user must assume to be significant) when the
bsamplesQ (alas all too often just unreliable speci- Two major classes and several subclasses of error may
mens) have been taken without any respect for the occur at each sampling stage:
rules derived from proper sampling theory, with the
consequence that the second or even the first digit * Correct Sampling Errors (CSE): these are tied to the
is likely to be suspect due to unrecognized mathematical model that is based on the assumption that
sampling errors. sampling is correct (uniform selecting probability P).
n It is therefore critical that bsomeoneQ, preferably As a consequence of the material constitution or
the analyst in charge of quality control (or a structure, the correct sampling errors are structural and
similar company, or institutional, authority—for therefore inevitable. Due to this, sampling is first of all a
the sake of consistency), should be responsible science that falls in the province of the calculus of
for proper sampling. But he (or she) should of probability. Due to the existence of both the constitu-
course be taught at least the fundamentals of the tional and the distributional heterogeneities, as defined
theory of sampling and preferably a complete earlier (homogeneity never exists in the real world),
course. CSE never cancels out, i.e. CSE is never zero.
n The same person should be in charge of * Incorrect Sampling Errors (ISE): These errors result from
sampling both in bthe fieldQ, e.g. in the not respecting the assumptions of the sampling model that
industrial facilities involved where the primary sampling is correct; ISI are thus circumstantial. They can
sampling takes place, as well as in the labo- and therefore must be avoided as much as possible, which
ratory where the ultimate assay portions are implies, indeed requires, proper knowledge of the
extracted from the laboratory sample before being sampling theory by the personnel in charge of sampling.
submitted to the analytical process. Sampling is a The components of ISE result from the fact that sampling
scientific, holistic process approach, certainly not in practice is also very much a technique. ISE cancels
a fragmented, local handling problem with frag- when sampling is carried out correctly—and only then.
mented responsibility sharing between several
persons! TSE ¼ CSE þ ISE ð13Þ
Etienne Roth wrote in the preface of [11]: bAnalysts In the quantitative approach, we always have to assume
should refuse to give results whenever they are not that all sampling is probabilistic. We cannot take non-
satisfied that the samples they process are truly repre- probabilistic sampling into consideration for the simple
sentative of the batch they are supposed to represent.Q reason that the errors it generates, by definition, cannot be
The critical concept of representative sample is defined in dealt with from a proper theoretical standpoint. They are
Section 18. Unfortunately, Etienne Roth’s advice is widely unpredictable and thus outside the realm of statistics and
ignored. probability calculus.
The two additive TSE errors are independent of each
other in probability, which entails:
9. Sampling errors—the Relative Sampling Error (TSE)
mðTSEÞ ¼ mðCSEÞ þ mðISEÞ ð14Þ
At every sampling stage, we shall now define:
r2 ðTSEÞ ¼ r2 ðCSEÞ þ r2 ðISEÞ ð15Þ
n Relative Sampling Error (TSE):
as aL
TSE ¼ ð11Þ
aL 11. Breaking up the Correct Sampling Error (CSE)
n Absolute Sampling Error (ASE): We must distinguish between the so-called bzero-dimen-
sional modelQ (valid for populations) and the opposing bone-
ASE ¼ aL TSE ¼ faR TSE ð12Þ dimensional modelQ (valid for time series and extensive one-
dimensional bodies).
(in the following, the sign = ~ means bpractically
equal toQ). TSE is a random error (a few exceptions will 5 First Case: bzero-dimensional modelQ. All sampling
be pointed out in Section 12). As such, TSE is charac- errors result from the existence of one form or another
terized by its pertinent statistical distribution law and of heterogeneity. We defined two main forms of
moments. heterogeneity above, the constitutional heterogeneity
16 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24
and the distributional heterogeneity. The mathematical in full and illustrated in Part III. Here it suffices to
sampling models assume that all constituents are define this error, PSE, in a first brief overview fashion:
submitted to the selection process:
Point Selection Error (PSE): When sampling a lot L
C1—with a uniform probability, P of being selected flowing between time t=0 and time t=T L , one first
(sampling is correct), selects a certain number Q of instants t q on the time
C2—one by one independently (all selected elements axis; these will be taken as the locations of point-
are independent from one another). increments, I q . The selection of these points is
usually correct. The integral of the grade function
Condition C1 is fulfilled by the underlying assump- a(t), to which the grade a L to be evaluated is
tion. As regards C2, we must distinguish between two proportional, is replaced, in the sample, by a set of
cases and define two components of CSE: Q estimates a(t q ) obtained by extracting a set of
material-increments, and weighing, preparing and
Fundamental Sampling Error (FSE): When condi- assaying them (see Fig. 6). This substitution is what
tion C2 is fulfilled, the Correct Sampling Error generates the Point Selection Error (PSE), which is
(CSE) is limited to its incompressible minimum that also additive.
we shall call the Fundamental Sampling Error
(FSE). The fundamental error is the consequence General case, condition C2 is not fulfilled:
of the sole constitutional heterogeneity.
Condition C2 is fulfilled : CSE ¼ FSE ð16Þ One dimensional model : CSE ¼ PSE þ ðFSE þ GSEÞ
ð18Þ
Grouping and Segregation Error (GSE): In all
practical sampling, however, condition C2 is never
fulfilled (except in ideal experimental studies): we
are never in a position to extract the elements 12. Breaking up the Incorrect Sampling Error (ISE)
(fragments, particles) one by one. What we actually
do is to extract increments, I made of neighbouring We will illustrate this section with the example of a
elements having a uniform probability P of being mechanical cross-stream sampler operating at the discharge
selected: the first condition is fulfilled but the of a conveying/feeding device (particulate solids), or of a
second is not. Always working, as we do in the piping system (liquids or multimedia systems). These errors
field of gravity,2 neighboring elements (fragments, can be easily carried over to other scenarios. The
molecules and ions) can never be assumed to be mathematical model deals with extension-less point-incre-
independent from one another. They are on the ments. The sampling operation materializes the point-
contrary often spatially correlated with one another, increments of the model and transforms these into groups
for example through differential gravity segregation, of whole components. This operation can be broken up into
surface stickiness or similar. A correlation then exist three independent steps:
between an element’s bpersonalityQ3 and its position
within the volume domain occupied by lot L. In this 5 Increment Delimitation: definition of a geometrical
case, a second error is added to FSE, namely the domain around each point-increment—often loosely
Grouping and Segregation Error (GSE). This error called bthe sampleQ (noun) when no confusion can arise,
is a consequence of the distributional heterogeneity 5 Increment Extraction: extraction—bsamplingQ (ver-
that is itself a function of the constitutional bum)—of the elements present in this domain,
heterogeneity. 5 Increment and Sample Processing: physical gathering
of the increments and associated operations such as
General case, condition C2 is not fulfilled: transportation, crushing or grinding, homogenizing,
drying, etc.
Zero dimensional model : CSE ¼ FSE þ GSE ð17Þ
5 Second Case: the bone-dimensional modelQ. At the These three operations are, of course, also liable to
scale of increments, the zero-dimensional model applies generate errors. Theoretical as well as experimental
locally. An additional error, specific to one-dimensional research, the results of which can be found in dedicated
sampling, is also generated however. This is discussed textbooks [7–10], have shown how it is possible to define,
2
and consequently to cancel the ISE errors.
Exception: work in zero gravity, aboard a space station. This has been
done in studies or experiments requiring a total absence of segregation.
3
We here loosely use the term: bpersonalityQ to denote the set of * Incorrect Delimitation Error (IDE): when taking an
physical properties such as size, density, shape, which are correlated with increment I, a correct selection (uniform probability of
the chemical composition of the element. selection P of all constituents making up the stream) is
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 17
achieved when each strand of the stream is collected fragments and creation of fines as a result from
during the same time interval T I—and only then. This breakage of the former during rough handling, etc.
condition involves: o INVOLUNTARY FAULTS committed by the
o Cutter geometry: the sides of a straight-line cutter operator (focus is on the adjective involuntary).
must always be parallel; those of a rotating cutter Most sampling operators, although acting in good
must be radial. faith, lack even the most elementary TOS qual-
o Cutter velocity, which must remain constant ifications and hence are likely to make mistakes
during the whole stream traverse. because of ignorance, clumsiness or negligence.
An error IDE is generated whenever one of these conditions o DELIBERATE FAULTS committed by the oper-
is not fulfilled. IDE is zero when these conditions are met. ator or by any person having access the samples
Increment delimitation is a purely geometrical operation. (focus is here on the adjective deliberate). Sam-
pling, being the weakest link in the chain leading to
* Incorrect Extraction Error (IEE): The elements contained the analytical result, is the most vulnerable to fraud.
in the model increment that is delimited by the cutter Salting of gold ores is a classical example; two
edges and is assumed to be correctly delimited, must of other sensitive areas would be commercial sam-
course be materially extracted. More precisely, all the pling and (politically oriented) environmental pro-
elements whose centers of gravity fall completely within tection, e.g. where uranium is concerned. The author
the model-increment must all be correctly incorporated in has personally witnessed all these forms of fraud,
the actual physical increment. To achieve this, two including tampering with the analytical results.
conditions must be simultaneously fulfilled:
o The first condition involves cutter geometry: the Incorrect Sampling Error (ISE). Recapitulation in the
cutter width W should be at least equal to three most general case
times the diameter of the coarsest fragments, or 10
mm, whichever is greater. ISE ¼ IDE þ IEE þ IPE ð19Þ
o The second condition involves cutter velocity,
which must not exceed 0.6 m/s. Remark: To give the reader an example of the practical
When sampling high flow-rates and/or high-velocity importance of the above rules we saw that, with cross-
streams (over several hundreds t/h and velocities over 2 stream samplers, cutter velocity must remain constant
m/s) a safety factor must be invoked. Example4: for a during the whole traverse. This rule is not arbitrary. It
20,000 t/h (5.5 t/s) stream of coarse iron ore discharged by a results from simple geometrical considerations and cannot
belt conveyor running at a velocity of 4 m/s, a 200-mm be scientifically disputed. There are (at least) four ways of
cutter opening had to be selected. This factor was defined on driving a sampling cutter: electric, pneumatic, hydraulic and
the basis of experience. manual. It has been shown experimentally that only the
electric drive can assure a constant velocity (and then only
* Incorrect Processing Errors (IPE): IPE is in general a when the motor is adequately dimensioned). This observa-
sum of six components: tion has been repeatedly outlined in sampling publications
o CONTAMINATION of increments, and samples, for more than 30 years. In spite of this, as far as we are
by foreign material, e.g. dust; rust, material aware, some standards still go on recommending pneumatic
present in the sampling circuit, or belonging to a or hydraulic drives; manufacturers go on proposing these to
former sample (cross-contamination) and not their clients and the samples so obtained go on being
removed by cleaning, etc. incorrect and therefore biased.
o LOSS of material belonging to increments or
sample, e.g. dust; material remaining in the Recapitulation in the most general, one-dimensional
sampling circuit at the end of an operation and case: TSE is the sum of six components:
not recovered while cleaning, etc.
o ALTERATION in chemical composition: e.g., TSE ¼ CSE þ ISE
loss of molecules of water belonging to the lattice ¼ PSE þ ðFSE þ GSEÞ
structure, due to overheating upon drying, etc. þ ðIDE þ IEE þ IPEÞ
o ALTERATION in physical composition (this con-
cerns more specifically humidity and size distribu-
tion). For example, addition of external water (rain,
sprays); loss of moisture by exposure to an unavoid- 13. Properties of the Total Sampling Error (TSE)
able heat source (sun, kiln or stack); loss of coarse definitions based on sampling results
4
This is a world record, still valid, that the author established in Brazil The definitions to follow are central for a proper under-
in the 1980s (designed in 1976). standing of the theory of sampling. We will define and
18 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24
review each of the fundamental concepts: bexactQ, baccurateQ, theory a more unified basis; this update is published here for
breproducibleQ and brepresentativeQ sampling in detail. the first time.
In all former publications, accuracy was defined by the
following property of the mean of the relative sampling
14. Distribution law of the Total Sampling Error (TSE) error:
5 In most cases, TSE is a random variable. Notable mðTSEÞb ¼ m0 Former definitionZ no longer valid Z
P
exceptions are when money is at stake, but also when
of anbaccurate sampleQ:
the protection of the environment is a political issue.
TSE can then be due to a deliberate, non-random, The difficulty was in assigning a specific value for
alteration of a S (addition or subtraction of component m 0. We observe that we have, in fact, no control on the
A, addition of foreign material, tampering with the structural Correct Sampling Error (CSE). We now base
results, etc.), in which case it obviously does not our new definition of baccuracyQ on the properties of the
follow any statistical law. Such non-random berrorsQ additional, purely technical and circumstantial Incorrect
have been met with not infrequently by the author in Sampling Error (ISE), which we can control and thus
the precious metal and uranium industries, as well as cancel. We now define that a selection or a sample is:
in commercial sampling of many commodities. 5 bAccurateQ: when the mean m(ISE) is zero :
5 When a L is larger than, say 1 ppm, the pertinent
distribution law is normal, or may at least often be well mðISEÞ ¼ 0 Y mðTSEÞ ¼ mðCSEÞ þ mðISEÞ
approximated by a Gausian distribution. When a L is
smaller than 1 ppm (trace concentrations), the distribu- ¼ mðCSEÞ ð22Þ
tion often becomes asymmetrical. According to the case
This is then the new definition of an baccurate sampleQ. TSE
it can often be approximated by a lognormal or a
is now reduced to its structural minimum; this amounts to
Poisson distribution, etc.
bforgettingQ the structural bias m(CSE).
5 Whenever the sampling error is a random variable, with
Experience and computer simulations show that
an assumed distribution law, it can be characterized by
m(CSE) is, in very many situations, negligible (108—
its statistical moments (expected value, variance, mean
relative—with iron ore sampling for example), but there
square).
is an important caveat: This simplification does not hold
for the domain of trace concentrations. To all other
practical intents and purposes therefore, correct sampling
15. The unrealistic concept of sampling exactness
is accurate when:
A sampling or a selection could be said to be (focus is on
mðTSEÞ ¼ mðCSEÞ
the conditional):
5 Exact: If the total sampling error TSE was zero, irres- ¼ f0 ð ¼ fbpractically equal toQÞ ð23Þ
pective of the composition of the material being sampled.
5 bBiasedQ: when the mean m(ISE) is non-zero:
TSE ¼ 0 Unrealistic definition of anbexact sampleQ
mðTSEÞ ¼ fmðISEÞbN0
ð21Þ ðb Nbsignificantly different fromQÞ ð24Þ
We saw earlier that, for mathematical reasons, the This is then our new definition of a bbiased samplingQ, or a
fundamental sampling error FSE, and with it TSE, can bbiased sampleQ:
never be identically zero. It is a fatal misuse of terminology 5 bBiasQ: Bias is defined as the algebraic value of
to postulate (as is often heard) that a particular sampling is m(ISE):
bexactQ. Wishful thinking is probably the biggest danger in
sampling. The use of the terms bexactQ and bexactnessQ mðTSEÞ ¼ fmðISEÞ ð25Þ
should be discontinued completely where sampling is
concerned. This is also the new definition of the bincorrect sampling
biasQ.
Remark: The random component of the sampling errors,
16. Definitions based on properties of the mean, represented by the variance, tends to reduce when
m(TSE)—concept of sampling accuracy averaging large numbers of data but the systematic
component, represented by the bias, does not. It is therefore
The following section constitutes the new theoretical of the utmost importance to guarantee sampling accuracy
derivation, mentioned in the abstract, which allows the by respecting all of the conditions of its correctness.
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 19
Accuracy is the major quality of a sam- below are meaningless or simply wrong and therefore
pling, or a sample inherently dangerous. Examples, to be found in many places
in literature or even in international standards, include:
17. Definitions based on properties of the variance, o ISO Standard on oil sampling: This standard defines
j 2(TSE)—concept of reproducibility brepresentativenessQ as a sole property of the variance,
which is our definition or breproducibilityQ, thus
Following the same line of argument as in Section 16, we assuming, without any justification, that the sampling
will now define that a selection, a sampling or a sample is: is unbiased. This confusion is frequent.
o Standards on analysis: bThe assay must be carried out
5 bReproducibleQ (or bsufficiently reproducibleQ): when on representative samplesQ. But these standards fail to
the variance r 2 (TSE) is smaller than, or equal to, a give any definition of a representative sample.
certain value r 02+ regarded as the maximum acceptable: o Book on sampling, published in 1995 (for reasons of
politeness we refrain from referencing this source):
r2 ðTSEÞ ¼ r2 ðCSEÞ þ r2 ðISEÞb ¼ r20
bRepresentative sample: this is a sample that is typical
ðDEFINITION of abreproductible sampleQÞ5 ð26Þ of the lotQ.—But what is a typical sample? This is not
defined!
When the sampling is carried out correctly, then—but only o Other examples are legion—and found within very
then: many domain-specific sciences, e.g. the geological,
botanical, zoological, ecological, medical, environmen-
r2 ðISEÞ ¼ 0 Y r2 ðTSEÞ ¼ r2 ðCSEÞb ¼ r20 ð27Þ tal sciences. Inanysciencewheredescriptions,oranalyses
5 bNon-reproducibleQ (or binsufficiently reproducibleQ) of samples composed of heterogeneous, or bcompositeQ,
when condition (26) is not fulfilled: bcompoundQ materials are on the agenda, the need for an
unambiguous definition of a representative sample is
r2 ðTSEÞ N r20 essential. The present theory of sampling is universal.
ðDEFINITION of abnon reproductible sampleQÞ
ð28Þ
19. The only practical question: how to obtain a
representative sample?
18. Definitions based on properties of the mean square,
r 2 (TSE)—concept of brepresentativenessQ According to its statistical definition, the mean square,
r 2, is the sum of two terms, the square of the bias and the
We can further develop this line of argument. We can variance. For the total sampling error (TSE), this translates
now also define that a selection, a sampling or a sample is into:
(see Section 19): r2 ðTSEÞ ¼ m2 ðTSEÞ þ r2 ðTSEÞ ð31Þ
5 bRepresentativeQ (or bsufficiently representativeQ): The mean square r 2(TSE) of the total sampling error is
when r 2 (TSE) is smaller than, or equal to, a certain therefore minimum only when both its components
value r 02 regarded as the maximum acceptable: m 2(TSE) and r 2(TSE) are minimized,
A sample is therefore representative when it is at the a very short period. In a case between South American
same time correct (which entails maximum accuracy = con- mines and a European smelter, the author was asked to act
trol of bias) and reproducible (control of variance). It is as an expert witness and found out that due to a
essential to realize that both of these qualifiers are attributes (deliberately?) biased hand sampling, the mines had lost
of the sampling process, not of the sampled material by $7 millions in 3 years. This was easily fixed. We therefore
itself. It is not possible to ascertain whether a particular reach this important conclusion:
sample (increment) is representative or not, based solely on
the characteristics of the sample itself. There can be no compromise where sampling
correctness is concerned
conditions. They can manufacture a correct sampler or 23. Sampling and standardization—a warning
propose a non-probabilistic sampling method. It is
meaningless to ask them to design an accurate sampler The reader should be informed that in the year 2000,
delivering bias-free samples. This is a language they 99% of the standards on sampling already published, or
cannot understand! being prepared by ISO committees or national organiza-
5 The users of analytical results are first of all (or tions, give recommendations that are either arbitrary, or
rather bshould beQ) interested in the properties of the contrary to theory, or both! With fewer than a handful of
sampling errors. They can seldom express their exceptions such as Francois Clin (France) and Ralph
wishes in terms of mean, variance or mean square Holmes (CSIRO, Australia), people in charge of writing
as delineated by the theory of sampling. Here again, standards right up until very recently have deliberately
this is a language they don’t use or understand. What chosen to ignore the existence of the Theory of Sampling,
is wanted however is a straight brecipeQ to obtain which, however, has been scientifically accepted worldwide.
bgood, reliable samplesQ on which safe decisions can But things now finally seem to be moving:
be made, i.e. what was termed brepresentative
Those of us who are traveling a lot by air must be very
samplesQ above.
grateful to IATA for the dCustomer Acceptance ManualT that
5 Role of sampling theory: One of the prime purposes of
defines the tests to be undergone by any new aircraft, more
the Theory of Sampling, TOS, is to devise mathemat-
specifically the acceptance flight tests, that have been
ical relationships between the sampling conditions
designed by highly qualified specialists, NOT by an ISO
(defined in Section 5) and the sampling results (defined
technical committee such as those we see at work in
in Sections 13–18). The reader is kindly referred to
sampling. (P.G. quoting from bAir France reviewQ)
Refs. [18–20] in English, or to Refs. [17–19] in French,
It is one thing to standardize the insulation of electrical
for a complete exposé of the theory of sampling.
appliances (which standards are very good at) and quite
5 Sampling theory builds a bridge between the qualities
another to standardize what is a complex and practically
the users expect (or demand), often expressed in a
ignored science. Standards organizations should limit their
loose way, and the properties of the methods that can
activities to the first problem and ask specialists to deal with
be implemented or of the equipment the manufac-
the second.
turers can realize. These wishes are not always
According to an ISO officer, bthe role of Technical
compatible and someone has to be responsible for
Committees (TC) is limited to describing the practices on
finding a compromise acceptable to both parties. This
which commerce has based itself for a long timeQ. This calls
is the role of the sampling theoretician and the
for two remarks:
sampling consultant.
5 The role of University should be to teach beveQ
rybodyQ at least the necessary minimum of the 5 This author contests the logical validity of this
concepts of the TOS. Unfortunately, so far (allowing philosophy. There is an obvious conflict of interests
for the few, important recent Scandinavian exceptions between sampling standardization and commerce. The
mentioned above), there has barely ever been in financial stakes are usually so high that some standards
universities anyone to teach this curriculum. But this have obviously been written in favor of one of the
is a gap which has to be filled as soon as possible. parties involved, such as the producer of a given
bSomeoneQ has to teach the would-be teachers in the commodity or a sampling equipment manufacturer,
universities not yet conscious of the current predic- whose equipment does not respect the rules set by the
ament. This is one major objective of the recently theory. Some standards seem to have been deliberately
founded organizations: biased by the TC members themselves.
5 Most standards committees are deaf to theoretical
International Sampling Institute (ISI) arguments. Decisions are made, texts are adopted, by
International Sampling Forum (ISF) a ballot, which amounts to asking a score of people
picked on the street—not even at random—to write
ISI was created in France (1999) by a group of sampling recommendations about scientific matters. It amounts,
specialists and consultants, while ISF acts on a global scale in fact, to asking laymen to answer the question: bIs
within academe; ISF is led by a virtual board of interna- Newton’s first law valid?Q—Answer by: bYesQ or bNoQ!
tional directors from 2002. ISF, in particular, will make it its
objective to reach out to the world university communities The author believes that the whole philosophy of national
on all matters of TOS and practical proper sampling. and international standardization of sampling has to be
Collaboration between these two bodies was instrumental contemplated thoroughly again, and perhaps completely
in organizing the First World Conference on Sampling reinvented. This is another reason that both ISI and ISF, like
(WCSB1) in 2003, the proceedings of which contain this IATA, works only with highly qualified sampling specialists
tutorial. and scientifically accomplished university representatives.
22 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24
24. Interrelationships between sampling errors—con- light enough (small enough) to be submitted integrally to
densed recapitulation of part I analysis.
5 Sampling is a science—emphatically not a simple sampling errors and proposes practical solutions. It
picking technique. should be taught worldwide, especially at technical
5 A theory of sampling (TOS) does indeed exist. It can and other universities. At this moment, it is not, but
be contested (which it has never been), but it cannot a certain momentum has at last begun to be
continue to be ignored. It explains the generation of discernable.
24 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24
5 Most standards on sampling are inadequate because especially when human life, health, scientific insight, or
they ignore the very existence of TOS. Some are money are at stake.
deliberately biased. This practice must be discontinued
as quickly as possible. Something must be done about 26. References
this, and it will: ISF, ISI.
5 Correct sampling alone provides accurate, bias-free, References to Pierre Gy’s own publications in the first
reliable samples. All sampling must therefore be four parts of this series are collected in part V; they are
carried out correctly. There can be no arguments referred to sequentially in the first four parts. A few external
against! references appear in the individual parts, respectively.
5 Non-correct sampling, i.e. non-probabilistic or proba-
bilistic but incorrect sampling, can provide nothing but External reference for part I
potentially biased, therefore unreliable specimens on the [1] H. Martens, M. Martens, Multivariate Analysis of Quality. An
basis of which no safe decision can ever be made, Introduction, Wiley, Chichester, ISBN: 0471974285, 2001, 445 pp.