0% found this document useful (0 votes)
25 views18 pages

Sampling of Discrete Materials-A New Introduction To The

Uploaded by

张程
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views18 pages

Sampling of Discrete Materials-A New Introduction To The

Uploaded by

张程
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Chemometrics and Intelligent Laboratory Systems 74 (2004) 7 – 24

www.elsevier.com/locate/chemolab

Sampling of discrete materials—a new introduction to the


theory of sampling
I. Qualitative approach
Pierre Gy
Res. de Luynes, 14 Avenue Jean de Noailles, F-06400 Cannes, France
Received 22 August 2003; received in revised form 6 April 2004; accepted 28 May 2004
Available online 15 September 2004

Abstract

The purpose of a theory of sampling is to answer two questions: How should one select a sample?—How much material should be
selected? Parts I (qualitative approach), II and III (quantitative approach) of this series propose answers to these two fundamental questions.
These answers are not entirely new (answers have been formulated since 1950), but a scientific theory is a living structure that has to be kept
up to date. At a course given in Brasilia in 1998, pointed questions were raised which suggested that the introduction to the qualitative
approach had to be clarified. Part I represents the most updated introduction to theory of sampling (TOS). More than 200 scientific papers,
books, lectures and courses on sampling theory—and practice—have been published or offered to the public by the author over a period of 50
years. A brief, chronological account of the development history of TOS is presented for the first time in part IV—with a comprehensive
literature survey as part V.
D 2004 Elsevier B.V. All rights reserved.

Keywords: Theory of sampling; Discrete materials; Heterogeneity; Sampling errors; Accuracy; Bias; Reproducibility; Representativity; Quality control; Quality
assurance; Chemometrics

1. Introduction be biased too. If the data are uncertain, for example as a


consequence of high random sampling errors (high
The accuracy of many analytical data reports is a mirage sampling variances), the efficiency of statistical tests will
because unwitting negligence and false cost consciousness be reduced by the high residual variances. It will invariably
have ensured that a sample of powder taken with cursory be more difficult and/or more costly to reach safe and
swiftness has been examined with costly precision. Kaye, reliable conclusions. Few chemometricians are aware of
Illinois Institute of Technology, 1967 these facts. Below it is shown that there is no such thing as a
bconstant sampling biasQ, which is the basis for many
Chemometricians process analytical data, more often current complacent, but false sampling understandings. This
than not huge amounts of data. Are these data reliable? If tutorial is intended to highlight that a complete theory of
Kaye is right, which fully agrees with the author’s extensive sampling is in fact at hand—and has been for 50 years!
experience, we are entitled to have our doubts. If the data The heart of the matter of proper sampling is that the
are biased as a consequence of systematic sampling errors, question of bhow much?Q cannot be dissociated from the
what becomes of the chemometricians’ conclusions? We question of bhow?Q. Indeed, quantitative development of
have every reason to be cautious that these conclusions may sampling theory assumes explicitly that a certain number of
conditions have, by being respected, successfully sup-
pressed the sampling bias. These conditions are presented
E-mail addresses: [email protected], [email protected]. in the qualitative approach in Part I.
0169-7439/$ - see front matter D 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.chemolab.2004.05.012
8 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24

People in charge of sampling, analysis, quality control and analysis alone. Another purpose of the present work is
chemometrics—in industry and research—often overlook therefore to contribute to the sampling aspect of the science
this extremely important point. of quality in the broadest possible context.
The title refers to bdiscreteQ materials. Matter is
essentially discrete, or discontinuous, beginning right at
the atomic scale. Continuity is a mathematical concept, 2. Factors at stake
which does not apply to actual matter. With particulate
solids the discrete elements are fragments, with liquids and In biology, medicine and pharmacy, the factors at stake are
gases they are molecules and ions. The term bdiscrete human health and life: our health and our lives! The problem
materialsQ covers matter from every possible source: ores is serious. It is, however, almost completely ignored. The
and minerals; cements; agricultural products; products of same factors arise, e.g. in the control of food or beverages; in
animal origin such as bones from which gelatine is the environmental control of air or water; control of discarded
extracted, human food and cattle feed. In fact, I began household refuse or industrial effluents.
writing this paper at the time of the Belgian chicken and In industry, the factor at stake is money, whether
British mad cow disease disputes, and the early editorial sampling for technical or for commercial purposes. Often
work associated with it ran concurrently with the foot-and- huge amounts of money—money you don’t make because
mouth crisis affecting European farming. These are poignant your products are inferior or money you lose because your
reminders of the important role played by proper sampling production facilities do not work as well as they could. In
from only the last 5 years. both cases the loss can very often be traced back to poor
Other examples are the materials processed and produced quality control and, more often than not, to an unnecessarily
in the chemical, pharmaceutical, and oil industries; materials large sampling error. Examples of such errors generating
involved in environmental control; the potability of the losses in the million-dollar range are trivial; in the gold
water we drink; the quality of the air we breathe; the refuse mining industry, losses in the billion-dollar range are
we discard; bodily fluids analyzed clinically; all the reported. The stakes are high, the problem is serious.
materials measured (assayed) in biological, pharmaceutical
or medical research; in soil contamination studies, and so
on. The list above is not exhaustive—it covers everything 3. Sampling—a technique or a science?
except compact solids such as mineral deposits.
Quality Control (QC), as well as traceability, have Sampling is both a technique and a science. Sampling is
become vital necessities in all these fields, whether for fundamentally a mass reduction achieved by the appropriate
technical or for commercial purposes. Quality estimation of technical means, but this operation must respect the batch
any batch of matter by one form or another of analysis composition as best as possible. We will see that there is no
begins in most, if not all cases, with several sampling stages. such thing as exact sampling: sampling itself generates
Indeed, analysis is practically always carried out on assay— errors and these errors must be controlled. Sampling science
portions weighing only a few centigrams.1 is nothing other than a theory of the sampling errors. The
To obtain analytical samples of matter as small as this most detrimental sampling errors occur when the scientific
from batches that may weigh anything from kilograms to aspects of sampling are ignored; when sampling is regarded
several tons (or even thousands of tons, e.g. when sampling as a mere handling technique, the tools which are a shovel
coal or iron ore in loading seagoing vessels), to achieve such or a scoop and a few bags or containers.
an extreme mass reduction, these batches must undergo a Even though the present sampling theory has been
certain number of sampling stages, alternating with essential developed for materials of mineral origin, it is universally
size reduction stages when dealing with solids. Sampling is valid for all types of discrete material: with particulate
nothing other than a mass reduction and is indeed a major solids, the size of the constituents (the fragments, the grains)
part of quality control, which is very often not recognized. is expressed in mm or Am. With liquids/gases the size of
For the present purpose, we can therefore summarize this ions and molecules is expressed in angstroms. The differ-
technical context as follows: ence between particulate solids, liquids and gases is not a
difference in essence but a mere difference in scale. The
Quality estimation ¼ Sampling þ Analysis
same theoretical rules apply to all phases. In part I, we will
There are of course many other significant aspects to only deal with the qualitative aspects of sampling theory.
bqualityQ, for example a modern multivariate approach, for The corresponding quantitative aspects will be summarized
which see e.g. the recent comprehensive textbook by [1] and in parts II and III.
the extensive references quoted herein. However, even this Below follow the essential minimum of definitions and
powerful chemometric approach still needs to take account descriptions of the relationships of the elements of the
of the sampling issues, which cannot be substituted by data theory of sampling. This tutorial must be brief and
comprehensive within given limits, and a strict axiomatic
1
Exception: up to 100 g when analyzing precious metals by fire assay. description is therefore necessary.
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 9

4. Basic definitions and notations 5 Samples vs. specimens; sampling vs. specimen-
taking
5 Constituents and lot to be sampled o Sample, S: subset of lot L, the elements of which
o Constituent elements or constituents, F: In a given have been selected in a correct way (defined
material, constituents are defined as the smallest below). By definition, samples are reliable.
operative elements that can be regarded as o Specimen: subset of lot L, the elements of which
autonomous and unalterable during the physical, have been selected in a non-correct way (defined
chemical and mechanical conditions of sampling: below). No safe decision can be made on the basis
with particulate solids these are fragments (hence of analyses carried out on specimens. By definition
the notation F); with liquids and gases they are specimens are unreliable and therefore dangerous.
molecules and ions. o Sampling: mass reduction of lot L by selection of
o Lot, L: From a theoretical standpoint, the batch or a certain subset of units, with the purpose—not
lot of discrete material to be sampled and always fulfilled—of obtaining a true, reliable
analyzed can be regarded as a set of units. We sample S (when the conditions of sampling
must take into consideration two kinds of sets and correctness are respected). Sampling provides
two kinds of units. either samples or specimens.
o Set: The set can either be: o Specimen-taking: the operation of extraction of
! a population of non-ordered units (e.g., unreliable specimens.
heap of stationary material), or
! a series of ordered units (e.g., sequence of
elementary cross-sections of a flowing 5. Definitions based on selection conditions
stream. The order here is chronological; it
may alternatively be spatial). Non-destructive sampling can be achieved only by
selection of a certain number of constituents of L that,
Different mathematical laws apply to these two kinds of
when gathered, will make up a certain subset: either a true
sets. It is a grave mistake, which can be very costly, to use
sample S or a specimen.
wrong laws or formulas. Unfortunately—most people do!
The selection issue, perhaps surprisingly, falls within the
o Unit, U: The unit taken into consideration can
mathematical province of the calculus of probability.
either be:
According to the distribution of the selection probabilities
! a single constituent F (solid fragment,
of the constituents, a selection, a sampling or a sample is
molecule, ion), or
defined so as to be:
! a group, G of neighbouring constituents, i.e.
an increment (a key concept) 5 Non-probabilistic: when certain constituents of the lot
o Increment, I: group of neighbouring constituents have a zero probability of being selected (Fig. 1).
extracted from lot L at the same time, in a single 5 Probabilistic: when all constituents of the lot have a
operation (pass) of the sampling tool or device. non-zero probability of being selected for the benefit of

Fig. 1. When sampling, e.g. from a truckload with a shovel, only about the top 15% or so of the entire lot is directly accessible. This illustration is a generic
example of very many practical sampling situations in all sciences, especially if it is recognised that the exact same situation holds over very many scales, e.g.
from railroad cars to analytical vials. The sampling process is non-probabilistic, however, and can consequently never produce a representative sample.
10 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24

n Constitutional property: intrinsic property of a set of


constituents. With solids, this property can be altered,
e.g. by crushing, grinding, milling, agglomeration.
Mixing or homogenizing does not alter constitutional
properties.
n Distributional property: property of a set of adjoining
increments made of groups of neighboring constitu-
ents filling up the domain occupied by the set. This
property depends on the distribution of the constitu-
ents throughout this domain, which accounts for its
name. Distributional properties can be modified at
will or altered unknowingly by, e.g. mixing or
segregation, etc.

6.1. Constitutional properties

n Constitutional homogeneity: A set of constituents


Fig. 2. A better way to sample from a lot (here, a so-called bBig BagQ of 1000 would be said to be constitutionally homogeneous if
kg of grain) is by coring. This sampling process is almost (but not quite) all elements had a strictly identical constitution. Focus
probabilistic, since the bottom layer cannot be assessed because of a design is on the conditional (would/if ): This case can be
flaw of the tip of the coring instrument. Coring by bthiefsQ or spears of this theoretically defined but can never be observed in
type is used extensively in many industrial sectors dealing with particulate
practical sampling. Beware of appearances: A glass of
matter. While being much better than the situation depicted in Fig. 1, alas
this sampling procedure will neither lead to fully representative samples. pure water indeed looks homogeneous. We know,
however, that water is made of different constituents
(H2O, H+, OH, O2  to say nothing of their isotopic
the sample. A probabilistic selection in turn can be counterparts). Sampling of a constitutionally homoge-
either correct or incorrect, which is defined as follows: neous lot by selection of elements would, by definition,
o Correct: when both the following conditions are be an exact process. But the constitution of matter is
fulfilled simultaneously: never homogeneous with the consequence that sam-
! All constituents making up the lot have an pling is never exact. Indeed, all sampling errors have
equal probability P of being selected. their roots in this inhomogeneous property of matter.
! The integrity of the selected constituents When designing a sampling process, device or system,
(increments, sample) is duly respected. it is a very dangerous mistake to assume that the
o Incorrect: when at least one of these two constitution of any lot is homogeneous. This mistake is
conditions is not fulfilled. In this case, instead of unfortunately very frequent!
being uniform, the selection probability P i of a n Constitutional heterogeneity: A lot L, regarded as a set
given element F i becomes a function of its of constituents, is said to be constitutionally heteroge-
physical properties such as, e.g. density, size, neous when the elements do not have a strictly identical
shape, etc. (Fig. 3). composition. The constitution of any material object is
o Non-correct:when it is either non-probabilistic or therefore always heterogeneous. This is the general
probabilistic incorrect. Examples of non-correct sampling case we have to master.
sampling are given in Figs. 1–3.
6.2. Distributional properties
Correct sampling alone can provide reliable samples.
Non-correct sampling provides nothing but unreliable n Distributional homogeneity: A lot L, viewed as set of
specimens. potential increments, would be said to be distribution-
ally homogeneous if all increments had a strictly
identical composition. The sampling of a distribution-
6. Concepts of homogeneity and heterogeneity of a set ally homogeneous lot would, by definition, be an
exact process. This case can also be defined but can
n Homogeneity: A set of objects is said to be homoge- never be observed in the real world—indeed with a
neous when all objects making up the set are strictly set of elements that is constitutionally heterogeneous,
identical. Focus is on the adverb strictly. it is impossible to observe a strictly homogeneous
n Heterogeneity: A set of objects is said to be heteroge- distribution.
neous when the condition of homogeneity is not n Distributional heterogeneity: A set of potential incre-
fulfilled. ments (groups of neighbouring constituents) is said to
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 11

7. Sampling vs. specimen-taking: processes and methods

Mass reduction can be achieved in many ways, for


instance by picking (or grab sampling), by splitting or by
incremental sampling. The first method is very crude, the
latter two are more elaborate than the first. According to the
way they are implemented, they can provide either samples
or specimens.
5 Picking; grab sampling: specimen-taking method
characterized by the fact that sampling is regarded as a mere
handling technique without any theoretical considerations at
all. It consists in taking bsuitablyQ small portions of material
from a bconvenientlyQ accessible part of the lot in a (usually)
non-correct way, which can either be: strictly non-proba-
bilistic (Fig. 1), almost (but not quite) probabilistic (Fig. 2),
or else probabilistic but incorrect (Fig. 3).
5 Splitting: a lot L can be split into a certain number N
of fractions (potential increments or potential samples) of
equal bulk (usually for high-rate splitting) or unequal bulk
(usually for low-rate splitting). One or several fractions are
then selected, at random or not, to make up a sample or a
specimen (Figs. 4 and 5). There are a number of variants of
this technique such as the ancient and historic, but now
obsolete coning and quartering method.
With low splitting rates (b1/2 or 1/4), true splitting
Fig. 3. Example of a probabilistic sampling procedure, which is incorrect becomes expensive and may be replaced by deterministic
however. A chemical analyser (e.g., atomic absorbance) is bsamplingQ
shovelling (called bdegenerateQ in our former publications).
aliquots through a siphon from a flask of, in this case, Gold cyanide ions
which are very dense. However, this setup is incorrect because the level of The danger with this kind of shovelling arises (in
the sample extraction point is kept at a constant height above the bottom of commercial operations where splitting is very popular)
the flask, assuming (incorrectly in this case) that there is no density gradient from the possibility a dishonest operator selecting visually
in the flask. This assumption may seem natural as there is no visible those fractions of the lot that are likely to profit his
iridescence segregation observable. Fig. 5B shows the hidden density
employer. For instance, the size or colour of fragments is
gradient revealed, however: Analyte concentrations will change with time
at this fixed sampling height due to differential gravitational deposition. A often correlated with their chemical composition and the
simple continuous stirring would help greatly. operator may to some extent choose to feed the prede-
termined sample with material of favourable composition.
be distributionally heterogeneous when these groups do With true splitting, the correctness is warranted by a
not have a strictly identical composition. This again is posteriori random selection of the actual sample. When the
the general sampling case we have to master. Distribu- sample is pre-determined, this safeguard is no longer in
tional heterogeneity can be quantified: in the quantita- place. The necessary uniformity of the selection probability
tive approach we will see that distributional requires an absence of correlation between the choice of
heterogeneity can be increased, either purposefully by the material fed to the pre-determined sample and its
gravity concentration or spontaneously by gravity composition. When there is no possibility of visual
segregation. Much more important however, it can also selection or when the operator is not interested in the
be reduced purposefully by, e.g. mixing or blending. result, the crucial selection is done blindly which guaran-

Fig. 4. Example of true fractional shovelling with a sampling rate (in this case) of 1:5. The key issue is to select the sample in a random fashion only after
completion of shovelling.
12 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24

Fig. 5. Example of degenerate fractional shovelling, here also with a sampling rate 1:5. In this case, the sample necessarily must be pre-selected, which invites
selection bias, etc.

tees the uniformity of the selection probability. The safest take their loads in sequence. The whole operation lasted only
operator is the one who does not know about the material 2 days and cost a very small fraction of the total expenses
he is splitting. (which were formidable at the time).
Until recently, shovelling, true or otherwise, used to be 5 Incremental sampling: This is the preferred method
restricted to hand shovels and to relatively small weights (a for sampling flowing streams of particulate solids, for
few tons maximum in the year 2000), but the author has liquids and multi-phase media, i.e. so-called bone-dimen-
extended this technique to mechanical shovels with much sional objectsQ the sampling of which will be developed in
higher capacities, which opens up new possibilities for full in part III below. In this introduction, we only need to
deterministic fractional shovelling of much higher weights. present the three principal possibilities actually imple-
For instance, in a big harbour in the Pacific area, the author mented in the industrial and technological practice and
was acting as a referee between seller and buyer of a 16,000- outline their respective properties. They are illustrated in
metric ton lot of valuable concentrate, whose results did not Figs. 6–9.
match. It was possible to rent front-end loaders that were The operation of taking all of the stream for a fraction of
used as mechanical shovels. A three-stage deterministic the time is based on the mathematical model of point
fractional shovelling with a splitting rate of 1/20 (5-ton integration of a curve, which will be fully developed and
loader) was implemented; followed by a 1/10 (5-ton loader) illustrated in part III. It generates errors specific to the
and finally a 1/10 (0.5-ton loader) splitting, thus obtaining an incremental sampling of one-dimensional flowing streams.
800-ton primary sample, an 80-ton secondary sample and an It is the only one that is probabilistic and easily rendered
8-ton tertiary sample, which was sent to the laboratory for correct (Fig. 7).
further reduction. This large-scale operation was correct and Figs. 8 illustrates the taking of a fraction of the stream
commendable as the operators, though very competent, were for the whole of the time. It is obviously not
not acquainted with mineralogy and had been instructed to probabilistic.

Fig. 6. Developing the mathematical model behind practical sampling of a stream of particulate matter. Preview of Part III, here only used to facilitate a
backdrop for illustrating the fundamental concept of bcross-stream samplingQ, which is shown here in three different alternatives: stopped-belt, uni-directional
and bi-directional sampling.
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 13

Fig. 7. The principle of correct cross-stream sampling: taking all of the flow for a fraction of the time; shown here are two correct increments.

Figs. 9 illustrates the taking of a fraction of the stream for Relative errors are often easier to use and to compare
a fraction of the time. It also is obviously neither than absolute errors, and this fact will be used
probabilistic nor correct. extensively in the exposition of the theory of sampling
in all of what follows below. Several essential defi-
nitions now need to be introduced (below a star
8. Sampling errors—the Global Estimation Error (GEE) introduces the name and notation of a new sampling
error):
5 Component A: Let A be a certain component of interest.
This must be a physical entity that can be isolated or * Global Estimation Error (GEE): this is the relative
observed without any alteration of the material, e.g. a difference between the analytical result a R and the
mineral—as opposed to the entity a bchemical actual, well-defined but unknown, value of a L
componentQ, e.g. such as a metal in the same mineral,
aR  aL
the proportion of which is estimated by analysis. In the GEE ¼ ð1Þ
size analysis of an aggregate for example, component A aL
can be a certain size fraction of interest, e.g. the 5 Properties of GEE: Above, it was mentioned that
material which remains between, say, the 10- and 5-mm quality estimation is a sequence of two error-generating
screens. In a moisture analysis, component A is the groups of operations: sampling and analysis, which
free, adsorbed water as opposed to the constitutional implies:
water that belongs to the lattice structure of the minerals
present. Global Estimator Error GEE
5 Grade a: mass proportion, defined as follows (for ¼ Total Sampling Error TSE
solids): þ Total Analytical Error TAE ð2Þ
Mass of component A in a given object
a¼ * Total Sampling Error (TSE), defined as follows:
Total mass of dry solids in the same object
(when a is the moisture content of the object, this aS  aL
TSE ¼ ð3Þ
proportion is usually expressed with the mass of wet aL
solids in the denominator instead of that of dry solids).
* Total Analytical Error (TAE) defined as follows:
o Grade a L of lot L: true, well-definable proportion
of component A in lot L. This proportion is always aR  aS
unknown. It is also sometimes called the bcritical TAE ¼ ð4Þ
aL
content of LQ. The purpose of the entire sampling
evaluation process is to estimate a L . 5 Properties of TSE: For all purposes, sampling oper-
o Grade a S of sample S: true, well-definable ations can in general be broken up into two main
proportion of component A in sample S. This stages, primary and secondary, which generate two
proportion is always unknown. The grade a S is an main groups of errors: the primary error—extraction of
estimator of a L . a laboratory sample S 1 from the lot L; and the
5 Analytical result a R : this is an estimate of a S and hence, secondary error—extraction of an assay portion from
in turn, also an estimator for a L . The analytical result is sample S 1 (potential subdivision of the sampling
used as the final estimate of a L . process in more stages than two, if needed in practice,

Fig. 8. Example of non-probabilistic, hence incorrect sampling of a one-dimensional body or stream. It is incorrect only to take a fraction (of the cross-
section) of the flow, even though extracted all of the time. This has a critical bearing on many currently popular PAT schemes (Process Analytical
Technologies), which often suffer from exactly this incorrectness.
14 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24

Fig. 9. Another example of non-probabilistic, hence incorrect sampling of a one-dimensional body or stream. This is but a variant of the situation depicted in
Fig. 8, in which sampling is only taking place for a fraction of the time from a fraction of the stream.

is trivial and not needed for the present introductory are especially in the domain of trace and ultra-trace
theoretical understanding): concentrations, etc.

Total Sampling Error TSE 5 Sampling vs. analysis-first conclusions: In view of the
¼ Primary Sampling Error PSE above definitions, we hope that the reader will accept
þ Secondary Sampling Error SSE ð5Þ the following first conclusions:
n Sampling requires at least as much care (and
* Primary Sampling Error (PSE), a random error investment) as analysis, which is unfortunately all
defined as follows: too often overlooked. Why? Analytical chemistry
is taught extensively at practically all universities
aS 1  aL
PSE ¼ ð6Þ worldwide—but the theory of sampling is not.
aL (There exist only a few notable exceptions that
* Secondary Sampling Error (SSE), a random error the author is aware of, such as at Lappeenranta
defined as follows: Technological University, Finland and 2lborg
University Esbjerg, Denmark, as well as at
aS 2  aS 1 the Powder Science and Technology group
SSE ¼ ð7Þ
aL [POSTEC], Porsgrunn in Norway). Sampling
would seem to fall in a no-man’s land between
5 Additivity of sampling and analytical errors. From the several university departments and thus remains
above definitions, we can easily deduce the following ignored by practically all. When confronted, all
relationships, well-known from statistics: bresponsibleQ authorities give the same, universal
answer—that sampling is somebody else’s prob-
Global Estimation Error GEE lem. Whose problem? Analysts (chemical, other)
¼ PSE þ SSE þ TAE Additivity of Errors ð8Þ are all brought up academically to be totally
convinced that their task first begins when the
The three random error components of GEE are
proverbial blaboratory sampleQ is received and
independent of each other with the following
then processed as best as they can, which usually
consequences (the mean, or expected value of a
is very well. When specifically asked however,
random error is the bias):
they usually don’t know how it has been obtained
and, which is worse, they usually don’t care. This
mðGEEÞ ¼ mðPSEÞ þ mðSSEÞ
is both a professional as well as a scientific insult
þ mðTAEÞ Additivity of Biases ð9Þ to the integrity of the overall quality estimation
process!
r2 ðGEEÞ ¼ r2 ðPSEÞ þ r2 ðSSEÞ n Proper sampling of discrete materials and more
specifically particulate solids is, however, the
þ r2 ðTAEÞ Additivity of Variances object of a complete theory that has never been
ð10Þ seriously contested in 50 years! More than 120
articles in various European languages and 9 books
5 Remark: Experience shows that sampling biases and in French and English have been published on the
variances, especially those resulting from the prac- theory of sampling by this author (clandestine
tical implementation of the theoretical model, can translations into Russian and Japanese also exist).
be much, much larger than analytical biases and Various other authors have also written books and
variances. In our activities as a consultant and numerous articles about this theory. Is the theory
troubleshooter we have met primary sampling biases ignored because it is new? Hardly—the first paper
as large as 1000% (relative) and secondary sam- was written in 1950! A chronology of sampling
pling biases as large as 50%, whereas analytical theory development (for the first time ever) is
biases do not usually exceed 0.1–1%. Exceptions presented in part IV of this series.
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 15

n It is meaningless and misleading to express an 10. Breaking up the relative sampling error, the
analytical result with three or four digits (which the components of TSE
user must assume to be significant) when the
bsamplesQ (alas all too often just unreliable speci- Two major classes and several subclasses of error may
mens) have been taken without any respect for the occur at each sampling stage:
rules derived from proper sampling theory, with the
consequence that the second or even the first digit * Correct Sampling Errors (CSE): these are tied to the
is likely to be suspect due to unrecognized mathematical model that is based on the assumption that
sampling errors. sampling is correct (uniform selecting probability P).
n It is therefore critical that bsomeoneQ, preferably As a consequence of the material constitution or
the analyst in charge of quality control (or a structure, the correct sampling errors are structural and
similar company, or institutional, authority—for therefore inevitable. Due to this, sampling is first of all a
the sake of consistency), should be responsible science that falls in the province of the calculus of
for proper sampling. But he (or she) should of probability. Due to the existence of both the constitu-
course be taught at least the fundamentals of the tional and the distributional heterogeneities, as defined
theory of sampling and preferably a complete earlier (homogeneity never exists in the real world),
course. CSE never cancels out, i.e. CSE is never zero.
n The same person should be in charge of * Incorrect Sampling Errors (ISE): These errors result from
sampling both in bthe fieldQ, e.g. in the not respecting the assumptions of the sampling model that
industrial facilities involved where the primary sampling is correct; ISI are thus circumstantial. They can
sampling takes place, as well as in the labo- and therefore must be avoided as much as possible, which
ratory where the ultimate assay portions are implies, indeed requires, proper knowledge of the
extracted from the laboratory sample before being sampling theory by the personnel in charge of sampling.
submitted to the analytical process. Sampling is a The components of ISE result from the fact that sampling
scientific, holistic process approach, certainly not in practice is also very much a technique. ISE cancels
a fragmented, local handling problem with frag- when sampling is carried out correctly—and only then.
mented responsibility sharing between several
persons! TSE ¼ CSE þ ISE ð13Þ

Etienne Roth wrote in the preface of [11]: bAnalysts In the quantitative approach, we always have to assume
should refuse to give results whenever they are not that all sampling is probabilistic. We cannot take non-
satisfied that the samples they process are truly repre- probabilistic sampling into consideration for the simple
sentative of the batch they are supposed to represent.Q reason that the errors it generates, by definition, cannot be
The critical concept of representative sample is defined in dealt with from a proper theoretical standpoint. They are
Section 18. Unfortunately, Etienne Roth’s advice is widely unpredictable and thus outside the realm of statistics and
ignored. probability calculus.
The two additive TSE errors are independent of each
other in probability, which entails:
9. Sampling errors—the Relative Sampling Error (TSE)
mðTSEÞ ¼ mðCSEÞ þ mðISEÞ ð14Þ
At every sampling stage, we shall now define:
r2 ðTSEÞ ¼ r2 ðCSEÞ þ r2 ðISEÞ ð15Þ
n Relative Sampling Error (TSE):

as  aL
TSE ¼ ð11Þ
aL 11. Breaking up the Correct Sampling Error (CSE)
n Absolute Sampling Error (ASE): We must distinguish between the so-called bzero-dimen-
sional modelQ (valid for populations) and the opposing bone-
ASE ¼ aL TSE ¼ faR TSE ð12Þ dimensional modelQ (valid for time series and extensive one-
dimensional bodies).
(in the following, the sign = ~ means bpractically
equal toQ). TSE is a random error (a few exceptions will 5 First Case: bzero-dimensional modelQ. All sampling
be pointed out in Section 12). As such, TSE is charac- errors result from the existence of one form or another
terized by its pertinent statistical distribution law and of heterogeneity. We defined two main forms of
moments. heterogeneity above, the constitutional heterogeneity
16 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24

and the distributional heterogeneity. The mathematical in full and illustrated in Part III. Here it suffices to
sampling models assume that all constituents are define this error, PSE, in a first brief overview fashion:
submitted to the selection process:
Point Selection Error (PSE): When sampling a lot L
C1—with a uniform probability, P of being selected flowing between time t=0 and time t=T L , one first
(sampling is correct), selects a certain number Q of instants t q on the time
C2—one by one independently (all selected elements axis; these will be taken as the locations of point-
are independent from one another). increments, I q . The selection of these points is
usually correct. The integral of the grade function
Condition C1 is fulfilled by the underlying assump- a(t), to which the grade a L to be evaluated is
tion. As regards C2, we must distinguish between two proportional, is replaced, in the sample, by a set of
cases and define two components of CSE: Q estimates a(t q ) obtained by extracting a set of
material-increments, and weighing, preparing and
Fundamental Sampling Error (FSE): When condi- assaying them (see Fig. 6). This substitution is what
tion C2 is fulfilled, the Correct Sampling Error generates the Point Selection Error (PSE), which is
(CSE) is limited to its incompressible minimum that also additive.
we shall call the Fundamental Sampling Error
(FSE). The fundamental error is the consequence General case, condition C2 is not fulfilled:
of the sole constitutional heterogeneity.
Condition C2 is fulfilled : CSE ¼ FSE ð16Þ One  dimensional model : CSE ¼ PSE þ ðFSE þ GSEÞ
ð18Þ
Grouping and Segregation Error (GSE): In all
practical sampling, however, condition C2 is never
fulfilled (except in ideal experimental studies): we
are never in a position to extract the elements 12. Breaking up the Incorrect Sampling Error (ISE)
(fragments, particles) one by one. What we actually
do is to extract increments, I made of neighbouring We will illustrate this section with the example of a
elements having a uniform probability P of being mechanical cross-stream sampler operating at the discharge
selected: the first condition is fulfilled but the of a conveying/feeding device (particulate solids), or of a
second is not. Always working, as we do in the piping system (liquids or multimedia systems). These errors
field of gravity,2 neighboring elements (fragments, can be easily carried over to other scenarios. The
molecules and ions) can never be assumed to be mathematical model deals with extension-less point-incre-
independent from one another. They are on the ments. The sampling operation materializes the point-
contrary often spatially correlated with one another, increments of the model and transforms these into groups
for example through differential gravity segregation, of whole components. This operation can be broken up into
surface stickiness or similar. A correlation then exist three independent steps:
between an element’s bpersonalityQ3 and its position
within the volume domain occupied by lot L. In this 5 Increment Delimitation: definition of a geometrical
case, a second error is added to FSE, namely the domain around each point-increment—often loosely
Grouping and Segregation Error (GSE). This error called bthe sampleQ (noun) when no confusion can arise,
is a consequence of the distributional heterogeneity 5 Increment Extraction: extraction—bsamplingQ (ver-
that is itself a function of the constitutional bum)—of the elements present in this domain,
heterogeneity. 5 Increment and Sample Processing: physical gathering
of the increments and associated operations such as
General case, condition C2 is not fulfilled: transportation, crushing or grinding, homogenizing,
drying, etc.
Zero  dimensional model : CSE ¼ FSE þ GSE ð17Þ
5 Second Case: the bone-dimensional modelQ. At the These three operations are, of course, also liable to
scale of increments, the zero-dimensional model applies generate errors. Theoretical as well as experimental
locally. An additional error, specific to one-dimensional research, the results of which can be found in dedicated
sampling, is also generated however. This is discussed textbooks [7–10], have shown how it is possible to define,
2
and consequently to cancel the ISE errors.
Exception: work in zero gravity, aboard a space station. This has been
done in studies or experiments requiring a total absence of segregation.
3
We here loosely use the term: bpersonalityQ to denote the set of * Incorrect Delimitation Error (IDE): when taking an
physical properties such as size, density, shape, which are correlated with increment I, a correct selection (uniform probability of
the chemical composition of the element. selection P of all constituents making up the stream) is
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 17

achieved when each strand of the stream is collected fragments and creation of fines as a result from
during the same time interval T I—and only then. This breakage of the former during rough handling, etc.
condition involves: o INVOLUNTARY FAULTS committed by the
o Cutter geometry: the sides of a straight-line cutter operator (focus is on the adjective involuntary).
must always be parallel; those of a rotating cutter Most sampling operators, although acting in good
must be radial. faith, lack even the most elementary TOS qual-
o Cutter velocity, which must remain constant ifications and hence are likely to make mistakes
during the whole stream traverse. because of ignorance, clumsiness or negligence.
An error IDE is generated whenever one of these conditions o DELIBERATE FAULTS committed by the oper-
is not fulfilled. IDE is zero when these conditions are met. ator or by any person having access the samples
Increment delimitation is a purely geometrical operation. (focus is here on the adjective deliberate). Sam-
pling, being the weakest link in the chain leading to
* Incorrect Extraction Error (IEE): The elements contained the analytical result, is the most vulnerable to fraud.
in the model increment that is delimited by the cutter Salting of gold ores is a classical example; two
edges and is assumed to be correctly delimited, must of other sensitive areas would be commercial sam-
course be materially extracted. More precisely, all the pling and (politically oriented) environmental pro-
elements whose centers of gravity fall completely within tection, e.g. where uranium is concerned. The author
the model-increment must all be correctly incorporated in has personally witnessed all these forms of fraud,
the actual physical increment. To achieve this, two including tampering with the analytical results.
conditions must be simultaneously fulfilled:
o The first condition involves cutter geometry: the Incorrect Sampling Error (ISE). Recapitulation in the
cutter width W should be at least equal to three most general case
times the diameter of the coarsest fragments, or 10
mm, whichever is greater. ISE ¼ IDE þ IEE þ IPE ð19Þ
o The second condition involves cutter velocity,
which must not exceed 0.6 m/s. Remark: To give the reader an example of the practical
When sampling high flow-rates and/or high-velocity importance of the above rules we saw that, with cross-
streams (over several hundreds t/h and velocities over 2 stream samplers, cutter velocity must remain constant
m/s) a safety factor must be invoked. Example4: for a during the whole traverse. This rule is not arbitrary. It
20,000 t/h (5.5 t/s) stream of coarse iron ore discharged by a results from simple geometrical considerations and cannot
belt conveyor running at a velocity of 4 m/s, a 200-mm be scientifically disputed. There are (at least) four ways of
cutter opening had to be selected. This factor was defined on driving a sampling cutter: electric, pneumatic, hydraulic and
the basis of experience. manual. It has been shown experimentally that only the
electric drive can assure a constant velocity (and then only
* Incorrect Processing Errors (IPE): IPE is in general a when the motor is adequately dimensioned). This observa-
sum of six components: tion has been repeatedly outlined in sampling publications
o CONTAMINATION of increments, and samples, for more than 30 years. In spite of this, as far as we are
by foreign material, e.g. dust; rust, material aware, some standards still go on recommending pneumatic
present in the sampling circuit, or belonging to a or hydraulic drives; manufacturers go on proposing these to
former sample (cross-contamination) and not their clients and the samples so obtained go on being
removed by cleaning, etc. incorrect and therefore biased.
o LOSS of material belonging to increments or
sample, e.g. dust; material remaining in the Recapitulation in the most general, one-dimensional
sampling circuit at the end of an operation and case: TSE is the sum of six components:
not recovered while cleaning, etc.
o ALTERATION in chemical composition: e.g., TSE ¼ CSE þ ISE
loss of molecules of water belonging to the lattice ¼ PSE þ ðFSE þ GSEÞ
structure, due to overheating upon drying, etc. þ ðIDE þ IEE þ IPEÞ
o ALTERATION in physical composition (this con-
cerns more specifically humidity and size distribu-
tion). For example, addition of external water (rain,
sprays); loss of moisture by exposure to an unavoid- 13. Properties of the Total Sampling Error (TSE)
able heat source (sun, kiln or stack); loss of coarse definitions based on sampling results

4
This is a world record, still valid, that the author established in Brazil The definitions to follow are central for a proper under-
in the 1980s (designed in 1976). standing of the theory of sampling. We will define and
18 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24

review each of the fundamental concepts: bexactQ, baccurateQ, theory a more unified basis; this update is published here for
breproducibleQ and brepresentativeQ sampling in detail. the first time.
In all former publications, accuracy was defined by the
following property of the mean of the relative sampling
14. Distribution law of the Total Sampling Error (TSE) error:

5 In most cases, TSE is a random variable. Notable mðTSEÞb ¼ m0 Former definitionZ no longer valid Z
P
exceptions are when money is at stake, but also when
of anbaccurate sampleQ:
the protection of the environment is a political issue.
TSE can then be due to a deliberate, non-random, The difficulty was in assigning a specific value for
alteration of a S (addition or subtraction of component m 0. We observe that we have, in fact, no control on the
A, addition of foreign material, tampering with the structural Correct Sampling Error (CSE). We now base
results, etc.), in which case it obviously does not our new definition of baccuracyQ on the properties of the
follow any statistical law. Such non-random berrorsQ additional, purely technical and circumstantial Incorrect
have been met with not infrequently by the author in Sampling Error (ISE), which we can control and thus
the precious metal and uranium industries, as well as cancel. We now define that a selection or a sample is:
in commercial sampling of many commodities. 5 bAccurateQ: when the mean m(ISE) is zero :
5 When a L is larger than, say 1 ppm, the pertinent
distribution law is normal, or may at least often be well mðISEÞ ¼ 0 Y mðTSEÞ ¼ mðCSEÞ þ mðISEÞ
approximated by a Gausian distribution. When a L is
smaller than 1 ppm (trace concentrations), the distribu- ¼ mðCSEÞ ð22Þ
tion often becomes asymmetrical. According to the case
This is then the new definition of an baccurate sampleQ. TSE
it can often be approximated by a lognormal or a
is now reduced to its structural minimum; this amounts to
Poisson distribution, etc.
bforgettingQ the structural bias m(CSE).
5 Whenever the sampling error is a random variable, with
Experience and computer simulations show that
an assumed distribution law, it can be characterized by
m(CSE) is, in very many situations, negligible (108—
its statistical moments (expected value, variance, mean
relative—with iron ore sampling for example), but there
square).
is an important caveat: This simplification does not hold
for the domain of trace concentrations. To all other
practical intents and purposes therefore, correct sampling
15. The unrealistic concept of sampling exactness
is accurate when:
A sampling or a selection could be said to be (focus is on
mðTSEÞ ¼ mðCSEÞ
the conditional):
5 Exact: If the total sampling error TSE was zero, irres- ¼ f0 ð ¼ fbpractically equal toQÞ ð23Þ
pective of the composition of the material being sampled.
5 bBiasedQ: when the mean m(ISE) is non-zero:
TSE ¼ 0 Unrealistic definition of anbexact sampleQ
mðTSEÞ ¼ fmðISEÞbN0
ð21Þ ðb Nbsignificantly different fromQÞ ð24Þ

We saw earlier that, for mathematical reasons, the This is then our new definition of a bbiased samplingQ, or a
fundamental sampling error FSE, and with it TSE, can bbiased sampleQ:
never be identically zero. It is a fatal misuse of terminology 5 bBiasQ: Bias is defined as the algebraic value of
to postulate (as is often heard) that a particular sampling is m(ISE):
bexactQ. Wishful thinking is probably the biggest danger in
sampling. The use of the terms bexactQ and bexactnessQ mðTSEÞ ¼ fmðISEÞ ð25Þ
should be discontinued completely where sampling is
concerned. This is also the new definition of the bincorrect sampling
biasQ.
Remark: The random component of the sampling errors,
16. Definitions based on properties of the mean, represented by the variance, tends to reduce when
m(TSE)—concept of sampling accuracy averaging large numbers of data but the systematic
component, represented by the bias, does not. It is therefore
The following section constitutes the new theoretical of the utmost importance to guarantee sampling accuracy
derivation, mentioned in the abstract, which allows the by respecting all of the conditions of its correctness.
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 19

Accuracy is the major quality of a sam- below are meaningless or simply wrong and therefore
pling, or a sample inherently dangerous. Examples, to be found in many places
in literature or even in international standards, include:

17. Definitions based on properties of the variance, o ISO Standard on oil sampling: This standard defines
j 2(TSE)—concept of reproducibility brepresentativenessQ as a sole property of the variance,
which is our definition or breproducibilityQ, thus
Following the same line of argument as in Section 16, we assuming, without any justification, that the sampling
will now define that a selection, a sampling or a sample is: is unbiased. This confusion is frequent.
o Standards on analysis: bThe assay must be carried out
5 bReproducibleQ (or bsufficiently reproducibleQ): when on representative samplesQ. But these standards fail to
the variance r 2 (TSE) is smaller than, or equal to, a give any definition of a representative sample.
certain value r 02+ regarded as the maximum acceptable: o Book on sampling, published in 1995 (for reasons of
politeness we refrain from referencing this source):
r2 ðTSEÞ ¼ r2 ðCSEÞ þ r2 ðISEÞb ¼ r20
bRepresentative sample: this is a sample that is typical
ðDEFINITION of abreproductible sampleQÞ5 ð26Þ of the lotQ.—But what is a typical sample? This is not
defined!
When the sampling is carried out correctly, then—but only o Other examples are legion—and found within very
then: many domain-specific sciences, e.g. the geological,
botanical, zoological, ecological, medical, environmen-
r2 ðISEÞ ¼ 0 Y r2 ðTSEÞ ¼ r2 ðCSEÞb ¼ r20 ð27Þ tal sciences. Inanysciencewheredescriptions,oranalyses
5 bNon-reproducibleQ (or binsufficiently reproducibleQ) of samples composed of heterogeneous, or bcompositeQ,
when condition (26) is not fulfilled: bcompoundQ materials are on the agenda, the need for an
unambiguous definition of a representative sample is
r2 ðTSEÞ N r20 essential. The present theory of sampling is universal.
ðDEFINITION of abnon  reproductible sampleQÞ
ð28Þ
19. The only practical question: how to obtain a
representative sample?
18. Definitions based on properties of the mean square,
r 2 (TSE)—concept of brepresentativenessQ According to its statistical definition, the mean square,
r 2, is the sum of two terms, the square of the bias and the
We can further develop this line of argument. We can variance. For the total sampling error (TSE), this translates
now also define that a selection, a sampling or a sample is into:
(see Section 19): r2 ðTSEÞ ¼ m2 ðTSEÞ þ r2 ðTSEÞ ð31Þ
5 bRepresentativeQ (or bsufficiently representativeQ): The mean square r 2(TSE) of the total sampling error is
when r 2 (TSE) is smaller than, or equal to, a certain therefore minimum only when both its components
value r 02 regarded as the maximum acceptable: m 2(TSE) and r 2(TSE) are minimized,

r2 ðTSEÞb ¼ r02 5 Maximizing the accuracy: The first term m 2(TSE) is


ðDEFINITION of abrepresentative sampleQÞ ð29Þ minimum—and this minimum is practically zero—
when the sampling is accurate, which can only come
5 bNon-representativeQ (or binsufficiently representa-
about as a consequence of it being carried out
tiveQ): the opposite case, i.e. when
correctly—and only then. This problem is addressed
r2 ðTSEÞNr02 in full in the qualitative approach in Refs. [17–20].
ðDEFINITION of abnon  representative sampleQÞ 5 Maximizing the reproducibility: The second term
r 2(TSE) is minimum when the sampling is reprodu-
ð30Þ cible. The degree of reproducibility of sampling is an
Remark: (concerning the concept of brepresentativenessQ increasing function of the sample size or, more
As far as the author is aware, definition (29) is the only generally, of the sample quantitative properties: sample
published objective, scientific definition of ba representative mass, increment mass, number of increments, etc. This
sampleQ(sic). Subjective definitions, such as exemplified problem is addressed in more detail in part II (zero-
dimensional model) and part III (one-dimensional
5
The value of r 2 (CSE) will be estimated in Part II (Quantitative model) of this series. Full details on the quantitative
approach—Zero-dimensional model). approach are to be found in Refs. [17–20].
20 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24

A sample is therefore representative when it is at the a very short period. In a case between South American
same time correct (which entails maximum accuracy = con- mines and a European smelter, the author was asked to act
trol of bias) and reproducible (control of variance). It is as an expert witness and found out that due to a
essential to realize that both of these qualifiers are attributes (deliberately?) biased hand sampling, the mines had lost
of the sampling process, not of the sampled material by $7 millions in 3 years. This was easily fixed. We therefore
itself. It is not possible to ascertain whether a particular reach this important conclusion:
sample (increment) is representative or not, based solely on
the characteristics of the sample itself. There can be no compromise where sampling
correctness is concerned

20. Cost of obtaining representative samples


21. On the notion of bprecisionQ
All decisions made on samples, or specimens, are likely to
have important scientific, financial, health or social impacts. Based on our experience, authors in very many scientific
disciplines, as well as in sampling, use the words bpreciseQ
5 Short-term reasoning: Ill-informed people reason in and bprecisionQ in a subjective, unscientific and most
the short term. At first glance, the most economical way to confusing way. Analysis of this usage strongly suggests
get a bsampleQ (actually an unreliable specimen, but most some underlying implicit assumptions, the tenor of which
people cannot see the difference) from any lot is to follow would seem to be that precision is:
the principle bcatch whatever you can in the cheapest
possible wayQ. The very popular bpickingQ or bgrab- o sometimes a property of the mean, which corresponds
samplingQ methods are based on this principle. Increments to our definition of accuracy,
are taken from any accessible part of the lot (see above: e.g., o or a property of the variance, which corresponds to our
a shovelful on top of a truck- or wagon-load; a scoop-full on concept of reproducibility;
top of a drum or of a bag; a spoonful from the top of a lab o or else precision is viewed as a subjective, global
sample bottle, etc.). Unfortunately, such methods are property defining a bgood, reliableQ sample, which
blatantly non-probabilistic and provide nothing but unreli- corresponds to our definition of brepresentativenessQ
able specimens on the basis of which absolutely no safe (implying both accuracy and reproducibility).
decisions can ever be made. They are all the more
dangerous, as they are attractive as a consequence of their There is, however, no room for subjective or ambiguous
being cheap. But only in the very short term! There is no definitions where objective, scientific counterparts are
way to develop a theory of non-probabilistic sampling, but available. This is why we cannot speak of, far less define,
experience shows how large the biases they introduce can be a useful concept of precision. Precision is an extremely ill-
(up to 1000% according to our own experience, but there defined notion. According to our statistical definitions, there
really is no limit). When sampling for commercial as well as is one—and only one—bglobalQ property, which is based on
for technical purposes, biases always cost money to the properties of the mean square, r 2, of the total sampling
someone: the market value of a product is ill-estimated; error (TSE), see Section 19 above, i.e. the sum of:
the parameters of a transformation process, set on the basis
of assays, do not ensure optimal conditions; losses are o The square of the mean m(TSE), and of
undervalued, etc. When sampling for scientific purposes it is o The square of the standard deviation r(TSE), i.e. the
from the outset senseless and false to build theories based on variance r 2(TSE)
samples which are in principle non-representative, etc.
As far as we are concerned, we shall use the word
5 Long-term reasoning: People conscious of the theory bprecisionQ with its loose, subjective meaning of a global,
of sampling reason in the long term: they insist on getting undefined good quality, e.g. in the expression bprecision
unbiased samples warranted by a problem-dependent instrumentQ only. We shall never speak of a precise sample.
correct sampling. When the so-called bsamplesQ (true Chemometricians often speak of bprecise dataQ, or equiv-
specimens) are biased, financial losses can reach the alently of bimprecise, or uncertain dataQ. One wonders what
million-dollar range per year. In a world-class copper mine is meant here?
of Papua New Guinea (now out of operation due to
bpolitical unrestQ), the bsamplingQ of blast hole cuttings
was very crude and provided heavily biased specimens. The 22. Relationships: selecting conditions, sampling results,
losses generated were calculated and amounted to $8 professional responsibilities
millions per year. The cost of a bcorrectQ (and therefore
accurate) sampling system subsequently designed by the 5 Manufacturers of sampling equipment and designers of
author, which provided unbiased samples, was paid back in sampling methods can only act upon the selecting
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 21

conditions. They can manufacture a correct sampler or 23. Sampling and standardization—a warning
propose a non-probabilistic sampling method. It is
meaningless to ask them to design an accurate sampler The reader should be informed that in the year 2000,
delivering bias-free samples. This is a language they 99% of the standards on sampling already published, or
cannot understand! being prepared by ISO committees or national organiza-
5 The users of analytical results are first of all (or tions, give recommendations that are either arbitrary, or
rather bshould beQ) interested in the properties of the contrary to theory, or both! With fewer than a handful of
sampling errors. They can seldom express their exceptions such as Francois Clin (France) and Ralph
wishes in terms of mean, variance or mean square Holmes (CSIRO, Australia), people in charge of writing
as delineated by the theory of sampling. Here again, standards right up until very recently have deliberately
this is a language they don’t use or understand. What chosen to ignore the existence of the Theory of Sampling,
is wanted however is a straight brecipeQ to obtain which, however, has been scientifically accepted worldwide.
bgood, reliable samplesQ on which safe decisions can But things now finally seem to be moving:
be made, i.e. what was termed brepresentative
Those of us who are traveling a lot by air must be very
samplesQ above.
grateful to IATA for the dCustomer Acceptance ManualT that
5 Role of sampling theory: One of the prime purposes of
defines the tests to be undergone by any new aircraft, more
the Theory of Sampling, TOS, is to devise mathemat-
specifically the acceptance flight tests, that have been
ical relationships between the sampling conditions
designed by highly qualified specialists, NOT by an ISO
(defined in Section 5) and the sampling results (defined
technical committee such as those we see at work in
in Sections 13–18). The reader is kindly referred to
sampling. (P.G. quoting from bAir France reviewQ)
Refs. [18–20] in English, or to Refs. [17–19] in French,
It is one thing to standardize the insulation of electrical
for a complete exposé of the theory of sampling.
appliances (which standards are very good at) and quite
5 Sampling theory builds a bridge between the qualities
another to standardize what is a complex and practically
the users expect (or demand), often expressed in a
ignored science. Standards organizations should limit their
loose way, and the properties of the methods that can
activities to the first problem and ask specialists to deal with
be implemented or of the equipment the manufac-
the second.
turers can realize. These wishes are not always
According to an ISO officer, bthe role of Technical
compatible and someone has to be responsible for
Committees (TC) is limited to describing the practices on
finding a compromise acceptable to both parties. This
which commerce has based itself for a long timeQ. This calls
is the role of the sampling theoretician and the
for two remarks:
sampling consultant.
5 The role of University should be to teach beveQ
rybodyQ at least the necessary minimum of the 5 This author contests the logical validity of this
concepts of the TOS. Unfortunately, so far (allowing philosophy. There is an obvious conflict of interests
for the few, important recent Scandinavian exceptions between sampling standardization and commerce. The
mentioned above), there has barely ever been in financial stakes are usually so high that some standards
universities anyone to teach this curriculum. But this have obviously been written in favor of one of the
is a gap which has to be filled as soon as possible. parties involved, such as the producer of a given
bSomeoneQ has to teach the would-be teachers in the commodity or a sampling equipment manufacturer,
universities not yet conscious of the current predic- whose equipment does not respect the rules set by the
ament. This is one major objective of the recently theory. Some standards seem to have been deliberately
founded organizations: biased by the TC members themselves.
5 Most standards committees are deaf to theoretical
International Sampling Institute (ISI) arguments. Decisions are made, texts are adopted, by
International Sampling Forum (ISF) a ballot, which amounts to asking a score of people
picked on the street—not even at random—to write
ISI was created in France (1999) by a group of sampling recommendations about scientific matters. It amounts,
specialists and consultants, while ISF acts on a global scale in fact, to asking laymen to answer the question: bIs
within academe; ISF is led by a virtual board of interna- Newton’s first law valid?Q—Answer by: bYesQ or bNoQ!
tional directors from 2002. ISF, in particular, will make it its
objective to reach out to the world university communities The author believes that the whole philosophy of national
on all matters of TOS and practical proper sampling. and international standardization of sampling has to be
Collaboration between these two bodies was instrumental contemplated thoroughly again, and perhaps completely
in organizing the First World Conference on Sampling reinvented. This is another reason that both ISI and ISF, like
(WCSB1) in 2003, the proceedings of which contain this IATA, works only with highly qualified sampling specialists
tutorial. and scientifically accomplished university representatives.
22 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24

24. Interrelationships between sampling errors—con- light enough (small enough) to be submitted integrally to
densed recapitulation of part I analysis.

Below follows, in the most condensed overview format


possible, the central tenets of the qualitative introduction to 25. Conclusions, part I: the qualitative approach
the theory of sampling as outlined above.
5 Sampling is a serious problem, which has to be dealt
24.1. Purpose of sampling with seriously. At this moment, it is deliberately, or
unwittingly, mostly ignored by a majority of people
The sole purpose of sampling is to reduce progres- who in fact may well have serious sampling
sively the mass of a lot L to that of an assay portion, problems.
P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24 23

5 Sampling is a science—emphatically not a simple sampling errors and proposes practical solutions. It
picking technique. should be taught worldwide, especially at technical
5 A theory of sampling (TOS) does indeed exist. It can and other universities. At this moment, it is not, but
be contested (which it has never been), but it cannot a certain momentum has at last begun to be
continue to be ignored. It explains the generation of discernable.
24 P. Gy / Chemometrics and Intelligent Laboratory Systems 74 (2004) 7–24

5 Most standards on sampling are inadequate because especially when human life, health, scientific insight, or
they ignore the very existence of TOS. Some are money are at stake.
deliberately biased. This practice must be discontinued
as quickly as possible. Something must be done about 26. References
this, and it will: ISF, ISI.
5 Correct sampling alone provides accurate, bias-free, References to Pierre Gy’s own publications in the first
reliable samples. All sampling must therefore be four parts of this series are collected in part V; they are
carried out correctly. There can be no arguments referred to sequentially in the first four parts. A few external
against! references appear in the individual parts, respectively.
5 Non-correct sampling, i.e. non-probabilistic or proba-
bilistic but incorrect sampling, can provide nothing but External reference for part I
potentially biased, therefore unreliable specimens on the [1] H. Martens, M. Martens, Multivariate Analysis of Quality. An
basis of which no safe decision can ever be made, Introduction, Wiley, Chichester, ISBN: 0471974285, 2001, 445 pp.

You might also like