0% found this document useful (0 votes)
15 views6 pages

M&M J Chem Educ

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

M&M J Chem Educ

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

In the Classroom

Using Candy Samples To Learn about Sampling Techniques


and Statistical Data Evaluation
Larissa S. Canaes, Marcel L. Brancalion, Adriana V. Rossi, and Susanne Rath*
Department of Analytical Chemistry, Institute of Chemistry, State University of Campinas,13084-971 Campinas, SP, Brasil;
*[email protected]

The first step in any chemical analysis is to obtain an ana- given should be adequate for undertaking this activity in a class
lytical sample of the bulk material. In effect, the reliability of of 10–35 students. The variety of sizes, shapes, and colors is
an analytical result is often conditioned by the quality of the important for representing heterogeneity in data.
original sample. The sample must have the same chemical and
• 10 packages (104 g) of M&M1 candies
physical properties of the raw material, so that it represents well
the material that will be analyzed. In these cases, sampling is • 1 package (98 g) of M&M candies with peanuts
directly responsible for the accuracy of the analytical results. The • 2 packages (35.2 g) of M&M Minis2
best way to sample would be to obtain large samples at random • 1 package (80 g) of Confeti3 chocolate candies
from the total population, based on the idea that as the sample
• Disposable gloves
size approaches the population size the errors decrease to zero
(1). In practice, some factors—such as measurement costs and • Plastic tray or paper plates
facilities for manipulating huge amounts of the bulk material— • Paper cups (50 and 200 mL capacities)
make it impractical to select large, essentially unlimited samples.
The gloves, plastic tray or paper plates, and paper cups were
A typical random sample is usually far smaller than desired, rais-
used in all the sample manipulation with hygiene in mind, so
ing concerns about how accurately the sample really represents
that the samples could be eaten at the end of the experiment.
the bulk material. This doubt can be answered by statistical
This exercise has been developed and used with students in
analysis of the data (2).
both an undergraduate classical analytical chemistry course
These facts justify the important need for students to un-
and a graduate course, Statistics in Analytical Chemistry, from
derstand all the challenges involved in sampling techniques, the
2000 to present.
first step of any chemical analysis. In spite of that, students in
classroom experiments are usually presented with homogeneous
Data Acquisition
samples, so they tend to believe that sampling and statistical
analyses are not problems that they have to deal with. Some In the first step, students were divided into ten groups and
references do describe ways to present to students how random each group was responsible for data acquisition from one bag
sampling works and how it could be representative (1, 3), while of the regular M&M candies. Each group counted the candies,
others propose different exercises emphasizing statistical analysis separating and reporting them by color. The students were asked
of data (4–6). However, these exercises usually require extended to compile the data and to start the statistical evaluation.
laboratory periods or are very theoretical.
In 2000, Ross (7) proposed a simple, fast, and didactic Parametric and Nonparametric Approaches
classroom exercise using colored candies, which could easily
In order to show the students the theoretical and practical
demonstrate the effect of sample and particle size in sampling.
differences between parametric and non-parametric approaches,
However, this paper does not address statistics; this is more fully
the raw data were statistically evaluated by comparison of param-
explored by Vitha and Carr (2).
eters, representations, and by the application of tests from both
Inspired by these papers (2, 7), we developed and imple-
statistical methods. The results were also statistically compared
mented a more complete classroom exercise for undergraduate
to an average composition of the bags, provided at the manu-
and beginning graduate students to explore both sampling and
facturer’s Web site.1
statistics. It is an easy, interesting exercise that takes ~1.5 hours to
demonstrate the effects involved in sampling techniques (sample
Sample Amount Effect—Part I
amount and particle size and the representativeness of the sample
in relation to the bulk material). This exercise also includes a Next, all the M&M candies from the 10 bags that had been
simple statistical approach to commonly used parameters (mean, sorted by color were put together in a large plastic tray to simu-
median, standard deviation, errors, quartiles, and confidence late the population of a “bulk material”. The students obtained
limits), presentation of results, graphs (histogram, box-plot, and this composition by aggregating the data of all 10 bags. Then,
whisker plot) and related tests (normality, outliers, significance) the groups collected “samples” of the bulk material, using two
using parametric and non-parametric statistical methods. different sizes of cups (50 and 200 mL). Each group sampled five
times. In the next step, a reducing procedure by quartering was
Procedure used to enable the students to evaluate how representative each
kind of sampling is. For the quartering procedure, the candies
Materials were uniformly spread inside a circular tray and then divided
For the sampling exercise, we used sugar-coated, round radially into quarters. The opposite quarters were combined.
chocolate candies available in several colors, sizes, and types. All This process was repeated two more times. The statistical treat-
of the candies were purchased at a local market; the quantities ment of data was the same as used before.

© Division of Chemical Education • www.JCE.DivCHED.org • Vol. 85 No. 8 August 2008 • Journal of Chemical Education 1083
In the Classroom

Sample Amount Effect—Part II The Parametric Approach


All the candies were returned to the tray and some different In order to introduce the concepts important to a paramet-
candies (purple in color; 0.99% of the total in the bag) from ric approach, the different ideas of random (or indeterminate),
another manufacturer were added to the mix. In our study, Con- systematic (or determinate), and gross errors were presented. The
feti3 candies were used because they are very similar to M&Ms instructor discussed how different types of error affect the final
in terms of size. The two sampling procedures used in Part I results. In the present experiment, a gross error is exemplified by
and described above were also used in Part II, although just the accidentally dropping some or all of the candies (loss of sample):
purple candies were counted and analyzed statistically, because as a consequence the experiment must be restarted with a new
this condition simulates an analyte present in low concentra- bag. In some instances one set of measurements apparently lies
tion as well as mimicking the effect of using different sampling an abnormal distance from other values in a random sample
methods. from a population. Such measurements, called outliers, may
be related to human errors and must be removed or corrected
Particle Size Effect because they interfere with the precision and accuracy of the
The last observations explored the influence of different results. In a sense, this definition leaves it up to the analyst to de-
particle sizes on a sampling method, using a simple visual exer- cide what will be considered abnormal. The students were asked
cise. Students combined candies of different sizes (two bags of about possible outliers. It was explained that before abnormal
M&M peanut candies, one bag of M&M Minis, and some of observations can be singled out, it is necessary to characterize
the regular M&Ms used before) in a large glass jar, and mixed normal observations; a statistical test that can identify outliers
them very well to qualitatively observe the size distribution after should be used. Nevertheless, the students pointed out some
different mixing procedures. possible outliers from Table 1; after they evaluated the data us-
ing Dixon’s Q-test (8) at a 95% significance level (P = 0.05), no
Data Acquisition values were rejected.
After that, the students were asked to represent the results
After counting and compiling the raw data, the students in a simple way, retaining the sampling information—that is, the
were asked to organize these data by percentage of candies of each average value for the frequency of each color and the distribution
color. We instruct students to be aware of significant figures and about the mean. For this, they used the arithmetic means (x–) and
observe that the smallest unit possible is one candy. As the num- the confidence interval of the error distribution for two different
ber of candies per bag is about one hundred, the number of can- confidence intervals (95 and 99%), presenting these statistical
dies of each color has two significant figures. Thus, the percentage concepts (8). The calculated parameters—means, standard de-
of candies must be represented by a maximum of two significant viation (s), and relative standard deviation (RSD) or coefficient
digits, without decimals. Table 1 reports the results from a typical of variation (CV), as well as Student’s t values—are shown
classroom exercise used for the statistical treatments. in Table 2, using the data from Table 1. Note that the mean,
In this exercise, each bag was considered as a sample origi- s, and RSD are presented with three significant figures—the
nating from a total population (in the manufacturer’s produc- total number of candies was higher than 1000 (four significant
tion line), the colors were the property being measured and figures) and the number of candies of each color was given with
the counting results were the measurements in a total of ten three significant figures. For the confidence interval the results
replicates represented by the number of bags used. were rounded.

Table 1. Comparison of the Dispersion of Colors in the Candy Samples


Candies Classified by Color, %

Groups (Bags) Candies per Bag Blue Brown Green Orange Red Yellow
1 127 29 13 13 10 18 17
2 127 23 11 12 16 8 30
3 128 27 14 13 20 8 18
4 119 15 9 15 26 16 19
5 114 11 19 11 23 12 24
6 118 10 15 11 24 20 20
7 115 15 14 10 28 11 22
8 119 10 13 12 30 17 18
9 115 18 11 14 24 17 16
10 114 12 10 12 27 13 26
Total 1196

1084 Journal of Chemical Education • Vol. 85 No. 8 August 2008 • www.JCE.DivCHED.org • © Division of Chemical Education
In the Classroom

In the next exercise the students had to verify whether a sig- The comparisons were made by using the critical values
nificant difference existed between the means reported for each of t for 9 degrees of freedom 2.26 and 3.25, respectively, for
color (n = 10 bags) and the values supplied by the manufacturer significance levels of P = 0.05 and P = 0.01 (8). Significant
(used as the actual values, μ), in order to evaluate whether the differences were observed for both significance levels between
data obtained from the sampling experiment accurately represent the samples evaluated and the population (supplied by the
the bulk material or bulk sample or, in this case, a large popula- manufacturer) just for the green amount of candies, the color
tion, denoted by the provided values. Significance testing was that also presented the lowest standard deviation and, as a con-
introduced and a null hypothesis (H0) was formulated: the two sequence, the lowest confidence interval (see Table 2). For the
population means are equal. It is important to emphasize that green candies, the null hypothesis is rejected as it is statistically
to accept a hypothesis does not mean that it is true, only that we understood, since a data distribution with a smaller dispersion
do not have evidence to believe otherwise. Thus hypothesis tests means greater precision near the arithmetic mean. In this way,
are usually stated in terms of both a condition that is doubted any value that is not very close will not be contained inside the
(null hypothesis) and a condition that is believed (alternative confidence interval provided by the Gaussian distribution curve.
hypothesis). In our study the alternative hypothesis would be The opposite effect could be observed for bigger dispersions of
that the two population means are not equal. The students are data (see the data for the blue and orange candies).
asked to test the hypothesis using a t-test (8) for each color and New information was introduced to the students. Bags
a significance level of 0.05. The significance level, P, defines the 1–3 and bags 4–10 came from sample batches A and B, respec-
sensitivity of the test. A value of P = 0.05 means that we inadver- tively. Now it was asked whether a significant difference existed
tently reject the null hypothesis 5% of the time when it is in fact between the two sample batches of the candies in relation to
true. The choice of P is somewhat arbitrary, although in practice each color (see Table 3). In this case, we compared two sample
a value of 0.05 is commonly used in analytical chemistry. means xA and xb, which correspond to sample batches A and B,

Table 2. Parametric Approach to Analyzing the Data


Candies Classified by Color, %
Data Parametersa Blue Brown Green Orange Red Yellow
Values supplied by the manufacturerb 14.3 14.3 14.3 21.4 14.3 21.4
Mean x– (n =10) 17.0 12.9 12.3 22.8 14.0 21.0
Standard deviation (s ) 007.06 02.88 01.49 06.03 04.22 04.47
RSDc 41.5 22.3 12.1 26.4 30.1 21.3
Confidence interval:d P = 0.05 17±5 13±2 12±1 23±4 14±3 21±3
Confidence interval:d P = 0.01 17±7 13±3 12±2 23±6 14±4 21±5
Student’s t = |x– – μ|n½/s (t statistic values) 01.2 01.4 04.9 0.84 0.23 0.28
Is there a significant difference?e No No Yes No No No
aBased on the data reported in Table 1.
bThese mean values are reported at the manufacturer’s Web site: http://global.mms.com/br/about/products/milkchocolate.jsp (accessed Jun 2008).
cRSD is the relative standard deviation, and is given by 100(s/x – ).
dThe confidence interval is determined by (x– – t ) s/n½ < μ < (x– + t ) s/n½, where t is the critical value of Student’s t-test. Confidence levels of 95% and
99% are represented by probability values in which P = 0.05 or 0.01, respectively.
eComparison of the mean value supplied by the manufacturer and the mean value of the ten bags evaluated in this experiment.

Table 3. Parametric Comparison of the Two Random Samples of Candies


Candies Classified by Color, %
Parameters Blue Brown Green Orange Red Yellow
Sample batch A Mean,x–A (n = 3) 26 13 13 15 11 22

Sample batch B Mean, x–B (n = 7) 13 13 12 26 15 21

Sample batch A standard deviation, sA 3.0 1.5 0.58 5.0 5.8 7.2

Sample batch B standard deviation, s B 3.0 3.4 1.8 2.5 3.2 3.5

Is there a significant difference? Yes No No Yes No No

Note: Significance of the comparison of the mean values between the two sample batches ( x–A – x–B ) evaluated at P = 0.5.

© Division of Chemical Education • www.JCE.DivCHED.org • Vol. 85 No. 8 August 2008 • Journal of Chemical Education 1085
In the Classroom

respectively. Taking the null hypothesis that the two means are we generally deal with small sets of data, sometimes fewer than
equal, we need to test whether ( xA − xb ) differs significantly five results, and in some instances we are interested in methods
from zero. First the F-test was applied for the comparison of that do not require the assumption of normally distributed
standard deviations (8). Both samples had standard deviations data. Methods that make no assumptions about the shape of a
that did not differ significantly, which allows calculation of a data set’s distribution are called nonparametric or distribution-
pooled estimate of standard deviation from the two individual free methods.
standard deviations, sA and sB. In turn, the value of t was obtained
and compared with the critical value of t, using 8 degrees of A Nonparametric Approach
freedom [(nA + nB) − 2].
Typical statistical tests incorporate assumptions about A nonparametric approach to data analysis uses the same
the underlying normal (Gaussian) distribution of data, and data (Table 1); however, instead of the mean, the students are
hence rely on distribution parameters. Statistical values such as now asked to calculate the median for each color from the 10
means, standard deviations, and confidence limits are, strictly bags. In addition, the lower and upper quartiles should be calcu-
speaking, for a large population size. In analytical chemistry, lated, as well as the smallest (minimum) and the greatest (maxi-
mum) values in the distribution. Values categorized into these
five rankings are then represented in a simple visual way by a box-
35 and-whisker plot (8) (Figure 1), where the immediate visuals are
the center, the spread, and the overall range of distribution. A
30 box-and-whisker plot consists of a rectangle (the box) with two
Fraction of Candy (%)

lines (the whiskers) extending from opposite edges of the box,


25
and a further line in the box, crossing it parallel to the edges. The
20 ends of the whiskers indicate the range of the data, the edges of
the box from which the whiskers protrude represent the upper
15 and lower quartiles, and the line crossing the box represents the
10
median of the data. A box-and-whiskers plot, accompanied by a
numerical scale, is a graphical representation of the five-number
5 summary, thus, the data set is described by its extremes, its lower
and upper quartiles, and its median (see Table 4). The plot shows
0 at a glance the spread and the symmetry of the data (8). After
blue brown green orange red yellow
considering these results, no values were rejected.
Color of Candy
The comparison of results obtained by parametric and non-
Figure 1. Box-and-whisker plots for fractions (%) of each color of parametric approach is shown in Table 5. The differences between
candies. (The black squares inside the boxes represent the means). the mean and the median are not significant, indicating that the
data can be drawn from a normal distribution, which makes
sense for the sampling exercise used. One method of testing this
Table 4. Comparison of the Five-Number Summary for Each Color hypothesis is by using a χ2 test. This method, unfortunately, is
Sample Minimum, Lower Median, Upper Maximum,
only reliable in cases with at least 50 data points.
Color % Quartile, % % Quartile, % %
Blue 10 11 15 23 29 Sample Amount Effect
Brown 09 11 13 14 19 At this point the students are told of the relationships of
Green 10 11 12 13 15 the operations involved in sampling and analysis. The concepts
of primary sample (bulk sample), reduced sample, subsample,
Orange 10 20 24 27 30
laboratory sample, and test sample are discussed. It is pointed
Red 08 11 13 17 20 out that the term “sample” implies the existence of sampling
Yellow 16 18 19 24 30 error, which arises from a lack of homogeneity in the popula-
tion. Since sampling error is always associated with analytical
error, it must be isolated by the statistical procedure of analysis
of variance (9). It will be assumed in our experiment that all the
Table 5. Comparison of Results Obtained
candies from the 10 bags represent the bulk sample and the task
by Parametric and Nonparametric Approaches
now is to obtain the laboratory sample from the bulk sample of
Sample Parametric Nonparametric the material.
Color (Mean), % (Median), % In this part of the exercise the students discussed and simu-
Blue 17.0 15 lated different conditions of sampling (using different containers
Brown 12.9 13 and reducing by quartering) from a bulk sample with known
composition, which was obtained by mixing all the candies
Green 12.3 12 in a large container. The data collected and the basic statisti-
Orange 22.8 24 cal parameters calculated are presented in Table 6, in terms of
Red 14.0 13 percentages. The values of the relative errors for each sampling
procedure are shown in parentheses. The same statistical treat-
Yellow 21.0 19
ments applied before were also used in this case.

1086 Journal of Chemical Education • Vol. 85 No. 8 August 2008 • www.JCE.DivCHED.org • © Division of Chemical Education
In the Classroom

Table 6. Dispersion of Candies Reported by Color for Each Sample


Candies Classified by Color, % (Relative Error)
Sampling Methods Total Candies Blue Brown Green Orange Red Yellow
Observed means of the population (µ)
1196 17.0 12.9 12.3 22.8 14.0 21.0
Sampling with a small container (50 mL cup)
1 0040 15 (−12) 08 (−38) 15 (−22) 10 (−56) 20 (−43) 32 (−52)
2 0049 16 (−5.9) 16 (24) 04 (−67) 33 (45) 06 (−133) 25 (19)
3 0043 18 (5.9) 12 (−7.0) 19 (54) 19 (−17) 21 (50) 11 (−47)
4 0042 17 (0) 12 (−7.0) 17 (38) 21 (−7.9) 09 (−35) 24 (14)
5 0035 14 (−18) 06 (−53) 20 (63) 06 (−74) 23 (64) 31 (48)
Sampling with a large container (200 mL cup)
1 0177 15 (−12) 11 (−15) 16 (30) 22 (−3.5) 11 (−21) 25 (19)
2 0153 18 (5.9) 12 (−7.0) 11 (−11) 20 (−12) 14 (0) 25 (19)
3 0185 24 (41) 15 (16) 10 (−19) 19 (−17) 13 (−7.1) 19 (−9.5)
4 0191 15 (−12) 17 (32) 12 (−2.4) 20 (−12) 17 (21) 19 (−9.5)
5 0192 19 (12) 09 (−30) 15 (22) 24 (5.3) 12 (−14) 21 (0)
Reducing by quartering
1 0265 16 (5.9) 14 (8.5) 12 (2.4) 23 (0.0088) 13 (7.1) 22 (4.7)
Note: Values in parenthesis are the relative errors, calculated by 100|x– – μ|/μ, where µ is the known value of the bulk sample.

Table 7. Comparison of the Percentage of Purple Candies in Each Sample


Purple Candies or “Analyte”, % (Relative Error)
Samples Actual Values (µ) Sampled with a 50 mL Cup Sampled with a 200 mL Cup Reduced by Quartering
1 0.99 2.5 (152) 1.0 (1.0) 0.98 (–1.0)
2 0.99 0.0 (–100) 1.9 (92) —
3 0.99 1.9 (92) 0.49 (–50) —
Note: Values in parenthesis are the relative errors, calculated by 100|x– – μ|/μ, where µ is the known value of the bulk sample with purple candy added.

It is possible to observe in these experiments that the In the same way, the students also simulated a sample
number of candies sampled influences the values of the relative with an analyte present in low concentration, by the addition
errors. Whereas with the small sampling cup the relative errors of 0.99% purple candies to the total material. Once more, the
varied from ‒133 to 64%, with the larger cup the values range effect of using different sampling procedures was evaluated, as
between ‒30 to 41%. Thus, the larger container resulted in a rela- summarized in Table 7.
tive error about three times smaller than provided by the smaller
one. These numbers elucidate to the students the improvement
in sampling caused by increasing the sample amount from ap- 40 Sample Size:
Fraction of Candy (%)

proximately 42 to 180 candies per collection. For reducing by 50 mL 200 mL


quartering—even though the procedure was only made once, 30
in contrast to the five replicates for the other samplings—the
relative error observed was 0.0088 to 8.5%, still smaller than 20
the values observed with the other sampling procedures. This is
because reducing by quartering results in a larger sample (265 10
candies in this case) and because this method was developed to
optimize a sampling condition, resulting in smaller errors (8). 0
Another discussion topic concerns dispersion of data points blue brown green orange red yellow
resulting when the same sample procedure is used. The values were Color of Candy
graphically presented (Figure 2), so students can observe that a Figure 2. Percentage of candies sampled using the small container
greater dispersion usually occurs with a smaller sampling con- (50 mL) and the larger container (200 mL). The solid bar represents the
tainer than with a larger one (blue and brown were exceptions). expected value of each color, as provided by the manufacturer.1

© Division of Chemical Education • www.JCE.DivCHED.org • Vol. 85 No. 8 August 2008 • Journal of Chemical Education 1087
In the Classroom

sequence of three unitary operations: grinding, homogeniza-


tion (by mixing), and separation of the samples by different
ranges of mesh (defined ranges of particle sizes) before sam-
pling occurs.

Conclusions
This classroom experiment with candies has been used
for six consecutive semesters in a chemistry course for under-
graduates, and in a graduate chemistry course. It successfully
introduced the undergraduate students to important concepts
of statistics and sampling techniques. This approach is easy to
implement and engages students in learning in a stimulating
way that is lucid and concrete—as well as tasty—because all the
statistical data used was obtained by the students themselves.

Notes
1. M&M candies have chocolate interiors; some also have peanuts
or almonds in the center. For more information, see the manufacturer’s
Web page: http://global.mms.com/br/about/products/milkchocolate.jsp
(accessed Jun 2008).
2. As the name implies, M&M Minis are smaller-sized than the
conventional version.
3. Confeti candies are manufactured by Kraft Foods Brazil S. A.

Figure 3. The size gradient formed by the different-sized candies Acknowledgments


mixed inside a glass jar.
The authors are grateful to all the students who participated
in the exercises, and thank C. H. Collins for language assis-
tance.
The students could note again the reduction in sampling
error as the sample amount is increased, denoted by the decrease Literature Cited
in relative errors obtained using the small cup, the larger cup,
and by quartering. This means that, as the amount sampled 1. Cohen, R. D. J. Chem. Educ. 1992, 69, 200–203.
becomes larger, it better represents the bulk sample, up to the 2. Vitha, M. F.; Carr, P. W. J. Chem. Educ. 1997, 74, 998–1000.
limit of the entire sample, which represents the actual value of 3. Cohen, R. D. J. Chem. Educ. 1991, 68, 902–903.
the material. 4. Salzsieder, J. C. J. Chem. Educ. 1995, 72, 623–625.
5. Carter, D. W. J. Chem. Educ. 1985, 62, 497–498.
Particle Size Effect 6. Spencer, R. D. J. Chem. Educ. 1984, 61, 555–563.
7. Ross, M. R. J. Chem. Educ. 2000, 77, 1015–1016.
In this exercise, students easily observe the effect of differ- 8. Miller, J. C.; Miller, J. N. Statistics for Analytical Chemistry, 3rd
ent particle sizes during the sampling of solid materials. All the ed.; Ellis Horwood PTR Prentice Hall: New York, USA, 1994.
different candies were mixed together inside the flask, resulting 9. Horwitz, W. Pure Appl. Chem. 1990, 62, 1193–1208.
in a size distribution in which the smaller candies accumulated
at the bottom of the flask, while the bigger candies were more Supporting JCE Online Material
evident at the top of the flask. Figure 3 shows a photograph of
http://www.jce.divched.org/Journal/Issues/2008/Aug/abs1083.html
this phenomenon.
After this visual experiment, the students were questioned Abstract and keywords
about the errors in sampling that might result from this effect,
Full text (PDF)
namely, size segregation in real samples. The students were also
asked about possible ways of eliminating this type of error. In Links to cited URLs and JCE articles
real applications, for example, the simplest procedure used is a Color figures

1088 Journal of Chemical Education • Vol. 85 No. 8 August 2008 • www.JCE.DivCHED.org • © Division of Chemical Education

You might also like