M&M J Chem Educ
M&M J Chem Educ
The first step in any chemical analysis is to obtain an ana- given should be adequate for undertaking this activity in a class
lytical sample of the bulk material. In effect, the reliability of of 10–35 students. The variety of sizes, shapes, and colors is
an analytical result is often conditioned by the quality of the important for representing heterogeneity in data.
original sample. The sample must have the same chemical and
• 10 packages (104 g) of M&M1 candies
physical properties of the raw material, so that it represents well
the material that will be analyzed. In these cases, sampling is • 1 package (98 g) of M&M candies with peanuts
directly responsible for the accuracy of the analytical results. The • 2 packages (35.2 g) of M&M Minis2
best way to sample would be to obtain large samples at random • 1 package (80 g) of Confeti3 chocolate candies
from the total population, based on the idea that as the sample
• Disposable gloves
size approaches the population size the errors decrease to zero
(1). In practice, some factors—such as measurement costs and • Plastic tray or paper plates
facilities for manipulating huge amounts of the bulk material— • Paper cups (50 and 200 mL capacities)
make it impractical to select large, essentially unlimited samples.
The gloves, plastic tray or paper plates, and paper cups were
A typical random sample is usually far smaller than desired, rais-
used in all the sample manipulation with hygiene in mind, so
ing concerns about how accurately the sample really represents
that the samples could be eaten at the end of the experiment.
the bulk material. This doubt can be answered by statistical
This exercise has been developed and used with students in
analysis of the data (2).
both an undergraduate classical analytical chemistry course
These facts justify the important need for students to un-
and a graduate course, Statistics in Analytical Chemistry, from
derstand all the challenges involved in sampling techniques, the
2000 to present.
first step of any chemical analysis. In spite of that, students in
classroom experiments are usually presented with homogeneous
Data Acquisition
samples, so they tend to believe that sampling and statistical
analyses are not problems that they have to deal with. Some In the first step, students were divided into ten groups and
references do describe ways to present to students how random each group was responsible for data acquisition from one bag
sampling works and how it could be representative (1, 3), while of the regular M&M candies. Each group counted the candies,
others propose different exercises emphasizing statistical analysis separating and reporting them by color. The students were asked
of data (4–6). However, these exercises usually require extended to compile the data and to start the statistical evaluation.
laboratory periods or are very theoretical.
In 2000, Ross (7) proposed a simple, fast, and didactic Parametric and Nonparametric Approaches
classroom exercise using colored candies, which could easily
In order to show the students the theoretical and practical
demonstrate the effect of sample and particle size in sampling.
differences between parametric and non-parametric approaches,
However, this paper does not address statistics; this is more fully
the raw data were statistically evaluated by comparison of param-
explored by Vitha and Carr (2).
eters, representations, and by the application of tests from both
Inspired by these papers (2, 7), we developed and imple-
statistical methods. The results were also statistically compared
mented a more complete classroom exercise for undergraduate
to an average composition of the bags, provided at the manu-
and beginning graduate students to explore both sampling and
facturer’s Web site.1
statistics. It is an easy, interesting exercise that takes ~1.5 hours to
demonstrate the effects involved in sampling techniques (sample
Sample Amount Effect—Part I
amount and particle size and the representativeness of the sample
in relation to the bulk material). This exercise also includes a Next, all the M&M candies from the 10 bags that had been
simple statistical approach to commonly used parameters (mean, sorted by color were put together in a large plastic tray to simu-
median, standard deviation, errors, quartiles, and confidence late the population of a “bulk material”. The students obtained
limits), presentation of results, graphs (histogram, box-plot, and this composition by aggregating the data of all 10 bags. Then,
whisker plot) and related tests (normality, outliers, significance) the groups collected “samples” of the bulk material, using two
using parametric and non-parametric statistical methods. different sizes of cups (50 and 200 mL). Each group sampled five
times. In the next step, a reducing procedure by quartering was
Procedure used to enable the students to evaluate how representative each
kind of sampling is. For the quartering procedure, the candies
Materials were uniformly spread inside a circular tray and then divided
For the sampling exercise, we used sugar-coated, round radially into quarters. The opposite quarters were combined.
chocolate candies available in several colors, sizes, and types. All This process was repeated two more times. The statistical treat-
of the candies were purchased at a local market; the quantities ment of data was the same as used before.
© Division of Chemical Education • www.JCE.DivCHED.org • Vol. 85 No. 8 August 2008 • Journal of Chemical Education 1083
In the Classroom
Groups (Bags) Candies per Bag Blue Brown Green Orange Red Yellow
1 127 29 13 13 10 18 17
2 127 23 11 12 16 8 30
3 128 27 14 13 20 8 18
4 119 15 9 15 26 16 19
5 114 11 19 11 23 12 24
6 118 10 15 11 24 20 20
7 115 15 14 10 28 11 22
8 119 10 13 12 30 17 18
9 115 18 11 14 24 17 16
10 114 12 10 12 27 13 26
Total 1196
1084 Journal of Chemical Education • Vol. 85 No. 8 August 2008 • www.JCE.DivCHED.org • © Division of Chemical Education
In the Classroom
In the next exercise the students had to verify whether a sig- The comparisons were made by using the critical values
nificant difference existed between the means reported for each of t for 9 degrees of freedom 2.26 and 3.25, respectively, for
color (n = 10 bags) and the values supplied by the manufacturer significance levels of P = 0.05 and P = 0.01 (8). Significant
(used as the actual values, μ), in order to evaluate whether the differences were observed for both significance levels between
data obtained from the sampling experiment accurately represent the samples evaluated and the population (supplied by the
the bulk material or bulk sample or, in this case, a large popula- manufacturer) just for the green amount of candies, the color
tion, denoted by the provided values. Significance testing was that also presented the lowest standard deviation and, as a con-
introduced and a null hypothesis (H0) was formulated: the two sequence, the lowest confidence interval (see Table 2). For the
population means are equal. It is important to emphasize that green candies, the null hypothesis is rejected as it is statistically
to accept a hypothesis does not mean that it is true, only that we understood, since a data distribution with a smaller dispersion
do not have evidence to believe otherwise. Thus hypothesis tests means greater precision near the arithmetic mean. In this way,
are usually stated in terms of both a condition that is doubted any value that is not very close will not be contained inside the
(null hypothesis) and a condition that is believed (alternative confidence interval provided by the Gaussian distribution curve.
hypothesis). In our study the alternative hypothesis would be The opposite effect could be observed for bigger dispersions of
that the two population means are not equal. The students are data (see the data for the blue and orange candies).
asked to test the hypothesis using a t-test (8) for each color and New information was introduced to the students. Bags
a significance level of 0.05. The significance level, P, defines the 1–3 and bags 4–10 came from sample batches A and B, respec-
sensitivity of the test. A value of P = 0.05 means that we inadver- tively. Now it was asked whether a significant difference existed
tently reject the null hypothesis 5% of the time when it is in fact between the two sample batches of the candies in relation to
true. The choice of P is somewhat arbitrary, although in practice each color (see Table 3). In this case, we compared two sample
a value of 0.05 is commonly used in analytical chemistry. means xA and xb, which correspond to sample batches A and B,
Sample batch A standard deviation, sA 3.0 1.5 0.58 5.0 5.8 7.2
Sample batch B standard deviation, s B 3.0 3.4 1.8 2.5 3.2 3.5
Note: Significance of the comparison of the mean values between the two sample batches ( x–A – x–B ) evaluated at P = 0.5.
© Division of Chemical Education • www.JCE.DivCHED.org • Vol. 85 No. 8 August 2008 • Journal of Chemical Education 1085
In the Classroom
respectively. Taking the null hypothesis that the two means are we generally deal with small sets of data, sometimes fewer than
equal, we need to test whether ( xA − xb ) differs significantly five results, and in some instances we are interested in methods
from zero. First the F-test was applied for the comparison of that do not require the assumption of normally distributed
standard deviations (8). Both samples had standard deviations data. Methods that make no assumptions about the shape of a
that did not differ significantly, which allows calculation of a data set’s distribution are called nonparametric or distribution-
pooled estimate of standard deviation from the two individual free methods.
standard deviations, sA and sB. In turn, the value of t was obtained
and compared with the critical value of t, using 8 degrees of A Nonparametric Approach
freedom [(nA + nB) − 2].
Typical statistical tests incorporate assumptions about A nonparametric approach to data analysis uses the same
the underlying normal (Gaussian) distribution of data, and data (Table 1); however, instead of the mean, the students are
hence rely on distribution parameters. Statistical values such as now asked to calculate the median for each color from the 10
means, standard deviations, and confidence limits are, strictly bags. In addition, the lower and upper quartiles should be calcu-
speaking, for a large population size. In analytical chemistry, lated, as well as the smallest (minimum) and the greatest (maxi-
mum) values in the distribution. Values categorized into these
five rankings are then represented in a simple visual way by a box-
35 and-whisker plot (8) (Figure 1), where the immediate visuals are
the center, the spread, and the overall range of distribution. A
30 box-and-whisker plot consists of a rectangle (the box) with two
Fraction of Candy (%)
1086 Journal of Chemical Education • Vol. 85 No. 8 August 2008 • www.JCE.DivCHED.org • © Division of Chemical Education
In the Classroom
It is possible to observe in these experiments that the In the same way, the students also simulated a sample
number of candies sampled influences the values of the relative with an analyte present in low concentration, by the addition
errors. Whereas with the small sampling cup the relative errors of 0.99% purple candies to the total material. Once more, the
varied from ‒133 to 64%, with the larger cup the values range effect of using different sampling procedures was evaluated, as
between ‒30 to 41%. Thus, the larger container resulted in a rela- summarized in Table 7.
tive error about three times smaller than provided by the smaller
one. These numbers elucidate to the students the improvement
in sampling caused by increasing the sample amount from ap- 40 Sample Size:
Fraction of Candy (%)
© Division of Chemical Education • www.JCE.DivCHED.org • Vol. 85 No. 8 August 2008 • Journal of Chemical Education 1087
In the Classroom
Conclusions
This classroom experiment with candies has been used
for six consecutive semesters in a chemistry course for under-
graduates, and in a graduate chemistry course. It successfully
introduced the undergraduate students to important concepts
of statistics and sampling techniques. This approach is easy to
implement and engages students in learning in a stimulating
way that is lucid and concrete—as well as tasty—because all the
statistical data used was obtained by the students themselves.
Notes
1. M&M candies have chocolate interiors; some also have peanuts
or almonds in the center. For more information, see the manufacturer’s
Web page: http://global.mms.com/br/about/products/milkchocolate.jsp
(accessed Jun 2008).
2. As the name implies, M&M Minis are smaller-sized than the
conventional version.
3. Confeti candies are manufactured by Kraft Foods Brazil S. A.
1088 Journal of Chemical Education • Vol. 85 No. 8 August 2008 • www.JCE.DivCHED.org • © Division of Chemical Education