0% found this document useful (0 votes)
62 views

Math1040skittlestermproject Tomac-Dylan

The document summarizes a student's term project analyzing data from 29 bags of Skittles candy. Key findings include: - Red Skittles were the most common color overall, followed by orange, yellow, green and purple. However, in the student's individual bag, red was the least common and yellow the most. - The mean number of candies per bag was 59.45 with a standard deviation of 2.38, indicating consistent bag sizes. - Confidence intervals and hypothesis tests were conducted on the data, including the proportion of yellow candies, mean number of candies per bag, and standard deviation of candies per bag. - The student concluded that statistical analysis can provide meaningful insights

Uploaded by

api-326004315
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Math1040skittlestermproject Tomac-Dylan

The document summarizes a student's term project analyzing data from 29 bags of Skittles candy. Key findings include: - Red Skittles were the most common color overall, followed by orange, yellow, green and purple. However, in the student's individual bag, red was the least common and yellow the most. - The mean number of candies per bag was 59.45 with a standard deviation of 2.38, indicating consistent bag sizes. - Confidence intervals and hypothesis tests were conducted on the data, including the proportion of yellow candies, mean number of candies per bag, and standard deviation of candies per bag. - The student concluded that statistical analysis can provide meaningful insights

Uploaded by

api-326004315
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Math 1040 Skittles Term Project

Dylan Tomac
7/09/16
Math 1040-11 Ping Yu



Introduction:
The goal of this project is to catalog the data of 29 bags of Skittles from each student
in my 1040 class. Each student was required to buy a standard size bag of Skittles
and report the number of each color (along with the total number) to the Professor
in order to compile a reasonable amount of data to pull from. Once the data was
complied we created several charts and observations about the candy. Our findings
are as follows.

Number of Candies by Color:
Upon seeing the total number of Candies I assumed that the ratios of each color
would reflect my own findings in a standard size bag. But it seems that my bag might
have been an anomaly when comparing the two data collections.

In the overall data, the ratio of each color seems to be reasonably predicable. In
order from greatest to least, it went Red, Orange, Yellow, Green and Purple. Seeing
as Red is primary color of the Skittles Brand, I expected it to have the highest
frequency only to be proceeded by the next color in relation to the color wheel
Orange, Yellow, Green and Purple.













But in my individual bag, Red was actually the least and Yellow was the greatest.
Posing an interesting perspective on data and how a small sample can be a
misrepresentation of the overall population.

The Number of Candies per Bag:


I was impressed to find the mean of the candies per bag to be 59.45 and the
standard deviation to be 2.38. In the mass production of Candy, for some reason
expected there to be a broader variance in the number of candies from bag to bag.
Its reassuring that if I bought 29 bags of Skittles, that the mean is around 59
candies. Which is essentially what I found within my own bag of 62 Candies.

To better visualize my observations from above, please refer to the Histogram and
Box Plot below. Also note that when using a Box Plot you can see (and confirm) that
53 Candies is an outlier within the data:












Reflection:
The data in this project was very categorical and aligned nicely with Pie Charts and
different types of Bar Graphs due to its use of color. The Candy could be split up in
different Color groups and organized nicely into the aforementioned graphs.

While Pie and Bar graphs frequently work well with categorical data, they do not
pair well with more quantitative data like how much babies weigh at birth. This type
of data works better in Stem-and-Leaf plots, Dot plots, and some value bar charts
where the data needs a more specific representation of the overall picture.

Part II

Confidence Interval Estimates:


Confidence Intervals are used to express the degree of uncertainty within a given
sample statistic. CIs are mostly stated at a 95% confidence level, but can also be
stated at different levels like 90% and 99%. There may be lower confidence levels
being utilized, but within our class we encounter the 3 variations.

See Attached Paper for the following:
1. 99% Confidence Interval for True Proportion of Yellow Candies
2. 95% Confidence Interval for True Mean number of Candies
3. 98% Confidence Interval for Standard Deviation number of Candies per Bag

Hypothesis Tests:
The purpose of a Hypothesis Test is to determine if enough statistical data exists to
prove a certain belief, hypothesis or parameter. In most situations you would
compare the sample data to the hypothesis about the overall population, thus
proving that the hypothesis is reasonable or inaccurate.

See Attached Paper for the following:
1. 0.05 Significance Level to test the claim that 20% of Skittles candies are Red
2. 0.01 Significance Level to test the claim that the mean number of Skittles in
a bag is 55.

Reflection:
A Confidence Interval is generally used when stating a claim relating to a given
sample with a certain degree of uncertainty. For example, if you randomly sample a
48-pack of batteries, measure the run time, and calculate that the 95% confidence
interval is 13-14 hours. This would indicate that you are 95% confident that the
mean for the entire population of the batteries will fall within that range.

A Hypothesis Test is used to determine if a given Hypothesis is true when discussing
a population and a sample. You use the sample to compare and evaluate if the data
reflects your Hypothesis about the population parameter. For example, if you
sampled a neighborhood in a given city to see how many houses have garages, you
would then use this sample to draw and test a Hypothesis about the overall
population of the homes within the city.

In both tests human error could be very likely when inputting and recording data
points. When working on several problems I could have recorded data wrong from
the beginning, or even left out a number or read the incorrect probability from the
table. I found that when thoroughly checking my data and using the correct tools,
my findings usually came out accurately. But it required attention and proofing on
every problem.

For improving the sampling method, it would be beneficial to have a tighter control
on the actual samples. Having 29 different people count Skittles at various times and
locations can introduce a large possibility for error. Having a smaller number of
people count the data or having everyone in the same room when counting could
dramatically decrease the possibility of skewed data.

As far as conclusions, I have found that its far easier to than I first thought to create
a statistical study on a given subject. Especially if the data is easily accessible like the
Skittles were. I was also surprised to find how much information you could get from
a sample, and the conclusions you could draw from a solid Hypothesis.






Reflective Writing and e-Portfolio


What I have learned
Before this class started I was quite nervous about how well I would understand
and absorb the material we covered. Even now at the end of this course Id say
that my knowledge is amateur at best, but I can say with confidence that this
project not only helped expand my knowledge but also helped me understand
how important statistics can be. I never really knew how it could be applied; at
least technically speaking. It was extremely empowering to look back and see
that I was able to take a set of data, create graphs, conclusions, run tests and form
my own Hypothesis on what it means. It was difficult at times to understand the
different processes, but knowing that they can apply so directly in my life gives
the material purpose and meaning.
Not all of the material was easy to understand and Id say that I had the most
difficultly with Hypothesis Testing and the use of Classical Method vs P-Value
Method. Im not sure which one was more recommended, but I plan to study
them both excessively before the final in hopes of being fluent! Difficulties aside,
I really did enjoy this class even though it was challenging and sometimes very
frustrating. I feel that it will be far more applicable and useful then some of the
items I learned in 1010. That being said, Im very happy that I chose to take Math
1040 and not Math 1030, and Id recommend it to anyone who has to decide
between the two.

You might also like