Statistics Project
Statistics Project
Report introduction
For our project in MATH 1040 Introduction to Statistics, the whole class must buy (each student) a 2.17
individual size bag of skittles and proceed to count and find out the number of each color of candy in the
bag. The class data was collected, and we used it for several different exercises concerning different
aspects of statistics as we were learning in each class.
In the first part of the project, we had to determine the proportion of each color of candy and created a
Pareto chart and a Pie chart and using the total number of each color candy from the data of the entire
class given to us for the teacher. Comparing our personal data with the one from the whole class and
noticing any similarities or differences between the two of these.
Next, we used the skittles data to created statistics summaries of the mean, standard deviation and 5-
number summary. We construct a frequency histogram of the total number of candies as well as a box
plot chart. Individually, I also wrote a paragraph about the implication of different qualitative and
quantitative methods of analysis.
At the end of the project we worked with confidence intervals. We found three different confidence
intervals for the population, mean, and standard deviation. I wrote an analysis about what each
confidence interval meant. We worked with Hypothesis test, explaining the general purpose and
meaning of these.
To finish this project, we wrote a Reflective writing paper and explain what we had learned doing this
project and how this will impact in other classes and in my personal professional life.
ORGANIZING AND DISPLAYING CATEGORICAL DATA: COLORS
In this project, the sample is the class data. Since not everyone in the class is currently living in the same
city, and some of the students live out of the state, then the population will be all 2.17 ounces skittles
bags in the United States. There are currently different manufacturing plants operating overseas,
therefore the population can only rationally be expanded to include the United States distribution circuit
only.
ORGANIZING AND DISPLAYING QUANTITATIVE DATA: THE NUMBER OF CANDIES PER BAG
Using de total number of candies in each bag in the class sample, calculate de mean, standard deviation,
and 5-number summary:
Confidence Intervals estimated from a population proportion are used to determine, with the specified
degree of confidence, the proportion of a characteristic found within a population.
99% CONFIDENCE INTERVAL ESTIMATE FOR THE POPULATION PROPORTION OF YELLOW CANDY
364
𝑆𝑎𝑚𝑝𝑙𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛: = 0.204
1786
∝ 0.01
𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒: 99% = 𝑍 =𝑍 = 𝑍 0.005 = 2.575
2 2
√0.204(1 − 0.204)
𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝐸𝑟𝑟𝑜𝑟: 2.575 ∗ = 0.024
1786
𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡: 0.204 − 0.024 = 0.18
𝑈𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡: 0.204 + 0.024 = 0.228
99% Confidence Interval Estimate: (0.179, 0.229)
In relation to the skittles, we are 99% confident that the proportion of yellow skittles in any bag of
skittles falls between 0.18 and 0.23.
95% CONFIDENCE INTERVAL ESTIMATE FOR THE POPULATION MEAN NUMBER OF SKITTLES PER BAG
∝ 𝑠 1.72
𝑈𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑 ∶ 𝑥 + 𝑡 ∗ = 59.5 + 2.00 ∗ = 60.13
2 √𝑛 √29
∝ 𝑠 1.72
𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝐸𝑟𝑟𝑜𝑟 ∶ 𝐸 = 𝑡 ∗ = 𝐸 = 2.00 ∗ = 0.638.
2 √𝑛 √29
We are 95% confident that the mean number of candies per bag is between 58.8 and 60.1.
HYPOTHESIS TEST
1. For testing the claim that 20% of all skittles are red.
n= 1786 x= 330 p^=0.185 Level of Significance=.05
Ho: p=.20 H1: p ≠ .20
Z=-1.609 p=0.108
P >.05
Because p-value is greater than the level of significance .05, fail to reject hypothesis null.
2. Testing for mean number of candies.
N=30 x-bar=59.5 Level of Significance=.01 Sx=1.72
Ho: μ=55 H1: μ ≠55 t=14.330
p-value 1.085>.01
Fail to reject the null hypothesis
There is not enough evidence to warrant rejection of the claim that the mean number of candies in a
bag of skittles is 55.
REFLECTION
The purpose of taking data from a designated sample and calculating statistics from that sample is to
estimate the overall population. One of the issues when calculating statistics from a sample for the
population is how well that sample represents the population parameter. A confidence interval is put
into place to help alleviate that issue by allowing us to provide a range that the population parameter
will most likely fall into. Each interval is constructed with a level of confidence, such as 95%, 98% or 99%.
The higher level of confidence the more likely someone is to accept your hypothesis.
One of the possible errors that could have been made is during data entry. Everyone submitted their
amount of skittles per color and their total. It could have been mistyped or calculated when the data
was being compiled. Another possible error that could have been made is if one of the students didn’t
participate with purchasing a bag of skittles and just submitted false information. The Sampling method
could be improved by requiring all students to purchase their bag of skittles and bring them to class on a
specific date. Then, to calculate each color and total in person so that everyone participating could
physically see what each person had.
The conclusions that have been drawn from our statistical research and from doing the hypothesis test,
is that 20% of our skittles are red and that the mean number of candies is 55. These hypothesis tests
were confirmed through a 95-99% confidence interval, which concluded that we failed to reject the null
hypothesis since p-value is greater than our level of significance.