0% found this document useful (0 votes)
49 views

Mathproject

Uploaded by

api-237360764
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Mathproject

Uploaded by

api-237360764
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Brittany Allen MATH 1040-008 R.

Christensen Statistics Term Project Part 1 As a group, we selected the body measurements data set for our population. Part 2 As a group, we selected Age as our categorical variable. The population proportion for our categorical variable values from the entire population is represented below as a pie chart constructed from the population frequency distribution:

Relative Frequency of Ages (n = 507)


3% 1% 0% 14%

5%
0-9

10 - 19
20 - 29 54%

23%

30 - 39
40 - 49 50 - 59

Relative Frequency of Ages (n = 507) Age Relative (years) Frequency Frequency 0-9 0 0 10 - 19 26 0.051282051 20 - 29 271 0.534516765 30 - 39 118 0.232741617 40 - 49 69 0.136094675 50 - 59 16 0.031558185 60 - 69 7 0.013806706 Total 507 1

As a group, we chose to take two samples (size 30 < n < 35) from our categorical data set represented by the following simple random sample and systematic sample. Simple Random Sample (n = 33) The simple random sample was obtained utilizing the Excel Data Analysis function for Sampling, inputting the entire population of ages listed in the data set, and generating a random sample of 33 ages. The sample list of ages was then placed into the following frequency distribution, one sorted by class and the other sorted in frequency order from highest to lowest. Simple Random Sample (n = 33)
Frequency of Ages (n = 33) Age (years) Frequency 0-9 0 10 - 19 3 20 - 29 18 30 - 39 7 40 - 49 5 50 - 59 0 60 - 69 0 Total 33 Age (years) 20 - 29 30 - 39 40 - 49 10 - 19 0-9 60 - 69 50 - 59 Total Frequency 18 7 5 3 0 0 0 33 Relative Frequency 0.545454545 0.212121212 0.151515152 0.090909091 0 0 0

The distribution tables from the sample on the preceding page were used to construct the following bar graph, Pareto chart, and pie chart:

Frequency of Ages (n = 33) Bar Graph


20 Frequency 15 10 5 0 0-9 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59 60 - 69 Ages (years)

Relative Frequency of Ages (n = 33) Pareto Chart


Relative Frequency 60% 40% 20% 0% 20 - 29 30 - 39 40 - 49 10 - 19 Ages (years) 0-9 60 - 69 50 - 59

Relative Frequency of Ages (n = 33) Pie Chart


0% 0% 15% 0% 0-9 9% 10 - 19 20 - 29 21% 55% 30 - 39 40 - 49 50 - 59 60 - 69

Systematic Sample (n = 33) The systematic sample was obtained by using the raw data age list and sampling every 14th age from the list. This procedure utilized Excel to assign a number in consecutive order to each age in the entire population data set. Utilizing the MOD function in excel, each consecutive number in the data set was assigned a MOD number (remainder) after diving the assigned number by 14. The MOD number then was sorted in ascending order, and all ages with the MOD number of 0 were used as the sample set of every 14th age. The sample list of ages was then placed into the following frequency distribution, one sorted by class and the other sorted in frequency order from highest to lowest, shown on the following page.

Frequency of Ages (n = 33) Age (years) Frequency Relative Frequency 0-9 0 0 10 - 19 2 0.060606061 20 - 29 21 0.636363636 30 - 39 4 0.121212121 40 - 49 5 0.151515152 50 - 59 1 0.03030303 60 - 69 0 0 Total 33 1 Age (years) 20 - 29 40 - 49 30 - 39 10 - 19 50 - 59 0-9 60 - 69 Total Frequency 21 5 4 2 1 0 0 33 Relative Frequency 0.636363636 0.151515152 0.121212121 0.060606061 0.03030303 0 0 1

The above distribution tables from the sample were used to construct the following bar graph, Pareto chart, and pie chart:

Frequency of Ages (n = 33) Bar Graph


25 Frequency 20 15 10 5 0 0-9 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59 60 - 69 Ages (years)

Relative Frequency of Ages (n = 33) Pareto Chart


Relative Frequency 80% 60% 40% 20% 0% 20 - 29 40 - 49 30 - 39 10 - 19 50 - 59 Ages (years) 0-9 60 - 69

Relative Frequency of Ages (n = 33) Pie Chart


3% 0% 15% 12% 64% 0% 6% 0-9 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59 60 - 69

Comparison of the Results Both samples appear to have a close to symmetric, with a bell shaped appearance and normal distribution. The greatest frequency of ages from the samples is represented by the age range of 20-29 with a relative frequency of 55% from the random sample and 64% from the systematic sample. These are the center of the sample data sets. There is minimal variation between the two data sets, and no apparent outliers. The sample data sets represent the population data, which also has a similar normal distribution with the largest center class of age 20-29 and relative frequency of 54%. Part 3 As a group, we selected Weight as our categorical variable. Using the population data set values the following population mean and population standard deviation were computed: Population (Weight) Mean Standard Deviation 69.15 13.35

As a group, we chose to take two samples (size 30 < n < 35) from our quantitative data set represented by the following simple random sample and systematic sample. Simple Random Sample (n = 33) The simple random sample was obtained utilizing the Excel Data Analysis function for Sampling, inputting the entire population of weights listed in the data set, and generating a random sample of 33 weights. The following sample statistics of sample mean and sample standard deviation were computed. Following this information is a frequency distribution constructed from the sample data as well as the five-number summary used to create a frequency histogram and box plot. Random Sample Weights Mean Standard Deviation 67.96 14.94

Weights (kilograms) 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99 100 - 109 110 - 119 Total Five-Number Summary Minimum Q1 Median Q3 Maximum

Frequency 2 10 9 4 4 3 1 0 33

Relative Frequency 0.060606061 0.303030303 0.272727273 0.121212121 0.121212121 0.090909091 0.03030303 0 1

44.8 56.8 63.9 77.3 102.3

Simple Random Sample Weights (kilograms) Frequency Histogram


12 10 8 6 4 2 0 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99 100 - 109 110 - 119 Weights (kilograms) Frequency 120 100 80 60 40 20 0 Weights

Simple Random Sample Weights (kilograms) Box Plot (n = 33)

Systematic Sample (n = 33) The systematic sample was obtained by using the raw data weight list and sampling every 14th weight from the list. This procedure utilized Excel to assign a number in consecutive order to each weight in the entire population data set. Utilizing the MOD function in excel, each consecutive number in the data set was assigned a MOD number (remainder) after diving the assigned number by 14. The MOD number then was sorted in ascending order, and all weights with the MOD number of 0 were used as the sample set of every 14th weight. The following sample statistics of sample mean and sample standard deviation were computed. Following this information is a frequency distribution constructed from the sample data as well as the five-number summary used to create a frequency histogram and box plot. Systematic Sample Weights Mean Standard Deviation 68.39 12.52
Relative Frequency 0 0.333333333 0.242424242 0.181818182 0.181818182 0.060606061 0 0 1

Weights (kilograms) 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99 100 - 109 110 - 119 Total Five-Number Summary Minimum Q1 Median Q3 Maximum

Frequency 0 11 8 6 6 2 0 0 33

51.8 58 65.9 75.7 94.3

Systematic Sample Weights (kilograms) Frequency Histogram


12 10 Frequency 8 6 4 2 0 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99 100 - 109 110 - 119 Weights (kilograms)

Systematic Sample Weights (kilograms) Box Plot (n = 33)


120 100 80 60 40 20 0 Weights

Comparison of the Results Both samples appear to have a close to bell shaped appearance and close to normal distribution, slightly skewed to the right. The greatest frequency of weights from the samples is represented by the weight range of 50-59 kilograms. The sample means are very close at 67.96 and 68.39 kilograms, and there is little variation in the sample standard deviations of 14.94 and 12.52 kilograms. The sample means and standard deviations are representative of the population mean of 69.15 kilograms and standard deviation of 13.35 kilograms. Part 4 Using the sample mean, standard deviation, and number of ages in the categorical samples from the information in Part 2, 95% confidence intervals were constructed: Simple Random Sample Mean Standard Deviation Number Error Confidence Level Confidence Interval: Ages (years) 25.88 < < 31.92 Systematic Sample Mean Standard Deviation Number Error Confidence Level Confidence Interval: Ages (years) 27.01 < < 33.19 30.1 8.71 33 1.52 95% 28.9 8.52 33 1.48 95%

Using the sample mean, standard deviation, and number of weights in the quantitative samples from the information in Part 3, 95% confidence intervals were constructed: Simple Random Sample Mean Standard Deviation Number Error Confidence Level Confidence Interval: Weights (kilograms) 62.66 < < 73.26 Systematic Sample Mean Standard Deviation Number Error Confidence Level Confidence Interval: Weights (kilograms) 63.96 < < 72.84 68.4 12.53 33 2.18 95%

67.96 14.94 33 2.6 95%

Interpretation of the results: The sample mean is the best point estimate of the population mean, and the purpose of the confidence interval is to construct an interval or range of values that we estimate with a certain level of confidence (in this case 95%) the true value of the population mean. Categorical: We are 95% confident that the interval from 25.88 years to 31.92 years (simple random sample) and the interval from 27.01 years to 33.19 years (systematic sample) do contain the true value of the population mean. The 95% confidence intervals for the categorical values from our samples above did capture the population mean of the ages, which was 30.18 years. Quantitative: We are 95% confident that the interval from 62.66 kilograms to 73.26 kilograms (simple random sample) and the interval from 63.96 kilograms to 72.84 kilograms (systematic sample) do contain the true value of the population mean. The 95% confidence intervals for the quantitative values from our samples above did capture the population mean of the weights, which was 69.15 kilograms. Part 5 Hypothesis test for the population proportion: Claim: The population proportion of ages 20-29 years is greater than 50%. Symbolic form: p > 0.5 H0: p = 0.5 (null) H1: p > 0.5 (alternative) With a simple random sample of n = 33, and np 5 and nq 5, the requirements for the binomial distribution are satisfied. Our simple random sample of 33 people showed the number of ages 20-29 years was 18, a proportion of 54.5%. The test statistic was calculated from the normal sampling distribution test statistic formula, and was found to be z = 0.52, with a P-value of 0.3008. The significance level of 0.05 is chosen. Because the P-value of 0.3008 is greater than the significance 8

level of 0.05, we fail to reject the null hypothesis and conclude that there is not sufficient sample evidence to support the claim that the population proportion of ages 20-29 years is greater than 50%. This conclusion is actually a type II error because in reality the proportion of ages 20-29 from the sample is 55%, so the null hypothesis is actually false and we failed to reject it, so it is a type II error. Hypothesis test for the population mean: Claim: The mean body weight of the population is greater than 70 kilograms. Symbolic form: > 70 kilograms H0: = 70 kilograms (null) H1: > 70 kilograms (alternative) With a simple random sample of n = 33, the requirement for the normal distribution is satisfied. Our simple random sample of 33 people showed the mean weight of sample group is 67.96 kilograms with a standard deviation of 14.94 kilograms. Assuming the population standard deviation is not known for the purpose of this project, the test statistic was calculated from the t sampling distribution test statistic formula, and was found to be t = -0.78, with a P-value of 0.7807. The significance level of 0.05 is chosen. Because the P-value of 0.7807 is greater than the significance level of 0.05, we fail to reject the null hypothesis and conclude that there is not sufficient sample evidence to support the claim that the mean body weight of the population is greater than 70 kilograms. If a type I error had been encountered performing these hypothesis tests, we would have made the mistake of rejecting a true null hypothesis. So, if in reality the proportion of ages 20-29 years was 0.5, and the sample evidence led us to conclude it was > 0.5, that would be a type I error. Likewise, if in reality the mean of the body weight was 70 kilograms, and the sample evidence led us to conclude it was > 70 kilograms, that would be a type I error. Part 6 The term project required for introduction to statistics involved many different concepts learned throughout the semester. This included starting by interpreting and organzing data. Once that was completed we chose a form of quantitative data and categorical data to organize into different forms. This included putting the data onto different types of graphs based on samples and populations. After all of our information was combined and organized we created confidence intervals and ran different hypothesis tests based on our findings. By applying the knowledge I have learned throughout the semester into a project such as this one it allowed for me to learn different concepts and to see how statistics can be used in the real world. The first thing that I learned was how to use the program Excel better. By knowing how Excel works it allowed for me to organize the data we were given, it allowed for me to create different types of graphs such as bar graphs, pie charts, and box plots, and lastly it showed me how different calculations can be done such as the mean, standard, deviation, confidence intervals, etc. I felt the most knowledge came through applying the data and organizing it into different types of graphs. It allowed for me to look at the data differently. It also helped me to understand why certain graphs are used and how effective they can be depending on the setting. The math skills that I have applied and learned will help me through many different courses throughout my school career. By knowing how to use excel, create graphs, and interpret graphs I will be able to present information in a more professional manner. Lastly, the confidence intervals will be helpful to have in the future when trying to determine how confident I am based on information Ive gathered. This project helped change my perspective in regards to using statistics in the real world as it opened my eyes to multiple situations where it would help to know how to organize data, interpret graphs, create confidence intervals, and do hypothesis testing. This particular project focused on different ages and weights. Other data that could have been selected had to do with other tests run in the health fields that could be important to analyze more in the future. I work at a job where gathering data, organizing it, and interpreting it would really help in terms of looking for current trends and helping to improve sales. Because Ive been able to take what Ive learned and applied it to my current job I have become a better and more effiecient employee. Knowing how to do each of the parts we did throughout the project will allow for me to change my perspective in terms of how I view statistics and interpret it in the future. 9

You might also like