0% found this document useful (0 votes)
88 views

Numerical Measures HANDOUT With Answers

Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
88 views

Numerical Measures HANDOUT With Answers

Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 8
Statistics Sector 1: Numerical Measures Aims «Tobe able to calculate measures of average and measures of spread. To understand the difference between measure of average and measures of spread. To be able to calculate standard deviation and variance. To understand when different numerical measures are appropriate. To calculate estimates for the mean and standard deviation of grouped data. Introduction Population _ a collection of all possible items (people, objects) about which we wish to gather information Sample —_a portion, or part of the population of interest. One of the major purposes of statistics is to examine samples of data and make inferences about the population from which the samples are drawn A variable — characteristic being considered A parameter is a numerical property of a population and a statistic is a property of a sample. Types of Data ive - when the variable is non numeric hay colow , gerder 1 ekhaic. group Quantitative - when the variable can be reported numerically Discrete can only assume certain values, usually arise from counting something ShOESRE , NUMbe of CAS passig a pou Continuous can assume any value within a specific range hegit weg bemperawve Wher coahnuass of disoeke ? - does age Fo Example 1 Packets of a particular type of sweet are known to have a mean of 100 grams. The number of sweets in & packet is approximately 30 and the sweets come in any one of five flavours. The weight of 50 packets are taken and the mean is found to be 98.3 grams. From the above passage, identity; apopaion All packers of Hvis parhostar Type af sweik aparameter Poputalia) mean of (GO9GmS a sample SO padiats a qualitative variable (QYGUU> a continuous variable \) @ QVC adicretevarable omyer of SW ears seeaoce Measures of Average ‘+ The mode or modal value is the value that occurs most frequently. There can be more than one ‘mode or it may not exist. + The median is the middle value of ordered data, itis the ("=*)th term. *+ The (arithmetic) mean or average is the sum of the values divided by the number of values (# =), Remember the mean may be a value that cannot occur such as x = 2.43 children. The ‘sample mean is denoted # and the population mean is denoted by 1. Measures of Spread + The range is the difference between the highest and the lowest values. + The interquartile range (IR) is the difference between the upper and lower quartiles, Q; ~ 0, © The lower quartile, Q,, is the median of the ordered values to the left of the median or the (q+ 1))th term. © The upper quartile, Qs, is the median of the ordered values to the right of the median or the (a+ 1))tnterm © The median is also the second quartile, ‘+ The standard deviation (where the sample standard deviation is denoted s and the population standard deviation is denoted by 0) and the variance (where the sample variance is denoted s? ‘and the population variance is denoted by «”) are measures of the average deviation of the values from their mean. Example 2 Ata doctors surgery they record the number of patients who are late for appointments each week. The records for the first 12 weeks are recorded below. 14 23 18 37 a 21 16 b 32 28 19 26 Unfortunately on two of the weeks the number of lates was incorrectly recorded, however they do know that a<12and b > 40. Calculate the median and IQR of the 12 values, Ga Ie iu | 1g 14 Qt {23 26 28]32 37 b | To do this on a graphical cen Qo Qs calculator you must enter any number less than 12 for a and Maedcar=22 any number bigger than 40 for b. Qy= 30 Qe!t 2 3G-IT21% Example 3 ‘The following table gives the number of complaints recorded by a telecoms company on each day for a period of 50 days, Number of Days 115|7\|14/ 9 [10 cE © 13 27 36 te tT a) Caloulate the range, IQR and mode of the data ) Calculate the mean and standard deviation of the data. a a) Range = 1-G = (7 conplamrs b) Mua = $ a Mode = 3 caomplauts S=sd = 1-85 G- Otih 2 Sorl = 1267S" value = 2 we Q3 = 3(ar™ = S(saeD = 36-28" value = S cs a 202 ©-2= Standard Deviation and Variance | @2= S-2= 3 Number of Complaints [0[1[2]3]4[5|6/7 3 0 ce ale The standard deviation is a measure of the average deviation of the values from their mean. The formula depends on whether the value represents @ population or a sample. A sample is a set of values selected from the population. Standard deviation is based on the sum of squares so should never be negative Standard Devi: jion and Variance of a Sample For random sample of n values, Xj, Xz, ...Mn, the standard deviation, denoted by 5, is given as The sample variance, S°, is the square of the sample standard deviation. 5? is an unbiased estimator of 7” This is given in the formulae booklet as: For a random sample V.¥;......X, of » independent observations from a distribution having sean yr and v tance o y is an unbiased estimator of o° . where s* ===" Remember: Standard deviation = Yvariance or variance = standard deviation® Example 4 Find the variance and standard deviation of the number of children per family from the data below: Number of Children [0 [7 [2[3 [4 [51/6 | Number of Families | 9|4]|6|5|2|0/1) sso) Stcudard dao = S=l vararc = s*= (I 59326) = 2+S% (3sf) Example § ‘The response time, X minutes, is the time it takes between an emergency call being answered and the ambulance arriving at the scene, The value of X was recorded on a random sample of 50 occasions. The results are summarised below, where x donates the sample mean. Jesu Ya-nt=ae Find the values for the mean and standard deviation of this sample of 50 response times, Zoe = Fb = GY2E S*- 2(x-&F 6. [4262 © ant 4S = 0-933 (386 ) Example 6 The mean blood cholesterol level of the adult residents of a particular country has been found to be §.8. milimoles per litre. Monica is a researcher who believes that daily consumption of yogurt can reduce blood cholesterol level. She selected a sample of 80 residents who consumed yogurt daily and measured the blood cholesterol level, X, of each resident obtaining the following results, Dx= 452.8 and Dx? = 25964 Find the values for the mean and standard deviation of this sample of 80 residents. Comment on the values obtained. ———3. > pen PI 7 2 r 2 Se = Ze s- | 2x-Exy —- [rs96-4 - (4528) 22. 2 4528 ao V a Se 7 Estimating the Mean and Standard Deviation ‘A grouped frequency distribution is a list of continuous classes together with their corresponding frequencies. As we do not know the exact values of all the data we can only estimate the mean and standard deviation using the midpoints of each group. Example 7 A class of 35 pupils all complete a small puzzle as an aptitude test. Each pupil was timed to the nearest second and te mes are recorded boiow Midpart — (Timer complete pute ey | Frequency 3G 205x<40 6 Remember you 20. | _tsxcao __{ 6 need io ener the sus [-__sosxess 7 midpoints in the x | aes ers | 55 BA02sS FOr SAWS = 23-4 - ers Maan = 2 Choice of Numerical Measures Mode Advantages + Easy tofind ‘+ Can be used with numerical or non- numerical data Disadvantages ‘+ May not be unique or may not exist. For example if there are two modes such as 2 and 17 it would be inappropriate to use this as a measure of average Difficult to estimate for grouped data The value may be unrepresentative especially if itis zero or the data has a wide range Hint: You are often asked to explain why the mode is NOT appropriate in exam questions. Median ‘Advantages * Can sometimes be found when some of the data is missing. ‘+ Useful when there are outliers (unusually smalilarge values). Mean ‘Advantages * Takes into account all the values. * Provides a basis for further analysis. Disadvantages + Difficult to estimate for grouped data Disadvantages + Ifthe sample is small it may be unduly affected by outlier or incorrect values. + Can only be calculated if you know alll the data Hint: As a general rule statisticians favour the use of the mean as a measure of average but if itis not appropriate will use the mi appropriate measure of average in exam questions. Range ‘Advantages «Easy to calculate Interquartile Range ‘Advantages, + Can sometimes be found when some of the data is missing. + Useful when there are outliers (unusually smaililarge values). ‘Standard Deviation ‘Advantages ‘+ Takes into account all the values. ‘* Provides a basis for further analysis, n. Very rarely will they use the mode; this is nearly always the least Disadvantages ‘+ Depends only on the extreme values ‘+ Not appropriate for large data sets. + Cannot be calculated if either of the extreme values are unknown Disadvantages ‘+ Difficult to estimate from grouped data. Disadvantages + Ifthe sample is small it may be unduly affected by outlier or incorrect values. ‘+ Can only be calculated if you know all the data ‘+ Difficult to calculate from large data sets. Hint: As a general rule statisticians favour the use of the standard deviation as a measure of spread but if itis not appropriate they use the IQR or for small sets (n < 10) with no outliers, the range. Example 10 The length of time that customers are put on hold, in minutes, is recorded by a call centre for one hour, the results were as follows: 19903553020a10128 The value of a is unknown. Give a reason why, for these values; a) The mode is not an appropriate measure of average Mode vy Ze So Warpenentalure fF Ge doko b) The standard deviation is not an appropriate measure of spread. Thee © an urknood vol so Staded daschin Cork be cateuated - Example 11 ‘The table below shows an extract from the Purchased quantities of household food & drink survey by Government Office Region and Country published by DEFRA in 2015 Purchased quantities of household food & drink by Government Office Region and Country ‘Averages per person per week Region Description Units 2006 2007 2008 2009 2010 2011 2012 2013 2014 South East Bread 9 667 664 632 636 584 569 S75 565 529 West Midlands Bread 9 796 738 705 713 684 674 688 648 610 ‘Comment on the average consumption of bread over time and by region, The auvage — Consempviay o bread per pusM PY werk has decreaawd ger tma fe both He costh east ad the West MUCS Other areas in the survey include London, East Midlands, North East, North West, Yorkshire and Humber, East, South West. What results would you expect from these areas? Have a look at the data set in Excel and see if your predictions are correct. Linear Scaling Linear scaling is most often used when the units change or if a question gives you the information involving differences. mean(ax +5) = amean(x) +b ) SD(aX +b) = a SD(X) Var(ax + b) = a? Var(X) Ifa number, b, is added to each piece of data then the mean increases by b but the standard deviation and variance remains unchanged. If each piece of data is multiplied by a, the mean and standard deviation are both multiplied by « (the variance is multiplied by a?). Example 12 ‘The time, x seconds, in excess of 30 seconds, which a sample of buses waits at the bus stop, have a mean 18 and standard deviation 15. Find the mean and standard deviation of the times that this sample of buses actually waits at the stop, Actwok mean = 18 Mian = (BEBO =4S Sais sa US Past Exam Question 2 Before leaving for a tour of the UK during the summer of 2008, Eduardo was told that the UK price of a 1.5-litre bottle of spring water was about 5Op. Whilst on his tour, Eduardo noted the prices, x pence, which he paid for 1.5-litre boitles of spring water from 12 retail outlets. He then subtracted 50p from each price and his resulting differences, in pence, were “18 -11 1 15 7 -1 17 -16 18 -3 0 9 (2) (i) Calculate the mean and the standard deviation of these differences (2 marks) 1226 ition of the prices, x pene, paid by (ii) Hence calculate the mean and the standard de Eduardo, (2 marks) SEzSiSp se l2-2ep Camas) (b) Based on an exchange rate of €1.22 to £1, calculate, in curos, the mean and the standard deviation of the prices paid by Eduardo. 13 marks) L\= €122 wz foesis =~ f£ Gi226 x - . st = £0-68 S= EGls VE

You might also like