0% found this document useful (0 votes)
15 views

Lecture Week 2 Statistics

Mean = (15 + 15 + 17 + 19 + 44) / 5 = £20,800 Median = 17 (middle value) Mode = 15 (value that occurs most often) CHECK 29 Measures of Spread  Measures of spread tell us how spread out or dispersed the data are  Common measures of spread are:  Range  Standard Deviation  Interquartile Range (IQR)  These give an indication of variability or dispersion in the data 30 Range  Range is the difference between the highest and lowest values

Uploaded by

smkamran.mba
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Lecture Week 2 Statistics

Mean = (15 + 15 + 17 + 19 + 44) / 5 = £20,800 Median = 17 (middle value) Mode = 15 (value that occurs most often) CHECK 29 Measures of Spread  Measures of spread tell us how spread out or dispersed the data are  Common measures of spread are:  Range  Standard Deviation  Interquartile Range (IQR)  These give an indication of variability or dispersion in the data 30 Range  Range is the difference between the highest and lowest values

Uploaded by

smkamran.mba
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Keele University

Keele University International College


Welcome!
Lecture Week 2: Statistics
What is Statistics?
 Statistics is concerned with scientific methods for
 collecting

 summarising

 presenting

 and analysing

 data, as well as
 drawing valid conclusions

 and making reasonable decisions

 on the basis of such analyses


(Spiegel 1992)

3
What is Statistics?
 Examples in business where Statistics would be used
 Launch of a new product or service

 Monitoring sales

 Measuring customer satisfaction

 Trying to solve a quality problem

4
Learning Outcomes
In this lecture we will cover

 Summarising raw data using Excel

 Data 1: qualitative data (t-shirt sizes)

 Data 2: quantitative data (waiting times for payment)

 Measures of location

 Mean

 Median

 Mode.

 Measures of spread

 Range

 Standard deviation

 Inter quartile range

We will compute these statistics and then focus on interpretation Key idea
5
Data Types - Level of Measurement
There are four levels of measurement

Leve of
Measurements

Quantitative
Categorical /
(Interval vs.
Qualitative
Ratio)

Nominal Ordinal Discrete Continuous

6
Categorical Data
 Categorical/Qualitative data (i.e. categories or labels)

 The data are not numbers (If they are numbers, then these are just labels)
➢ Nominal Data (Unordered Categorical)

 Hair colour

 Daily newspaper

➢ Ordinal Data (Ordered Categorical)

 none, mild, moderate, severe (pain)

 Degree class: First, 2:1, 2:2, third etc.

7
Quantitative Data
 Quantitative data (i.e. Measurements or Counts)

 The data are numbers. i.e. the numbers correspond to amounts (quantity)

 Interval scale only represents Quantitative data, numerical values


 No absolute zero values

 negative amounts have a meaning


 For example; Temperature is measured on an interval scale as -5°C is a meaningful number

 Ratio scale data that have all the characteristics of interval data
 but there is an absolute zero

 It is impossible to have negative numbers


 For example; Weight is measured on a ratio scale as you cannot have a negative weight.

8
Quantitative Data
 Continuous data
 Can take any number is a given range

 Example: Height, weight, age

 Discrete data
 Generally, only whole numbers, Counts

 Example: shoe size, price

9
Measures of Location
 3 Measures of Location

Measures
of Location

Mean Median Mode

10
Measures of Location
Mean:

 The mean, also called the ‘average’


 is found by adding the values and dividing the total by the number of values

 Suitable for quantitative data only

11
Measures of Location
Median:

 The median is
 the middle value of an ordered list if number of values is odd

 or the mean of the two middle values if the number of values is even

 Suitable for ordinal and quantitative data

12
Measures of Location
Mode:

 This is the most frequently occurring number

 It can be found by inspection; sometimes the ‘modal class’ or class with the
highest frequency is quoted for grouped data

 Suitable for categorical and quantitative data

13
Data 1: Raw Data Showing Sizes of t – shirts Sold
 Size

14
Data 1: Raw Data Showing Sizes of t – shirts Sold
 We can summarize the data in the form of frequency table which shows the
count of each size of t – shirt sold

15
Recap: Small 37, Medium 39, Large 26, X Large 18, Total 120

Exercise 1

Work out the percentage (to 1dp) for each size.

Small

Medium

Large

X Large

CHECK

16
(Enhanced) Frequency table shows the sizes of t-shirt sold

Size Frequency %
Small 37 31
Medium 39 33
Large 26 22
XLarge 18 15
Grand Total 120 100

The modal class is Medium

17
Pie Chart showing the sizes of t-shirt sold

Small
Medium
Large
XLarge

18
Bar Chart Showing the sizes of T – Shirts Sold
 Bar Chart

Number of Shirts Sold


35.0

30.0

25.0
Number of Shirts Sold

20.0

15.0

10.0

5.0

0.0
Small Medium Large Xlarge
Size of Shirts

19
How do we Present data?
 For the t-shirts data we have
 A frequency table (counts only)

 A frequency table (counts and percentages)

 A pie chart

 A bar chart

 We must decide which one best meets our purposes.

 N.B. In some cases a simple sentence might be best.

20
Data 2: Waiting times for payment of invoices
 Data

21
First Idea: Frequency table showing waiting time for payment
 Data

 Exercise 2
 We need to simplify this. So, group the data into the categories
 5 – 9, 10 – 14, 15 – 19, 20 – 24, 25 – 29, 30 – 34, 35 - 39

22
A grouped frequency table showing waiting times for payment

 We can see that the modal class is 15 – 19 days as it is the class with the
highest frequency

23
An Excel Chart showing waiting times for payment
 Bar Chart

Chart Title
30

25

20
Frequency

15

10

0
5. - 9. 10. - 14. 15. - 19 20. - 24 25. - 29 30. - 34 35. - 39
Number of Days

24
Measures of Location

25
Measures of Location/ Central Tendency

Measures of
Location /
Central
Tendency

Mean Median Mode

26
Measures of Location
Mean:
𝑥1 + 𝑥2 + 𝑥3 + … 𝑥𝑛 σ𝑛
𝑖=1 𝑥𝑖
 Mean (‘Average’) = =
𝑛 𝑛
 In Excel: = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒(𝑐𝑒𝑙𝑙 𝑟𝑎𝑛𝑔𝑒)

Median:
 Arrange the data in order of size and take middle one (if odd) and take
average of the middle two (if even)
𝑛+1
 Median =
2
 In Excel: = 𝑚𝑒𝑑𝑖𝑎𝑛( 𝑐𝑒𝑙𝑙 𝑟𝑎𝑛𝑔𝑒 )

Mode:
 The value which occurs most often or in frequency data the one with highest
frequency

27
Measures of Location
 Exercise 3

 In a small business the salaries (in £000s) are


Note: the data are
15, 15, 17, 19, 44 already arranged
in order

 Find the mean, median and mode

28
Measures of Location
 Exercise 3

 In a small business the salaries (in £000s) are


Note: the data are
15, 15, 17, 19, 44 already arranged
in order

 Find the mean, median and mode

 Answer mean = 22, median = 17, mode = 15

29
What is the effect of an extreme value (Outlier)?
 Recap

 15, 15, 17, 19, 44 mean = 22 median = 17

 15, 15, 17, 19, 54 mean = 24 median = 17

 15, 15, 17, 19, 64 mean = 26 median = 17

 Conclusion

 The effect of the extreme value is to distort the mean

 The median is unchanged

30
Mean for grouped frequency data

 Our data is 7 7 8 8 8 9

7 + 7 + 8 + 8+ 8+9
𝑀𝑒𝑎𝑛 = = 7.83 ( 𝑡𝑜 2 𝑑𝑝)
6
 The mean could be written
2, 3 and 1 are the frequencies
2 𝑋 7 + 3 𝑋 8 +(1 𝑋 9)
𝑀𝑒𝑎𝑛 = 6
7, 8 and 9 are the x-values
 Leading to the formula

σ 𝑓𝑋
𝑀𝑒𝑎𝑛 = Σf is the total frequency
σ𝑓

31
Exercise 4
 100 hotel guests fill in a questionnaire. The ratings for food are
x values
f values
(5 = very good, 1 = very poor)

 Calculate the mean rating Mean = Σfx


Σf

 CHECK: Does the answer look sensible?

32
Exercise 4
 100 hotel guests fill in a questionnaire. The ratings for food are
x values
f values
(5 = very good, 1 = very poor)

 Calculate the mean rating Mean = Σfx


Σf

 Answer 3.66

 CHECK: Does the answer look sensible?

33
What happen when X is not a single value?
 Example:

x values
f values

 This is a frequency table showing salaries

 Calculate an estimate of the mean.

 Solution

 Use the midpoints as the x values i.e. 15, 25, 35, 45

σ 𝑓𝑋 (30 𝑋 15) + (34 𝑋 25) + (12 𝑋 35) + (4 𝑋 45)


 𝑀𝑒𝑎𝑛 = σ𝑓
= = 23.75
80

 CHECK: Does the answer look sensible?

34
Mean: Pros & Cons
 The mean uses all the data

BUT

 The mean is affected by extreme values

 The mean can only be calculated for quantitative data

 The mean is not usually one of the data values

35
Median: Pros & Cons
 The median is not affected by outliers

 The median can be calculated for quantitative and ordinal data

 The median is usually one of the data values

BUT

 The median does not use all the data

36
Mode: Pros & Cons
 The mode is not affected by outliers

 The mode can be calculated for quantitative and categorical data

 The mode is always one of the data values.

BUT

 The mode is hardly ever used due to its instability and is not as
representative of the data as the Mean or Median.

37
Measures of Dispersion

38
Measures of Dispersion/Spread

Measures of
Dispersion /
Spread

Standard
Range Quartile Variance
Deviation

39
Measures of Dispersion/Spread: Definitions
Range
 The range is the difference between the largest and smallest values

 Suitable for ordinal and quantitative data


𝑅𝑎𝑛𝑔𝑒 = 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑉𝑎𝑙𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒

 Example

 Set 1,3, 4, 6, 7 Range = 7 − 3 = 4


 Set 2 1, 4, 6, 9 Range = 9 − 1 = 8

 N.B. The range depends only on the two extreme values, so can easily be distorted

 e.g. 12, 53, 54, 55, 57, 58, 58, 59, 98

40
Measures of Dispersion/Spread: Definitions
Quartiles

 The quartiles, together with the median, divide the data set into quarters

 Suitable for ordinal and quantitative data

Variance

 The variance is an average of the differences about the mean

 Suitable for quantitative data only

Standard Deviation

 The standard deviation is simply the square root of the variance

41
Standard deviation (sd)

x1 x2 mean x3 x4

We base our measure on the squares of the distances from the


mean.

Population standard deviation = ( x − mean) 2


n

42
The population is ALL the items under consideration.
e.g. Annual sales figure for each branch of a business in the UK.

 (x − x )
Population standard deviation = 2

n
In Excel: =STDEVP (cell range)

The sample is just SOME items under consideration.


e.g. Annual sales figure for branches in towns beginning with C.
(Cambridge, Cardiff, Chelmsford, Chester, etc)

Sample standard deviation =


 ( x − x )2

n −1
In Excel: =STDEV (cell range)
43
Standard Deviation: Pros & Cons
 The standard deviation uses all the data

 Approx. 95% of all values lie within  2 standard deviations of the mean

BUT

 The standard deviation can only be calculated for quantitative data

 The standard deviation can be affected by outliers

44
Example

Calculate the sample standard deviation of 3, 4, 6, 7.

Solution key word

The mean is 5. Use this in the following table.

x x − mean (x − mean)2
3 3 − 5 = −2 (−2)2 = 4
4 4 − 5 = −1 (−1)2 = 1
6 6−5= 1 12 = 1
7 7−5= 2 22 = 4
Sum = 10

( x − x ) 2 10
Sample sd = = = 1.83 (2dp)
n −1 (4 − 1)
45
Comparison

Set 1 3, 4, 6, 7 has mean = 5 and sample sd =1.83

Set 2 1, 4, 6, 9 has mean = 5 and sample sd = 3.37

Each sample has a mean of 5


Set 1 has less variability than Set 2. i.e. It is more consistent

key
concept The greater the spread, the higher the standard deviation

46
Exercise 5

A sample of 9 interviewees complete a task in the


following times (in minutes).

7, 8, 8, 9, 10, 12, 14, 17, 23.

Calculate the mean and standard deviation. USE YOUR


CALCULATOR.

You should use your calculator in statistical mode.


(It is far quicker than using the formulae!)

47
Exercise 5

A sample of 9 interviewees complete a task in the


following times (in minutes).

7, 8, 8, 9, 10, 12, 14, 17, 23.

Calculate the mean and standard deviation. USE YOUR


CALCULATOR.

Answer

You should use your calculator in statistical mode.


(It is far quicker than using the formulae!)

Mean = 12 Sample sd (using σn−1) = 5.24 (2dp)

48
Your calculator and statistical mode

IMPORTANT REMINDER

For a small sample of data you should be able to compute the mean
and standard deviation by using the statistical mode on your
calculator.

A set of instructions for some of the more commonly used University


approved Casio calculators can be found online

49
Inter quartile range (IQR)

25% 25% 25% 25%

median

Lower Upper
quartile quartile
Q1 Q3

50
Example
Find the inter quartile range of
Note: the data are
6, 15, 16, 18, 19, 20, 33 already arranged
in order
Solution

Q1 is the n + 1th value.


4
Since n=7 then Q1 is the 2nd value which is 15.

Q3 is the 3( n + 1) th value
4
Since n=7 then Q3 is the sixth value which is 20

IQR = Q3 − Q1 = 20 − 15 = 5

Note: definitions of quartiles do vary slightly (e.g. In Excel)


but you should use the definitions given here in this module
51
Mean and standard deviation from a frequency table

Here are the formulae.

 fx
Mean =
f

 f ( x − x )2
Standard deviation =
f

or  fx2
− (mean)
2

52
Example (from slide 36)

Salary (K) 10-20 20-30 30-40 40-50 x values


Frequency 30 34 12 4 f values

Calculate an estimate of the mean and standard deviation.

Solution
VERY
for the x f fx fx2 IMPORTANT
x values 15 30 450 6750 fx2 = f.x.x
use the 25 34 850 21250
midpoints 35 12 420 14700
45 4 180 8100
80 1900 50800
continued …

53
Recap ∑f = 80 ∑fx = 1900 ∑ fx2 = 50800

 fx 1900
Mean = = = 23.75
n 80

 fx2
− (mean)
Standard deviation = 2

− (23.75)
50800 2
=
80

= 8.42 (to 2dp)

54
Summary

We can summarise raw data using Excel.

We can calculate appropriate measures of location and spread and use


these to compare data sets.

Workshop

In the workshop you can start work on Workshop 0 Introduction and


Workshop 1 Statistics which is on Moodle.

You should complete the work soon after the workshop.

Next week we will cover Probability and also complete an Introduction


to using Excel.

55
Global Sustainability
Institution of the Year
International Green Gown Awards 2021

Information Classification: Restricted

You might also like