Lecture Week 2 Statistics
Lecture Week 2 Statistics
summarising
presenting
and analysing
data, as well as
drawing valid conclusions
3
What is Statistics?
Examples in business where Statistics would be used
Launch of a new product or service
Monitoring sales
4
Learning Outcomes
In this lecture we will cover
Measures of location
Mean
Median
Mode.
Measures of spread
Range
Standard deviation
We will compute these statistics and then focus on interpretation Key idea
5
Data Types - Level of Measurement
There are four levels of measurement
Leve of
Measurements
Quantitative
Categorical /
(Interval vs.
Qualitative
Ratio)
6
Categorical Data
Categorical/Qualitative data (i.e. categories or labels)
The data are not numbers (If they are numbers, then these are just labels)
➢ Nominal Data (Unordered Categorical)
Hair colour
Daily newspaper
7
Quantitative Data
Quantitative data (i.e. Measurements or Counts)
The data are numbers. i.e. the numbers correspond to amounts (quantity)
Ratio scale data that have all the characteristics of interval data
but there is an absolute zero
8
Quantitative Data
Continuous data
Can take any number is a given range
Discrete data
Generally, only whole numbers, Counts
9
Measures of Location
3 Measures of Location
Measures
of Location
10
Measures of Location
Mean:
11
Measures of Location
Median:
The median is
the middle value of an ordered list if number of values is odd
or the mean of the two middle values if the number of values is even
12
Measures of Location
Mode:
It can be found by inspection; sometimes the ‘modal class’ or class with the
highest frequency is quoted for grouped data
13
Data 1: Raw Data Showing Sizes of t – shirts Sold
Size
14
Data 1: Raw Data Showing Sizes of t – shirts Sold
We can summarize the data in the form of frequency table which shows the
count of each size of t – shirt sold
15
Recap: Small 37, Medium 39, Large 26, X Large 18, Total 120
Exercise 1
Small
Medium
Large
X Large
CHECK
16
(Enhanced) Frequency table shows the sizes of t-shirt sold
Size Frequency %
Small 37 31
Medium 39 33
Large 26 22
XLarge 18 15
Grand Total 120 100
17
Pie Chart showing the sizes of t-shirt sold
Small
Medium
Large
XLarge
18
Bar Chart Showing the sizes of T – Shirts Sold
Bar Chart
30.0
25.0
Number of Shirts Sold
20.0
15.0
10.0
5.0
0.0
Small Medium Large Xlarge
Size of Shirts
19
How do we Present data?
For the t-shirts data we have
A frequency table (counts only)
A pie chart
A bar chart
20
Data 2: Waiting times for payment of invoices
Data
21
First Idea: Frequency table showing waiting time for payment
Data
Exercise 2
We need to simplify this. So, group the data into the categories
5 – 9, 10 – 14, 15 – 19, 20 – 24, 25 – 29, 30 – 34, 35 - 39
22
A grouped frequency table showing waiting times for payment
We can see that the modal class is 15 – 19 days as it is the class with the
highest frequency
23
An Excel Chart showing waiting times for payment
Bar Chart
Chart Title
30
25
20
Frequency
15
10
0
5. - 9. 10. - 14. 15. - 19 20. - 24 25. - 29 30. - 34 35. - 39
Number of Days
24
Measures of Location
25
Measures of Location/ Central Tendency
Measures of
Location /
Central
Tendency
26
Measures of Location
Mean:
𝑥1 + 𝑥2 + 𝑥3 + … 𝑥𝑛 σ𝑛
𝑖=1 𝑥𝑖
Mean (‘Average’) = =
𝑛 𝑛
In Excel: = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒(𝑐𝑒𝑙𝑙 𝑟𝑎𝑛𝑔𝑒)
Median:
Arrange the data in order of size and take middle one (if odd) and take
average of the middle two (if even)
𝑛+1
Median =
2
In Excel: = 𝑚𝑒𝑑𝑖𝑎𝑛( 𝑐𝑒𝑙𝑙 𝑟𝑎𝑛𝑔𝑒 )
Mode:
The value which occurs most often or in frequency data the one with highest
frequency
27
Measures of Location
Exercise 3
28
Measures of Location
Exercise 3
29
What is the effect of an extreme value (Outlier)?
Recap
Conclusion
30
Mean for grouped frequency data
Our data is 7 7 8 8 8 9
7 + 7 + 8 + 8+ 8+9
𝑀𝑒𝑎𝑛 = = 7.83 ( 𝑡𝑜 2 𝑑𝑝)
6
The mean could be written
2, 3 and 1 are the frequencies
2 𝑋 7 + 3 𝑋 8 +(1 𝑋 9)
𝑀𝑒𝑎𝑛 = 6
7, 8 and 9 are the x-values
Leading to the formula
σ 𝑓𝑋
𝑀𝑒𝑎𝑛 = Σf is the total frequency
σ𝑓
31
Exercise 4
100 hotel guests fill in a questionnaire. The ratings for food are
x values
f values
(5 = very good, 1 = very poor)
32
Exercise 4
100 hotel guests fill in a questionnaire. The ratings for food are
x values
f values
(5 = very good, 1 = very poor)
Answer 3.66
33
What happen when X is not a single value?
Example:
x values
f values
Solution
34
Mean: Pros & Cons
The mean uses all the data
BUT
35
Median: Pros & Cons
The median is not affected by outliers
BUT
36
Mode: Pros & Cons
The mode is not affected by outliers
BUT
The mode is hardly ever used due to its instability and is not as
representative of the data as the Mean or Median.
37
Measures of Dispersion
38
Measures of Dispersion/Spread
Measures of
Dispersion /
Spread
Standard
Range Quartile Variance
Deviation
39
Measures of Dispersion/Spread: Definitions
Range
The range is the difference between the largest and smallest values
Example
N.B. The range depends only on the two extreme values, so can easily be distorted
40
Measures of Dispersion/Spread: Definitions
Quartiles
The quartiles, together with the median, divide the data set into quarters
Variance
Standard Deviation
41
Standard deviation (sd)
x1 x2 mean x3 x4
42
The population is ALL the items under consideration.
e.g. Annual sales figure for each branch of a business in the UK.
(x − x )
Population standard deviation = 2
n
In Excel: =STDEVP (cell range)
n −1
In Excel: =STDEV (cell range)
43
Standard Deviation: Pros & Cons
The standard deviation uses all the data
Approx. 95% of all values lie within 2 standard deviations of the mean
BUT
44
Example
x x − mean (x − mean)2
3 3 − 5 = −2 (−2)2 = 4
4 4 − 5 = −1 (−1)2 = 1
6 6−5= 1 12 = 1
7 7−5= 2 22 = 4
Sum = 10
( x − x ) 2 10
Sample sd = = = 1.83 (2dp)
n −1 (4 − 1)
45
Comparison
key
concept The greater the spread, the higher the standard deviation
46
Exercise 5
47
Exercise 5
Answer
48
Your calculator and statistical mode
IMPORTANT REMINDER
For a small sample of data you should be able to compute the mean
and standard deviation by using the statistical mode on your
calculator.
49
Inter quartile range (IQR)
median
Lower Upper
quartile quartile
Q1 Q3
50
Example
Find the inter quartile range of
Note: the data are
6, 15, 16, 18, 19, 20, 33 already arranged
in order
Solution
Q3 is the 3( n + 1) th value
4
Since n=7 then Q3 is the sixth value which is 20
IQR = Q3 − Q1 = 20 − 15 = 5
fx
Mean =
f
f ( x − x )2
Standard deviation =
f
or fx2
− (mean)
2
52
Example (from slide 36)
Solution
VERY
for the x f fx fx2 IMPORTANT
x values 15 30 450 6750 fx2 = f.x.x
use the 25 34 850 21250
midpoints 35 12 420 14700
45 4 180 8100
80 1900 50800
continued …
53
Recap ∑f = 80 ∑fx = 1900 ∑ fx2 = 50800
fx 1900
Mean = = = 23.75
n 80
fx2
− (mean)
Standard deviation = 2
− (23.75)
50800 2
=
80
54
Summary
Workshop
55
Global Sustainability
Institution of the Year
International Green Gown Awards 2021