0% found this document useful (0 votes)
10 views

Lecture 3 - Numerical Summary - Part 1

This document discusses descriptive statistics and measures of central tendency. It covers the mean, median, and mode. The mean is the average value and can be calculated from raw data or a frequency table. It is sensitive to outliers. The median is the middle value when data is arranged in order. The mode is the most frequent value. Examples are provided for calculating the mean, median, and mode from raw data and frequency tables. Advantages and disadvantages of each measure are outlined.

Uploaded by

linhmilumilu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lecture 3 - Numerical Summary - Part 1

This document discusses descriptive statistics and measures of central tendency. It covers the mean, median, and mode. The mean is the average value and can be calculated from raw data or a frequency table. It is sensitive to outliers. The median is the middle value when data is arranged in order. The mode is the most frequent value. Examples are provided for calculating the mean, median, and mode from raw data and frequency tables. Advantages and disadvantages of each measure are outlined.

Uploaded by

linhmilumilu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Lecture 4

BUSINESS STATISTICS DESCRIPTIVE STATISTICS:


Advanced Educational Program Numerical summaries

Reading materials:
Chap 4 (Keller)

1 2

1 2

Outline Measure of center and spread


• Measures of center:
- Mean, median, mode
- Selection of measures of location
• Measures of dispersion (spread):
- Range, quartile range, quartile deviation,
variance, standard deviation
• Empirical rule (general case: Chebyshev’s
law)
• Coefficient of skewness
• Coefficient of variation
3 4

3 4

Measures of center Measures of center

• A measure of center or location shows


where the center of the data is
• Three most useful measures of location:
 Arithmetic mean/average
 Median
 Mode

5 6

5 6

1
Arithmetic mean from raw data Arithmetic mean from frequency table
N

X i
• Apply this formula for the sample:
• Arithmetic mean from population:  i 1

N
k
n

x x f i i
• Arithmetic mean from sample:
i
x i 1
x i 1 k
n f
i 1
i

Where: Xi, xi - the value of each item Where: xi - the value of class i
N, n - total number of items fi – frequency of class i

7 8

7 8

Advantages and disadvantages of arithmetic mean Mean is sensitive to outliers

• Advantages:
– Easy to understand and calculate
– Values of every items are included => representative for
the whole set of data
• Disadvantages
– Sensitive to outliers:
Sample: (43; 38; 37; : : : ; 27; 34): => x  33.5
Contaminated sample
(43; 38; 37; : : : ; 27; 1934): => x  71.5

9 10

9 10

Median Calculate median from raw data

 Median is the value of the observation which is • If the data has an odd number of observations:
located in the middle of the data set (n  1)th
– Middle observation:
2
 Steps to find median:
Median  x ( n1)th
1. Arrange the observations in order of size (normally 2
ascending order) • If the data has an even number of observations:
2. Find the number of observations and hence the middle – There are two observations located in the middle and
observation
3. The median is the value of the middle observation M edian  ( x th x th )/2
n n 
   1 
2 2 

11 12

11 12

2
Example Advantages and disadvantages of median

• Advantages:
• E.g1. Raw data: 11, 11, 13, 14, 17 => find median
– Easy to understand and calculate
• E.g 2. Raw data: 11, 11, 13, 14, 16, 17 => find – Not affected by outlying values => thus can be used
median when the mean would be misleading

• Disadvantages
– Value of one observation => fails to reflect the whole
data set
– Not easy to use in other analysis

13 14

13 14

Mode
Example to calculate mode

• Mode is the value which occurs most


frequently in the data set X Frequency

8 3
• Steps to find mode
12 7
1. Draw a frequency table for the data
16 12
2. Identify the mode as the most frequent value 17 8
19 5

15 16

15 16

Mean, median and mode in normal and skewed


Bimodal and multimodal data distributions

Bimodal (two modes) Multimodal (several modes)


17 18

17 18

3
Which measure of centre is best?
• Mean generally most commonly used
• Sensitive to extreme values
• If data skewed/extreme values present, median better, e.g.
real estate prices
• Mode generally best for categorical data – e.g. restaurant
service quality (below): mode is very good. (ordinal)

Rating # customers
Excellent 20
Very good 50
Good 30
Satisfactory 12
Poor 10
Very Poor 6 19

19

You might also like