Chapter 03. Numerical Measures
Chapter 03. Numerical Measures
Descriptive Statistics:
Numerical Measures
Content of Chapter 3
2
Measures of central tendency
• Measures of central tendency – Độ đo xu hướng
trung tâm /vị trí trung tâm
What does “center location” mean? How to
•
understand it correctly!
3
Measures of central tendency
What does “center location” mean? How to
understand it correctly!
4
Measures of central tendency
What does “center location” mean? How to
understand it correctly!
5
3.1 Measures of central tendency -1
These figures indicate that you can see where (a typical
value) most values tend to occur.
6
Measures of central tendency -2
• Mean
• Weighted Mean (Trung bình có trọng số)
• Median (trung vị)
• Mode (Yếu vị)
• Percentiles (Phân vị)
• Quartiles (Tứ phân vị)
7
Measure of mean -1
8
Measure of mean -2
When dataset is a grouped-data:
9
Mean - Grouped data -3
*
x min
(i )
+ x max
(i )
x =
i
2
10
Measure of mean -4
Example 1:
The number of new orders received by a company
over the last 25 working days were recorded as
follows:
3, 0, 1, 4, 4, 4, 2, 5, 3, 6, 4, 5, 1, 4, 2, 3, 0, 2, 0, 5,
4, 2, 3, 3, 1 sample
ANS:
11
Measure of mean -5
Example 1 (cont):
The number of new orders received by a company
over the last 25 working days were recorded as
follows:
12
Weighted Mean -1
13
Weighted Mean -2
• Construction Wages:
Worker (wage- amount of
$/hour) time(hours)
Carpenter 21.60 520
14
Weighted Mean -2
• Sol:
worker *
Carpenter 21.60 520 11232.0
1590 31873.7
FYI, equally-weighted (simple) mean = $21.21
15
Weighted Mean -3
• Sol:
16
Mean - Grouped data -1
Suppose we are given a frequency distribution summarizing
a sample of 65 customer satisfaction ratings for a consumer
product. Determine sample mean.
17
Mean - Grouped data -2
( ∗
Satisfaction Frequency Class midpoint i i
rating (i ( i∗)
( i)
36-38 4 37 4(37)=148
39-41 15 40 600
42-44 25 43 1075
45-47 19 46 874
48-50 2 49 98
= 43.
18
Mean - Grouped data -3
Example 2: Marks obtained by 50 students are given
19
Discussion - measure of mean -1
Discussion: the mean value is located at the
center.
20
Discussion - measure of mean -2
Discussion: The mean doesn’t always locate the center
of the data accurately. Observe the histograms below
where we display the mean in the distributions.
21
Median -1
Median: The middle value in an ordered data set.
22
Median -2
Step 3: Determine the median. There are two
cases:
If n is odd then, n has the form of n =
2*m+1guess value of m. Then the median
is the value at the (m+1)th position in the
ordering.
If n is even then, n has the form of n = 2*m.
guess value of m. The median is the
average of the values at position mth and
(m+1)th in the ordering data.
23
Median -3
Example 1:
26 18 27 12 14 27 19 7 observations
12 14 18 19 26 27 27 in ascending order
Median = 19
24
Median -4
Example 2:
• For an even number of observations:
26 18 27 12 14 27 30 19 8 observations
12 14 18 19 26 27 27 30 in ascending order
25
Mode -1
The third measure of the central tendency of a
population/sample is the mode, which is denoted .
Mode is the value that occurs with the highest
frequency.
26
Mode -2
What is mode?
27
Mode -3
If only one data value occurs with the greatest frequency, the data
set is unimodal. For instance,
28
Mode -4
1, 3, 3, 3, 4, 4, 6, 6, 6, 9.
• Mode = 3 or 6.
29
Mode -5
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
3, 0, 1, 4, 4, 4, 2, 5, 3, 6, 4, 5, 1, 4, 2, 3, 0, 2, 0, 5,
4, 2, 3, 3, 1
31
Examples
32
Comparison of the Mean, Median, and
Mode -7
ANS1: Hint:
Ordered data:
The median value:
Explain:
33
Comparison of the Mean, Median, and
Mode -8
ANS 2: Hint:
The mode value:
Explain:
34
Comparison of the Mean, Median, and
Mode -1
35
Comparison of the Mean, Median, and
Mode -2
36
Comparison of the Mean, Median, and
Mode -3
37
Comparison of the Mean, Median, and
Mode -4
The median may be a better indicator of the most
typical value if a data set has an outlier. An
outlier is an extreme value that differs greatly
from other values.
38
Comparison of the Mean, Median, and
Mode -5
Example 1: Suppose that in a small town of 50 people, one
person earns $5,000,000 per year and the other 49 each earn
$30,000. Which is the better measure of the “center”: the
mean or the median?
39
Percentile (Phân vị) -1
40
Percentile -2
41
Percentile -3
Step 1: order the dataset from smallest to
largest.
42
Percentile -4
Example: Researcher has obtained the number of
hours worked per week during the summer for a
sample of fifteen students.
40 25 35 30 20 40 30 20 40 10 30 20 10 5
20
43
Percentile -5
Step 1: ordering data: 5 10 10 20 20 20 20 25 30 30 30 35 40
40 40
44
Percentiles -6
• Example: The 85th percentile for the starting
salary data
– Step 1. Arrange the data in ascending order. 3710,
3755, 3850, 3880, 3880, 3890, 3920, 3940, 3950,
4050, 4130, 4325.
– Step 2.
– Step 3.
45
80th Percentile
• Example: Apartment Rents
i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
47
Quartiles -1
48
Quartiles -2
50
Quartiles -4
• Example: Apartment Rents
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
51
3.2 Measures of Variability (-1)
To introduce the idea of variability, consider this
example. The following data are recorded:
Dataset01: 1, 2, 3, 3, 5, 4.
mean = 3, median = 3, mode = 3.
Dataset02: 2, 3, 3, 3, 3, 4.
mean = 3, median = 3, mode = 3.
Their dotplots:
They have the same center, but what about their spreads?
Which one has more spread?
52
3.2 Measures of Variability (-2)
53
3.2 Measures of Variability (-3)
54
Measures of Variability (-4)
55
Measures of Variability (-5)
56
Range (-1)
57
Range (-2)
• Example: The starting salary data
3710, 3755, 3850, 3880, 3880, 3890, 3920, 3940, 3950, 4050,
4130, 4325.
– Data: 3510, 3755, 3850, 3880, 3880, 3890, 3920, 3940, 3950,
4050, 4130, 4825.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
62
Variance and standard deviation (-2)
The variance is the average of the squared differences
between each data value and the mean.
s s 2 2
for a sample for a population
63
Variance and Standard Deviation -3
s 2
= Σ x i2 – nx 2
n–1
64
Coefficient of Variation
• The coefficient of variation (CV) is a measure of relative
variability. It is the ratio of the standard deviation to the
mean.
s
CV 100%
x
65
Distribution Shape: Skewness -1
• An important measure of the shape of a
distribution is called skewness.
3(x – Median)
Skewness =
s
66
Distribution Shape: Skewness -2
67
Distribution Shape: Skewness -3
• Moderately Skewed Left
– Skewness is negative.
– Mean will usually be less than the median.
Skewness = – 0.31
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
60
Distribution Shape: Skewness -4
.35
Skewness = 1.25
.30
Relative Frequency
.25
.20
.15
.10
.0
50
61
Distribution Shape: Skewness -5
.35 Skewness = 0
.30
Relative Frequency
.25
.20
.15
.10
.05
0
62
Boxplot -1
We can use box plot for presenting quantitative data.
Box plot divides the data into sections that each contain
approximately 25% of the observations in that set.
63
Boxplot -2
64
Boxplot -3
66
Boxplot -4
How to draw a boxplot?
Step 1: Calculate the three number summary { Q1; Q2; Q3}.
Step 2: Detect outliers.
Step 3: A box plot can be presented horizontally or vertically.
In a box plot, we draw:
the line (front whisker) goes from minimum (except outlier)
to Q1 and the line (back whisker) goes from Q3 to
maximum (except outlier).
a box from Q1 to Q3.
A vertical/horizontal line goes through the box at the
median.
67
Boxplot -5
Example: Consider the data: 27, 89, 63, 61, 78, 87, 74, 72,
54, 88, 62, 81, 78, 73, 63, 56, 83, 86, 83, 93.
67
Boxplot -6
The shape of a box plot will show if a distribution
of the dataset is normally distributed or skewed.
When the median is in the middle of the box,
and the whiskers (râu) are about the same on
both sides of the box, then the distribution is
symmetric.
67
Boxplot -7
When the median is closer to the bottom of the box,
and if the whisker is shorter on the lower end of the
box, then the distribution is positively skewed
(skewed right).
67
Boxplot -8
When the median is closer to the top of the box,
and if the whisker is shorter on the upper end of
the box, then the distribution is negatively
skewed (skewed left).
67
End of Chapter 3,
69