Lecture 3 Statistics
Lecture 3 Statistics
Data Description
1. Measurement of
“Central Tendency”
2. Measurement of
“Variation”
3. Measurement of
“Position”
4. Distribution shape
5. Exploratory Data
Measurement of “Central
Tendency”
Þ Mean or arithmetic average
Þ Median
Þ Mid-range
Þ Mode
Þ Weighted Mean
Þ Other mean: harmonic mean, the
geometric mean, and the quadratic
mean
The Mean (arithmetic average)
X X ... + X
X= 1 2 n
n
X
.
n
The Mean (arithmetic average)
( f X )
X= .
n
Here f is the frequency for the
corresponding value of X , and n = f .
The Mean (arithmetic average)
For ungrouped Frequency Distribution
T h e s c o r e s fo r 2 5 s tu d e n ts o n a 4 p o in t
q u iz a r e g iv e n in th e ta b le . F in d th e m e a n s c o r e .
Score (X) Frequency (f) f.x
0 2
1 4
2 12
3 4
4 3
=
The Mean (arithmetic average)
For a grouped Frequency Distribution
å( f ×X )
X = m
.
n
Here X is the
m
ing
correspond
class midpoint.
The Mean (arithmetic average)
For grouped Frequency Distribution
T a b le w ith c la s s m id p o in ts , X .
m
=
The Mean (arithmetic average)
For a grouped Frequency Distribution
The Median
The median is the midpoint of the data array. The
symbol for the median is MD.
When there is an even number of values in the data
set, the median is obtained by taking the average of
the two middle numbers.
Steps in computing the median of a data array:
o Step 1 Arrange the data in order.
o Step 2 Select the middle point.
Ex 1: The number of children with asthma during a specific year in seven
local districts is shown. Find the median. 253, 125, 328, 417, 201, 70, 90
Ex 2: Six customers purchased these numbers of magazines: 1, 7, 3, 2, 3, 4.
Find the median.
Ex 3: Find the median for the daily vehicle pass charge for five U.S. National
Parks. The costs are $25, $15, $15, $20, and $15.
The Median
For an ungrouped Frequency Distribution
15.5 - 20.5 3
20.5 - 25.5 5
25.5 - 30.5 4
30.5 - 35.5 3
35.5 - 40.5 2
15.5 - 20.5 3 3
20.5 - 25.5 5 8
25.5 - 30.5 4 12
30.5 - 35.5 3 15
35.5 - 40.5 2 17
The Mode
The value that occurs most often in a data set is called the mode.
A data set that has only one value that occurs with the greatest frequency
is said to be unimodal.
If a data set has two values that occur with the same greatest frequency,
both values are considered to be the mode and the data set is said to be
bimodal.
If a data set has more than two values that occur with the same greatest
frequency, each value is used as the mode, and the data set is said to be
multimodal.
When no data value occurs more than once, the data set is said to have
no mode.
If the data set contains one extremely large value or one extremely
small value, a higher or lower midrange value will result and may not be
a typical description of the middle.
w w ... w 1 w 2 n
A 100 100 75 80 80
B 100 80 75 90 80
C 80 100 100 90 100
D 50 80 100 75 85
F 0 100 0 75 80
Distribution Shapes
Frequency distributions can appear
in many shapes.
Example: Find the sample variance and standard deviation for the amount of
European auto sales for a sample of 6 years shown. The data are in millions of
dollars.
11.2, 11.9, 12.0, 12.8, 13.4, 14.3
Sample Variance for Group Data
• For grouped data, use the class
midpoints for the observed value in the
different classes.
– f : class frequency
– Xm : class mid-point
– n : total frequency
Sample Variance for Group Data
Find the variance and standard deviation of group data below:
S2 =
S =
Sample Variance for Ungrouped Data
For ungrouped data, use the same
formula with the class midpoints, Xm,
replaced with the actual observed X value.
Sample Variance for Ungroupped Data
Find the variation and standard deviation of below ungroup data.
Class (X) Frequency (f) f.X (f.X2)
5 2
7 2
9 5
11 8
13 2
15 4
17 1
S2 =
S =
* Replace Xm with X
Summary
Coefficient of Variation
ÞThe coefficient of variation, denoted by CVar, is
the standard deviation divided by the mean.
ÞThe result is expressed as a percentage.