Basic Stat - Chapter 3 Statistical Description of Data
Basic Stat - Chapter 3 Statistical Description of Data
STATISTICAL
DESCRIPTION OF DATA
3.1 Measures of Central Tendency
Introduction
Typical value
(Center of data)
2
Introduction
– it should be defined rigidly which means that it should have a definite value
– Median
– Mode
3
The Summation Notation
• Let X is a variable
n ending point/
X
Upper limit of
the summation
i
i 1
Summation
notation
Xi is the index of
summation, each
starting point/
term of the sum
Lower limit of
the summation
(index of the
summation)
4
The Summation Notation..
X
i 1
i X1 X 2 X n
XY
i 1
i i X 1Y1 X 2Y2 X nYn
i 1 2
X 2
i 1
X 2
X 2
X 2
n
n n
CX
i 1
i C X i CX 1 CX 2 CX n
i 1
5
The Mean
• Mean is the most commonly used measure of central tendency. There are
different types of mean
– Arithmetic mean,
– Weighted mean,
6
The Arithmetic Mean
• It is computed by adding all the values in the data set divided by the number of
observations in it.
X i
X i 1
n
fX i i
X i 1
n
• If we have frequency distribution (grouped) mean is given by the formula
n LCB/UCB is lower/upper class boundary
fm i i
LCBi UCBi
X i 1
, where mi
n 2
7
The Arithmetic Mean …
• Example 1: The following data is the weight (in Kg) of eight youths:
32,37,41,39,36,43,48 and 36. Calculate the arithmetic mean of their weight.
(Ans:312/8=39 )
• Example 3: Age in year of 20 women who attended health education at Jimma Health
center in 1986 is summarized in the table. What is the mean age of these women. (Ans:
670/20=33.5)
9
Properties of Arithmetic Mean …
• It can be computed for any set of numerical data, it always exists, and unique.
• The sum of deviations of the observations about the mean is zero i.e.
• The sum of squares of deviations of all observations about the mean is the minimum
• If a constant is added to all observations, the new mean is old mean plus constant
• If all observations are multiplied by a constant, the new mean is the multiple of the constant and old
mean
• If wrong value is recorded and latter on it is discovered, the new corrected mean is
X corr X wrong
X corr X wrong
n
10
Weighted Mean
• Weighted mean is calculated when certain values in a data set are more
important than the others.
w x i i
Xw i 1
k
w
i 1
i
11
Geometric Mean
Solution:
Example 2: The man gets three annual raises in his salary. At the end of first year,
he gets an increase of 4%, at the end of the second year, he gets an increase of 6%
and at the end of the third year, he gets an increase of 9% of his salary. What is the
average percentage increase in the three periods?
Solution:
13
Properties of geometric mean
– Its calculations are not as such easy.
– It involves all observations during computation
– It may not be defined even it a single observation
is negative.
– If the value of one observation is zero its values
becomes zero.
Harmonic Mean
• Note: SHM is used for equal distances, equal costs and equal rates.
15
Harmonic Mean
Example 1: A motorist travels for three days at a rate (speed) of 480 km/day. On
the first day he travels 10 hours at a rate of 48 km/h, on the second day 12 hours at
a rate of 40 km/h, on the third day 15 hours at a rate of 32 km/h. What is the
average speed?
Solution: Since the distance covered by the motorist is equal
( ), so we use SHM.
Example 1: A driver travel for 3 days. On the 1st day he drives for
10h at a speed of 48 km/h, on the 2nd day for 12h at 45 km/h and
on the 3rd day for 15h at 40 km/h. What is the average speed?
Solution: since the distance covered by the driver is not equal, so
we use WHM by taking the distance as weights (wi).
Properties of harmonic mean
• It is based on all observation in a distribution.
• Used when a situations where small weight is
give for larger observation and larger weight
for smaller observation
• Difficult to calculate and understand
• Appropriate measure of central tendency in
situations where data is in ratio, speed or rate.
Relation between AM, GM, and Hm
• If all the values in a data set are the same, then all the three means (arithmetic
mean, GM and HM) will be identical.
• As the variability in the data increases, the difference among these means also
increases.
• Arithmetic mean is always greater than the GM, which in turn is always greater
than the HM.
– AM > GM > HM
19
Median
• Example: systolic blood pressure of seven persons were given as 113, 124, 124,
132, 146, 151, and 170. what is the median systolic blood pressure? (Ans: 132)
• Six men with high cholesterol participated in a study to investigate the effects of diet
on cholesterol level. At the beginning of the study, their cholesterol levels (mg/dL)
were as follows:366, 327, 274, 292, 274 and 230. what is the median cholesterol
level? (Ans:283)
20
Median …
– If the data is in ungrouped frequency distribution, median is the class with largest
less than cumulative frequency smaller than or equal to half of the total observation
• Example: Forty five students were taken to field and evaluated their performance using 60m
pure speed test. The time is recorded in seconds, and the result is summarized in the table. What
is the median performance of these students. (Ans: 19 secs)
21
Median …
• Example: fifty students were taken to field and evaluated their performance using 100 m
pure speed test. The time is recorded in seconds, and the result is summarized in the table.
What is the median performance of these students. (Ans: 20.81 secs)
• There can be only one mode-unimodal Eg: 25, 27, 22, 25,18
• There can be two mode-bimodal Eg: 25, 27, 22, 27, 25, 18, 20
• There can be more than two mode-multimodal Eg: 25, 27, 22, 27, 25, 18, 20, 19, 22, 17
23
Mode…
• Quartiles are three points which divide an array into four parts in
such a way that each portion contains an equal number of
elements.
– First quartile (Q1) 25% of the observations lies below or equal to it
• Example: Find the median, lower quartile and upper quartile of the
following numbers.
a) 12, 5, 22, 30, 7, 36, 14, 42, 15, 53, 25
b)
13 23.5 39
26
Quantiles
27
Quantiles …
• Deciles are nine points which divide an array into 10 parts in such
a way that each part contains equal number of elements.
– The nine deciles are denoted by D1, D2, …, D9
– Second decile (D2) 20% of the observations lies below or equal to it etc
28
Quantiles …
29
Quantiles …
Score Number
of
students Compute the following quantities
25-29 1 ● First quartile (Ans:44.92)
30-34 1
●Ninth decile (Ans:65.75)
35-39 1
●forty fifth percentile (Ans:51.38)
40-44 3
45-49 3 Remark:
50-54 6
Q1 P25
55-59 4
Q2 D5 P50 Median
60-64 3
65-69 2 Q3 P75
70-74 1 D1 P10 ; D2 P20 ;; D9 P90
30
3.2 Measures of Dispersion
31
Introduction
– Central tendency measures do not reveal the variability present in the data.
32
Introduction…
33
Introduction…
34
Range (R)
• Denoted by R
R = max − min
35
Relative Range (RR)
• Relative range is the ratio of the difference and sum of the two
extreme values in a data
• Denoted by RR/CR
max min
RR
max min
36
Properties of range
= Q3 - Q1
• The SIR is often used with skewed data as it is insensitive to the extreme
scores
38
Coefficient of Quartile Deviation
• The ratio of the difference to sum of the two extreme quartiles of a
data. Denoted by CQD
Q3 Q1
CQD
Q3 Q1
39
Properties of IQR
• Measures the ‘average’ distance of each observation away from the mean of
the data
• Gives an equal weight to each observation
• Generally more sensitive than the range or interquartile range, since a
change in any value will affect it
• The Mean Absolute Deviation of a set of n numbers is
n
x x i
MAD i 1
n
41
Coefficient of Mean Deviation (CMD)
MAD
CMD
x
– All values are used in the calculation.
42
Step 2 Step 3
4 37.6 -4 4
• The Coefficient of Variation (CV) for a data set defined as the ratio of the standard
deviation to the mean
s
CV 100%
x
– All values are used in the calculation.
– The actual value of the CV is independent of the unit in which the measurement has been
taken, so it is a dimensionless number.
– For comparison between data sets with different units or widely different means, one
should use the coefficient of variation instead of the standard deviation.
50
Coefficient of Variation
Example: Last semester, the students of Biology and Chemistry Departments took
Stat 273 course. At the end of the semester, the following information was recorded.
51
Standard Score
52
Standard Score
• Relatively speaking:
53
S1 1 .2
Coefficient of variation for group 1: CV 100 % 100 % 11 . 54 %
x1 10 . 4
S2 1 .3
Coefficient of variation for group 2: CV 100 % 100 % 10 . 92 %
x2 11 . 9
x A x1 9 . 2 10 . 4
Z-score of Person A: Z 1 . 00
S1 1 .2
xB x2 9.3 11 .9
Z-score of Person B: Z S
1.3
2.00
2
– If it has a longer tail to the left of the central maximum than to the right, it is said to
be skewed to the left or said to have negative skewness.
55
Skewness
– For moderately skewed distribution, the following relation holds among the three
commonly used measures of central tendency
mean-mode=3(mean-median)
mean mod e
skewness or
s tan dard deviation
m3 x x 2
x x 3
56
Kurtosis
– The normal distribution which is not very high peaked or flat topped is called
mesokurtic.
57
Kurtosis
• Measures of kurtosis
m4 x x 4
x x 2
58
Skewness and Kurtosis (Example)
Score 61 64 67 70 73
Number of students 5 18 42 27 8
Can we say the distribution is skeweed? What is the shape of the distribution?
(Ans: mean=67.45, m2=8.61, m3=-2.72, m4=201.39)
m3 2.72
skewness 0.11
m2 8.61
3 3
2 2
• The distribution is negatively skewed
m4 201.39
kurtosis 2 2.71
m2 8.612
skewness
m3 2.72 m4 201.39
0.11 kurtosis 2 2.71
m2 2 8.61 2 m2 8.61
3 3
2