Intro - Stat CH 3
Intro - Stat CH 3
The Mean
The mean is also known as the arithmetic average, is the sum of the observations divided by
the number of observations.
Let X be a variable which takes values x1 ,x2 ,x3 ,…………….,xn. In a sample size of n from a
population of size N for n < N, then A.M. of a set of observations is the sum of all values in a
series divided by the number of items in the series.
x1+x2+x3+x4+x5+⋯+xn ∑𝑛𝑖
𝑖 𝑥𝑖
𝑋̅ = = For raw data
𝑛 𝑛
Example: The following table gives the wages paid to 125 workers in a factory. Calculate the
arithmetic mean of the wages.
Wages (in birr): 200 210 220 230 240 250 260
No. of workers: 5 15 32 42 15 12 4
Solution:
Example: The following table gives the marks of 58 students in probability course. Calculate the
average marks of this group.
Solution=
∑ 𝑓𝑖= 4+8=11+………+2= 58
All observation involved in its calculation.
The mean is used in computing other statistics, such as the variance.
It is Unique: - a set of data has only one mean.
It can be used for further statistical treatment comparison of means, test
of means.
Demerits of arithmetic mean:
The mean cannot be computed for an open-ended data in frequency
distribution.
The mean is affected by extremely high or low values, called outliers, and
may not be the appropriate average to use in these situations.
It cannot be computed for qualitative data (intelligence, honesty, beauty)
which can’t be measured quantitatively.
1. The sum of the deviations of a set of items from their mean is always zero. i.e.
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ ) = 0.
2. The effect of transforming original series on the mean.
a) If a constant k is added/ subtracted to/from every observation then the new mean
will be the old mean± k respectively.
b) If every observations are multiplied by a constant k then the new mean will be
k*old mean.
3. If ̅𝑥 1 is the mean from n1 observations
If ̅𝑥 2 is the mean from n2 observations
.
If ̅𝑥 k is the mean from nk observations
Then the mean of all the observation in all groups often called the combined mean is given
by:
̅𝑥 1n1 + ̅𝑥 2n2+⋯…. ̅𝑥 k nk ∑𝑘
𝑖=1 ̅
𝑥 ini
̅𝑥 c= = ∑𝑘
𝑛1+𝑛2+⋯…+𝑛𝑘 𝑖=1 ni
Example: In a class there are 30 females and 70 males. If females averaged 60 in an examination
and boys averaged 72, find the mean for the entire class.
Solutions:
Females Males
̅𝒙 1=60 ̅𝒙 2=72
n1=30 n2=70
Exercise: The mean monthly salary paid to 77 employees in a company was $78. The mean salary
of 32 of them was $45 and of the other 25 was $82. What was the mean salary of the remaining?
B. Weighted Mean:
One of the limitations of the arithmetic mean is that it gives equal importance (weight) to all the
items in the Series.
Weighted mean is the mean of data set in which each data value in the set does not have the same
relative importance. For example, salaries paid should be weighted according to relative
importance. Weights are assigned to each item in proportion to its relative importance.
Let X1, X2, …Xn be the value of items of a series and W1, W2, …Wn are their
corresponding weights , then the weighted mean denoted as ̅𝑥 w is defined as:
∑𝒏𝒊
𝒊 𝑿𝒊𝑾𝒊
̅𝑥 w = Where, ̅𝒙 w is weighted mean, Wi= the weights attached to values
∑ 𝐖𝐢
of the variable and Xi= the values of the variable.
Example: Suppose a student has secured the following marks in three tests: Mid-term test= 30,
Laboratory = 25 and Final exam= 20.The simple arithmetic mean will be (30+25+20)/3 = 25.
However, this will be wrong if three tests carry different weights on the basis of their relative
importance. Assuming that the weights assigned to the three tests are 2, 3 and 5 points. On the
basis of this information, we can now calculate a weighted mean as
∑𝒏𝒊
𝒊 𝑿𝒊𝑾𝒊 𝐖𝟏𝐗𝟏+𝐖𝟐𝐗𝟐+𝐖𝟑𝐗𝟑
̅𝑥 w = = = 60+75+100/ 2+3+5 = 23.5 marks.
∑ 𝐖𝐢 𝐖𝟏+𝐖𝟐+𝐖𝟑
The geometric mean is the nth root of the product of n positive values. If X1, X2,…,, Xn are n
positive values, then their geometric mean is
G.M =(X1X2…Xn)1/n .
In case of number of observation is more than two it may be tedious taking out from square root,
in that case calculation can be simplified by taking natural logarithm with base ten.
𝑛
G. M = √𝑥1. . 𝑥2 … . 𝑥𝑛= G.M=(x1…x2….xn)1/n take log both sides
1 1 1
Log (G .M) = 𝑛 log(x1…x2….xn) = 𝑛 (log x1+log x2+…+log xn) = ∑𝑛𝑖=1 𝑙𝑜𝑔𝑥𝑖
𝑛
1
G.M=Antilog (𝑛 ∑𝑛𝑖=1 𝑙𝑜𝑔𝑥𝑖)
This shows that the logarithms of G.M is the mean of the logarithms of individual’s observations.
The geometric mean is best for reporting average inflation, percentage change over time, ratio,
for positively skewed data (income data) and growth rates. Because these types of data are
expressed as fractions.
Example: The ratio of prices in 1999 to those in 2000 for 4 commodities were 0.9, 1.25, 1.75
and 0.85. Find the average price ratio by means of geometric mean.
∑ 𝑙𝑜𝑔𝑥𝑖 𝑙𝑜𝑔0.9+𝑙𝑜𝑔1.25+𝑙𝑜𝑔1.75+𝑙𝑜𝑔0.85
Solution: G.M =Antilog ( ) = antilog ( ) = 1.14
𝑛 4
Exercise: Calculate the geometric mean of the annual percentage growth rate of profits in business
corporate from the year 2000 to 2005 of given data 50, 72, 54, 82 and 93.
The Harmonic mean is the reciprocal of the arithmetic mean of the reciprocal of each single
value. It is calculated by dividing the number of observations by the sum of reciprocal of the
observation. If x1, x2, x3,..xn are n values, then their harmonic mean is
𝒏 𝒏
H.M = 𝟏 𝟏 𝟏 = 𝟏
+ ……… ∑
𝒙𝟏 𝒙𝟐 𝒙𝒏 𝒙𝒊
Harmonic mean is used to calculate the average value when the values are expressed as value/unit.
Since the speed is expressed as km/hour, harmonic mean is used for the calculation of average
speed.
The harmonic mean is used to average rates rather than simple values. It is usually
appropriate in averaging kilometers per hour.
Exercise: A man travels from Adama to Hosanna by a car and takes 4 hours to cover the whole
distance. In the first hour he travels at a speed of 50 km/hr, in the second hour his speed is 64
km/hr, in third hour his speed is 80 km/hr and in the fourth hour he travels at the speed of 55 km/hr.
Find the average speed of the motorist.
For simple frequency data harmonic mean is calculated by using the following formula.
𝒇𝒊
∑( ) 𝒏
𝒙𝒊
H. M = Reciprocal = 𝒇𝒊 , Where n is the total number of observations.
𝒏 ∑( )
𝒙𝒊
For any set observation, its A.M, G.M, and H.M are related each other in the relationship.
Note:
The sign of ‘=’ holds if and only if all the observations are identical
If the observation on the data set takes the value a, ar, ar2, ar3…arn-1,each with
single frequencies then,
(G.M)2=A.M*H.M
Median:
n 1
th
th th
n n
1
Median = 2
2
element if n is even.
2
In the case of a continuous frequency distribution, we first locate the median class by cumulating
th
N
the frequencies until point is reached. Finally, the median is calculated by the following
2
formula:
Remark: The median class is the class with the smallest cumulative frequency (less than type)
N
th 2 Cf w
N
greater than or equal to = Median LCb
Where, Cf = less than
2 f
cumulative frequency of the class preceding(one before) the median class , f is frequency of
the median class, LCb is lower class boundary of median class and w is the size of the class
k
width and N
i 1
fi ,
In order to calculate median in this case, based on provided cumulative frequency, Median is the
N 143
value of 71.5th item, which lies in the class (1,200-1,400). Thus (1,200-1,400) is the
2 2
median class. For determining the median in this class, we use interpolation formula as follows:
N
2 Cf
71.5 43
Median L C b w =1200+ 200
f mc 30
f 1 f0
Mode = ̂
𝑿 = lo w
f1 f 0 f1 f 2
Where lo is the lower value of the class in which the mode lie, f1 is the frequency of the class in
which the mode lie, f0 is the frequency of the class preceding the modal class, f2 is the frequency
of the class success ding the modal class and w is the class width of class interval.
12 8 4
Mode 60 10 = 60 10 =65.7
12 8 12 9 43
Advantage of mode:
The mode is not affected by the extreme value in the distribution.
The mode value can be calculated for open-ended frequency distribution.
It is the only measurement of central tendency that can be used for qualitative data for
example in describing the opinion of people about a certain phenomenon and qualitative
data.
Disadvantage of mode:
Mode is not rigidly defined measure as there are several methods for calculating its
value.
It is difficult to locate modal class in the case of multi-modal frequency distribution.
Mode is not suitable for algebraic manipulations.
When data set contains more than one mode, such values are difficult to interpret and
compare.
Measure of location (positional measures): They indicated that where a specific data value falls
within the given data sets. The most common positional measures includes: quartiles, deciles,
Percentiles.
Quantiles are measures which divides a given set of data in to equal subdivision and are obtained
by the same procedure to that of median but data must be arranged only in an increasing order.
QUARTILES: Quartiles are measure which divided the ordered data in to four equal parts and
contain three points Q1, first(lower) quartile or value for which
which 25% of the observation lies below it, Q2 which is second quartile or value for which 50%of
the observation lies below or above it and Q3 is third (upper) quartile or value for which 75% of
the arranged item lies below it or 25% lies above it.
For ungrouped data the ith quartiles is the value of the items which is at the
n 1 n 1
th th
n 1
th
n 1
th
n 1
th
Q i Lo
i n 4 cf w
Where, n = the sum of the frequencies of all classes =
fQi
f i , Lo = the lower class boundary of the ith quartile class, Cf = the cumulative
frequencies of class before Qi (ith quartile class) and f Qi = The frequency of ith quartile class
and w is class width.
in
Note: To find ith quartile class compute and search for the minimum less than cumulative
4
frequency greater than or equal to this value.
DECILES: Are measures which divide a given ordered data in to ten equal parts and each part
contains equal no of elements. It has nine points known as 1st, 2nd …9th deciles and denoted by
D1 D2 D3………D9 and often called the first, the second,…, the ninth decile respectively.
n 1
th
For ungrouped data, i deciles is the value of the item which is at the i *
th
position it
10
n 1
th
For grouped data or continuous frequency distribution, deciles can be obtained by using
D i Lo
i n10 cf w
, for i=1, 2, 3………..9. Where, n= the sum of the frequencies
fDi
of all classes = fi , Lo the lower class boundary of the ith deciles class, Cf is the cumulative
frequencies of class before Di (ith deciles class) and f is the frequency of ith deciles class and
w is class width.
in
Note: To find ith deciles class compute and search for the minimum less than cumulative
10
frequency greater than or equal to this value.
PERCENTILES: Percentiles are measures having 99 points which divide a given ordered data in
to 100 equal parts and each part consists of equal number of elements. It is denoted by P1, P2,…P99
and known as 1st , 2nd , …99th percentiles respectively.
n 1
th
For ungrouped data, ith percentiles is the value of item at the i * position Pi =
100
n 1 n 1
th th
n 1 n 1
th th
P i Lo
i n100 cf w
, for i=1, 2, 3………..,99.
fp i
in
Note: To find ith percentile class compute and search less than cumulative frequency
100
greater than or equal to this value, then the class corresponding to this cumulative frequency is i th
percentile class.
Class <5 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 35 – 40
interval
Frequency 2 5 7 13 21 16 8 3
Compute,
n
Cf
2 n
Median = Lcb + w . To find median class compute 75 37.5
f mc 2 2
Median = 20 +
37.5 27 5
= 22.5. Thus 50 % of the companies earned an annual profit of
21
22.5 thousands birr or less.
Note that from above example on 2nd quartiles which is equal to median value of the profit earned
by 15 companies.
B). The highest frequency of the given data set is 21, the modal class is 20-25
f 1 f0 (21−13)
Mode = lo w = 20+( ) 5 = 23.07
f1 f 0 f1 f 2
(21−13)+(21−16)
P72 25
54 48 5 26.875 .
16
D2 Lo
210 n cf w 2*n 2 * 75
f D2
To find 2nd deciles class compute 15
10 10
D2 15
15 1 45
= 16.406.
13
n 75
E). To find 1st quartile class, compute 18.75
4 4
Q1 L o
n 4 cf w 18.75 14
= 15 5 16.827
f Q1 13
It shows that only 25 % of the companies earn profit of birr 16.827 thousands or less annually.