0% found this document useful (0 votes)
5 views

Intro - Stat CH 3

Uploaded by

mogesmisu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Intro - Stat CH 3

Uploaded by

mogesmisu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

INTRODUCTION TO STATISTICS LECTURE NOTE 2024

CHAPTER THREE Types of measures of central tendency:


The most commonly used types of measures
MEASURE OF CENTRAL TENDENCY of central tendency are,
In previous chapter, we have seen that how  The Mean (Arithmetic, Geometric
useful information can be obtained from raw and Harmonic)
data by organizing them into a frequency  The Mode
distribution and then presenting the data by  The Median
using various graphs. When we want to make  Quantiles (Quartiles, Deciles and
comparison between groups of data sets, it is Percentiles)
good to have a single value that is
considered to be a good representative of The choice of these averages depends up
each group. That single value is the statistical on which best fit the property under
measure which describes the middle or center discussion.
of a set of data, is called Averages or
measure of central tendency. Important characteristics of good average
(Measures of Central Tendency)
3.2 Objective of measure of central
It is easy to calculate and understand
tendency
It is based on all the observations
To determine single value around during computation.
which other value concentrate. It is rigidly defined.
To facilitate comparison. It is not affected by the extreme
To make further statistical analysis value if a few very small and very
large items is presented in the data
set.

The Mean

The mean is also known as the arithmetic average, is the sum of the observations divided by
the number of observations.

Arithmetic Mean (AM) of Individual Series:

Let X be a variable which takes values x1 ,x2 ,x3 ,…………….,xn. In a sample size of n from a
population of size N for n < N, then A.M. of a set of observations is the sum of all values in a
series divided by the number of items in the series.

Their arithmetic mean is

x1+x2+x3+x4+x5+⋯+xn ∑𝑛𝑖
𝑖 𝑥𝑖
𝑋̅ = = For raw data
𝑛 𝑛

Arithmetic mean of discrete frequency distribution:

PREPARED BY: ABDULMENAN M. (MSc) Page 1 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024

In discrete frequency distribution, the arithmetic mean becomes:


𝑛𝑖
∑ 𝑓𝑖𝑥𝑖
𝑋̅= 𝑖∑ fi where, ∑ 𝑋𝑖𝑓𝑖= the sum of the products of observations with their respective
frequencies and ∑ 𝑓𝑖= n which is the sum of the frequencies.

Example: The following table gives the wages paid to 125 workers in a factory. Calculate the
arithmetic mean of the wages.

Wages (in birr): 200 210 220 230 240 250 260

No. of workers: 5 15 32 42 15 12 4

Solution:

∑ 𝑥𝑖𝑓𝑖= (200*5) + (210*15) + (220*32) +...+ (260*4) = 28490

N = ∑ 𝑓𝑖= 5+15+3+…………+4 = 125


𝒏𝒊
̅ =∑𝒊 𝒇𝒊𝒙𝒊 = 28490= 227.92birr
𝑿 ∑ 𝐟𝐢 125

Arithmetic Mean of Grouped Data (Continuous frequency distribution):

Simple arithmetic mean for continuous frequency distribution is given by


𝒏𝒊
̅ =∑𝒊 𝒇𝒊𝒎𝒊
𝑿 where, mi is midpoint of class interval.
∑ 𝐟𝐢

Example: The following table gives the marks of 58 students in probability course. Calculate the
average marks of this group.

Marks 0-10_ 10-20 20-30 30-40 40-50 50-60 60-70


No. students 4 8 11 15 12 6 2

Solution=

Midpoint (mi) =(0+10)/2=5,(10+20)/2=15……….(60+70)/2=65

∑ 𝑓𝑖= 4+8=11+………+2= 58

∑ 𝑀𝑖𝑓𝑖= (5*4) + (15*8) +……..+ (65*2) =1940


𝒏𝒊
̅ =∑𝒊 𝒇𝒊𝑴𝒊 = 𝟏𝟗𝟒𝟎 = 33.45 marks.
𝑿 ∑ 𝐟𝐢 𝟓𝟖

 Merits of arithmetic mean:


 It is easy to calculate and understand.

PREPARED BY: ABDULMENAN M. (MSc) Page 2 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024


All observation involved in its calculation.

The mean is used in computing other statistics, such as the variance.

It is Unique: - a set of data has only one mean.

It can be used for further statistical treatment comparison of means, test
of means.
 Demerits of arithmetic mean:
 The mean cannot be computed for an open-ended data in frequency
distribution.
 The mean is affected by extremely high or low values, called outliers, and
may not be the appropriate average to use in these situations.
 It cannot be computed for qualitative data (intelligence, honesty, beauty)
which can’t be measured quantitatively.

Special properties of Arithmetic mean:

1. The sum of the deviations of a set of items from their mean is always zero. i.e.
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ ) = 0.
2. The effect of transforming original series on the mean.
a) If a constant k is added/ subtracted to/from every observation then the new mean
will be the old mean± k respectively.
b) If every observations are multiplied by a constant k then the new mean will be
k*old mean.
3. If ̅𝑥 1 is the mean from n1 observations
If ̅𝑥 2 is the mean from n2 observations
.
If ̅𝑥 k is the mean from nk observations

Then the mean of all the observation in all groups often called the combined mean is given
by:

̅𝑥 1n1 + ̅𝑥 2n2+⋯…. ̅𝑥 k nk ∑𝑘
𝑖=1 ̅
𝑥 ini
̅𝑥 c= = ∑𝑘
𝑛1+𝑛2+⋯…+𝑛𝑘 𝑖=1 ni

Example: In a class there are 30 females and 70 males. If females averaged 60 in an examination
and boys averaged 72, find the mean for the entire class.

Solutions:

Females Males

̅𝒙 1=60 ̅𝒙 2=72

n1=30 n2=70

PREPARED BY: ABDULMENAN M. (MSc) Page 3 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024

̅𝒙 𝟏𝐧𝟏 + ̅𝒙 𝟐𝐧𝟐 (𝟔𝟎∗𝟑𝟎)+(𝟕𝟐∗𝟕𝟎)


̅𝒙 c = = = 68.40
𝒏𝟏+𝒏𝟐 𝟑𝟎+𝟕𝟎

Exercise: The mean monthly salary paid to 77 employees in a company was $78. The mean salary
of 32 of them was $45 and of the other 25 was $82. What was the mean salary of the remaining?

B. Weighted Mean:

One of the limitations of the arithmetic mean is that it gives equal importance (weight) to all the
items in the Series.

Weighted mean is the mean of data set in which each data value in the set does not have the same
relative importance. For example, salaries paid should be weighted according to relative
importance. Weights are assigned to each item in proportion to its relative importance.

 Let X1, X2, …Xn be the value of items of a series and W1, W2, …Wn are their
corresponding weights , then the weighted mean denoted as ̅𝑥 w is defined as:

∑𝒏𝒊
𝒊 𝑿𝒊𝑾𝒊
̅𝑥 w = Where, ̅𝒙 w is weighted mean, Wi= the weights attached to values
∑ 𝐖𝐢
of the variable and Xi= the values of the variable.

Example: Suppose a student has secured the following marks in three tests: Mid-term test= 30,
Laboratory = 25 and Final exam= 20.The simple arithmetic mean will be (30+25+20)/3 = 25.
However, this will be wrong if three tests carry different weights on the basis of their relative
importance. Assuming that the weights assigned to the three tests are 2, 3 and 5 points. On the
basis of this information, we can now calculate a weighted mean as

∑𝒏𝒊
𝒊 𝑿𝒊𝑾𝒊 𝐖𝟏𝐗𝟏+𝐖𝟐𝐗𝟐+𝐖𝟑𝐗𝟑
̅𝑥 w = = = 60+75+100/ 2+3+5 = 23.5 marks.
∑ 𝐖𝐢 𝐖𝟏+𝐖𝟐+𝐖𝟑

C. Geometric Mean (G.M):

The geometric mean is the nth root of the product of n positive values. If X1, X2,…,, Xn are n
positive values, then their geometric mean is

G.M =(X1X2…Xn)1/n .

In case of number of observation is more than two it may be tedious taking out from square root,
in that case calculation can be simplified by taking natural logarithm with base ten.
𝑛
G. M = √𝑥1. . 𝑥2 … . 𝑥𝑛= G.M=(x1…x2….xn)1/n take log both sides
1 1 1
Log (G .M) = 𝑛 log(x1…x2….xn) = 𝑛 (log x1+log x2+…+log xn) = ∑𝑛𝑖=1 𝑙𝑜𝑔𝑥𝑖
𝑛

1
G.M=Antilog (𝑛 ∑𝑛𝑖=1 𝑙𝑜𝑔𝑥𝑖)

PREPARED BY: ABDULMENAN M. (MSc) Page 4 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024

This shows that the logarithms of G.M is the mean of the logarithms of individual’s observations.

The geometric mean is best for reporting average inflation, percentage change over time, ratio,
for positively skewed data (income data) and growth rates. Because these types of data are
expressed as fractions.

Example: The ratio of prices in 1999 to those in 2000 for 4 commodities were 0.9, 1.25, 1.75
and 0.85. Find the average price ratio by means of geometric mean.
∑ 𝑙𝑜𝑔𝑥𝑖 𝑙𝑜𝑔0.9+𝑙𝑜𝑔1.25+𝑙𝑜𝑔1.75+𝑙𝑜𝑔0.85
Solution: G.M =Antilog ( ) = antilog ( ) = 1.14
𝑛 4

Exercise: Calculate the geometric mean of the annual percentage growth rate of profits in business
corporate from the year 2000 to 2005 of given data 50, 72, 54, 82 and 93.

Geometric mean for ungrouped and grouped frequency distribution:

In case of ungrouped data, geometric mean is obtained by


𝑛
G. M = √𝒙𝟏𝒇𝟏 … 𝒙𝟐𝒇𝟐 … 𝒙𝒏𝒇𝒏
𝟏
== Antilog ( 𝒏 ∑𝒏𝒊=𝟏 𝒇𝒊 𝒍𝒐𝒈𝒙𝒊 ) Where, n =∑ 𝒇𝒊

For continuous frequency distribution


𝑛
G. M = √𝒎𝟏𝒇𝟏 … 𝒎𝟐𝒇𝟐 … 𝒎𝒏𝒇𝒏
𝟏
== Antilog (𝒏 ∑𝒏𝒊=𝟏 𝒇𝒊 𝒍𝒐𝒈𝒎𝒊) Where, n=∑ 𝒇𝒊 and mi is class interval of
the class.

Properties of geometric mean:

 Its calculations are not as such easy.


 It involves all observations during computation
 The geometric mean can only be found for positive values.
 If any value in the data set is zero, the geometric mean is zero.

D. Harmonic mean (H.M):

The Harmonic mean is the reciprocal of the arithmetic mean of the reciprocal of each single
value. It is calculated by dividing the number of observations by the sum of reciprocal of the
observation. If x1, x2, x3,..xn are n values, then their harmonic mean is

PREPARED BY: ABDULMENAN M. (MSc) Page 5 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024

𝒏 𝒏
H.M = 𝟏 𝟏 𝟏 = 𝟏
+ ……… ∑
𝒙𝟏 𝒙𝟐 𝒙𝒏 𝒙𝒊

Harmonic mean is used to calculate the average value when the values are expressed as value/unit.
Since the speed is expressed as km/hour, harmonic mean is used for the calculation of average
speed.

Example: Find the harmonic mean of the values 2, 3 and 6.


3
H.M =1 1 1 =3
+ +
2 3 6

The harmonic mean is used to average rates rather than simple values. It is usually
appropriate in averaging kilometers per hour.

Exercise: A man travels from Adama to Hosanna by a car and takes 4 hours to cover the whole
distance. In the first hour he travels at a speed of 50 km/hr, in the second hour his speed is 64
km/hr, in third hour his speed is 80 km/hr and in the fourth hour he travels at the speed of 55 km/hr.
Find the average speed of the motorist.

Harmonic mean for ungrouped and grouped frequency distribution:

For simple frequency data harmonic mean is calculated by using the following formula.
𝒇𝒊
∑( ) 𝒏
𝒙𝒊
 H. M = Reciprocal = 𝒇𝒊 , Where n is the total number of observations.
𝒏 ∑( )
𝒙𝒊

For continuous frequency distribution;


𝒇𝒊
∑( ) 𝒏
𝒎𝒊
 H. M = Reciprocal = 𝒇𝒊 , Where n is the total number of observations and mi is
𝒏 ∑( )
𝒎𝒊
class marks of class interval.

Properties of harmonic mean:

 It is based on all observation in a distribution.


 Used when a situations where small weight is given for larger observation and
larger weight for smaller observation
 Difficult to calculate and understand.
 Appropriate measure of central tendency in situations where data is in ratio, speed or rate.

Relationship among A.M, G.M, and H.M:

For any set observation, its A.M, G.M, and H.M are related each other in the relationship.

A.M ≥ G.M ≥ H.M

PREPARED BY: ABDULMENAN M. (MSc) Page 6 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024

Note:

 The sign of ‘=’ holds if and only if all the observations are identical
 If the observation on the data set takes the value a, ar, ar2, ar3…arn-1,each with
single frequencies then,

(G.M)2=A.M*H.M

Median:

Median for ungrouped data:


Median is defined as the value of the middle item (or the mean of the values of the two middle
items) when the data are arranged in an ascending or descending order of magnitude. It is the value
such that in a set of observations, 50% observations are above and 50% observations are below it.
Hence the median is a positional average.

 n  1
th

Median =   element if n is odd.


 2 

th th
n n 
     1
Median =   2 
2
element if n is even.
2

Median for Continuous Frequency Distribution

In the case of a continuous frequency distribution, we first locate the median class by cumulating
th
N
the frequencies until   point is reached. Finally, the median is calculated by the following
2
formula:

Remark: The median class is the class with the smallest cumulative frequency (less than type)
N 
th 2  Cf  w
N
greater than or equal to   = Median  LCb   
Where, Cf = less than
2 f

cumulative frequency of the class preceding(one before) the median class , f is frequency of
the median class, LCb is lower class boundary of median class and w is the size of the class
k
width and N  
i 1
fi ,

PREPARED BY: ABDULMENAN M. (MSc) Page 7 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024

Example: Calculate median for the following frequency distribution.

Monthly 800- 1000-1200 1200-1400 1400- 1600-1800 1800- total


Wages (in 1000 1600 2000
birr)
No. of 18 25 30 34 26 10 143
Workers
LCF 18 43 73 107 133 143
Solution=

In order to calculate median in this case, based on provided cumulative frequency, Median is the
N 143
value of   71.5th item, which lies in the class (1,200-1,400). Thus (1,200-1,400) is the
2 2
median class. For determining the median in this class, we use interpolation formula as follows:

N 
2  Cf 
71.5  43
Median  L C b    w =1200+ 200
f mc 30

=1393.2 birr observations rather than the values of


the observations.
Advantage of median:  Since median is an average of
position, therefore arranging the
 The value of median is easy to
data in ascending or descending
understand and may be calculated for
order of magnitude is time consuming
any type of data.
in case of a large number of
 The extreme value in the data set does
observation.
not affect the calculation of the
median value. Mode(𝑿 ̂ ): The mode is another measure of
 The median value may be calculated central tendency. The mode is the value that
for an open-ended distribution of data occurs most frequently in the data set.
set.
Note: In case of discrete frequency
 It is unique that is like mean there is
distribution or raw data, mode is the value
only one median for a given set of
of the variable corresponding to the
data
maximum frequency. This method can be
Disadvantage of median:
used conveniently if there is only one value
 The value of median is affected with the highest concentration of
more by sampling variations, that
observation.
is, it affected by the number of

PREPARED BY: ABDULMENAN M. (MSc) Page 8 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024

In the case of grouped data, mode is


determined by the following formula:

 f 1  f0 
Mode = ̂
𝑿 = lo    w
  f1  f 0    f1  f 2  

Where lo is the lower value of the class in which the mode lie, f1 is the frequency of the class in
which the mode lie, f0 is the frequency of the class preceding the modal class, f2 is the frequency
of the class success ding the modal class and w is the class width of class interval.

Example: Let us take the following frequency distribution:

Class 30-40 40-50 50-60 60-70 70-80 80-90 80-90


intervals
frequency 4 6 8 12 9 7 4
Solution: We can see from Column (2) of the table that the maximum frequency of 12 lies in the
class-interval of 60-70. This suggests that the modal class lies in this class-interval. Applying the
formula given earlier, we get:

12  8 4
Mode  60   10 = 60   10 =65.7
12  8  12  9 43

Advantage of mode:
The mode is not affected by the extreme value in the distribution.
The mode value can be calculated for open-ended frequency distribution.
It is the only measurement of central tendency that can be used for qualitative data for
example in describing the opinion of people about a certain phenomenon and qualitative
data.

Disadvantage of mode:

Mode is not rigidly defined measure as there are several methods for calculating its
value.
It is difficult to locate modal class in the case of multi-modal frequency distribution.
Mode is not suitable for algebraic manipulations.
When data set contains more than one mode, such values are difficult to interpret and
compare.

PREPARED BY: ABDULMENAN M. (MSc) Page 9 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024

Measure of location (positional measures): They indicated that where a specific data value falls
within the given data sets. The most common positional measures includes: quartiles, deciles,
Percentiles.

Quantiles are measures which divides a given set of data in to equal subdivision and are obtained
by the same procedure to that of median but data must be arranged only in an increasing order.

 It is the collective name of quartiles, deciles, and percentiles

QUARTILES: Quartiles are measure which divided the ordered data in to four equal parts and
contain three points Q1, first(lower) quartile or value for which
which 25% of the observation lies below it, Q2 which is second quartile or value for which 50%of
the observation lies below or above it and Q3 is third (upper) quartile or value for which 75% of
the arranged item lies below it or 25% lies above it.

For ungrouped data the ith quartiles is the value of the items which is at the

 n  1  n  1
th th

i *  position item i.e Qi = i *   position item where i=1, 2, 3


 4   4 

 n 1
th

Q1 - is value corresponding to   ordered observation.


 4 

 n 1
th

Q2 is the value of 2   ordered observation.


 4 

 n 1
th

Q3 =the value of the 3   ordered observation.


 4 

In case of continuous frequency distribution, quartiles obtained by applying formula

Q i  Lo 
i n 4  cf w
Where, n = the sum of the frequencies of all classes =
fQi
f i , Lo = the lower class boundary of the ith quartile class, Cf = the cumulative
frequencies of class before Qi (ith quartile class) and f Qi = The frequency of ith quartile class
and w is class width.

PREPARED BY: ABDULMENAN M. (MSc) Page 10 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024

 in 
Note: To find ith quartile class compute   and search for the minimum less than cumulative
4
frequency greater than or equal to this value.
DECILES: Are measures which divide a given ordered data in to ten equal parts and each part
contains equal no of elements. It has nine points known as 1st, 2nd …9th deciles and denoted by
D1 D2 D3………D9 and often called the first, the second,…, the ninth decile respectively.

 n  1
th

For ungrouped data, i deciles is the value of the item which is at the i * 
th
 position it
 10 

 n  1
th

Di = i *   position item where i=1, 2, 3, ……9.


 10 

For grouped data or continuous frequency distribution, deciles can be obtained by using

D i  Lo 
i n10  cf w
, for i=1, 2, 3………..9. Where, n= the sum of the frequencies
fDi

of all classes =  fi , Lo the lower class boundary of the ith deciles class, Cf is the cumulative

frequencies of class before Di (ith deciles class) and f is the frequency of ith deciles class and
w is class width.

 in 
Note: To find ith deciles class compute   and search for the minimum less than cumulative
 10 
frequency greater than or equal to this value.

PERCENTILES: Percentiles are measures having 99 points which divide a given ordered data in
to 100 equal parts and each part consists of equal number of elements. It is denoted by P1, P2,…P99
and known as 1st , 2nd , …99th percentiles respectively.

PREPARED BY: ABDULMENAN M. (MSc) Page 11 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024

 n  1
th

For ungrouped data, ith percentiles is the value of item at the i *   position Pi =
 100 

 n  1  n 1
th th

i *  position item where i=1, 2, 3, 3…, 99, P1=   ordered observation, P2 = 2


 100   100 

 n 1  n 1
th th

  ordered observation and P99 = 99   ordered observation.


 100   100 

For grouped (continuous) data, percentiles can be obtained by using

P i  Lo 
i n100  cf w
, for i=1, 2, 3………..,99.
fp i

 in 
Note: To find ith percentile class compute   and search less than cumulative frequency
 100 
greater than or equal to this value, then the class corresponding to this cumulative frequency is i th
percentile class.

Example: The following frequency distribution is the distribution of profit earned by 15


companies during 2003 – 2004.

Class <5 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 35 – 40
interval
Frequency 2 5 7 13 21 16 8 3
Compute,

A) Median and verify that it is equal to Q2.


B) Mode
C) 72th percentile.
D) Second deciles for the above data.
E) The value for which 75% of the profit earned by the company lies above it.

Solution: A). The value of median can be calculated by using formula

PREPARED BY: ABDULMENAN M. (MSc) Page 12 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024

n 
  Cf 
2  n
Median = Lcb + w . To find median class compute  75  37.5
f mc 2 2

The median class is 20 - 25 ,Lmc = 20 , Cf = 27 , W = 5 and Fmc = 21 then

Median = 20 +
37.5  27 5
= 22.5. Thus 50 % of the companies earned an annual profit of
21
22.5 thousands birr or less.

Note that from above example on 2nd quartiles which is equal to median value of the profit earned
by 15 companies.

B). The highest frequency of the given data set is 21, the modal class is 20-25

Here, L0=20, f0=13, f1=21 and f2= 16

 f 1  f0  (21−13)
Mode = lo    w = 20+( ) 5 = 23.07
  f1  f 0    f1  f 2  
(21−13)+(21−16)

C). 72th percentile of the profit earned by 15 companies is computed as follows;


 72 
 n  cf  w
P72  Lo   
100 72
75  54 , The
th
To find 72 percentiles class compute
f P 72 100
class that contains P72 is 25 – 30 , R = 25 , C f = 48 , fp72 = 13.

P72  25 
54  48 5  26.875 .
16

It shows that 72 % of the companies earn profit of 26.875 thousands.

D). Second deciles of the profit earned by 15 companies is computed as follows

D2  Lo 
210 n  cf  w 2*n 2 * 75
f D2
To find 2nd deciles class compute   15
10 10

The class for D2 is 15 – 20, L o= 15, cf =1 4, f02 = 13

PREPARED BY: ABDULMENAN M. (MSc) Page 13 of 14


INTRODUCTION TO STATISTICS LECTURE NOTE 2024

D2  15 
15  1 45
= 16.406.
13

n 75
E). To find 1st quartile class, compute   18.75
4 4

Then Q1 lies in the class 15-20, then Lo = 1 5, Cf = 14, fQ2 = 13 and w = 5

The unique value of Q1 is computed by

Q1  L o 
n 4  cf w 18.75  14
= 15  5  16.827
f Q1 13

It shows that only 25 % of the companies earn profit of birr 16.827 thousands or less annually.

PREPARED BY: ABDULMENAN M. (MSc) Page 14 of 14

You might also like