0% found this document useful (0 votes)
17 views

Stat Chapter 3

Uploaded by

abd2369ked
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Stat Chapter 3

Uploaded by

abd2369ked
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Unit 3 Tujuba A.

Central Tendency & Dispersion

UNIT 3

3. Measures of Central Tendency and Measures of Dispersion

The measure of Central Tendency: Usually when two or more different data sets are to be
compared it is necessary to condense the data, but for comparison, the condensation of data set
into a frequency distribution and visual presentation is not enough. It is then necessary to
summarize the data set in a single value. Such a value usually somewhere in the center and
represent the entire data set and hence it is called a measure of central tendency or averages.

Measures of Central Tendency: refers to a single value that describes the characteristics of the
entire mass of data. It gives information about the location of the center of the distribution of
data values.

Central Tendency refers to the measures used to determine the center of distributions of data. It
is used to identify a single value that represents an entire data set at most. The major common
types of central tendency are mean, median, and mode. Each of these measures calculates the
location of the central point using a different method. The choice of measures of central tendency
depends on the types of statistical data used.

The measure of central tendency is important:

 To determine a single value around which other values in the data concentrate
 To facilitates comparison among sets of data
 To summarize or reduce the size of data

Central Tendency includes

1. Mean (Average)
Averages are statistical constants that enable us to comprehend in a single effect the significance
of the whole. It gives us an idea about the concentration of the values in the central part of the
distribution. Speaking an average of a statistical series is the value of the variable which is
representative of the entire distribution. Average refers to the central value of given statistical
data. The main objectives/purpose of the mean are the following

 The main object (purpose) of the average is to give a bird’s eye view (summary) of the
statistical data.

1 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

 The average removes all the unnecessary details of the data and gives a concise (to the
point or short) picture of the huge data under investigation.
 Average is also of great use for the purpose of comparison (i.e., the comparison of two or
more groups in which the units of the variables are the same) and for the further analysis
of the data.
 Averages are very useful for computing various other statistical measures such as
dispersion, skewness, kurtosis. Perquisites (desirable qualities) of a Good Average: An
average will be considered as good if:
 It utilizes all the values given in the data
 It is not much affected by the extreme values
 It can be calculated in almost all cases
 It can be used in further statistical analysis of the data
 It should avoid giving misleading results
 Rigidly defined (unique)
 Based on all observations under investigation
 Easily understood and simple to compute
 Suitable for further mathematical treatment and it should be mathematically
defined
 Little affected by fluctuations of sampling and not highly affected by extreme
values.

Average or Mean can be classified/categorized as follows


i. Arithmetic Mean
ii. Weighted Arithmetic Mean
iii. Combined mean
iv. Geometric Mean
v. Harmonic Mean

i. Arithmetic Mean
Arithmetic mean: is defined as the sum of the measurements of the items divided by the total
number of items.
Arithmetic Mean: Arithmetic mean is a number that is obtained by adding the values of all the
items of a series and dividing the total by the number of items.
It is usually denoted by ̅ .

2 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

Arithmetic Mean for individual series


Suppose , are observed values in a sample of size n from a population of size N,
n<N then the arithmetic means of the sample, denoted by ̅ is given by

̅= =
If we take an entire population the Mean is denoted by μ and is given by:

= =
Where N stands for the total number of observations in the population.

Example: There are six classrooms in Future Generation Hope Kindergarten School of Ambo
town. The class sizes of each of these kindergartens are 26, 20, 25, 18, 20, and 23. A researcher
writing a report about schools in her town wants to come up with a figure to describe the typical
kindergarten class size in this town.
= = 132/6 = 22
Therefore, the average kindergarten class size in this school is 22.

Arithmetic means for discrete data arranged in a frequency distribution

When the numbers , occur with frequencies , , respectively, then the


mean can be expressed in a more compact form as:

̅= = ∑

Example: Calculate the arithmetic mean of the pulse rates (beats per minute) of eleven students:
60 60 71 68 71 72 71 76 72 80 80

̅ = = = = 71

In this case, there are two 60’s, one 68, three 71’s, two 72’s, one 76, and two 80’s. The number
of times each number occurs is called its frequency and the frequency is usually denoted by f.
The information in the sentence above can be written in a table, as follows.
Value, xi 60 68 71 72 76 80 Total
Frequency, fi 2 1 3 2 1 2 11
The formula for the arithmetic mean for data of this type is

̅= = ∑

3 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

Arithmetic Mean for Grouped Continuous Frequency Distribution


If data are given in the form of continuous frequency distribution, the sample mean can be
computed as

̅= ∑
where is the class mark of the ith class; i=1, 2, . . ., k

is the frequency of the ith class and k is the number of classes


Note that ∑ = n = the total number of observations.
Example 3.5

The following frequency table gives the height (in inches) of 100 students in a college.

Class 60-62 62-64 64-66 66-68 68-70 70-72 Total


boundary
Frequency (fi) 5 18 42 20 8 7 100
Calculate the mean

Properties of the Arithmetic Mean


 The algebraic sum of the deviations of a set of numbers , from their mean x is
always zero. i.e.

 The sum of squares of deviations from the mean is the least.


 If the mean of , is ̅ , then
a) The mean of ± k, ± k ,..., ± k will be ̅ ± k
b) The mean of will be k ̅ .
Merits of Arithmetic Mean
 Arithmetic mean has a rigidly defined mathematical formula so that its value is always
definite or unique. It can be calculated for any set of numerical data.
 It is calculated based on all observations.
 Arithmetic mean is simple to calculate and easy to understand.
 It doesn’t need the arrangement of data in increasing or decreasing order.
 The arithmetic mean of many samples from the same population does not fluctuate
considerably.
 It affords a good standard of comparison.
Demerits of Arithmetic Mean
 It can’t be calculated for data that are not quantifiable.
 It is highly affected by extreme (abnormal) values in the series.
 It can be a number that does not exist in the series.

4 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

 It can’t be calculated for grouped continuous open-ended classes

ii. Weighted Arithmetic Mean


While calculating simple arithmetic mean, all items were assumed to be of equal importance
(each value in the data set has equal weight). When the observations have different weights, we
use a weighted average. Weights are assigned to each item in proportion to its relative
importance.
If , represent values of the items and , are the corresponding weights,
then the weighted mean, ( ̅ ) is given by

Example: In 2002/03, the average salaries of elementary school teachers in three cities were Birr
24, 000, 20,000, and 30,000. If there were 600,400 & 800 elementary school teachers. Find the
weighted average salary of all the elementary school teachers in the three cities

Example: A student’s final mark in Mathematics, Physics, Chemistry, and Biology are
respectively A, B, D, and C. If the respective credits received for these courses are 4, 4, 3, and 2,
determine the approximate average mark the student has got for the course.

iii. Combined Mean


Combined Mean: When a set of observations is divided into k groups and ̅ is the mean of n1
observations of group 1, ̅ is the mean of n2 observations of group2, …, ̅ is the mean of nk
observations of group k, then the combined mean, denoted by ̅ , of all observations taken
together is given by

̅ ̅ ̅
̅

This is a special case of the weighted mean. In this case, the sample sizes are the weights.
Example: In the previous year there were two sections taking Statistics course. At the end of the
semester, the two sections got average marks of 70 & 78. There were 45 and 50 students in each
section respectively. Find the mean mark for the entire student.

5 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

iv. Geometric Mean


The geometric mean as an arithmetic mean is calculated average. It is used when observed values
are measured as ratios, percentages, proportions, indices, or growth rates.
The geometric mean for individual series: The geometric mean, G.M. of an individual series
of positive numbers , is defined as the nth root of their product.
G.M  n x1 .x2  xn
= antilog ( ∑ )
Example: Find the G. M of (a) 3 and 12 b) 2, 4 and 8
Solution: a) ; b) GM= √ √ =4
Geometric Mean for Discrete Data Arranged in FD:- When the numbers , occur
with frequencies , , respectively, then the geometric mean is obtained by
G.M .  n x1f1 .x2f2 ..xmf m = antilog ( ∑ )
Example: Compute the geometric mean of the following values: 3, 3, 4, 4, 4, 5, 6 and 6.
Solution
Values 3 4 5 6
Frequency 2 3 1 2
G.M. = √ = 4.236 The geometric mean for the given data is 4.236
The geometric mean for continuous grouped FD: The above formula can also be used
whenever the frequency distribution is grouped continuous, class marks of the class intervals are
considered as xi.
Properties of geometric mean
 It is less affected by extreme values.
 It takes each and every observation into consideration.
 If the value of one observation is zero its value becomes zero

v. Harmonic Mean
It is a suitable measure of central tendency when the data pertains to speed, rate and time. The
harmonic of n values is defined as n divided by the sum of their reciprocal.
Harmonic mean for individual series: If , are n observations, then harmonic
mean can be represented by the following formula:
n
H .M 
1 1 1
 
x1 x2 xn
Example: A car travels 25 miles at 25 mph, 25 miles at 50 mph, and 25 miles at 75 mph. Find
the harmonic mean of the three velocities.
Solution

6 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

H .M 
n = = 40.9.
1 1 1
 
x1 x2 xn
Harmonic mean for discrete data arranged in FD:- If the data is arranged in the form of
frequency distribution
n
H .M  m
, where n   f k
f1 f 2 f
  m k 1

x1 x 2 xm
Harmonic mean for continuous grouped FD: Whenever the frequency distribution are
grouped continuous, class marks of the class intervals are considered as and the above
formula can be used as
H.M. = where

is the class mark of ith class


Properties of the harmonic mean
 It is unique for a given set of data.
 It considers every observation.
 Difficult to calculate and understand.
 The appropriate measure of central tendency in situations where data is in ratio, speed or
rate.

Relations among different means


i. If all the observations are positive we have the relationship among the three means
given as: ̅ GM HM
ii. For two observations √ ̅ GM
iii. ̅ = GM = HM if all observation is positive and has equal value.
Uses of Averages in Different Situations

 A.M is an appropriate average for all the situations where there are no extreme values in
the data
 G.M is an appropriate average for calculating the average percent increase in sales,
population, production, etc. It is one of the best averages for the construction of index
numbers
 H.M is an appropriate average for calculating the average rate of increase of profits of a
firm or finding the average speed of a journey or the average price at which articles are
sold

7 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

2. MEDIAN
Median is the midpoint of the values after they have been ordered from the smallest to the
largest Equivalently, the Median is a number that divides the data set into two equal parts, each
item in one part is no more than this number, and each item in another part is no less than this
number

Median is the value of that item in a series which divides the array into two equal parts, one consisting of all the values
less than it and the other consisting of all the values more than it. median is a positional average. The number of items
below it is equal to the number. The number of items below it is equal to the number of items above it. it occupies the
central position. thus, t he median is defined as the mid- value of the variants if the values are arranged in ascending or
descending order of their magnitude, t h e median is the middle value of the number of variants is odd, and an average of
two middle values if the number of variants is even
Median is the middle number in a sorted list of numbers. It is the value that separates the higher
half from the lower half of a data sample. In a data set, it may be thought of as the “middle”
value. Median is an appropriate average in a highly skewed distribution e.g. in the distribution of
wages, incomes

For example, in the data set [1, 2, 3, 6, 7, 8, 9], the median is 6, the fourth largest, and also the
fourth smallest, number in the sample. Therefore, in case the data set has an odd number of
values, the median is the center value. But, when there is an even number of values in a data set,
then the two middle needs to be added and divide by 2. For example, in the data set [1, 2, 3, 5,
6, 7, 8, 9], the median is 5.5. To determine the median value in a data set, the numbers must first
be sorted or arranged in order of magnitude. The median is less affected by outliers and skewed
data. This property makes it a better option than the mean as a measure of central tendency

Median for continuous grouped frequency distribution

In the case of continuous frequency distribution, the class corresponding to the cumulative
frequency just greater than N/2 is called the median class and the value of median is obtained
by the following formula:

( ) where Lm is the lower limit of the median class

f is the frequency of the median class

8 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

h is the magnitude of the median class

fb is the cf of the class preceding the

median class and ∑

Example: Find the median wage of the following distribution

Wages (in Birr) 2000-3000 3000-4000 4000-5000 5000-6000 6000-7000


No.of workers 3 5 20 10 5
Merits of Median
 It is simple, easy to compute, and understand
 Median is not influenced by extreme values because it is a positional average
 Median can be calculated in case of distribution with open-ended intervals
 Median can be located even if the data are incomplete
 Median can be located even for qualitative factors such as ability, honesty
 Its value is not affected by extreme variables
 It is capable of further algebraic treatment
 It can be determined by inspection for arrayed data
 It can be found graphically also.
 It indicates the value of the middle item.

Demerits of Median
 It may not be representative value as it ignores extreme values
 It can’t be determined precisely when its size falls between the two values
 A slight change in the series may bring a drastic change in the median value
 In case of an even number of observations or continuous series, the median is an
estimated values ether than any value in the series
 It is not suitable for further mathematical treatment except its use in mean deviation
 It is not useful in cases where large weights are to be given to extreme values.

9 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

3. THE MODE

Mode is defined as the most frequently occurring value in data. The mode is not attracted by
the extreme values in the data. Mode is the only measure of central tendency that can be used
for qualitative (nominal) data. Mode is an appropriate average in the case of qualitative data
e.g. the opinion of an average person; it is probably referring to the most frequently
expressed opinion which is the modal opinion

The mode in case of Ungrouped Data: “A value that occurs most frequently in a data is called
mode” OR “if two or more values occur the same number of times but most frequently than
the other values, the there is more than one whole” “If two or more Values occur the Same
number of times but most frequently than the other values, then there is more than one mode”
. The data having one mode is called uni-modal distribution. The data having two modes is
called bimodal distribution. The data has more than two modes is called multi-modal
distribution. The mode in case of Discrete Grouped Data: “A value which has the largest
frequency in a set of data is called mode” Mode in case of Continuous Grouped Data: In case
of continuous grouped data, the mode would lie in the class that carries the highest
frequency. This class is called the modal class. The formula used to compute the value of
mode is given below:  Numerical examples of Mode for ungrouped and group

Example: Find the mode for the following exam result (10%) of 15 students

3,8,6,5,8,7,8,6,7,4,7,5,7,9,

For grouped frequency distribution the mode is given by

( )

Where Lo- is the lower class boundary of the modal class

is the difference between the frequency of modal class and that of the preceding class

is the difference between the frequency of the modal class and that of the following class

CW- is the class width of the modal class.

10 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

Example 3.19:

Find the mode for the frequency distribution given by below.

Class interval Frequency


3-6 4
6-9 8
9-12 10
12-15 3
Merits of mode
1. Mode is readily comprehensible and easy to calculate. Like median, the mode can be
located in some cases merely by inspection
2. Mode is not at all affected by extreme values
3. The mode can be conveniently located even if the frequency distribution has class
intervals of unequal magnitude provided the modal class and the classes preceding the
succeeding it is of the same magnitude. Open-ended classes also do not pose any problem
in the location of the mode.

Demerits of mode

1. Mode is ill-defined. It is not always possible to find a clearly defined mode, in some
cases we may come across a distribution having two modes and it is called bi-modal. If a
distribution has more than two modes it is said to be multimodal
2. It is not based upon all the observations
3. It is not capable of further mathematical treatment
4. As compared with the mean, the mode is affected to a greater extent, by fluctuations of
sampling

11 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

2. QUARTILES, DECILES AND PERCENTILES


Median is the value of the middle item which divides the data into two equal parts and found by
arranging the data in an increasing or decreasing order of magnitude, whereas quintiles are
measures that divide a given set of data into an approximately equal subdivision and are obtained
by the same procedure to that of the median. They are averages of position (non-central
tendency). Some of these are quartiles, deciles, and percentiles.

i. QUARTILES
Are values that divide the data set into approximately four equal parts, denoted by
. The first quartile ( ) is also called the lower quartile and the third quartile ( ) is
the upper quartile. The second quartile ( ) is the median.

Quartiles are values, which divide the ordered data into 4 equal parts. Hence there are three
quartiles

 The first quartile Q1 is the value that is the first quarter of the given ordered data.
 The second quartile Q2 is the value that divides the given ordered data into two equal
parts
 The third quartile Q3 is the value that is the third quarter of the given ordered data

Quartiles are the measurements that divide the series into 4 equal parts. The median is the 2nd
quartile. The first quartile (Q1) is the value of the item, which divides the lower half of the
distribution into two equal parts. The third quartile (Q3) is the value or the item that divided
3
the upper half of the distribution into two equal parts. That is it is the value of the   item
4
in the series

For raw (ungrouped) data, first, arrange the n observations in increasing order of
Magnitude. Then the ith quartile is given by
th
i 
Qi    n  1  Value of the ordered data
4 

12 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

 Quartiles for Individual series:

Let be n ordered observations. The ith quartile is the value of the item corresponding with the
[i(n+1)/4]th position, i = 1, 2, 3.

That is, after arranging the data in ascending order, Q1, Q2, & Q3 are, obtained by:

( ) , ( ) and ( )

 Quartiles for discrete data arranged in a frequency distribution:

Arranged in a frequency distribution this case also, we will follow the same procedure as the
median. That is, we construct the less than cumulative frequency distribution and apply the formula
of quartile for Individual series.

 Quartiles in continuous data:

For continuous data, use the following formula. Where i = 1, 2, 3, and L, w,fQi and CF are defined in the
same way as the median.

Q1 = L + ( ), Q2 = L + ( ) Q3 = L + ( )

The class under question is the one including (ixn/4)th value. That is, the class with the minimum
cumulative frequency greater than or equal to (ixn/4) th is the class of the ith quartile.

i. DECILES
Are values dividing the data approximately into ten equal parts, denoted by .
 Deciles for Individual series:

Let be n ordered observations. The ith decile is the value of the item corresponding with the
th
[i(n+1)/10] position, i = 1, 2, . . . ,9.

That is, after arranging the data in ascending order, D1, D2, . . . & D9 are, obtained by:

( ) , ( ) . . . and ( ) .

13 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

 Deciles for Discrete data arranged in a frequency distribution

Arranged in a frequency distribution this case also, we will follow the same procedure as the
median. That is, we construct the less than cumulative frequency distribution and apply the formula
of deciles for individual series.

 Deciles for continuous data: Apply the following formula and follow the procedures of
quartile for continuous data.

( ) i = 1, 2,...,9 . Then

Define the symbols in similar ways as we did in the case of quartiles for continuous data.
ii. PERCENTILES
Are values that divide the data approximately into one hundred equal parts, and denoted by

 Percentiles for Individual series:

Let be n ordered observations. The ith percentile is the value of the item
corresponding with the [i(n+1)/100]th position, i = 1, 2, . . . ,99.

That is, after arranging the data in ascending order, P1, P2, . . . & P99 is, obtained by:

( ) , ( ) . . . and ( ) .

 Percentiles for Discrete data arranged in a frequency distribution:

Arranged in a frequency distribution this case also, we will follow the same procedure as the
median. That is, we construct the less than cumulative frequency distribution and apply the formula
of percentile for individual series.

Percentiles for continuous data: Apply the following formula



( ), i = 1, 2.,.,99.

Define the symbols in similar ways as we did in the case of quartiles or deciles for continuous
data.

14 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

Interpretations
 is the value below which ( i × 25) percent of the observations in the series are found
(where i = 1, 2,3). For instance, means the value below which 75 percent of
observations in the given series are found
 is the value below which ( i ×10) percent of the observations in the series are found
(where i = 1, 2,...,9 ). For instance, is the value below which 40 percent of the values
are found in the series
 is the value below which i percent of the total observations are found (where i = 1, 2,
3...,99). For example, 60 percent of the observations in a given series are below .
Example: Calculate , for the following tables.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Solution: The given data is measured and is arranged in increasing order. So we need to
construct only the cumulative frequency table before calculating the required values.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Cum. Freq. 2 10 35 83 148 188 208 217 219
The total number of observations is 219 which is odd. Clearly then the median is 14 because
̃= = value = 110th value = 14

( ) =( ) = 55th value = 13

( ) =( ) = 110th value = 14 = ̃

( ) =( ) = 165th value = 15

( ) =( ) = 88th value = 14

( ) =( ) = 198th value = 16

( ) =( ) = 88th value = 14

( ) =( ) = 198th value = 16.

15 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

Example 3: Values of fecundity (rate of reproduction) of 50 Fish of a species of Fish is given


below. Based on the data find ,
Rate of 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 Total
Reproduction
f 3 11 7 4 15 0 7 3 50
Solution:- first find the class boundaries and cumulative frequency distributions.
Rate of 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 Total
reproduction
f 3 11 7 4 15 0 7 3 50
Cfd 3 14 21 25 40 40 47 50
th th
Q1 Measure of (n/4) value = 12.5 value which lies in group 10.5 – 20.5

Q1 = L + ( ) = 10.5 + = 19.1

D4 Measure of (4n/10)th value = 20th value which lies in group 20.5 – 30.5.

D4 = L + ( ) = 20.5 + = 29.1

P7 Measure of (7n/100)th value = 3.5th value which lies in group 10.5 – 20.5

P7 = L + ( ) = 10.5 + = 11.

3.MEASURES OF DISPERSION (VARIATION)

Introduction
The term dispersion is generally used in two senses. Firstly, dispersion refers to the variations of
the items among themselves. If the value of all the items of a series is the same, there will be no
variation among different items of a series. Secondly, dispersion refers to the variation of the
items around an average. If the difference between the value of items and the average is large,
the dispersion will be high and on the other hand, if the difference between the value of the items
and averaging is small, the dispersion will below. Thus, dispersion is defined as the scatteredness
or spreads of the individual items in a given series.

Measures of Variation have the following purposes:


 To judge the reliability of measures of central tendency
 To control variability itself
 To compare two or more groups of numbers in terms of their variability
 To make further statistical analysis.

16 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

The Variance, Standard Deviation, and Coefficient of Variation

1. Variance and Standard Deviation

Similar to mean deviation, the variance is also based on all observations in a set of data. But the
variance is the average of squared deviations from the mean. Recall that the sum of squared
deviations is minimum only when taken from the mean. Squared deviations are mathematically
manipulated than absolute deviations. Thus, if we averaged the squared deviations from the
mean and take the square root of the result (to compensate for the fact that the deviations were
squared), we obtain the standard deviation. This overcomes the limitation of the mean deviation.

a. Population Variance ( )
If we divide the squared variation by the number of values in the population, we get something
called the population variance. This variance is the "average squared deviation from the mean".
 For Ungrouped Data

[∑ ] , where is the population arithmetic mean and N

is the total number of observations in the population.

 For Discrete Data Arranged in FD and for Continuous Grouped Data



*∑ +

where is the population arithmetic mean, is the value or class mark of the ith class, is the
frequency of the ith class and N=∑

b. Sample Variance ( )
To derive sample variance, replace sample means in the position of the population mean and
drive for the value. However, one of the major uses of statistics is to estimate the corresponding
parameter. This formula has the problem that the estimated value isn't the same as the parameter.
To offset this, the sum of the squares of the deviations is divided by one less than the sample
size.
 For Ungrouped Data
∑ ̅
[∑ ̅ ]

where ̅ is the sample arithmetic mean and n is the total number of observations in the sample.

17 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

If the values xi have frequencies fi (i=1,2,…,m), then the sample variance is given by:

1 m
̅ ] or S   fi  xi  x 
2 2
∑ ̅
[∑
n  1 i 1

For Discrete Data arranged in FD and for Grouped Data

∑ ̅
[∑ ̅ ] where ̅ is the sample arithmetic mean, is the value or

class mark of the ith class, is the frequency of the ith class and n=∑ .
2. The Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square
root must be taken.
 Population Standard Deviation ( )
√ where is the population variance.
 Sample Standard Deviation ( S )
√ where is the sample variance.

Example: Find the sample variance and standard deviation for frequency distribution
of height in cms of students in a AU given below.

Heights in cms 150 152 154 156 158 160 162 164 166
Number of students 28 40 52 100 60 48 32 20 7

Solution: Compute for the below given values(fixi,xi2,fixi2):

xi fi fixi xi2 fixi2


150 28 4200 22500 630000
152 40 6080 23104 924160
154 52 8008 23716 1233232
156 100 15600 24336 2433600
158 60 9480 24964 1497840
160 48 7680 25600 1228800
162 32 5184 26244 839808
164 20 3280 26896 537920
166 7 1162 27556 192892

18 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

Sum 387 60674 224916 9518252


Thus, n=∑ ∑ ∑ ∑ .

[∑ ̅ ]

= [ ( ) ]

Example: Calculate the sample variance and standard deviation of the blood glucose
level, in milligrams per deciliter, for 60 patients shown below.

Class limit 55 – 63 64 – 72 73 – 81 82 – 90 91 – 99 100 – 108 109 –117

Frequency 9 5 12 17 7 6 4

Solution: In a continuous F.D., xi is the class mark representing the ith class.

Class limit xi fi f i xi 2
f i xi
55 – 63 59 9 531 31329
64 – 72 68 5 340 23120
73 – 81 77 12 924 71148
82 – 90 86 17 1462 125732
91 – 99 95 7 665 63175
100 – 108 104 6 624 64896

109 –117 113 4 452 51076


Total 60 4998 430476

Where, n=∑ ̅= ∑ , so that

[∑ ̅ ]= [ ]

√ = 15.48

19 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

3. Range
Range(R) difference between the largest (L) and the smallest value (S) in a distribution Thus
Range (R) = L – S

a. Coefficient of Range: It is a relative measure of the range. It is used in the


comparative study of the dispersion

In the case of continuous series Range is just the difference between the upper limit of the highest
class and the lower limit of the lowest class
Range: Evaluation
The range is very simple to understand and easy to calculate. However, it is not based on all the
observations of the distribution and is unduly affected by the extreme values. Any change in the
data not related to minimum and maximum values will not affect the range. It cannot be calculated
for open-ended frequency distribution.
Example: The amount spent (in Birr `) by the group of 10 students in the school canteen is as
follows: 110, 117, 129, 197, 190, 100, 100, 178, 255, 790.
Find the range and the co-efficient of the range.

Solution: R = L - S = 790 - 100 = ` 690

Example 2: Find the range and it’s co-efficient from the following data
Size 10 – 20 20 – 30 30 – 40 40 – 50 50 -100
Frequency 2 3 5 4 2
Solution: R = L – S = 100 – 10 = 90

b. Quartile Deviation
It is based on the lower quartile Q1 and the upper quartile Q3. The difference Q3 – Q1 is called
the inter-quartile range. The difference Q3 – Q1 divided by 2 is called semi-inter-quartile range or
the quartile deviation. Quartile deviation, also called semi-inter-quartile range, is half of the
difference between the upper and lower quartile. That is, half of the inter-quartile range. Its
formula is

20 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

c. Coefficient of Quartile Deviation


A relative measure of dispersion based on the quartile deviation is called the coefficient of
quartile deviation. It is pure number free of any units of measurement. It can be used for comparing
the dispersion in two or more than two sets of data. It is defined as

 Computation of Quartile Deviation of Ungrouped Data


Example: Find out the quartile deviation of daily wages (in `) of 7 persons is given
below:120,70,150,100,190,170,250
Solution:
Arranging the data in an ascending order we get 70, 100, 120, 150, 170, 190, 250
Here n = 7

Merits of QD

 It is well-defined, easy to compute, and simple to understand.


 It helps in studying the middle 50% item in the series.
 It is not affected by extreme items.
 It is useful in measuring variations in the case of open-ended distributions.

21 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

Demerits of QD

 It is not based on all the items (it ignores 50% items, i.e., the first 25% and the last
25%).
 It is greatly influenced by sampling fluctuations.
 It is not amenable to algebraic manipulations.

Example: The wheat production (in Kg) of 20 acres is given as: 1120, 1240,1320, 1040, 1080,
1200, 1440, 1360, 1680, 1730, 1785, 1342,1960, 1880, 1755,1720, 1600, 1470, 1750, and 1885.
Based this data find the quartile deviation and coefficient of quartile deviation.
Solution: Arrange the observations in ascending order: 1040, 1080, 1120, 1200,
1240, 1320, 1342, 1360, 1440, 1470, 1600, 1680, 1720, 1730, 1750, 1755, 1785, 1880, 1885,
1960.

= = Value of (15.75) th item


= Value of (5.25)th item = 15th item + 0.75(16th item – 15th item)

= 5th item + 0.25(6th item – 5th item) = 1750 + 0.75 (1755 – 1750)

Q1 = 1240 + 20 = 1260 Q3 = 1750+3.75 = 1753.75 kg


Q1 = 1240 + 20 = 1260 kg

b. Coefficient of Quartile Deviation (CQD)

22 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

 Computation of QD for a Frequency Distribution


Computation in case of Discrete Series:
Example: The Tax authority collected the following amount of tax from different firms in a
particular market.
Amount of Taxes (in 000) 10 11 12 13 14
No of Firms 3 12 18 12 3
cf 3 15 33 45 48
Here n = 7

 Computation of QD for a Continuous Series


Example 6: Calculate quartile deviation and coefficient of quartile deviation from the following
distribution:
Size 5–7 8 – 10 11 – 13 14 – 16 17 - 19

Frequency 14 24 38 20 4

Solution: Calculation of Quartile deviation and coefficient of quartile deviation


Weekly Wages No. of Cumulative
(in ‘000 `) Workers (f) Frequency(cf)
4.5 – 7.5 14 14
7.5 – 10.5 24 38
10.5 – 13.5 38 76
13.5 – 16.5 20 96
16.5 – 19.5 4 100
Σf = 100

( )

23 of 24 Statistics for Management I July 2021


Unit 3 Tujuba A. Central Tendency & Dispersion

( ) ( )

( ) ( )

24 of 24 Statistics for Management I July 2021

You might also like