5. Lecture Note 05_ Measures of Dispersion (2)
5. Lecture Note 05_ Measures of Dispersion (2)
Measures of Dispersion
The essential purpose of statistical averages is to summarize a large mass of data. These averages
serve to locate the ‘center’ of a distribution, but they do not reveal how the items or the
observations are spread out or scattered on each side of the center. This characteristic or property
of a distribution is commonly referred to as the ‘dispersion’, ‘scatter’ or ‘variation’.
It is just as important to measure this property of a distribution as to locate the central values. If
the dispersion is small, it indicates high uniformity of the observations in the distribution. Absence
of dispersion in the data indicates perfect uniformity. This situation arises when all observations in
the distribution are identical. If this were the case, description of any single observation would
suffice.
A measure of dispersion appears to serve two purposes: First, it is one of the most important
quantities used to characterize a frequency distribution. Second, it affords a basis of comparison
between two or more frequency distribution. The study of dispersion bears its importance from
the fact that various distributions may exactly have the same averages, but substantial differences
in variability.
The frequently used measure of dispersion are:
1. The range
2. The quartile deviation
3. The mean (or average) deviation
4. The variance
5. The standard deviation
The above measures sometimes classified as absolute measures. The measures are absolute in the
sense that they are expressed in the same statistical units in which the original data are presented
such as dollar, taka, meter, kilogram etc. when two sets of data are expressed in different units,
however, the absolute measures are not comparable. In that case the measures are referred as
relative measure. The relative measures are usually expressed in the form of coefficients and are
pure numbers, independent of the unit of measurements. The measures are:
1. Coefficient of range
2. Coefficient of quartile deviation
3. Coefficient of mean deviation
4. Coefficient of variation.
The Range
The simplest and the crudest measure of dispersion is the range. This is defined as the difference
between the smallest and largest values in the distribution. The symbol R is used for the range.
𝑅 = ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝑙𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒
For grouped data, the difference between the lower class-limit of the lowest class and the higher
class-limit of the highest class is considered to be the range.
Although the range is meaningful, it is of little use because of its marked instability, particularly
when the range is based on a small sample. Imagine, if there is one extreme value in a distribution,
Prepared by: Suman Biswas, Assistant Professor, Department of Statistics, Islamic University, Kushtia-7003 1
Measures of Dispersion
the range of the distribution will appear to be large, when in fact, removal of this value may reveal
an otherwise compact distribution with extremely low dispersion.
Example: A testing lab wishes to test two experimental brands of outdoor paint to see how long
each will last before fading. The testing lab makes 6 gallons of each paint to test. Since different
chemical agents are added to each group and only six cans are involved, these two groups
constitute two small populations. The results (in months) are shown.
Group A Group B
10 35
60 45
50 30
30 35
40 40
20 25
Prepared by: Suman Biswas, Lecturer, Department of Statistics, Islamic University, Kushtia-7003 2
Measures of Dispersion
The Mean Deviation
The mean deviation is an average of absolute deviations of individual observations from the
central value of a series.
If 𝑥1 , 𝑥2 , … … , 𝑥𝑛 from a sample of observations, the formula for computing the average or mean
deviation from arithmetic mean is
∑𝑛𝑖=1|𝑥𝑖 − 𝑥̅ | ∑𝑛𝑖=1|𝑑𝑖 |
𝑀𝐷(𝑥̅ ) = =
𝑛 𝑛
Where 𝑑𝑖 = 𝑥𝑖 − 𝑥̅ , which stands for the deviations of the individual observation from the mean;
and | | means that the signs of the deviations whether positive or negative, are ignored.
Example: Compute mean deviation from the mean using the data in the following table.
Physician No. of visit (𝒙𝒊 )
1 5
2 0
3 1
4 4
5 7
6 0
7 12
8 2
9 0
10 20
11 3
12 5
13 6
Total 65
Solution: To compute the average deviation from the arithmetic mean, follow the following steps:
(i) Compute the arithmetic mean, which in this case is
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑖𝑠𝑖𝑡𝑠 65
𝑥̅ = = =5
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝ℎ𝑦𝑠𝑖𝑐𝑖𝑎𝑛𝑠 13
(ii) Obtain the absolute deviation 𝑑𝑖 of the 𝑥𝑖 value from the arithmetic mean 𝑥̅ .
Physician No. of visit (𝒙𝒊 ) |𝑑𝑖 | = |𝑥𝑖 − 𝑥̅ |
1 5 0
2 0 5
3 1 4
4 4 1
5 7 2
6 0 5
7 12 7
8 2 3
9 0 5
10 20 15
11 3 2
12 5 0
13 6 1
Total 65 50
(iii) Sum the deviation to compute ∑𝑛𝑖=1|𝑑𝑖 |.
(iv) Divide the quantity ∑𝑛𝑖=1|𝑑𝑖 | by 𝑛.
(v) Thus, the mean deviation from the arithmetic mean is:
∑𝑛𝑖=1|𝑑𝑖 | 50
𝑀𝐷(𝑥̅ ) = = = 3.85
𝑛 13
Prepared by: Suman Biswas, Lecturer, Department of Statistics, Islamic University, Kushtia-7003 3
Measures of Dispersion
If a grouped frequency distribution is constructed, as is usually done with large samples, the
mean deviation is
∑𝑘𝑖=1 𝑓𝑖 |𝑥𝑖 − 𝑥̅ |
𝑀𝐷(𝑥̅ ) =
𝑛
Where 𝑀𝐷(𝑥̅ ) = 𝑀𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑎𝑏𝑜𝑢𝑡 𝑚𝑒𝑎𝑛
𝑘 = number of classes
𝑥𝑖 =midpoint of the 𝑖 𝑡ℎ class
𝑓𝑖 =frequency of the 𝑖 𝑡ℎ class
𝑘
𝑛 = ∑ 𝑓𝑖
𝑖=1
Prepared by: Suman Biswas, Lecturer, Department of Statistics, Islamic University, Kushtia-7003 4
Measures of Dispersion
(iii) Sum the deviation to compute ∑𝑛𝑖=1 𝑓𝑖 |𝑑𝑖 |.
(iv) Divide the quantity ∑𝑛𝑖=1 𝑓𝑖 |𝑑𝑖 | by 𝑛 = ∑ 𝑓𝑖 .
(v) Thus, the mean deviation from the arithmetic mean is:
∑𝑛𝑖=1 𝑓𝑖 |𝑑𝑖 | 556.4
𝑀𝐷(𝑥̅ ) = = = 11.1
∑ 𝑓𝑖 50
Usually, the mean deviation is computed as the arithmetic mean of the absolute values of the
deviations from the typical value of a distribution. The typical value may be the arithmetic mean,
median, mode or even an arbitrary value. The median is sometimes preferred as a typical value in
computing the average deviation, because the sum of the absolute values of the deviations from
the median is smaller than any other value. In practice, however, the arithmetic mean is generally
used. If the distribution is symmetrical, the mean is identical with the median and the same average
deviation is obtained.
N.B. To calculate mean deviation from median/mode, compute median/mode of the distribution
in usual manner and replace the mean by the median/mode value. The other steps remain the
same as in the case of calculating mean deviation from mean.
The Variance and Standard Deviation (Ungrouped data)
Instead of ignoring the signs of deviations from the mean as in the computation of an average
deviation, they may each be squared and then the results are added. The sum of squares can be
regarded as a measure of the total dispersion of the distribution. By dividing the sum of squares by
n, we obtain the average of the squares of deviations, a measure, called variance, of the
distribution. If the observations are all from a population, the resulting variance is referred as the
population variance. As a formula, the variance of population observations 𝑥1 , 𝑥2 , … … , 𝑥𝑁
commonly designed as 𝜎 2 is
∑(𝑥𝑖 − 𝜇)2
𝜎2 =
𝑁
Where 𝜇 is the mean of all the observations and 𝑁 is the total number of observations in the
population.
Because of the operation of squaring, the variance is expressed in square units (e.g., 𝑘𝑚2 , 𝑘𝑔2 , 𝑡𝑎𝑘𝑎2
etc.) and not the original unit (e.g., 𝑘𝑚, 𝑘𝑔, 𝑡𝑎𝑘𝑎 𝑒𝑡𝑐. ). It is therefore necessary to extract the
positive square root to restore the original unit. The measure of dispersion thus obtained is called
the population standard deviation and is usually denoted by 𝜎. Thus
∑(𝑥𝑖 − 𝜇)2
𝜎=√ = √𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑥)
𝑁
Thus, by definition, the standard deviation is the positive square root of the mean-deviation of the
observation from their arithmetic mean.
In many statistical applications, we deal with a sample rather that a population. Thus, while a set
of population observation yields a population variance, a set of sample observations will yield a
sample variance. This if 𝑥1 , 𝑥2 , … … , 𝑥𝑛 represent a set of sample observations of size n, then the
sample variance, denote by 𝑠 2 , is expressed as
2
∑(𝑥𝑖 − 𝑥̅ )2
𝑠 =
𝑛−1
Prepared by: Suman Biswas, Lecturer, Department of Statistics, Islamic University, Kushtia-7003 5
Measures of Dispersion
Where 𝑥̅ is the sample mean of all the sample observations and 𝑛 is the total number of
observations in the sample.
Which also can be computed from the formula as:
𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )2
𝑠2 =
𝑛(𝑛 − 1)
The square root of the sample variance 𝑠 2 is the sample standard deviation, usually denoted by 𝑠.
Thus
∑(𝑥𝑖 − 𝑥̅ )2
𝑠=√
𝑛−1
Or
𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )2
𝑠 = √𝑠 2 = √
𝑛(𝑛 − 1)
Example: The number of a 10 household members are given in the following table:
Family # 1 2 3 4 5 6 7 8 9 10
Size (𝒙𝒊 ) 3 3 4 4 5 5 6 6 7 7
Compute the sample variance and standard deviation.
Solution: The quantities to be calculated for computing the variance and standard are shown in
the following table:
Family # Size (𝒙𝒊 ) (𝒙𝒊 − 𝒙̅) ̅) 𝟐
(𝒙𝒊 − 𝒙 𝒙𝟐𝒊
1 3 -2 4 9
2 3 -2 4 9
3 4 -1 1 16
4 4 -1 1 16
5 5 0 0 25
6 5 0 0 25
7 6 1 1 36
8 6 1 1 36
9 7 2 4 49
10 7 2 4 49
Total 50 - 20 270
∑ 𝑥𝑖 50
Here, 𝑥̅ = = 10 = 5
𝑛
Prepared by: Suman Biswas, Lecturer, Department of Statistics, Islamic University, Kushtia-7003 6
Measures of Dispersion
ungrouped data. With divisors 𝑁 and 𝑛 − 1, the formula for computing the variance for
population and sample are respectively,
∑ 𝑓𝑖 (𝑥𝑖 −𝜇)2 ∑ 𝑓𝑖 (𝑥𝑖 −𝑥̅ )2
𝜎2 = and 𝑠 2 =
𝑁 𝑛−1
Example: Compute the variance and standard deviation for the following grouped data:
𝒙𝒊 𝒇𝒊
3 2
5 3
7 2
8 2
9 1
Total 10
Solution: The quantities to be calculated for computing the variance and standard deviation of the
data are given below:
𝒙𝒊 𝒇𝒊 𝒇 𝒊 𝒙𝒊 (𝒙𝒊 − 𝒙̅) ̅) 𝟐
(𝒙𝒊 − 𝒙 ̅ )𝟐
𝒇𝒊 ( 𝒙𝒊 − 𝒙
3 2 6 -3 9 18
5 3 15 -1 1 3
7 2 14 1 1 2
8 2 16 2 4 8
9 1 9 3 9 9
Total 10 60 - - 40
∑ 𝑓 𝑖 𝑥𝑖 60
Here, 𝑥̅ = ∑ 𝑓𝑖
= 10 = 6
∑ 𝑓𝑖 (𝑥𝑖 −𝑥̅ )2 40
So the variance is 𝑠 2 = = = 4.44.
𝑛−1 9
2
𝑛 ∑ 𝑓𝑖 𝑥𝑖2 − (∑ 𝑓𝑖 𝑥𝑖 )2 32(23805) − (867)2
𝑠 = = = 10.1
𝑛(𝑛 − 1) 32 × 31
∴ 𝑠 = √𝑠 2 = √10.1 = 3.17
Hence, mean=27.1 mm, 𝑠 2 = 10.1 𝑚𝑚2 and s=3.17 mm.
Example 3.24 [From Bluman’s Book: Chapter 03]
Properties of variance
(i) Changes in origin does not have any effect on their variance (i.e., 𝒔𝟐𝒙 = 𝒔𝟐𝒚 )
Prepared by: Suman Biswas, Lecturer, Department of Statistics, Islamic University, Kushtia-7003 8
Measures of Dispersion
Uses of the Variance and Standard Deviation
1. Variances and standard deviations can be used to determine the spread of the data. If the
variance or standard deviation is large, the data are more dispersed. This information is useful
in comparing two (or more) data sets to determine which is more (most) variable.
2. The measures of variance and standard deviation are used to determine the consistency of a
variable. For example, in the manufacture of fittings, such as nuts and bolts, the variation in the
diameters must be small, or the parts will not fit together.
3. The variance and standard deviation are used to determine the number of data values that fall
within a specified interval in a distribution. For example, Chebyshev’s theorem (explained later)
shows that, for any distribution, at least 75% of the data values will fall within 2 standard
deviations of the mean.
4. Finally, the variance and standard deviation are used quite often in inferential statistics.
Thus, a value 33 per cent (say) for CV implies that the standard deviation of the sample value is
33 percent of the mean of the same distribution.
As an illustration of the use of the CV as descriptive statistics, let us suppose that we wish to obtain
some insight into whether height is more variable than the weight in the same population. For this
purpose, for instance, we have the following data obtained from 150 children in a community.
Height Weight
Mean 40 inch 10 kg
SD 5 inch 2 kg
CV 0.125 0.200
Examination of the respective standard deviations does not tell us in any meaningful way which
characteristic has more variability than the other, because they are in different units. If we now
compute the coefficient of variation, the results become comparable, because coefficient of
variation is a dimensionless.
Thus, since the coefficient of variation for weight is greater than that of the height, we would tend
to conclude that weight has more variability that height in the population.
Again, if two variables in the same population are measured in the same unit, the standard deviation
may fail to provide a correct picture of their relative variability. Consider that the blood pressure
Prepared by: Suman Biswas, Lecturer, Department of Statistics, Islamic University, Kushtia-7003 9
Measures of Dispersion
of a group of patients were measured at two levels: systolic and diastolic, both being measured in
the same unit. The results were as follows:
Systolic Diastolic
Mean 130 mm Hg 60 mm Hg
SD 15 mm Hg 8 mm Hg
CV 0.115 0.133
As the data show, the systolic pressure is more variable (sd=15 mm Hg) than the diastolic pressure
(sd=8 mm Hg). However, in relative terms, as measured by the CV, the diastolic pressure has the
greater variability.
This shows that the relative variability is of more concern than absolute variation—hence the
importance of the coefficient of variation.
The coefficient of variation may be helpful in comparing the relative variation in several data sets
that have different means and different standard deviations.
Example 3.25, 3.26 [From Bluman’s Book: Chapter 03]
Coefficient of Range
The coefficient of range is a relative measure corresponding to range and is obtained by the
following formula:
𝐿 − 𝑆
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒 = × 100
𝐿+𝑆
Where 𝐿 𝑎𝑛𝑑 𝑆 are respectively the largest and the smallest observations in the data set. The
coefficient of range is rarely used as a measure of dispersion because of its inherent difficulties in
interpretation.
Coefficient of Mean Deviation
The third relative measure is the coefficient of mean deviation. As the mean deviation can be
computed from mean, median, mode or from any arbitrary value, a general formula for computing
coefficient of mean deviation may be put as follows:
𝑀𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑜𝑚 𝐴 𝑀𝐷(𝐴)
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = × 100 = × 100
𝐴 𝐴
Where 𝐴 is the mean, median, mode or any other arbitrary value.
Coefficient of Quartile Deviation
The coefficient of quartile deviation is computed from the first and the third quartiles using the
following formula:
𝑄3 − 𝑄1
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = × 100
𝑄3 + 𝑄1
Comparing the Measures of Dispersion
Like the measures of averages, a measure of dispersion should also satisfy certain criteria in order
to be reckoned as an ideal measure. From this point of view, a measure of dispersion should be
• Rigidly defined
• Easy to comprehend
• Based on all the observations
Prepared by: Suman Biswas, Lecturer, Department of Statistics, Islamic University, Kushtia-7003 10
Measures of Dispersion
• Affected less due to sampling fluctuation
• Less affected by extreme values and
• Amenable to algebraic treatment
A brief overview of the advantages, disadvantages and limitations of these measures are discussed
below:
The range is easy to compute and is a common way to describe dispersion. It is especially useful in
situations where the purpose of investigation is only to find out the extent of extreme variations.
The range, however, has certain disadvantages and drawbacks that tend to limit its usefulness as a
measure of variability. Since it depends solely on the highest and lowest values, it is highly sensitive
to the presence of unusual and extreme values in the series. Furthermore, the range does not
provide measurement of dispersion of the items relative to the central value. It tends to increase
as the size of the sample increases. Moreover, the range can not be used meaningfully with nominal
or ordinal data. It is restricted to only to interval data where it is meaningful to talk about the
largest and the smallest values. As it is based on only two terminal observations, it is not suitable
for algebraic treatment.
The average or mean deviation possesses many of the desirable properties of an ideal measure
indicated at the outset. It takes into account each and every item in the distribution and shows the
scatter of the items around the measure of central tendency. It is found that more than halves of
the observations are concentrated within one unit of average deviation around the mean 𝑥̅ . The
chief advantage is that its knowledge helps us to understand the standard deviation, which is one
of the most important measures of dispersion.
One of the drawbacks of the average deviation is the ambiguity about the measure of central
tendency to be used for its computation. In order to avoid confusion, it is necessary to state clearly
whether the mean or the median is used in computing the average deviation.
Because of its high degree of accuracy and precision, standard deviation is the most prominently
used measure of dispersion. It is based on all the observations, highly amenable to further algebraic
treatment and is considerably less affected due to sampling fluctuations.
The quartile deviation has a special utility in measuring variation in the case of open-end
distribution. It has an advantage that it is less affected by extreme values in the data set. The chief
disadvantage is that it ignores 50 percent of its observations in the computation, 25 percent from
the upper tail and 25 percent from the lower tail. Furthermore, no algebraic manipulation is
possible with quartile deviation. It is also less affected by sampling variability.
The coefficient of variation is a dimensionless measure and because of this, it is regarded as the
most commonly used measure of relative variation.
Prepared by: Suman Biswas, Lecturer, Department of Statistics, Islamic University, Kushtia-7003 11