0% found this document useful (0 votes)
11 views

Charpter 5 - Descriptive Analysis

Uploaded by

nadiope mark
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Charpter 5 - Descriptive Analysis

Uploaded by

nadiope mark
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 88

Chapter 5:

Descriptive analysis using


measures of central tendency
and measures of dispersion
Revision question
The performance of a sample BBA 2.2 students in Business Statistics is given below;
24 12 24 10 11 34 10 11 92 10
21 58 10 35 82 67 17 56 12 21
62 12 11 44 87 12 15 13 25 79
28 96 86 17 33 53 48 12 63 35
14 16 16 82 62 15 99 15 16 91
98 17 14 19 97 48 19 18 47 23
a) Construct a frequency distribution table for the following data
b) Construct a bar diagram and interpret your results

c) Construct a line graph diagram and interpret your results


d) Construct a pie chart and interpret your results
e) Calculate the arithmetic mean, GM, HM, median, mode, range,
Revision question
Calculate the arithmetic mean, GM, HM, median, mode,
range, interquartile range, standard deviation, coeffient
of skewness and coefficeint of kurtosis and

interprete your results


Measures of Central Tendency
Histogram
Histogram
Continuous Data

No segmentation of data into groups


Frequency polygon
Polygon

20

Frequency

10

11.5 21.5 31.5 41.5 51.5 61.5 71.5


Age
Measures of Central Tendency - Introduction
• In the previous chapter we discussed how to organize raw data in a tabular or graphical form for inspection.
For example looking at a histogram, we often realize that the data tend to cluster around some central value
and that as we move away from the central value in either direction, the frequency of observations tends to
decrease. Thus it seams reasonable to use the central value as the typical value or the value which
summarises the data.
• The measure of the tendency of the data to concentrate at certain values usually somewhere around the
centre of the distribution is called the measure of central tendency or averages. The average is the typical
value around which other figures congregate.
• There are several different kinds of averages each with certain characteristics, advantages and disadvantages.
• However the most frequently encountered types of averages are;

1.Computed averages
a) The arithmetic mean
b) The geometric mean
c) The harmonic mean
2. Positional averages
a) The median
b) The mode
Averages
1.Computed averages
a)The arithmetic mean
b)The geometric mean
c)The harmonic mean
Averages

1.Positional averages
a)The median
b)The mode
Population and sample sizes
• If the population data are involved, we let denote
the number of observations and the observations
themselves are labeled as
• With the sample data, is used to denote the
number of observations which are labeled as
The Arithmetic mean
• The arithmetic mean or simply the mean is the most commonly used measure of central tendency.
• In everyday use, however this term is erroneously regarded as synonymous with the term average. But the mean is just one of
the many averages.
Ungrouped data
The mean for ungrouped data is computed by adding all the numerical observations and dividing the sum by the number of
observations in a given set.
• Suppose there are population observations with numerical values denoted by say . The population mean which is customarily
denoted by the Greek letter (read as ) is given as

• For a sample of n observations , the sample mean denoted by (read as is given by


Ungrouped data - Example

The following are the lengths (in cm) of a sample of six garment blanks chosen at random from
a large batch of similar blanks: 54.5, 55.0, 55.7, 51.8, 54.2, 52.4.
What is the mean length of the sample of garments?
Grouped data
• Suppose that we have data grouped into classes, with frequencies . Let the mid points (class
mark) of these classes be , respectively.
• For a population of observations, the mean is estimated as

• For a sample of observations, the mean is estimated as


Grouped mean Example
The table below shows age group of 50 MOHA permanent workers.
Calculate the arithmetic mean

Class limits Frequency, f


42-48 8
49-55 8
56-62 13
63-69 7
70-76 6
77-83 5
84-90 3
Solution
Class limits Frequency, f Class mark, m fm
42-48 8 45 360
49-55 8 52 416
56-62 13 59 767
63-69 7 66 462
70-76 6 73 438
77-83 5 80 400
84-90 3 87 261
50 3104

μ= (360+ 416+767+ 462+ 438+ 400 + 261)/50


μ= 3104/50
μ=62.08
Exercises for grouped data
Compute the arithmetic mean, Geometric mean, Harmonic mean, median, mode

Values Frequency
140-150 17
151-160 29
161-170 42
171-180 72
181-190 84
191-200 107
201-210 49
211-220 34
221-230 31
231-240 16
241-250 12
The geometric mean
While the arithmetic mean is obtained by defining the sum of the set of values and then dividing by
say the geometric mean is the root of the product of a set of values. If we let symbolize the
geometric mean, then
( 𝑥 1 ∗ 𝑥2 ∗ 𝑥 3 ∗ … … … … . ∗ 𝑥 𝑛 )
𝐺𝑀 =
𝑛
The geometric mean is used mainly for averaging series of ratios or percentages. Geometric mean
cannot be computed if one of the values in the data set is negative and it is zero.
Ungrouped data
It is much easier to work in logarithms when computing geometric mean. The formula is;
From logarithms
Geometric mean - Example

Find the GM of the following values

Solution

log 300 2.47712


log 290 2.4624
log 350 2.54407
log 300 2.47712
Total 9.96071

Read obout Geometric Mean for Frouped Data


The harmonic mean
• Harmonic mean is often used for computing the average speeds. To compute harmonic mean denoted by say we
• Add the reciprocal of the values then
• Divide the number of the observations by the sum, that is

𝑛
𝐻𝑀 =
1 1 1 1
+ + +… … … +
𝑥1 𝑥2 𝑥3 𝑥𝑛
Ungrouped data
• Find the harmonic mean of
Revision question
Find the arithmetic mean, geometric mean, and harmonic mean of the following set of values of X
Exercises for grouped data
Compute the Arithmetic mean, Geometric mean, harmonic mean

Values Frequency
140-150 17
151-160 29
161-170 42
171-180 72
181-190 84
191-200 107
201-210 49
211-220 34
221-230 31
231-240 16
241-250 12
The median
The median conveys the notion of being the middle most value, dividing the distribution into two halves. Exactly 50% of the values
will lie on each side of the median. The median is said to be a positional average since it is located rather than computed.

Ungrouped data
 The median of a set of observations is defined as the middle value when the observations are arranged in order of magnitudes
from the smallest to the largest or vice versa.
 Let be ordered observations.
 Arrange the data in ascending order.
 Determine n, the number of elements in the set. Is n “odd” or “even”
 Find the value of the median by either getting the middle value (odd) or adding the two middle values and dividing the sum by
two.

Example
Suppose we want to find the median expenditures of customers during the Christmas eve shopping at Target Supermarket, served by
the shop attendant at counter No. TS01. The following were her records (in Australian Dollars)
350

 First arrange the data in ascending or descending order of magnitude.


 Determine n, the number of elements in the set. Is n “odd” or “even”
 Find the value of the median by either getting the middle value (odd) or adding the two middle values and dividing the sum by
two.
 In ascending order, we get
 ,350 ===== 135
Revision question
Find the median of the following numbers.
1. 6, 5, 2, 8, 9, 4
2. 2, 1, 8, 3, 5
Grouped data
For the data grouped into a frequency distribution table, the position of the median must be approximated since the
values of the individual data elements are no longer specified. Since there are values in the data set, by common practice,
we specify that the median corresponds to item.
The first step is to find the median class, that is the class which contains the middle or the item. We have to cumulate
frequencies until we reach the class for which the cumulative frequency is equal to or greater than . The value of the
median within that class is found by a process of interpolation. Thus, the formula for median becomes;

Where
is the lower-class boundary of the median class
is the cumulative frequency of the class interval preceding the median class
is the frequency of the median class
is the class size/width of the median class
Grouped data:

Compute the median for the following distribution.

Grade Frequency

40-49 5

50-59 18

60-69 27

70-79 15

80-89 6
Solution

Grade Class boundary Frequency Cumulative frequency

40-49 39.5-49.5 5 5

50-59 49.5-59.5 18 23

60-69 59.5-69.5 27 50

70-79 69.5-79.5 15 65

80-89 79.5-89.5 6 71
Solution

Since
The median class is
From the formula of median

,,
Then applying the Formula, we get :
Exercises for grouped data

Compute the Median

Values Frequency
140-150 17
151-160 29
161-170 42
171-180 72
181-190 84
191-200 107
201-210 49
211-220 34
221-230 31
231-240 16
241-250 12
The mode
• The word mode in French means “fashionable” and in the context of a frequency distribution, it means the
most common value.
• The mode therefore is defined as the value that occurs more frequently than any other in the data set.


A set of data may have a single mode, in which case it is said to be

unimodal, it may have two modes which makes it

bimodal, or it may have several modes and be called

multimodal.
Modes
Ungrouped data
Example
Find the mode for the following data sets;

Solution
a) Since is the value that occurs most often ( times) is the mode.
b) This set of values has two modes, . They both occur four times. It is an example of a bimodal
case.
c) This set of values has no mode because there is no one particular value that occurs more often
than any other
Mode for Grouped data
In grouped data:
• First, determine the model class, that is, the class with the highest frequency.
• Secondly, estimate the mode using the following formula;

• Where;
• = The lower-class boundary of the modal class
• =
• =
• = The size of the modal class
• = Frequency of the class preceding the modal class
• = Frequency of the class succeeding the modal class
• = Frequency of the modal class.
Example of the mode

Calculate the modal age for the age distribution of 228 patients below

Class interval Number of women


15-19 6
20-24 19
25-29 50
30-34 57
35-39 48
40-44 27
45-49 21
Total 228
Solution Grouped Mode

Model group = 30-34


L = 29.5
= 57

= 57- 50 = 7
= 57 – 48 = 9
W = (30-34)+1 = 5

There for the Modal average

• Mode = 31.7
Exercises for grouped data
Compute the Mode

Values Frequency
140-150 17
151-160 29
161-170 42
171-180 72
181-190 84
191-200 107
201-210 49
211-220 34
221-230 31
231-240 16
241-250 12
Relationship between the mean, median and mode
For a symmetric frequency distribution which is bell shaped, the mean, median and mode coincide.
Relationship between the mean, median and mode
• But for skewed distribution with a peak and a long tail, these measures do not coincide.
• Starting from the peak of the distribution and moving towards the longer tail, they appear in the following
order : mode, median and mean.
Certain numerical relationships exist among the averages
 For any series, except one whose observations are of identical value, the arithmetic mean is
always greater than the geometric mean which in turn is greater than the harmonic mean.
 For a symmetrical and unimodal distribution

Mean > Median > Mode

For moderately skewedness distributions, the relationship

between the mean, median and mode can be expressed algebraically as

 Of all the three, the mean is the most useful measure of the central tendency. It always exists, is
unique, reliable and takes into account all observations and lastly, it is easy to manipulate
Characteristics of measures of central tendency
The arithmetic mean
 The arithmetic mean uses all observations in the data set, and it is therefore affected by extreme values
particularly if the extreme values fall to the same side of the distribution. When this occurs, the mean may be less
representative of the set than any of the other averages.
 The arithmetic mean is unique, and it is always determined if the individual values of the variables are available.
 As a computed average, the arithmetic mean tends itself to further algebraic manipulations.
 The arithmetic mean is typical in the sense that it is the centre of gravity, balancing the values on either side of it.
Geometric mean
 It is affected by all items in the series, but it gives less weight to extremely high values than does the arithmetic mean.
 It is strictly determined for positive values but cannot be used to average negative values or values with a zero term.
 It is adapted to average rates of change, ratios between measures and the ratios of price change.
 It is also capable of algebraic manipulation.
Harmonic mean
 It is also affected by all observations. However, since the reciprocals are averaged, it gives more weight to the smaller
values. This is just the opposite of the mean.
 It is capable of algebraic manipulation.
 It is adapted to average time rates and price movements. It is also useful when the observations are expressed inversely to
what is required in the average.
Characteristics of measures of central tendency
Median
 It is a positional average that is affected by the number of items but not by the value of each item.
 The extreme deviations from the central part of the distribution affect the median much less than the case for the
mean.
 The median strictly speaking is indeterminate for an even number of cases, although by general agreement it is the
mean of the two central values of the data set.
 The median unlike the mean does not lend itself to algebraic treatment.
 It is meaningless for completely qualitative data but meaningful as long as data can be ranked, such as grades A, B,
C,D and E. The median is the most suitable average to describe observations that are scored rather than computed
or measured.
Mode
 The modal value is determined by the items at the point of greatest concentration and is not affected by the
remaining values of the data set.
 The true mode is difficult to compute but readily located from the frequency distribution.
 The mode does not lend itself to algebraic manipulation.
 The mode is unaffected by extreme values and may not exist for a given data set.
 The mode is meaningless unless the distribution includes a large number of observations and possesses a distinct
central tendency.
Measures of Dispersion
Revision question
Find the arithmetic mean, geometric mean, and harmonic mean of the following set of values of X
Exercises for grouped data
Compute the mean absolute deviation

Values Frequency
140-150 17
151-160 29
161-170 42
171-180 72
181-190 84
191-200 107
201-210 49
211-220 34
221-230 31
231-240 16
241-250 12
Measures of Dispersion,
Histogram
Histogram
Continuous Data

No segmentation of data into groups


Frequency polygon
Polygon

20

Frequency

10

11.5 21.5 31.5 41.5 51.5 61.5 71.5


Age
Measures of Dispersion, Skewness and Kurtosis
Measures of Dispersion, Skewness and Kurtosis - Introduction

• An average is a single value adapted to represent the central tendency of a series, is indeed a very useful and
powerful measure.
• However, the use of a single value to describe a distribution conceals many important facts. Decision making
often demands the revelation of these concealed characteristics of the distribution.
• For one thing, not all observations in a series are of the same value as the derived average. Almost without
exception, the items included in a distribution always depart from the central value, although the degree of
departure varies from one series to another.
• Thus, a measure of the dispersion or variation is needed in order to give a more complete description of the
chief characteristics of a distribution or to make possible effective comparison of two or more distributions.
• For example, a company manufacturing say electric bulbs will be interested not only in the average life of the
bulbs but also how consistent the performance of the bulbs is.
• That is, a mean life of 1000 hours will not be satisfactory even if this is realised if, in fact, there is a very high
proportion of bulbs that only last up to 300 hours.
Measures of Dispersion, Skewness and Kurtosis - Introduction
• A second consideration is that distribution shapes differ from one data set to another. Some a symmetrical,
others are not. Hence, to describe a distribution we also need a measure of symmetry or asymmetry - a
description of the balance or lack of balance on both sides of the central tendency. The descriptive statistic for
this characteristic is called the measure of skewness.

• Finally, there are differences in the of the degree of peakedness among the different distributions. This
property is called kurtosis. To measure kurtosis is to define the pattern of scatter of observations among the
classes near the central value, as compared with scatter of observations near both ends of the distribution.

• As in the central tendency, there are many different summary measures for dispersion, namely;
1. The Range
2. The Interquartile range
3. The Index of dispersion
4. The mean absolute deviation
5. The variance and standard deviation
6. The coefficient of variation
The range
The range is defined as the difference between the largest and the smallest values in the data set. Let
stand for the range.
Then where are the largest and smallest values of respectively.
In grouped data case, the range is the difference between the lower-class boundary of the lowest class
interval and the upper-class boundary of the highest-class interval.
The range is the simplest measure of dispersion and its easy to determine.
Its chief disadvantage is that it uses only two values in the data set and therefore ignores the way the
remaining data values vary in the data set.

Example
Find the range of the following values

Now
The range
• The range though a simple may be used quite fruitfully as a measure of dispersion for many
purposes. It is perhaps most useful when one wants to know only the extent of the extreme
dispersion under ‘ordinary’ conditions. If either the largest or smallest item is unusual, the
range reveals nothing about the ordinary distribution of the items. The range can also be
used to advantage when the same sample size is used repeatedly, as is often true in
manufacturing quality control. In this case, comparisons between ranges are not affected by
differences in sample size, so it is easy to see whether dispersion is getting worse, staying the
same, or getting better. Finally, the range is the only measure of dispersion that people
without statistical training can immediately understand.
• The range however has some serious defects. It can be unduly influenced by one unusual
value in the sample. Also, the range is in no way a measure of the scatter of the intervening
items relative to the typical value. Finally, the range is highly sensitive to the size of a sample.
The range tends to increase, though not proportionately, as the size of the sample increases.
For this reason we cannot interpret the range properly without knowing the number of
observations included in the data.
Interquartile range
• The range as noted above is subject to the chance of erratic changes in the extreme items and it fails to
take into account the scatter within the range. To overcome these limitations, at least partially, one of
the measures suggested is called interquartile range.

• Fractiles, percentiles, quartiles, and deciles.


• The idea of fractiles is concerned with division of the values of a variable into equal fractions and is
related to the cumulative frequency of the values of a variable.
• If we denote a cumulative proportion for a sample as S, then the fractile is the lowest observed value
such that the cumulative proportion corresponding to that value is at least equal to
• Fractiles are often expressed in terms of percentages, rather than decimal fractions. When we do this, a
fractile is called a percentile.
• In terms of percentiles the whole range of values of a variable is divided into 100 equal parts with 99
percentile values.
• For example, the fractile may be referred to as the percentile. The percentile of a variable say are
denoted as .
Interquartile range

• There are two sets of fractiles or percentiles that are used most frequently.
• The first is
• These three fractiles divide the data the whole distribution into four equal parts or four quarters, thus they are
often referred to as quartiles and are denoted as . Here is the percentile and is defined as that value of a
variable for which of the values are less than or equal to it and of the values of the variables are greater than
or equal to it. are divided in a similar manner.
• The second set of frequently used fractiles is
• These nine fractiles divide the range of the variable into equal parts.
• Thus, they are referred to as deciles and are denoted as . Obviously, the first decile, equal to and it is the value
of the variable such that of the values are less than or equal to it and of the values are greater or equal to it.
The other deciles can be defined in a similar manner.
Interquartile range
As mentioned earlier the range is often affected by extreme values. One way of getting rid of this problem is to cut off the
top and bottom quarters by considering only the quartiles. We then get the range of what is left. This range is what we call
the Interquartile range.
To determine the Interquartile range, we begin by locating the two quartiles and . The process of finding quartile values
parallels that of locating the median. To obtain quartiles from an array of a sample we simply first locate the rank of items by
observing that;
= is the item and
is the item.
We then read off these values from the array. If fractional values occur, we make a linear interpolation between the values
corresponding to the two observations within which the fraction falls. The interquartile range denoted by is simply the
difference between and , that is

A lower interquartile range indicates a small variation among the central of the items and a high interquartile range value
means that the variation among the central of the items is large.
Interquartile range
Algebraic interpolations for quartile values from grouped data can be made by the same
principle or procedure as that for the median. Since the median is identical with the second
quartile.
+
+
Where and are the lower class boundaries of the first and third quartile, respectively; is the
total number of observations; is the width of the class containing the quartile of interest; and
are the first and third quartile class frequencies respectively; and are the cumulative frequencies
before the first quartile class and the third quartile class, respectively
Example
Class Class mark Frequency CF
2-3 2.5 2 2
4-5 4.5 3 5
6-7 6.5 5 10
8-9 8.5 9 19
10-11 10.5 7 26
12-13 12.5 4 30
14-15 14.5 1 31
16-17 16.5 1 32
Total 32

The range is give as


Substitute the values in the formulars for and and compute the estimated first and third quartiles
+ = 6.7
+
Thus, the interquartile range is
Interquartile range
• A similar procedure could be used to compute other percentiles or other fractiles and other ranges,
such as the interdecile range, could be used.
• Compared to the range, the interquartile range is influenced less by the extreme small or large
values, because these are computed from intermediate values of a distribution.
• However, like the range, the interquartile range is based on the values of only two observations
without paying any attention to the other values of the variable.
• Thus, the measure is still quiet crude in measuring dispersion by ignoring the general distribution
pattern of variables.
• Furthermore, since dispersion refers to the tendency of the individual values to deviate from the
central tendency, a good dispersion measure should somehow be related to an appropriate average.
• This measure, loke the range, does not take into account this.
Index of dispersion

The index of dispersion is suggested as a measure of variation for nominal or ordinal variables
suggested by Hammond K. R. and Householder J.E (1962) in their text Knopf, New York, pp. 136-142.
The index is simply a ratio of the number of different pairs that could be made out of out of the data
at hand, compared with the maximum number of unique pairings that could be created if cases were
evenly spread over all available categories. Denoted by , the index of dispersion is expressed as

Where is the number of scores, is the number of categories of the variable into which data might be
classified, is the frequency of the cases in the category. If all scores were in a single category of a
variable that has several possible categories, then there is a maximum concentration or minimum
variability, and would equal to On the other hand, if cases were evenly distributed among the
possible categories, there would be maximum variability, and the numerator and denominator of the
ratio would be the same, and would equal to therefore varies from and and is a useful measure of
variation for nominal or ordinal variables.
Example
Satisfaction with work
1 Very satisfied 2 4
2 Satisfied 7 49
3 Neutral 3 9
4 Dissatisfied 9 81
5 Very dissatisfied 5 25
6 None of the above 4 16
Total 30
184

is computed as

=
A possible application of the index is as a measure of the concept ‘division of labour’ which refers to
the difference or variability among individuals in their sustenance activity. The more evenly spread
they are among the different possible occupations, the grater the division of labour. The amount of
division of labour could be expressed for different groups of people (for example, men and women in
Mean absolute deviation
The mean of the absolute deviations from the mean is called the mean absolute
deviation or simply average deviation and is obtained by dividing the sum of the
absolute deviations by the number of the deviations which is the same as the
number of observations. Thus, the mean absolute deviation from sample
observations is

=
Where is the mean of the variable and signifies that all differences between the
individual value and the mean are in absolute terms, that is only the magnitude
of the deviations and not the sign of the deviations are considered.
Example
A secretary typed five one page letters. The number of minutes spent on these letters were

Find the sample mean absolute deviation


Solution
= = =8
Therefore

=
=
=
The mean absolute deviation is useful in situations where no elaborate analysis is required. It is
introduced here as a logical stepping stone to the variance, which is a superior measure of
dispersion.
Variance and standard deviation
Ungrouped data
The variance
The variance of a set of observations is defined as the mean of the squares of the ‘deviations of the individual observations from their
mean’. It tells us how the observations vary or are spread out about the mean.
Denoting the population variance of observations with mean by

The population standard deviation is the positive square root of the variance.
The sample variance denoted by is defined as
Variance =
An equivalent formular more convenient for computation is

The sample standard deviation, , is the positive square root of the variance. Note that is the divisor of . The accounts for the
Ungrouped data

| |
𝑛 𝑛 2

𝑛 ∑ 𝑥𝑖 − 2
∑ 𝑥𝑖
2 𝑖=1 𝑖=1
𝜎 =
𝑛 ( 𝑛 −1 )
Example
Find the variance, standard deviation, coefficient of skewness and coefficient of kurtosis using
sample moment about the mean of

Solution 1 2 4
2 15 225
3 7 49
4 29 841
5 11 121
6 33 1089
7 5 25
8 15 225
9 3 9
10 23 529

Total 143 3117


Solution
From the table above

Variance =

So the standard deviation is the Square root


Grouped data
When the data is grouped into frequency distribution table, the formula for the variance and standard deviation are modified
similarly as for the averages discussed in the previous chapter.
For computational purposes the sample variance of observations is

Variance
Where are the midpoints or class mark; the number of classes and is the frequency.
The standard deviation is then given by the positive square root of the variance obtained from the formular above.
Interpreting variance or standard deviation
Large values of standard deviation and variances indicate more dispersion of the values of about the mean and small values
indicate that the values of are clustered about the mean.
Grouped data

| |
𝐾 𝐾 2

𝑛 ∑ 𝑓 𝑖 𝑚 − ∑ 𝑓 𝑖 𝑚𝑖
𝑖
2

2 𝑖=1 𝑖 =1
𝜎 =
𝑛 (𝑛 − 1 )
Example
Compute the sample variance, standard deviation, cofficient of skewness and coefficient of
kurtosis using sample moment about the mean from the grouped data
Class interval Class mark, m
Solution
Frequency, f
0-4 2 6 12 4 24
5-9 7 25 175 49 1225
10-14 12 11 132 144 1584
15-19 17 7 119 289 2023
20-25 22 1 22 484 484
Total 50 460 5340

From the table


,
Therefore

and
Relative dispersion and coefficient of variation
So far we have been dealing with absolute measures of dispersion in a distribution. However, there are times when the problem is
to compare the scatter in one distribution with that of another. If the items I one distribution are different in magnitudes as well as
units of measurement from the items in another distribution, it becomes difficult to compare the degree of scatter in the two. The
most common method of comparing amounts of different magnitudes and units is to reduce them to a comparable basis first.
Quite often, a standard deviation is expressed as percentage of the mean of the data from which it was computed. This new value
which is now unitless is called the coefficient of variation and denoting it by , we have

Example

Suppose that Mr. Otto, a Nakasero butcher, keeps records on monthly sales of his high quality beef and summarises his records as
follows;

Beef prices (Shs)


Beef sales (Kgs)

From the table above, the coefficients of variation indicate that, on relative terms, there was greater variation in the prices of beef
Measures of skewness
• In a given data set, when the plotted frequency distribution curve is not symmetrical, that is, not bell-
shaped, we say it is skewed. The measure of skewness indicates the degree to which the data set
deviates from the symmetry and also indicates the direction in which it is skewed.
• Skewness is brought about by the presence of extreme values at one side or in one of the tails of the
distribution, thereby elongating that tail. When the extreme values occur in the upper or right tail, the
distribution is said to be negatively skewed. When the extreme values occur in the lower or left tail,
the distribution is positively skewed.
• As discussed in the previous chapter, the mean is the measure most affected by the presence of
extreme values in one tail of a distribution. It is pulled substantially in the direction of the extreme
values. The mode is unaffected by the extreme values while the median, which is affected by the
number but not the value of extreme values, is pulled in the direction of extreme values but not as far
as the mean. The median moves about two-thirds as far as the mean in the direction of extreme
values.
• Several measures of skewness are available but we shall look at only two;
1. Pearsonian measure of skewness
2. An alternative measure of skewness involves using sample moments about the mean
Pearsonian measure of skewness
• This measure is based on the relationship between the mean, the median and the mode. Denoting the Pearsonian
coefficient of skewness by it is defined as

• Or using the mean and median

• Theoretically, both should yield the same result because of the relationship

• However, owing to the unstable measure of the mode, the formula involving the mean and median is usually preferred in
actual application.
• is nearly always within the limits of , even though theoretically it could range in value from
• When the distribution is symmetrical, the mean, median and mode coincide so
• For positively skewed distribution, the mean is greater than the median and is a positive value. The inverse is true for
negatively skewed distributions.
• Sample moments as measures of skewness
Sample moments about the mean as measures of skewness
An alternative measure of skewness involves using sample moments about the mean. The general formular for say moment
about the mean, which we denote by is

In particular

1. First moment about the mean (mean absolue deviation) is always zero, that is,

2. Second moment about the mean (variance) is the variance


Third moment about the mean (skewness) is used as a measure of
3.

skewness and is given as for ungrouped data;

Note that the third moment about the mean can be positive or negative.
For computational purposes, we can use an equivalent formula for the numerator in the above equation given as
=
mean

Ungrouped data
𝑛

∑ (𝑥 − 𝑥)
3

𝑖=1
𝑀 3 ( 𝑥 )=
𝑛
Measure of skeweness
=
Sample moments as measures of skewness
In case of grouped data the third moment about the mean is

Computational formular for the numerator is

Where is the number of classes, is the frequency, and is the midpoint or class mark.
Assuming a moderately skewed distribution, a measure of skewness which is independent of the scale is given by

In conclusion, a few observations may be mentioned. It is quiet common to encounter positively skewed distributions in
economic and business data, particularly in production and price series, which can only be as small as zero but can be
indefinitely large. It is believed that positive skewness is produced by multiplicative force. For instance, income distribution is
usually positively skewed because it is affected by a large number of factors such as education, sex, family background, and so
forth, which can be thought of as combining multiplicatively instead of additively. Negatively skewed distributions are quite
rare and it is often difficult to furnish a rational explanation for their existence.
Grouped data
𝐾

∑ 𝑓 𝑖 ( 𝑚𝑖 − 𝑥 )
3

𝑖=1
𝑀 3 ( 𝑥 )= 𝐾

∑ 𝑓 𝑖
𝑖 =1
Grouped data
Measures of Kurtosis
• The shape of the frequency distribution is primarily indicated by its skewness and kurtosis. The term kurtosis
describes the curvature or sharp-pointedness of the curve of the frequency distribution.
• Kurtosis measures from fractiles
• Using the fractiles, the coefficient of kurtosis denoted by is defined as a ratio of the semi-interquartile range
– that is, half of the value of the interquartile range – and the interdecile range, namely

• The value of ranges approximately from


• If is close to , the distribution is said to be leptokurtic, that is sharp peaks and thin tail.
• If it’s close to from both sides, the distribution is said to be mesokurtic that is equivalent to standardised
normal curve.
• If is close to , the distribution is said to be platokurtic, that is, flat peak and thick walls.
• For the standardised normal variable, a normal variable with zero mean and unit standard deviation,
Coefficient of kurtosis - Using
interqaurtile range

𝟏
𝟐
( 𝑸𝟑 − 𝑸 𝟏 )
𝑲𝒖𝒓 =
𝑫𝟗 − 𝑫 𝟏
Sample moments as kurtosis measure

∑ (𝑥−𝑥) 4

𝑖 =1
𝑀 4 ( 𝑥)=
𝑛
Coefficient of Kurtosis - Ungrouped data
Grouped data
Measures of Kurtosis
Sample moments as kurtosis measure
Kurtosis is also measured using the fourth sample moment about the mean. The fourth moment is

The computational formula for the numerator in the case of ungrouped data is

Similarly, in the case of grouped data, this formular becomes

Where again, is the number of class intervals and are the class marks.
A measure of kurtosis denoted by which is independent of the scale is given by

If , the distribution is mesokurtic. The bell shaped normal curve is usually taken as standard and has .
If , the distribution is platykurtic
If , the distribution is leptokurtic
Sometimes, we compute and refer to it as a measure of excess kurtosis. For the normal curve, the value is zero, positive for leptokurtic and
negative for platykurtic distribution.

You might also like