Unit 4 Descriptive Statistics
Unit 4 Descriptive Statistics
Written By:
Miss Sumbal Asghar
Reviewed By:
Dr. Rizwan Akram Rana
Introduction
Measures of central tendency estimate normal or central value of a dataset, while
measures of dispersion are important for describing the spread of the data, or its variation
around a central value. Two distinct samples may have same mean or median, but
completely different level of variability and vice versa. A proper description of a set of
data should include both of these characteristics. There are various methods that can be
used to measure the dispersion of a dataset. In this unit you will study range, quartiles,
quartile deviation, mean or average deviation, standard deviation and variance. Two
measures of shape and discussion about co-efficient of variation are also included in this
unit.
Objectives
After reading this unit, you will be able to:
1. tell the basic purpose of measure of central tendency.
2. define Range.
3. determine range of a given data.
4. write down the formulas for determining quartiles.
5. define mean or average deviation.
6. determine variance and standard deviation.
7. define normal curve.
8. explain skewness and kurtosis.
There is dispersion when there is dissimilarity among the data values. The greater the
dissimilarity, the greater the degree of dispersion will be.
48
Measure of dispersion enables us to compare two or more series with regards to their
variability. It is also looked as a means of determining uniformity or consistency. A high
degree would mean little consistency or uniformity whereas low degree of variation
would mean greater uniformity or consistency among the data set. Commonly used
measures of dispersion are range, quartile deviation, mean deviation, variance, and
standard deviation.
4.1.1 Range
The range is the simplest measure of spread and is the difference between the highest and
lowest scores in a data set. In other words we can say that range is the distance between
largest score and the smallest score in the distribution. We can calculate range as:
Range = Highest value of the data – Lowest value of the data
For example, if lowest and highest marks scored in a test are 22 and 95 respectively, then
Range = 95 – 22 = 73
The range is the easiest measure of dispersion, and is useful when you wish to evaluate
whole of a dataset. But it is not considered a good measure of dispersion as it does not
utilize the other information related to the spread. The outliers, either extreme low or
extreme high value, can considerably affect the range.
4.1.2 Quartiles
The values that divide the given set of data into four equal parts is called quartiles, and is
denoted by Q1, Q2, and Q3. Q1 is called the lower quartile and Q3 is called the upper
quartile. 25% of scores are less than Q1and 75% scores are less than Q3. Q2 is the median.
The formulas for the quartiles are:
Q1 = (N + )th Score
Q2 = 2 (N + )th = (N + ) th Score
Q3 = 3(N + 1) / 4th Score
In order to calculate quartile deviation from ungrouped data, following steps are used.
i) Arrange the test scores from highest to lowest
ii) Assign serial number to each score. The first serial number is assigned to the
lowest score.
49
iii) Determine first quartile (Q1) by using formula Q1 = . Use obtained value to
locate the serial number of the score that falls under Q1.
iv) Determine the third (Q3), by using formula Q3 = . Locate the serial number
corresponding to the obtained answer. Opposite to this number is the test score
corresponding to Q3.
v) Subtract the Q1 from Q3, and divide the difference by 2.
Or
Ϭ = √ Ʃ (X – X)2 / n
Ϭ is a Greek letter “Sigma”
4.1.6 Variance
The variance of a set of scores is denoted by σ2and is defined as
Ϭ2= Ʃ (X – X)2 / n
Where X is the mean, n is the number of data values and X stand for each of the scores,
and Ʃ means add up all the values.
50
distribution of data may naturally occur in several possible ways, with a number of
possibilities for standard deviation (which could be from 1 to infinity). A standard
normal curve has a mean of 0 and standard of 1. The larger the standard deviation, the
flatter the curve will be and vice versa. A standard normal distribution is given below.
a) Skewness
Skewness tells us about the amount and direction of the variation of the data set. It is a
measure of symmetry. A distribution or data set is symmetric if it looks the same to the left
and right of the central point. If bulk of data is at the left i.e. the peak is towards left and the
right tail is longer, we say that the distribution is skewed right or positively skewed.
On the other hand if the bulk of data is towards right or, in other words, the peak is
towards right and the left tail is longer, we say that the distribution is skewed left or
negatively skewed.If the skewness is equal to zero, the data are perfectly symmetrical.
But it is quiet unlikely in real world.
51
Source: Google Images
b) Kurtosis
Kurtosis is a parameter that describes the shape of variation. It is a measurement that tells
us how the graph of the set of data is peaked and how high the graph is around the mean.
In other words we can say that kurtosis measures the shape of the distribution, .i.e. the
fatness of the tails, it focuses on how returns are arranged around the mean. A positive
value means that too little data is in the tail and positive value means that too much data
is in the tail. This heaviness or the lightness in the tail means that data looks more peaked
of less peaked. Kurtosis is measured against the standard normal distribution. A standard
normal distribution has a kurtosis of 3.
Kurtosis has three types, mesokurtic, platykurtic, and leptokurtic. If the distribution has
kurtosis of zero, then the graph is nearly normal. This nearly normal distribution is called
mesokurtic. If the distribution has negative kurtosis, it is called platykurtic. An example
of platykurtic distribution is a uniform distribution, which has as much data in each tail as
it does in the peak. If the distribution has positive kurtosis, it is called leptokurtic. Such
distribution has bulk of data in the peak.
52
4.3 Co-Efficient of Variation
The coefficient of variation is another useful statistics for measuring dispersion of a data
set. The coefficient of variation is
C.V = (s / x ) × 100
The coefficient of variation is invariant with respect to the scale of the data. On the other
hand, standard deviation is not scale variant.
4.5 Activities
Take a cardboard. Cut it into 4x4 pieces, and:
i) Cut one piece into standard normal distribution shape and mention its name on it.
ii) Cut one piece into negatively skewed shape and mention its name on it.
iii) Cut one piece into positively skewed shape and mention its name on it.
iv) Cut one piece into no skewed shape and mention its name on it.
v) Cut one piece into mesokurtic shape and mention its name on it.
vi) Cut one piece into platykurtic shape and mention its name on it.
vii) Cut one piece into leptokurtic shape and mention its name on it.
53
4.6 Bibliography
Bartz, A. E. (1981). Basic Statistical Concepts (2nd Ed.). Minnesota: Burgess Publishing
Company
Deitz, T., & Kalof, L. (2009). Introduction to Social Statistics. UK: Wiley_-Blackwell
Gravetter, F. J., & Wallnau, L. B. (2002). Essentials of Statistics for the Behavioral
Sciences (4th Ed.). Wadsworth, California, USA.
54