Quartiles
Quartiles
The Quartile Deviation is a simple way to estimate the spread of a distribution about a measure of its
central tendency (usually the mean). So, it gives you an idea about the range within which the central
50% of your sample data lies. Consequently, based on the quartile deviation, the Coefficient of Quartile
Deviation can be defined, which makes it easy to compare the spread of two or more different
distributions. Since both of these topics are based on the concept of quartiles, we’ll first understand how
to calculate the quartiles of a dataset before working with the direct formulae.
Quartiles
A median divides a given dataset (which is already sorted) into two equal halves similarly, the quartiles
are used to divide a given dataset into four equal halves. Therefore, logically there should be three
quartiles for a given distribution, but if you think about it, the second quartile is equal to the median
itself! We’ll deal with the other two quartiles in this section.
The first quartile or the lower quartile or the 25th percentile, also denoted by Q1, corresponds to
the value that lies halfway between the median and the lowest value in the distribution (when it
is already sorted in the ascending order). Hence, it marks the region which encloses 25% of the
initial data.
Similarly, the third quartile or the upper quartile or 75th percentile, also denoted
by Q3, corresponds to the value that lies halfway between the median and the highest value in
the distribution (when it is already sorted in the ascending order). It, therefore, marks the region
which encloses the 75% of the initial data or 25% of the end data.
For a better understanding, look at the representation below for a Gaussian Distribution –
Qd=Q3–Q12
The Quartile Deviation doesn’t take into account the extreme points of the distribution. Thus,
the dispersion or the spread of only the central 50% data is considered.
If the scale of the data is changed, the Qd also changes in the same ratio.
It is the best measure of dispersion for open-ended systems (which have open-ended extreme
ranges).
Also, it is less affected by sampling fluctuations in the dataset as compared to the range (another
measure of dispersion).
Since it is solely dependent on the central values in the distribution, if in any experiment, these
values are abnormal or inaccurate, the result would be affected drastically.
Q1 = lower quartile
Q3 = upper quartile
Q1 = [(n+1)4]th item
Q2 = [(n+1)2]th item
Q3 = [3(n+1)4]th item
It is important to note here that students need to arrange the given data values in ascending order
before estimating the quartiles.
For a grouped data, the quartiles can be calculated using the following formula:
Qr=l1+r(N4)−cf(l2−l1)
Here,
Qr = rth quartile
l1 = the lower limit of the quartile class
Based on the quartiles, a relative measure of dispersion, known as the Coefficient of Quartile Deviation,
can be defined for any distribution. It is formally defined as –
Since it involves a ratio of two quantities of the same dimensions, it is unitless. Thus, it can act as a
suitable parameter for comparing two or more different datasets which may or may not involve
quantities with the same dimensions.
So, now let’s go through the solved examples below to get a better idea of how to apply these concepts
to various distributions.
Statistics is a tool that helps us understand the data, its frequency, and the distribution of the trends.
Quartile deviation is the difference between the first quartile and the third quartile in the frequency
distribution table. This is also known as the interquartile range. It is important as in this range numerous
regressions and deviations can be calculated which help to assess the characteristics of the data. When
we divide the interquartile range by two, it is known as quartile deviation or semi-interquartile range.
Question 1: The number of vehicles sold by a major Toyota Showroom in a day was recorded for 10
working days. The data is given as –
Day Frequency
1 20
2 15
3 18
4 5
5 10
6 17
7 21
8 19
9 25
10 28
Find the Quartile Deviation and its coefficient for the given discrete distribution case.
Solution: We first need to sort the frequency data given to us before proceeding with the quartiles
calculation –
Sorted Data – 5, 10, 15, 17, 18, 19, 20, 21, 25, 28
n(number of data points) = 10
Now, to find the quartiles, we use the logic that the first quartile lies halfway between the lowest value
and the median; and the third quartile lies halfway between the median and the largest value.
Using the values for Q1 and Q3, now we can calculate the Quartile Deviation and its coefficient as follows
–
Question 2:
For the following open-ended data, calculate the Quartile Deviation and its coefficient.
No. of
Marks
Students
0-10 10
10-20 20
20-30 30
30-40 50
40-50 40
50-60 30
Solution: For the case of a grouped-data distribution, we can find the quartiles through the following
steps –
⇒ Construct a cumulative frequency table for the given data alongside the given distribution
⇒ From the total number of data values, estimate the groups/classes of the Lower and Upper Quartiles
⇒ Use the following formulae to then calculate the quartiles:
where, LB – the lower bound of the class in which the respective quartile lies
w – the class width
f_c – the cumulative frequency up to that class
f – the frequency corresponding to that particular class
For the given data, we can form the required table with the cumulative frequency as –
0-10 10 10
10-20 20 30
20-30 30 60
30-40 50 110
40-50 40 150
50-60 30 180
Since the total number of students is 180, the first quartile must lie at the position of 180/4 = 45th
student. Similarly, the third quartile must lie at the position of 180×3/4 = 135th student. By the
distribution of our data into groups, we can note that the first quartile will lie in the 20-30 marks range.
Calculation –
Q1 = LB + w14n – fcf
Here, LB = 20; w = 10
f_c = 30; f = 30; n = 180
Thus, Q1 = 20 + 1014 × 180 – 3030
= 20 + 1530 × 10
= 25
Similarly, the third quartile will lie in the 40-50 marks range. Calculation –
Q3 = LB + w34n – fcf
Here, LB = 40; w = 10
f_c = 110; f = 40; n = 180
Thus, Q3 = 40 + 1034 × 180 – 11040
= 40 + 2540 × 10
= 46.25
Now, using the values for Q1 and Q3, now we can calculate the Quartile Deviation and its coefficient as
follows –