0% found this document useful (0 votes)
56 views18 pages

Data Management 2

This document discusses measures of variability and position for data analysis. It introduces key measures of variability like range, variance, and standard deviation. Formulas are provided to calculate variance and standard deviation for both populations and samples. Examples are used to demonstrate how to apply these measures and interpret the results. The variance and standard deviation of two data sets are calculated and compared to show how the measures indicate which data is more spread out.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views18 pages

Data Management 2

This document discusses measures of variability and position for data analysis. It introduces key measures of variability like range, variance, and standard deviation. Formulas are provided to calculate variance and standard deviation for both populations and samples. Examples are used to demonstrate how to apply these measures and interpret the results. The variance and standard deviation of two data sets are calculated and compared to show how the measures indicate which data is more spread out.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Data Management 2

Topics:
o Measures of Variability
o Measures of Position
o Box – and - Whisker Plot
Learning objectives:
1. Determine and apply the measures of variability, and position.
2. Create box – and – whisker plot
Lesson Proper:

Measures of Variability/ Dispersion/Spread/ Scattering

Let us first compare the two sets of data.


Set A Set B
2, 3, 3, 4, 8 2, 3, 3, 3, 9
𝑥̅ = 4 𝑥̅ = 4
𝑥̃ = 3 𝑥̃ = 3
𝑥̂ = 3 𝑥̂ = 3
The two sets of data have different, but it is reveal that mean, median, and mode respectively
are the same. In some aspects the two sets of data are still different by using the measures of
variability (also called dispersion, spread, or scattering ). This descriptive measures talks
about how spread, scatter, disperse, or variable of data is.

Like the descriptive measures of central tendency (mean, median, and mode), measures of
variability have also different types, such as range, variance, standard deviation, and among
others. The bigger the descriptive measures of variability, means the data is more spread,
disperse, scatter, or variable. Similarly, the smaller the descriptive measures of variability,
means the data is more close to one another.

Ungrouped Data
1. Range
The difference of the highest value and lowest value.
𝑅 = 𝐻𝑉 − 𝐿𝑉
The range of Set A is 𝑅 = 8 − 2 = 6, while the range of Set B is 𝑅 = 9 − 2 = 7. Results implies
that Set A have a more closer data than Set B.

Range is not as reliable as the other measures of variable, because it does not consider the
data in between the highest and lowest values. Other types of variability consider these data.

For instant, given the data below. Both have the range (𝑅 = 8 − 2 = 6), but as we can
observe the behavior of data between the highest value and lowest value are not the same.
Set C Set D
2, 2, 2, 2, 8 2, 3, 4, 6, 8

Page 1 of 18
Data Management 2

2. Variance
Variance of Population
∑(𝑥 − 𝜇 )2
𝜎2 =
𝑁
Where:
𝜎 (Lowercase Greek letter sigma)
∑ (uppercase Greek letter sigma means summation)
𝜎 2 is the variance
𝑥 is each datum
𝜇 (Lowercase Greek letter mu) is the mean of the population
𝑁 is the population size

Variance of Sample
∑(𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1
Where:
∑ (uppercase Greek letter sigma means summation)
𝑠 2 is the variance
𝑥 is each datum
𝑥 is the mean of the population
𝑛 is the population size

Use the variance of population if data comes from the whole population. Likewise, use
the formula for the variance of the sample if the sample comes from the sample.

Let us determine the variance of each set. Assuming the data comes from a sample.
Set A Set B
2, 3, 3, 4, 8 2, 3, 3, 3, 9
𝑥̅ = 4 𝑥̅ = 4
𝑥̃ = 3 𝑥̃ = 3
𝑥̂ = 3 𝑥̂ = 3

Set A
𝒙 𝒙−𝒙 ( 𝒙 − 𝒙 )𝟐
2 2 − 4 = −2 (−2)2 = 4
3 3 − 4 = −1 (−1)2 = 1
3 3 − 4 = −1 (−1)2 = 1
4 4−4= 0 (0)2 = 0
8 8−4= 4 (4)2 = 16
𝑥̅ = 4 ∑(𝑥 − 𝑥̅ )2 = 22
There are five data point (e.i. 2, 3, 3, 4, and 8).

Page 2 of 18
Data Management 2

2
∑(𝑥 − 𝑥̅ )2
𝑠 =
𝑛−1
2
22
𝑠 =
5−1
22 11
𝑠2 = = = 𝟓. 𝟓
4 2
The variance of Set A is 5.5. Let us also determine the variance of Set B.
Set B
𝒙 𝒙−𝒙 ( 𝒙 − 𝒙 )𝟐
2 2 − 4 = −2 (−2)2 = 4
3 3 − 4 = −1 (−1)2 = 1
3 3 − 4 = −1 (−1)2 = 1
3 3 − 4 = −1 (−1)2 = 1
9 9−4= 5 (5)2 = 25
𝑥̅ = 4 ∑(𝑥 − 𝑥̅ )2 = 32
∑(𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1
32
𝑠2 =
5−1
2
32 16
𝑠 = = =𝟖
4 2
The variance of Set B is 8 compare to variance of Set A is 5.5. Since the Variance of Set B is
bigger that the variance of Set A, therefore the data in Set B is more spread, disperse, scatter,
or variable. Likewise, each datum in Set A is closer from one another compare to behavior of
data in Set B.
3. Standard Deviation
Standard Deviation of Population
∑(𝑥 − 𝜇 )2
𝜎=√
𝑁
Where:
𝜎 (Lowercase Greek letter sigma) is the standard deviation
∑ (uppercase Greek letter sigma means summation)
𝑥 is each datum
𝜇 (Lowercase Greek letter mu) is the mean of the population
𝑁 is the population size

Standard Deviation of Sample


∑(𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
Where:
∑ (uppercase Greek letter sigma means summation)

Page 3 of 18
Data Management 2

𝑠 is the variance
𝑥 is each datum
𝑥 is the mean of the population
𝑛 is the population size

Use the standard deviation of population if data comes from the whole population.
Likewise, use the formula for the standard deviation of the sample if the sample
comes from the sample.
Standard deviation and variance are related. The square root of variance is equal to the
standard deviation. Similarly, the square of standard deviation is equal to the variance.

Therefore:
Set A
∑(𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
𝑠 = √5.5 ≈ 𝟐. 𝟑𝟓

Set B
∑(𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
𝑠 = √8 ≈ 𝟐. 𝟖𝟑
Similarly, results reveals that the data in Set B is more disperse or scatter compare to data in
Set A.

Grouped Data

Let us consider the same problem we used in measures of central tendency. The ages of the
first 50 persons who enter the mall were tallied, as shown below. Determine the mean,
median, and mode of their ages.
Age Frequency
10 – 19 5
20 – 29 20
30 – 39 10
40 – 49 7
50 – 59 8
Total n=50

1. Range
𝑈𝑝𝑝𝑒𝑟 𝐶𝑙𝑎𝑠𝑠 𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝐿𝑜𝑤𝑒𝑟 𝐶𝑙𝑎𝑠𝑠 𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦
𝑅= −
𝑜𝑓 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑜𝑓 𝐿𝑜𝑤𝑒𝑠𝑡 𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙

Page 4 of 18
Data Management 2

To solve, determine first the boundaries of class.


Age Frequency Boundaries
10 – 19 5
20 – 29 20 19.5 – 29.5
30 – 39 10
40 – 49 7
50 – 59 8
Total n=50

For example, consider class 20 – 29. The lower boundary is average of lower limit of
the class and the upper limit of lower class:
20 + 19 39
= = 19.5
2 2
Hence, the upper boundary is the average of the upper limit of the class and the lower
limit of the higher class next to it:
29 + 30 59
= = 29.5
2 2
Therefore, the class boundary is 19.5 – 29.5. To proceed, we all know from the
previous discussions that the class interval is 10 (e.g. 40 – 30=10). Simply, add or
subtract the class interval to determine the other boundaries. For example:
19.5 – 29.5
+10 +10
29.5 – 39.5

19.5 – 29.5
-10 -10
9.5 – 19.5
And continue the same process to determine the succeeding class boundaries.
Age Frequency Boundaries
10 – 19 5 9.5 – 19.5
20 – 29 20 19.5 – 29.5
30 – 39 10 29.5 -39.5
40 – 49 7 39.5 – 49.5
50 – 59 8 49.5 – 59.5
Total n=50

Lower boundary of Upper boundary of


the lowest class the highest class

𝑅 = 59.5 − 9.5 = 𝟓𝟎
The range is 50.
2. Variance
Variance of Population

Page 5 of 18
Data Management 2

2
∑ 𝑓 ( 𝑥 − 𝜇 )2
𝜎 =
𝑁
Where:
𝜎 (Lowercase Greek letter sigma)
∑ (uppercase Greek letter sigma means summation)
𝑓 is the frequency of the class
𝜎 2 is the variance
𝑥 is each datum
𝜇 (Lowercase Greek letter mu) is the mean of the population
𝑁 is the population size

Variance of Sample
∑ 𝑓 (𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1
Where:
∑ (uppercase Greek letter sigma means summation)
𝑓 is the frequency of the class
𝑠 2 is the variance
𝑥 is each datum
𝑥 is the mean of the population
𝑛 is the population size

In the previous discussions (measures of central tendency) we already computed for the
mean which is 𝑥̅ = 33.1. Likewise, the class mark were already determined in the previous
discussions. Class mark (x) is the average of lower limit and upper limit of each class. We will
use this to complete first the table.
Age f x 𝑥− 𝑥̅ (𝑥 − 𝑥
̅ )2 𝑓 (𝑥 − 𝑥̅ )2
14.5 − 33.1 (−18.6)2
10 – 19 5 14.5 5(345.96) = 1,729.8
= −18.6 = 345.96
24.5 − 33.1 (−8.6)2
20 – 29 20 24.5 20(73.96) = 1,479.2
= −8.6 = 73.96
30 – 39 10 34.5 1.4 1.96 19.6
40 – 49 7 44.5 11.4 129.96 909.72
50 – 59 8 54.5 21.4 457.96 3,663.68
Total n=50 ∑ 𝑓 (𝑥 − 𝑥
̅ )2 = 7,802

∑ 𝑓 (𝑥 − 𝑥
̅ )2 = 1,729.8 + 1,479.2 + 19.6 + 909.72 + 3,663.68 = 7,802

2
∑ 𝑓 (𝑥 − 𝑥̅ )2
𝑠 =
𝑛−1
7,802 7,802
𝑠2 = = ≈ 𝟏𝟓𝟗. 𝟐𝟐
50 − 1 49
The variance of the sample is 159.22.

Page 6 of 18
Data Management 2

3. Standard Deviation
Standard Deviation of Population
∑ 𝑓 (𝑥 − 𝜇 )2
𝜎=√
𝑁
Where:
𝜎 (Lowercase Greek letter sigma) is the standard deviation
∑ (uppercase Greek letter sigma means summation)
𝑓 is the frequency of the class
𝑥 is each datum
𝜇 (Lowercase Greek letter mu) is the mean of the population
𝑁 is the population size

Standard Deviation of Sample


∑ 𝑓 (𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
Where:
∑ (uppercase Greek letter sigma means summation)
𝑠 is the variance
𝑓 is the frequency of the class
𝑥 is each datum
𝑥 is the mean of the population
𝑛 is the population size

We already know that variance and standard computation is related (vis – a – vis). We can
already determine the standard deviation by the use of variance.

∑ 𝑓 (𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1

𝑠 = √159.22 ≈ 𝟏𝟐. 𝟔𝟐
This gives us the standard deviation of 12.62.

Measures of Location (Position)

Measures of location or position is used to locate the relative position of data value in the
data set. These includes standard scores, percentiles, deciles, and quartiles.

1. Standard Scores

Page 7 of 18
Data Management 2

Standard scores or z-scores tells how many standard deviations a data value is above
or below the mean for a specific distribution of values. If standard score is zero, then
the data value is the same as the mean. If it is positive means the data value is above
the mean. Hence, if it is negative, then it is below the mean.

It obtained by subtracting the mean from the value and dividing the result by the
standard deviation. The symbol for a standard score is z. The formula is:
𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑒𝑎𝑛
𝑧=
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
The formula for samples is:
𝑥 − 𝑥̅
𝑧=
𝑠
The formula for populations is:
𝑥−𝜇
𝑧=
𝜎
The z – score represents the number of standard deviations a data falls above or
below the mean.

Example. Angelo’s score in Literature is 88, compare to the mean score of the class which is
80 with standard deviation of 3. Also, his score in Mathematics is 90 with a class
mean of 95 and standard deviation of 5. Which subject Angelo perform better?

Solution:
Literature Mathematics
Score/ value (x) 88 90
Mean 80 95
Standard Deviation 3 5

Literature Mathematics From the results, it shows that


𝑥 − 𝑥̅ 𝑥 − 𝑥̅ Angelo’s score in Literature is
2.67 𝑧= 𝑧= units above the mean while his
𝑠 𝑠
88 − 80 90 − 95 score is Mathematics is 1 unit
𝑧= 𝑧= below the mean (-1.0). This
3 5
8 −5 implies that he scored better in
𝑧 = = 2. 6 ̅ ≈ 2.67 𝑧= = −1.0 Literature than in
3 5
Mathematics.

Illustration of result in Literature:

z -3 -2 -1 0 1 2 2.67 3
s -3s -2s -1s 0s 1s 2s 2.67s 3s
Scores 71 74 77 80 83 86 88 89
𝑥̅

Page 8 of 18
Data Management 2

Illustration of result in Mathematics:

z -3 -2 -1 0 1 2 3
s -3s -2s -1s 0s 1s 2s 3s
Scores 80 85 90 95 100 105 110
𝑥̅

Ungrouped Data
A group of students obtained the following scores in their Statistics quiz:
4, 9, 7, 14, 10, 8, 12, 15, 6, 11
Determine the 1st and 3rd Quartiles, 3rd and 7thDecile, and 25th and 75th Percentiles.

2. Quartiles
Divide the group into 4 parts/ quarters (𝑄1, 𝑄2 , 𝑄3 , 𝑄4 ). Every part or quarter is
equivalent to ¼ or 25%.
𝑘
𝑄𝑘 = (𝑛 + 1)
4
Where 𝑘 is the partition (𝑘 = 1,2,3,4), and 𝑛 is the number of terms. Round off the
nearest whole number.

First, arrange the scores in ascending order. Then solve for the quartiles.
4, 6, 7, 8, 9, 10, 11, 12, 14, 15
There are 10 scores (𝑛 = 10).

From First Quartile (𝑄1 ) Third Quartile (𝑄3 )


the 𝑘 𝑘
𝑄𝑘 = (𝑛 + 1) 𝑄𝑘 = (𝑛 + 1)
4 4
1 3
𝑄1 = (10 + 1) 𝑄3 = (10 + 1)
4 4
1 3
𝑄1 = (11) 𝑄1 = (11)
4 4
1 11 3 33
𝑄1 = (11) = = 2.75 = 3 𝑄1 = (11) = = 8.25 = 8
4 4 4 4
result 𝑄1 is the 3rd term, and 𝑄3 is the 8th term which is 12. The difference between 𝑄3
and 𝑄1 respectively is the interquartile (𝑄2 ). The 𝑄2 can also be solve by the use of
formula.
𝑄2 = 𝑄3 − 𝑄1
𝑄2 = 8.25 − 2.75

Page 9 of 18
Data Management 2

𝑄2 = 5.5 ≈ 6

or

𝑘
𝑄𝑘 = (𝑛 + 1)
4
2
𝑄2 = (10 + 1)
4
1
𝑄2 = (11)
2
1 11
𝑄2 = (11) = = 5.5 ≈ 6
2 2
This implies that 2nd quartile is 6th term which is 10. The first 6 terms (4, 6, 7, 8, 9, 10)
belongs to 50% of the score.
4, 6, 7, 8, 9, 10, 11, 12, 14, 15
𝑸𝟏 𝑸𝟐 𝑸𝟑
This means that the first 3 terms (4, 6, 7) belongs to 25% of the scores. Similarly, the
first 8 terms (4, 6, 7, 8, 9, 10, 11,12) belongs to 75% of the scores.

Interpolation technique can be done to get the exact number in the position. For
example, the 𝑄3 is not actually the 8th term but the 8.25th term. What is the score in
the 8.25th term?
Interpolation
The 3rd quartile (𝑄3 ) is in the 8.25th term which is somewhere between 8th term and
the 9th term (specifically, it is a quarter after the 8th term). It is 0.25 higher than the
8th term. First, is get the difference of scores between the 9th term and the 8th term:
9𝑡ℎ – 8𝑡ℎ
14 − 12
=2
Then multiply the difference of scores (2) and the excess after the 8th term (0.25).
2(0.25) = 0.5
Then add the result to the 8th term.
12 + 0.5 = 12.5
Therefore, the 3rd quartile (𝑄3 ) which is 8.25th term with score of 12.5. This is the
actual position and actual score. This can also be done to other quartiles.

3. Deciles
1
Divide the group into 10 parts (𝐷1 , 𝐷2 , … 𝐷9 , 𝐷10 ). Each partition is equivalent to 10 or
10%.
𝑘
𝐷𝑘 = (𝑛 + 1)
10
Where 𝑘 is the partition (𝑘 = 1,2,3,4,5,6,7,8,9,10), and 𝑛 is the number of terms.
Round off the nearest whole number.
Page 10 of 18
Data Management 2

From the problem, we’re about to find the 3rd and 7th deciles.
4, 6, 7, 8, 9, 10, 11, 12, 14, 15

Third Decile (𝐷3 ) Eighth Decile (𝐷8 )


From 𝐷𝑘 = 𝑘 (𝑛 + 1) 𝐷𝑘 =
𝑘
(𝑛 + 1)
the 10 10
3 7
𝐷3 = (10 + 1) 𝐷7 = (10 + 1)
10 10
3 7
𝐷3 = (11) 𝐷7 = (11)
10 10
33 77
𝐷3 = = 3.3 ≈ 3 𝐷7 = = 7.7 ≈ 8
10 10
results, the 3rd decile is 3rd term which is 12. The 7th decile is the 8th term which is 12.
Interpolation technique can be done to get the score of actual position.

4, 6, 7, 8, 9, 10, 11, 12, 14, 15


𝑫𝟑 𝑫𝟕
Roughly speaking, the 1st quartile and 2.5th decile are equal, 2nd quartile is to 5th decile,
and 3rd quartile is equal to 7.5th decile.

4. Percentiles
Divide the group into 100 parts (𝑃1 , 𝑃2 , 𝑃3 , … 𝑃99, 𝑃100). Each partition is equivalent to
1/100 or 1%.
𝑘
𝑃𝑘 = (𝑛 + 1)
100
Where 𝑘 is the partition (𝑘 = 1,2,3, … ,99,100), and 𝑛 is the number of terms. Round
off the nearest whole number.

From the given, we are about to the 25th and 75th percentiles.
4, 6, 7, 8, 9, 10, 11, 12, 14, 15

The 25th Percentile (𝑃25 ) 75th Percentile (𝑃75 )


𝑘 𝑘
25th 𝑃𝑘 = (𝑛 + 1) 𝑃𝑘 = (𝑛 + 1)
100 100
25 75
𝑃25 = (10 + 1) 𝑃75 = (10 + 1)
100 100
1 3
𝑃25 = (11) 𝑃75 = (11)
4 4
11 33
𝑃25 = = 2.67 ≈ 3 𝑃75 = = 8.25 ≈ 8
4 4
percentile is the 3rd terms which is 7. The 75th percentile is 8th term which is 12.
Interpolation technique can also be done to get the actual score in actual position.

Page 11 of 18
Data Management 2

4, 6, 7, 8, 9, 10, 11, 12, 14, 15


𝑷𝟐𝟓 𝑷𝟕𝟓

Furthermore, the 50th term is the same as the 2nd quartile or the 5th decile which is
the 6th term. And the 6th term is 10.
Quartile Decile Percentile
𝐷1 𝑃10
𝐷2 𝑃20
𝑄1 𝐷2.5 𝑃25
𝐷3 𝑃30
𝐷4 𝑃40
𝑄2 𝐷5 𝑃50
𝐷6 𝑃60
𝐷7 𝑃70
𝑄3 𝐷7.5 𝑃75
𝐷8 𝑃80
𝐷9 𝑃90
𝑄4 𝐷10 𝑃100

Grouped Data
The computation of quartiles, deciles, and percentiles in grouped data is the same as the
computation for the median of grouped data.
Problem. The scores of 50 students in Statistics are shown in the table below.
Score Frequency
41 – 45 9
36 – 40 13
31 – 35 15
26 – 30 10
21 – 25 3
Total 50

Determine the following:


a. 𝑄2
b. 𝐷4

Page 12 of 18
Data Management 2

c. 𝑃67

1. Quartile
𝑘𝑛
− 𝑐𝑓𝑏
𝑄𝑘 = 𝑙𝑏 + ( 4 )𝑖
𝑓
Where:
𝑄𝑘 is the quartile position
𝑙𝑏 is the lower boundary of the class.
𝑘 is the nth quartile (𝑘 = 1, 2, 3, 4)
𝑛 is the total frequency
𝑐𝑓𝑏 is cumulative of frequency before the class
𝑓 is the frequency of the class
𝑖 is the class interval

Solution. To solve for the 2nd quartile (𝑄2 ) complete first the table.
Score Frequency 𝑐𝑓
41 - 45 9 50
36 – 40 13 41
31 – 35 15 28
26 – 30 10 13
21 – 25 3 3
Total 50
𝑘𝑛
The solve for 4 :

𝑘𝑛 2(50) 100
= = = 25
4 4 4
Class 31 – 35 gave 14th to 28th ranks where 25th rank is included. Therefore, 2nd quartile is
included in this class.
Score Frequency 𝑐𝑓
41 - 45 9 50
36 – 40 13 41
31 – 35 15 28
26 – 30 10 13
21 – 25 3 3
Total 50
The lower boundary (𝑙𝑏) is halfway between 31 and 30 which 30.5. The frequency (𝑓) of the
class is 15. The cumulative frequency before (𝑐𝑓𝑏 ) the class is 13. The class interval is 5 (e.g.
31 − 26, 𝑜𝑟 35 − 30). Therefore:
𝑘𝑛
− 𝑐𝑓𝑏
𝑄𝑘 = 𝑙𝑏 + ( 4 )𝑖
𝑓

Page 13 of 18
Data Management 2

2(50)
− 13
𝑄2 = 30.5 + ( 4 )5
15

100
− 13
𝑄2 = 30.5 + ( 4 )5
15

25 − 13
𝑄2 = 30.5 + ( )5
15
12
𝑄2 = 30.5 + ( )5
15
4
𝑄2 = 30.5 + ( ) 5
5
𝑄2 = 30.5 + 4
𝑸𝟐 = 𝟑𝟒. 𝟓
The 2nd quartile is 34.5.

2. Decile
𝑘𝑛
− 𝑐𝑓𝑏
𝐷𝑘 = 𝑙𝑏 + (10 )𝑖
𝑓
Where:
𝐷𝑘 is the decile position
𝑙𝑏 is the lower boundary of the class.
𝑘 is the nth quartile (𝑘 = 1, 2, 3, … 9, 10)
𝑛 is the total frequency
𝑐𝑓𝑏 is cumulative of frequency before the class
𝑓 is the frequency of the class
𝑖 is the class interval
Score Frequency 𝑐𝑓
41 - 45 9 50
36 – 40 13 41
31 – 35 15 28
26 – 30 10 13
21 – 25 3 3
Total 50
𝑘𝑛
To solve for the 4th decile, start with 10 :
𝑘𝑛 4(50)
= = 4(5) = 20
10 10
The 20th rank also belongs to class 31 -35 which holds the ranks 14th to 28th. The same
as the solution for quartile earlier, we have the same values for unknowns. Hence:

Page 14 of 18
Data Management 2

𝑘𝑛
− 𝑐𝑓𝑏
𝐷𝑘 = 𝑙𝑏 + (10 )𝑖
𝑓
4(50)
− 13
𝐷4 = 30.5 + ( 10 )5
15
4(5) − 13
𝐷4 = 30.5 + ( )5
15
20 − 13
𝐷4 = 30.5 + ( )5
15
7
𝐷4 = 30.5 + ( ) 5
15
7
𝐷4 = 30.5 +
3
𝑸𝟒 = 𝟑𝟎. 𝟓 + 𝟐. 𝟑𝟑 ≈ 𝟑𝟐. 𝟖𝟑
The 4th decile is approximately 32.83.

3. Percentile
𝑘𝑛
− 𝑐𝑓𝑏
𝑃𝑘 = 𝑙𝑏 + ( 100 )𝑖
𝑓
Where:
𝑃𝑘 is the percentile position
𝑙𝑏 is the lower boundary of the class.
𝑘 is the nth quartile (𝑘 = 1, 2, 3, … ,99, 100)
𝑛 is the total frequency
𝑐𝑓𝑏 is cumulative of frequency before the class
𝑓 is the frequency of the class
𝑖 is the class interval
Score Frequency 𝑐𝑓
41 - 45 9 50
36 – 40 13 41
31 – 35 15 28
26 – 30 10 13
21 – 25 3 3
Total 50
𝑘𝑛
To solve for the 67th percentile, start with 100:
𝑘𝑛 67(50) 3,350
= = = 33.5 ≈ 38
100 100 100
The 38th rank is included in the class 36 -40 since it holds the ranks 29th to 41st.
36 + 35 71
𝑙𝑏 = = = 35.5
2 2

Page 15 of 18
Data Management 2

𝑛 = 50
𝑐𝑓𝑏 = 28
𝑓=13
𝑖=5
𝑘𝑛
− 𝑐𝑓𝑏
𝑃𝑘 = 𝑙𝑏 + ( 100 )𝑖
𝑓
67(50)
− 28
𝑃67 = 35.5 + ( 100 )5
13
35.5 − 28
𝑃67 = 35.5 + ( )5
13
7.5
𝑃67 = 35.5 + ( ) 5
13
37.5
𝑃67 = 35.5 + ( )
13
𝑷𝟔𝟕 = 𝟑𝟓. 𝟓 + 𝟐. 𝟖𝟖 ≈ 𝟑𝟖. 𝟑𝟖
The 67 percentile is approximately 38.38.
th

Box – and - Whisker Plot

Three steps in creating box – and – whisker plot:


1. Calculate the values of the five – number summary.
2. Draw and translate data sets to and from a box – and – whisker plot.
3. Interpret the shape of a box – and – whisker plot.
To illustrate, let us consider the scenario below.
Suppose, Maria’s scores in Calculus quizzes are 15, 2, 8, 9, 5, 5, 13, 10, 7, 9, and 4. Draw box
– and – whisker graph and interpret the result.
Step 1. Five – number summary
The five – number summary is composed of:
a. Least value;
b. Greatest value;
c. First Quartile (𝑄1 );
d. Second Quartile (𝑄2 ); and
e. Third Quartile (𝑄3 )
Arrange first the given in increasing order:
2, 4, 5, 5, 7, 8, 9, 9, 10, 13, 15
From the given the least value is 2 and the greatest value is 15.
𝑘 𝑘 𝑘
𝑄𝑘 = (𝑛 + 1) 𝑄𝑘 = (𝑛 + 1) 𝑄𝑘 = (𝑛 + 1)
4 4 4

Page 16 of 18
Data Management 2

1 2 3
𝑄1 = (11 + 1) 𝑄2 = (11 + 1) 𝑄3 = (11 + 1)
4 4 4
1 1 3
𝑄1 = (12) 𝑄2 = (12) 𝑄3 = (12)
4 2 4
12 12 36
𝑄1 = 𝑄2 = 𝑄2 =
4 2 4
𝑄1 = 3 𝑄2 = 6 𝑄2 = 9
The 3rd term is 5. (Lower The 6th term is 8. The 9th term is 10. (Upper
Quartile) (Median) Quartile)

2, 4, 5, 5, 7, 8, 9, 9, 10, 13, 15
L 𝑄1 𝑄2 𝑄3 H
Step 2. Draw and translate data sets to and from a box – and – whisker plot.
a) In the number line, draw congruent vertical lines positioned at 𝑄1 , 𝑄2, 𝑎𝑛𝑑 𝑄3.

b) Horizontally connect endpoints of 𝑄1 , 𝑎𝑛𝑑 𝑄3 lines to create box.

c) Connect the least and highest value to the box. These lines are called whiskers.

Step 3. Interpret the shape of a box – and – whisker plot.


Observations.

Page 17 of 18
Data Management 2

• The third quarter (from median to upper quarter) data is densely


concentrated.
• Upper whisker is widely spread than the other section.
This shows that data is slightly skewed to the right because data at the right of the
box is quite spread than the other.

References
Almukkahal, R., et. al. (2016). CK-12 Advanced Probability and Statistics Concepts. Flexbook:
next generation textbook.
Australian Bureau of Statistics (2013). What is Variable? Retrieved 04 June 2020 from
https://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language+-
+what+are+variables#:~:text=A%20variable%20is%20any%20characteristics,ty
pe%20are%20examples%20of%20variables.
Bluman, A. G. (2018). Elementary Statistics: A Step by Step Approach , Tenth Edition, ISBN
978 – 1 – 259 -75533 McGraw – Hill Education, New York City, USA. Retrieved 03 June
2020 from https://b-ok.asia/book/5009088/f236d3
Dataceuticc, Inc. (2018). Sir Ronald Aylmer Fisher – The Father of Modern Statistics.
Retrieved 06 June 2020 from https://www.dataceutics.com/blog/2018/7/24/sir-
ronald-aylmer-fisher-the-father-of-modern-statistics
Encyclopedia Britanica, Inc. (2020). Sir Ronald Aylmer Fisher. Retrieved 06 June 2020 from
https://www.britannica.com/science/physical-anthropology
Gupta, S. (2014). Sampling Methods. Retrieved 06 June 2020 from
https://www.slideshare.net/shubhanshug1/seminar-sampling-
methods?qid=d1f11eda-cdd5-44b8-81de-f0cd88637e6e&v=&b=&from_search=1
Ratner, B. (2009). The correlation coefficient: Its values range between +1/−1, or do they?.
Spring Nature Switzerland. Retrieved 17 June 2020 from
https://doi.org/10.1057/jt.2009.5
Tejada, J.J. & Punzalan, R. B. (2012). On the Misuse of Slovin’s Formula. The Philippine
Statistician, Vol. 61, No. 1, pp. 129 – 136. Retrieved 06 May 2020 from
https://www.psai.ph/docs/publications/tps/tps_2012_61_1_9.pdf
Weiss, N. A. (2012). Elementary Statistics, 8th Edition, ISBN 978 – 0- 321 – 69123 - 1. Pearson
Education, Inc., Boston, USA. Retrieved 03 June 2020 from https://b-
ok.asia/book/1236722/d339a2

http://onlinestatbook.com/2/calculators/normal_dist.html
Adapted from the module created by Anthony L. Madrazo

Page 18 of 18

You might also like