0% found this document useful (0 votes)
30 views16 pages

LEC 3 - STATISTICALDESCRIPTION (GROUPED DATA ANALYSIS)

The document provides an overview of grouped data analysis in statistics, detailing how to construct a frequency distribution table (FDT) and compute measures of central location and variation. It outlines the steps for creating an FDT, including determining the range, number of classes, class width, and constructing the table with relevant columns. Additionally, it explains how to calculate the mean, median, mode, variance, and standard deviation for grouped data.

Uploaded by

Ronald Bernardo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views16 pages

LEC 3 - STATISTICALDESCRIPTION (GROUPED DATA ANALYSIS)

The document provides an overview of grouped data analysis in statistics, detailing how to construct a frequency distribution table (FDT) and compute measures of central location and variation. It outlines the steps for creating an FDT, including determining the range, number of classes, class width, and constructing the table with relevant columns. Additionally, it explains how to calculate the mean, median, mode, variance, and standard deviation for grouped data.

Uploaded by

Ronald Bernardo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

EDA (Eng’g Data Analysis)

Reference: Intro to Statistics by R. Walpole Lecture Notes 3

STATISTICAL DESCRIPTION OF DATA


(GROUPED DATA ANALYSIS)
Topic Learning Outcomes (TLO)

After these topics, the students can


1. Construct frequency distribution table (FDT).
2. Compute and interpret measures of central location, variation and position for
grouped data.

GROUPED DATA ANALYSIS


 Sometimes statisticians are confronted with the problem of disseminating large
masses of statistical data in compact form. Important characteristics of a large
mass of data can be readily assessed by grouping the data into different classes
and then determining the number of observations that fall in each of the classes.
Such arrangement, in tabular form, is called a frequency distribution. Data that
are presented in the form of a frequency distribution are called grouped data.

FREQUENCY DISTRIBUTION TABLE (FDT)


 In statistics, numerical information may be treated as ungrouped or grouped data.
In grouped data analysis, tabular presentation is very important. This tabular
presentation of data is called the frequency distribution table.

STEPS in the construction of an FDT


1. Determine the Range of the data, R = highest value – lowest value

2. Determine the number of classes, K. A class is a grouping or category.


Statisticians said that the ideal number of classes is between 5 and 15.
The desired number of classes may be determined from the formulas
enumerated below.
𝐾𝐾 = √𝑁𝑁 ; N = number of observations

Or

𝐾𝐾 = 1 + 1.33 log 𝑁𝑁

3. Determine the class width, C, the size of each class

𝑅𝑅 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
𝐶𝐶 = =
𝐾𝐾 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐

The value of C must be of the same form as the form of the original set of
data. If the given set of data are in whole numbers, the C value must be
rounded to a whole number and if the given set of data are given until the
first decimal point, the C value must be rounded until the first decimal point
such that 𝐶𝐶 ⋅ 𝐾𝐾 ≥ 𝑅𝑅.

1|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

4. Enumerate the different classes. The lowest class must include the lowest
value, and the highest class must include the highest value. Make a
frequency tally.

5. Construct the Frequency Distribution Table by providing the following


columns such as the Classes, the True Class Boundaries (TCB), Class
Marks (midpoint), Cumulative Frequencies, and the Relative Cumulative
Frequencies.

6. Draw the Frequency Histogram and the Cumulative Frequency Polygon


(Ogives).

A. MEASURES OF CENTRAL LOCATION


a. Mean
∑𝑘𝑘
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖
𝜇𝜇 = where k = number of class intervals
∑𝑘𝑘
𝑖𝑖=1 𝑓𝑓𝑖𝑖
b. Median
𝑁𝑁
� 2 − 𝑓𝑓𝑐𝑐𝑐𝑐𝑐𝑐 �
𝑀𝑀𝑒𝑒 = 𝐿𝐿𝑚𝑚𝑚𝑚 + 𝐶𝐶
𝑓𝑓𝑚𝑚𝑚𝑚
c. Mode
(𝑓𝑓𝑚𝑚𝑚𝑚 − 𝑓𝑓𝑏𝑏 )
𝑀𝑀𝑂𝑂 = 𝐿𝐿𝑚𝑚𝑚𝑚 + 𝐶𝐶
(2𝑓𝑓𝑚𝑚𝑚𝑚 − 𝑓𝑓𝑏𝑏 − 𝑓𝑓𝑎𝑎 )
Where:
𝑓𝑓𝑖𝑖 frequency of the ith class
𝑋𝑋𝑖𝑖 class mark (midpoint) of the ith class
𝐿𝐿𝑚𝑚𝑚𝑚 Lower boundary of the median class
𝐶𝐶 class width
N total number of observations
𝑓𝑓𝑐𝑐𝑐𝑐𝑐𝑐 cumulative frequency of the class before the median class
𝑓𝑓𝑚𝑚𝑚𝑚 the frequency of the median class
𝐿𝐿𝑚𝑚𝑚𝑚 Lower boundary of the modal class
𝑓𝑓𝑚𝑚𝑚𝑚 frequency of the modal class
𝑓𝑓𝑏𝑏 frequency of the class just before the modal class
𝑓𝑓𝑎𝑎 frequency of the class just after the modal class

B. MEASURES OF VARIATION

RANGE = Highest Value – Lowest Value

1. Population Variance
2
𝑋𝑋𝑖𝑖2 𝑁𝑁 ∑𝑘𝑘 2 𝑘𝑘
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 −�∑𝑖𝑖−1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 �
𝜎𝜎𝐺𝐺2 = ∑𝑘𝑘𝑖𝑖=1 − 𝜇𝜇2 =
𝑁𝑁 𝑁𝑁2

2. Population Standard Deviation, 𝜎𝜎𝐺𝐺


𝜎𝜎𝐺𝐺 = �𝜎𝜎𝐺𝐺2

2|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

3. Sample Variance, 𝑆𝑆𝐺𝐺2


2 2
𝑛𝑛 ∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 −�∑𝑘𝑘𝑖𝑖−1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 �
𝑆𝑆𝐺𝐺2 = 𝑛𝑛(𝑛𝑛−1)

4. Standard Deviation, 𝑆𝑆𝐺𝐺


𝑆𝑆𝐺𝐺 = �𝑆𝑆𝐺𝐺2

Sample problem
The following numbers represent the total number of projects undertaken by 60
different contractors in CAR for the past five years.

12 6 8 23 6 7 25 7 5 3 4 3
18 10 14 7 19 9 6 8 4 10 6 25
18 13 8 10 8 8 14 7 6 5 6 6
19 6 17 14 6 17 9 8 3 4 21 22
12 8 24 7 2 8 7 7 16 5 22 1

Set up a frequency distribution table and determine the measure of central locations
and the standard deviation considering it as a population.

Construction of FDT

Step 1 Range, R = 25 – 1 = 24
Step 2 Number of classes, 𝐾𝐾
𝐾𝐾 = √𝑁𝑁 = 7.70
or 𝐾𝐾 = 1 + 3.3 𝑙𝑙𝑙𝑙𝑙𝑙 𝑁𝑁 = 1 + 3.3 𝑙𝑙𝑙𝑙𝑙𝑙 60 = 6.87

Try: K = 8
𝑅𝑅 24
Step 3 Class Width, 𝐶𝐶 = 𝐾𝐾 = 8 = 3 ; check C(K) = 3(8) = 24 = R = 24 OK!
Step 4 Write the different Classes. The lowest class lower limit may be less
than or equal to the lowest value in the set of observation. The upper
limit shall be equal to the lower limit plus (C-1). Provide a unit gap
between succeeding classes, or for the lower limits of each classes,
the difference is equal to the class width, C. The same for the upper
limits of succeeding classes, the difference is equal to C.

Classes Frequency tally


1–3 Lowest Class
4–6
Lower limits

Upper limits

7–9
10 – 12
13 – 15
16 – 18
19 – 21
22 – 24 Highest Class
The highest value is not included, so
change the number of classes

3|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

Repeat step no. 2


Let: Number of classes, K = 7

Step no. 3
𝑅𝑅 24
Class width, 𝐶𝐶 = 𝐾𝐾 = 7 = 3.43 say 4, since the form of the
observations is a whole number, the class width shall be
rounded to a whole number.
Check, C(K) = 4(7) = 28 > R = 24 OK

Step no. 4 Since the product of C and K is greater than the range, R, the
lower limit of the lowest class may start with a value lower than
the lowest value in the set of observation.

Classes Frequency Tally, 𝑓𝑓𝑖𝑖


Lowest Value is 0–3 5
included
4–7 22
8 – 11 13
12 – 15 6
16 – 19 7
20 – 23 4
Highest Value is
included 24 – 27 3
Sum 60

Step no. 5 Construct the Frequency Distribution Table (FDT)

a) Class Mark (𝑋𝑋𝑖𝑖). It is the middle value in a class. In the class 4 – 7, the class
4+7 16+19
mark is 2
= 5.5. in the class 16 – 19, the class mark is 2
= 17.5

b) True Class Boundaries (TCB). They are often described as true limits
because these are more precise expressions of class limits. For whole
number form of observations, subtract one-half of the gap between
classes from the lower limit of the class and add the same to the upper
limit of the class. Note: the gap between classes is 1. The lower
boundary (LB) of a class is 0.5 less than its lower limit (LL), and its upper
boundary (UB) is 0.5 more than its upper limit (UL). In the class 8 – 11, the
lower boundary (LB) is 8 - 0.5 = 7.5 and the upper boundary (UB) is 11 +
0.5 = 11.5.

c) Cumulative Frequency (CF). There are two kinds of cumulative frequency


for a class. The less than cumulative frequency (<CF) of a class is found by
adding the frequency of the class and the frequencies of the lower classes.
In the example, the less than cumulative frequency of the class 12 – 15 is
6+(5+22+13) = 46. The greater than cumulative frequency (>CF) is found
by adding the frequency of the class and the frequencies of the upper

4|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

classes. In the example, the greater than cumulative of the class 12 – 15 is


6 + (7+4+3) = 20.

d) The relative cumulative frequency is just the percent form of the cumulative
frequency.

e) Histogram. A histogram is a bar graph-like representation of a frequency


distribution.

Frequency Distribution Table, FDT


Cumulative
True Class Relative
Class Frequency, CF
Classes Boundaries Cumulative
Mark Frequency (No. of
(TCB) Frequency, (%)
(midpoint) 𝑓𝑓𝑖𝑖 Observations)
𝑋𝑋𝑖𝑖 Less Greater Less Greater
LL UL LB UB
Than Than Than Than
0 3 – 0.5 3.5 1.5 5 5 60 8.33 100.00
4 7 3.5 7.5 5.5 22 27 55 45.00 91.67
8 11 7.5 11.5 9.5 13 40 33 66.67 55.00
12 15 11.5 15.5 13.5 6 46 20 76.67 33.33
16 19 15.5 19.5 17.5 7 53 14 88.33 23.33
20 23 19.5 23.5 21.5 4 57 7 95.00 11.67
24 27 23.5 27.5 25.5 3 60 3 100.00 5.00

Step 6 Draw the frequency histogram and the cumulative frequency


polygon (ogive)

 The Frequency Histogram is a pictorial/graphical representation of


the distribution of the data. It looks like a bar chart where the bases
of the bar are the true class boundaries (class width) and the heights
of the bars represent the frequencies associated with each class. It
shows the skewness of the distribution of the data.

20
Frequency

Modal Reading
10

0
-0.5

11.5

19.5

23.5

27.5
3.5

7.5

15.5

True Class Boundaries


FREQUENCY HISTOGRAM

5|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

 The frequency histogram illustrated, shows that the distribution of the


observation is not symmetrical, it is skewed to the right (positively
skewed). The Pearsonian Coefficient of Skewness (SK) is computed
by the formula
3(𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚)
𝑆𝑆𝑆𝑆 =
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑

 The cumulative frequency polygon is illustrated by plotting the


cumulative frequency against the class mark.
Cumulative Frequency

60
Less than Ogive
50

40

30
Median Reading
20

Greater than Ogive


10
25.5
5.5

9.5

13.5

17.5

21.5
1.5

Class Mark

FREQUENCY OGIVES

Measures of Central Locations

TCB Frequency Class Mark


𝑓𝑓𝑖𝑖 (midpoint) 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖2
LB UB
𝑋𝑋𝑖𝑖
– 0.5 3.5 5 1.5 7.5 11.25
3.5 7.5 22 5.5 121 665.5
7.5 11.5 13 9.5 123.5 1173.25
11.5 15.5 6 13.5 81 1093.5
15.5 19.5 7 17.5 122.5 2143.75
19.5 23.5 4 21.5 86 1849.0
23.5 27.5 3 25.5 76.5 1950.75
𝐾𝐾 𝑘𝑘 𝑘𝑘

� 𝑓𝑓𝑖𝑖 = 60 � 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 = 618 � 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖2 = 8887


𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

6|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

1. Mean
∑𝑘𝑘
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 618
𝜇𝜇 = ∑𝑘𝑘
= = 10.3 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
𝑖𝑖=1 𝑓𝑓𝑖𝑖 60

2. Median

 The median class is not necessarily the class located in the


middle. The median class is the class or interval that contains
the median. In the example, the number of observations is 60,
thus one-half of 60 is 30. Count this number of observations
either from the lowest class or from the highest class towards
the middle of the FDT. The 30th observation is located in the
class 8 – 11. Thus, the median class is the class 8 – 11.
𝑁𝑁 60
� −𝑓𝑓𝑐𝑐𝑐𝑐𝑐𝑐 � −27
2
𝑀𝑀𝑒𝑒 = 𝐿𝐿𝑚𝑚𝑚𝑚 + 𝐶𝐶 𝑓𝑓𝑚𝑚𝑚𝑚
= 7.5 + 4 � 2 � = 8.4 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
13

3. Mode

 The modal class is the class which has the highest


frequency. In the example, the class which has the highest
frequency is class 4 – 8. Thus, the modal class is class 4 – 8.

(𝑓𝑓 −𝑓𝑓𝑏𝑏 ) 22−5


𝑀𝑀𝑂𝑂 = 𝐿𝐿𝑚𝑚𝑚𝑚 + 𝐶𝐶 (2𝑓𝑓 𝑚𝑚𝑚𝑚−𝑓𝑓 )
= 3.5 + 4 �2(22)−5−13� = 6.1 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
𝑚𝑚𝑚𝑚 𝑏𝑏 −𝑓𝑓𝑎𝑎

Measure of Variation
1. As a population
2
2
∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 2 2
𝑁𝑁 ∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 2 − �∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 �
𝜎𝜎𝐺𝐺 = − 𝜇𝜇 =
𝑁𝑁 𝑁𝑁 2

2
60(8887) − (618)2
𝜎𝜎𝐺𝐺 = = 42.03
(60)2

Standard Deviation, 𝜎𝜎𝐺𝐺 = √42.03 = 6.48 contracts

2. As a sample
2
2 𝑛𝑛 ∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 2 − �∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 �
𝑆𝑆𝐺𝐺 =
𝑛𝑛(𝑛𝑛 − 1)

2 60(8887) − (618)2
𝑆𝑆𝐺𝐺 = = 42.74
(60)(59)

Standard Deviation, 𝑆𝑆𝐺𝐺 = √42.74 = 6.54 contracts

7|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

Pearsonian Coefficient of Skewness, SK


1. As a population
3(𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚)
𝑆𝑆𝑆𝑆 =
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑

3(𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚) 3(10.3 − 8.4) 5.7


𝑆𝑆𝑆𝑆 = = = = 0.880
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝜎𝜎𝐺𝐺 6.48

2. As a sample

3(𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚)
𝑆𝑆𝑆𝑆 =
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑

3(𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚) 3(10.3 − 8.4) 5.7


𝑆𝑆𝑆𝑆 = = = = 0.872
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑆𝑆𝐺𝐺 6.54

MEASURES OF POSITION

Fractiles or Quantiles

 These are measures of location that describe or locate the position of


certain noncentral pieces of data relative to the entire set of data of values
below which a specific fraction or percentage of the observations in a given
set must fall.

 The most common fractiles used are the Percentiles, Deciles, and
Quartiles.

1. Percentiles
 Percentiles are values that divide a set of observations into 100 equal
parts. These values, denoted by P1, P2, …. P45, ….. P99, are such
that 1% of the data falls below P1, 45% of the data falls below P45,
and 99% falls below P99.

𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝑃𝑃𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 �100 �
𝑓𝑓𝑘𝑘

2. Deciles
 Deciles are values that divide a set of observations into 10 equal
parts. These value, denoted by D1, D2, ……, D7,…… D9, are such
that 10% of the data falls below D1, 30% falls below D3, ….. and 90%
falls below D9.

8|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝐷𝐷𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 � 10 �
𝑓𝑓𝑘𝑘

3. Quartiles
 Quartiles are values that divide a set of observations into 4 equal
parts. These values, denoted by Q1, Q2, and Q3, are such that 25%
of the data falls below Q1, 50% falls below Q2, and 75% falls below
Q3.

𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝑄𝑄𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 � 4 �
𝑓𝑓𝑘𝑘

Where:
𝑁𝑁 = 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑦𝑦
𝐶𝐶 = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤ℎ
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 = 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑦𝑦 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑡𝑡ℎ𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑃𝑃𝑘𝑘 , 𝐷𝐷𝑘𝑘 , 𝑜𝑜𝑜𝑜 𝑄𝑄𝑘𝑘
𝐶𝐶𝐶𝐶< = 𝑡𝑡ℎ𝑒𝑒 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑡𝑡ℎ𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
𝑡𝑡ℎ𝑎𝑎𝑎𝑎 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑃𝑃𝑘𝑘 , 𝐷𝐷𝑘𝑘 , 𝑜𝑜𝑜𝑜 𝑄𝑄𝑘𝑘
𝑓𝑓𝑘𝑘 = 𝑡𝑡ℎ𝑒𝑒 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑃𝑃𝑘𝑘 , 𝐷𝐷𝑘𝑘 , 𝑜𝑜𝑜𝑜𝑄𝑄𝑘𝑘

Sample Problem

From the example given above, determine P25, P85, D4, D6, Q1, Q2, and Q3.

Solution: Using a portion of the constructed Frequency Distribution Table

Portion of Frequency Distribution Table, FDT


Cumulative
True Class
Frequency, CF
Boundaries
Frequency (No. of
(TCB)
𝑓𝑓𝑖𝑖 Observations)
Less Greater
LB UB
Than Than
– 0.5 3.5 5 5 60
3.5 7.5 22 27 55
7.5 11.5 13 40 33
11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
Sum,N=60

9|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

a. P25 = Pk

Determine Location of Percentile


𝑘𝑘𝑘𝑘 25(60)
𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑜𝑜𝑜𝑜 𝑃𝑃𝑘𝑘 = = = 15 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
100 100

Count 15 observations starting from the lowest class towards the middle of the
FDT. The 15th observation is on the second class.

Determine the values of 𝑓𝑓𝑘𝑘 , 𝑇𝑇𝐶𝐶𝐵𝐵𝐿𝐿𝐿𝐿 𝑎𝑎𝑎𝑎𝑎𝑎 𝐶𝐶𝐹𝐹<

Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
𝑓𝑓𝑘𝑘 Less Greater
LB UB
Than Than
– 0.53.5 5 5 60 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 3.5 7.5 22 27 55
7.5 11.5 13 40 33
11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60

Solve the value of Percentile


𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝑃𝑃𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 �100 �
𝑓𝑓𝑘𝑘

25(60)
−5
𝑃𝑃25 = 3.5 + 4 � 100 � = 5.32 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
22

Interpretation: 25% (or 15 contractors) of the 60 contractors have less


than 5.32 contracts for the past five years.

10 | 1 6 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

b. P85 = Pk

Determine Location of Percentile


𝑘𝑘𝑘𝑘 85(60)
𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑜𝑜𝑜𝑜 𝑃𝑃𝑘𝑘 = = = 51 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
100 100

Count 51 observations starting from the lowest class towards the middle of the
FDT. The 51st observation is on the fifth class.

Determine the values of 𝑓𝑓𝑘𝑘 , 𝑇𝑇𝐶𝐶𝐵𝐵𝐿𝐿𝐿𝐿 𝑎𝑎𝑎𝑎𝑎𝑎 𝐶𝐶𝐹𝐹<

Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
𝑓𝑓𝑘𝑘 Less Greater
LB UB
Than Than
– 0.53.5 5 5 60
3.5 7.5 22 27 55
7.5 11.5 13 40 33
11.5 15.5 6 46 20 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60

Solve the value of Percentile


𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝑃𝑃𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 �100 �
𝑓𝑓𝑘𝑘

85(60)
− 46
𝑃𝑃25 = 15.5 + 4 � 100 � = 18.36 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
7

Interpretation: 85% (or 51 contractors) of the 60 contractors have less


than 18.36 contracts for the past five years.

11 | 1 6 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

c. D4 = Dk

Determine Location of Decile


𝑘𝑘𝑘𝑘 4(60)
𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑜𝑜𝑜𝑜 𝐷𝐷𝑘𝑘 = = = 24 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
10 10

Count 24 observations starting from the lowest class towards the middle of the
FDT. The 24th observation is on the second class.

Determine the values of 𝑓𝑓𝑘𝑘 , 𝑇𝑇𝐶𝐶𝐵𝐵𝐿𝐿𝐿𝐿 𝑎𝑎𝑎𝑎𝑎𝑎 𝐶𝐶𝐹𝐹<

Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
Less Greater
LB UB 𝑓𝑓𝑘𝑘
Than Than
– 0.53.5 5 5 60 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 3.5 7.5 22 27 55
7.5 11.5 13 40 33
11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60

Solve the value of the Decile


𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝐷𝐷𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 � 10 �
𝑓𝑓𝑘𝑘

4(60)
−5
𝐷𝐷4 = 3.5 + 4 � 10 � = 6.95 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
22

Interpretation: 40% (or 24 contractors) of the 60 contractors have less


than 6.95 contracts for the past five years.

12 | 1 6 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

d. D6 = Dk

Determine Location of Decile


𝑘𝑘𝑘𝑘 6(60)
𝐿𝐿𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 = = = 36 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
10 10

Count 36 observations starting from the lowest class towards the middle of the
FDT. The 36th observation is on the third class.

Determine the values of 𝑓𝑓𝑘𝑘 , 𝑇𝑇𝐶𝐶𝐵𝐵𝐿𝐿𝐿𝐿 𝑎𝑎𝑎𝑎𝑎𝑎 𝐶𝐶𝐹𝐹<

Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
𝑓𝑓𝑘𝑘 Less Greater
LB UB
Than Than
– 0.53.5 5 5 60
3.5 7.5 22 27 55 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 7.5 11.5 13 40 33
11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60

Solve the value of the Decile


𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝐷𝐷𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 � 10 �
𝑓𝑓𝑘𝑘

6(60)
− 27
𝐷𝐷6 = 7.5 + 4 � 10 � = 10.27 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
13

Interpretation: 60% (or 36 contractors) of the 60 contractors have less


than 10.27 contracts for the past five years.

13 | 1 6 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

e. Q1 = Qk

Determine Location of Quartile


𝑘𝑘𝑘𝑘 1(60)
𝐿𝐿𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑄𝑄𝑘𝑘 = = = 15 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
4 4

Count 15 observations starting from the lowest class towards the middle of the
FDT. The 15th observation is on the second class.

Determine the values of 𝑓𝑓𝑘𝑘 , 𝑇𝑇𝐶𝐶𝐵𝐵𝐿𝐿𝐿𝐿 𝑎𝑎𝑎𝑎𝑎𝑎 𝐶𝐶𝐹𝐹<

Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
𝑓𝑓𝑘𝑘 Less Greater
LB UB
Than Than
– 0.5 3.5 5 5 60 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 3.5 7.5 22 27 55
7.5 11.5 13 40 33
11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60

Solve the value of the Quartile


𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝑄𝑄𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 � 4 �
𝑓𝑓𝑘𝑘
1(60)
−5
𝑄𝑄1 = 3.5 + 4 � 4 � = 5.32 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑃𝑃25
22
Interpretation: 25% (or 15 contractors) of the 60 contractors have less
than 5.32 contracts for the past five years.

14 | 1 6 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

f. Q2 = Qk

Determine Location of Quartile


𝑘𝑘𝑘𝑘 2(60)
𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑜𝑜𝑜𝑜 𝑄𝑄𝑘𝑘 = = = 30 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
4 4

Count 30 observations starting from the lowest class towards the middle of the
FDT. The 30th observation is on the third class.

Determine the values of 𝑓𝑓𝑘𝑘 , 𝑇𝑇𝐶𝐶𝐵𝐵𝐿𝐿𝐿𝐿 𝑎𝑎𝑎𝑎𝑎𝑎 𝐶𝐶𝐹𝐹<

Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
Less Greater
LB UB 𝑓𝑓𝑘𝑘
Than Than
– 0.53.5 5 5 60
3.5 7.5 22 27 55 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 7.5 11.5 13 40 33
11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60

Solve the value of the Quartile


𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝑄𝑄𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 � 4 �
𝑓𝑓𝑘𝑘

2(60)
− 27
𝑄𝑄2 = 7.5 + 4 � 4 � = 8.42 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀
13

Interpretation: 50% (or 30 contractors) of the 60 contractors have less


than 8.42 contracts for the past five years.

15 | 1 6 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3

g. Q3 = Qk

Determine Location of Quartile


𝑘𝑘𝑘𝑘 3(60)
𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑜𝑜𝑜𝑜 𝑄𝑄𝑘𝑘 = = = 45 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
4 4

Count 45 observations starting from the lowest class towards the middle of the
FDT. The 45th observation is on the fourth class.

Determine the values of 𝑓𝑓𝑘𝑘 , 𝑇𝑇𝐶𝐶𝐵𝐵𝐿𝐿𝐿𝐿 𝑎𝑎𝑎𝑎𝑎𝑎 𝐶𝐶𝐹𝐹<

Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
Less Greater
LB UB 𝑓𝑓𝑘𝑘
Than Than
– 0.53.5 5 5 60
3.5 7.5 22 27 55
7.5 11.5 13 40 33 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60

Solve the value of the Quartile


𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝑄𝑄𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 � 4 �
𝑓𝑓𝑘𝑘

3(60)
− 40
𝑄𝑄3 = 11.5 + 4 � 4 � = 14.83 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
6

Interpretation: 75% (or 45 contractors) of the 60 contractors have less


than 14.83 contracts for the past five years.

16 | 1 6 cblamsis

You might also like