LEC 3 - STATISTICALDESCRIPTION (GROUPED DATA ANALYSIS)
LEC 3 - STATISTICALDESCRIPTION (GROUPED DATA ANALYSIS)
Or
𝐾𝐾 = 1 + 1.33 log 𝑁𝑁
𝑅𝑅 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
𝐶𝐶 = =
𝐾𝐾 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
The value of C must be of the same form as the form of the original set of
data. If the given set of data are in whole numbers, the C value must be
rounded to a whole number and if the given set of data are given until the
first decimal point, the C value must be rounded until the first decimal point
such that 𝐶𝐶 ⋅ 𝐾𝐾 ≥ 𝑅𝑅.
1|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
4. Enumerate the different classes. The lowest class must include the lowest
value, and the highest class must include the highest value. Make a
frequency tally.
B. MEASURES OF VARIATION
1. Population Variance
2
𝑋𝑋𝑖𝑖2 𝑁𝑁 ∑𝑘𝑘 2 𝑘𝑘
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 −�∑𝑖𝑖−1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 �
𝜎𝜎𝐺𝐺2 = ∑𝑘𝑘𝑖𝑖=1 − 𝜇𝜇2 =
𝑁𝑁 𝑁𝑁2
2|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
Sample problem
The following numbers represent the total number of projects undertaken by 60
different contractors in CAR for the past five years.
12 6 8 23 6 7 25 7 5 3 4 3
18 10 14 7 19 9 6 8 4 10 6 25
18 13 8 10 8 8 14 7 6 5 6 6
19 6 17 14 6 17 9 8 3 4 21 22
12 8 24 7 2 8 7 7 16 5 22 1
Set up a frequency distribution table and determine the measure of central locations
and the standard deviation considering it as a population.
Construction of FDT
Step 1 Range, R = 25 – 1 = 24
Step 2 Number of classes, 𝐾𝐾
𝐾𝐾 = √𝑁𝑁 = 7.70
or 𝐾𝐾 = 1 + 3.3 𝑙𝑙𝑙𝑙𝑙𝑙 𝑁𝑁 = 1 + 3.3 𝑙𝑙𝑙𝑙𝑙𝑙 60 = 6.87
Try: K = 8
𝑅𝑅 24
Step 3 Class Width, 𝐶𝐶 = 𝐾𝐾 = 8 = 3 ; check C(K) = 3(8) = 24 = R = 24 OK!
Step 4 Write the different Classes. The lowest class lower limit may be less
than or equal to the lowest value in the set of observation. The upper
limit shall be equal to the lower limit plus (C-1). Provide a unit gap
between succeeding classes, or for the lower limits of each classes,
the difference is equal to the class width, C. The same for the upper
limits of succeeding classes, the difference is equal to C.
Upper limits
7–9
10 – 12
13 – 15
16 – 18
19 – 21
22 – 24 Highest Class
The highest value is not included, so
change the number of classes
3|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
Step no. 3
𝑅𝑅 24
Class width, 𝐶𝐶 = 𝐾𝐾 = 7 = 3.43 say 4, since the form of the
observations is a whole number, the class width shall be
rounded to a whole number.
Check, C(K) = 4(7) = 28 > R = 24 OK
Step no. 4 Since the product of C and K is greater than the range, R, the
lower limit of the lowest class may start with a value lower than
the lowest value in the set of observation.
a) Class Mark (𝑋𝑋𝑖𝑖). It is the middle value in a class. In the class 4 – 7, the class
4+7 16+19
mark is 2
= 5.5. in the class 16 – 19, the class mark is 2
= 17.5
b) True Class Boundaries (TCB). They are often described as true limits
because these are more precise expressions of class limits. For whole
number form of observations, subtract one-half of the gap between
classes from the lower limit of the class and add the same to the upper
limit of the class. Note: the gap between classes is 1. The lower
boundary (LB) of a class is 0.5 less than its lower limit (LL), and its upper
boundary (UB) is 0.5 more than its upper limit (UL). In the class 8 – 11, the
lower boundary (LB) is 8 - 0.5 = 7.5 and the upper boundary (UB) is 11 +
0.5 = 11.5.
4|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
d) The relative cumulative frequency is just the percent form of the cumulative
frequency.
20
Frequency
Modal Reading
10
0
-0.5
11.5
19.5
23.5
27.5
3.5
7.5
15.5
5|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
60
Less than Ogive
50
40
30
Median Reading
20
9.5
13.5
17.5
21.5
1.5
Class Mark
FREQUENCY OGIVES
6|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
1. Mean
∑𝑘𝑘
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 618
𝜇𝜇 = ∑𝑘𝑘
= = 10.3 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
𝑖𝑖=1 𝑓𝑓𝑖𝑖 60
2. Median
3. Mode
Measure of Variation
1. As a population
2
2
∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 2 2
𝑁𝑁 ∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 2 − �∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 �
𝜎𝜎𝐺𝐺 = − 𝜇𝜇 =
𝑁𝑁 𝑁𝑁 2
2
60(8887) − (618)2
𝜎𝜎𝐺𝐺 = = 42.03
(60)2
2. As a sample
2
2 𝑛𝑛 ∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 2 − �∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑋𝑋𝑖𝑖 �
𝑆𝑆𝐺𝐺 =
𝑛𝑛(𝑛𝑛 − 1)
2 60(8887) − (618)2
𝑆𝑆𝐺𝐺 = = 42.74
(60)(59)
7|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
2. As a sample
3(𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚)
𝑆𝑆𝑆𝑆 =
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
MEASURES OF POSITION
Fractiles or Quantiles
The most common fractiles used are the Percentiles, Deciles, and
Quartiles.
1. Percentiles
Percentiles are values that divide a set of observations into 100 equal
parts. These values, denoted by P1, P2, …. P45, ….. P99, are such
that 1% of the data falls below P1, 45% of the data falls below P45,
and 99% falls below P99.
𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝑃𝑃𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 �100 �
𝑓𝑓𝑘𝑘
2. Deciles
Deciles are values that divide a set of observations into 10 equal
parts. These value, denoted by D1, D2, ……, D7,…… D9, are such
that 10% of the data falls below D1, 30% falls below D3, ….. and 90%
falls below D9.
8|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝐷𝐷𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 � 10 �
𝑓𝑓𝑘𝑘
3. Quartiles
Quartiles are values that divide a set of observations into 4 equal
parts. These values, denoted by Q1, Q2, and Q3, are such that 25%
of the data falls below Q1, 50% falls below Q2, and 75% falls below
Q3.
𝑘𝑘𝑘𝑘
− 𝐶𝐶𝐶𝐶<
𝑄𝑄𝑘𝑘 = 𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 + 𝐶𝐶 � 4 �
𝑓𝑓𝑘𝑘
Where:
𝑁𝑁 = 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑦𝑦
𝐶𝐶 = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤ℎ
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 = 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑦𝑦 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑡𝑡ℎ𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑃𝑃𝑘𝑘 , 𝐷𝐷𝑘𝑘 , 𝑜𝑜𝑜𝑜 𝑄𝑄𝑘𝑘
𝐶𝐶𝐶𝐶< = 𝑡𝑡ℎ𝑒𝑒 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑡𝑡ℎ𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
𝑡𝑡ℎ𝑎𝑎𝑎𝑎 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑃𝑃𝑘𝑘 , 𝐷𝐷𝑘𝑘 , 𝑜𝑜𝑜𝑜 𝑄𝑄𝑘𝑘
𝑓𝑓𝑘𝑘 = 𝑡𝑡ℎ𝑒𝑒 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑃𝑃𝑘𝑘 , 𝐷𝐷𝑘𝑘 , 𝑜𝑜𝑜𝑜𝑄𝑄𝑘𝑘
Sample Problem
From the example given above, determine P25, P85, D4, D6, Q1, Q2, and Q3.
9|16 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
a. P25 = Pk
Count 15 observations starting from the lowest class towards the middle of the
FDT. The 15th observation is on the second class.
Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
𝑓𝑓𝑘𝑘 Less Greater
LB UB
Than Than
– 0.53.5 5 5 60 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 3.5 7.5 22 27 55
7.5 11.5 13 40 33
11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60
25(60)
−5
𝑃𝑃25 = 3.5 + 4 � 100 � = 5.32 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
22
10 | 1 6 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
b. P85 = Pk
Count 51 observations starting from the lowest class towards the middle of the
FDT. The 51st observation is on the fifth class.
Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
𝑓𝑓𝑘𝑘 Less Greater
LB UB
Than Than
– 0.53.5 5 5 60
3.5 7.5 22 27 55
7.5 11.5 13 40 33
11.5 15.5 6 46 20 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60
85(60)
− 46
𝑃𝑃25 = 15.5 + 4 � 100 � = 18.36 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
7
11 | 1 6 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
c. D4 = Dk
Count 24 observations starting from the lowest class towards the middle of the
FDT. The 24th observation is on the second class.
Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
Less Greater
LB UB 𝑓𝑓𝑘𝑘
Than Than
– 0.53.5 5 5 60 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 3.5 7.5 22 27 55
7.5 11.5 13 40 33
11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60
4(60)
−5
𝐷𝐷4 = 3.5 + 4 � 10 � = 6.95 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
22
12 | 1 6 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
d. D6 = Dk
Count 36 observations starting from the lowest class towards the middle of the
FDT. The 36th observation is on the third class.
Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
𝑓𝑓𝑘𝑘 Less Greater
LB UB
Than Than
– 0.53.5 5 5 60
3.5 7.5 22 27 55 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 7.5 11.5 13 40 33
11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60
6(60)
− 27
𝐷𝐷6 = 7.5 + 4 � 10 � = 10.27 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
13
13 | 1 6 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
e. Q1 = Qk
Count 15 observations starting from the lowest class towards the middle of the
FDT. The 15th observation is on the second class.
Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
𝑓𝑓𝑘𝑘 Less Greater
LB UB
Than Than
– 0.5 3.5 5 5 60 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 3.5 7.5 22 27 55
7.5 11.5 13 40 33
11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60
14 | 1 6 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
f. Q2 = Qk
Count 30 observations starting from the lowest class towards the middle of the
FDT. The 30th observation is on the third class.
Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
Less Greater
LB UB 𝑓𝑓𝑘𝑘
Than Than
– 0.53.5 5 5 60
3.5 7.5 22 27 55 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 7.5 11.5 13 40 33
11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60
2(60)
− 27
𝑄𝑄2 = 7.5 + 4 � 4 � = 8.42 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀
13
15 | 1 6 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 3
g. Q3 = Qk
Count 45 observations starting from the lowest class towards the middle of the
FDT. The 45th observation is on the fourth class.
Frequency Cumulative
True Class
𝑓𝑓𝑖𝑖 Frequency, CF
Boundaries
(No. of
(TCB)
Observations)
Less Greater
LB UB 𝑓𝑓𝑘𝑘
Than Than
– 0.53.5 5 5 60
3.5 7.5 22 27 55
7.5 11.5 13 40 33 𝐶𝐶𝐶𝐶<
𝑇𝑇𝑇𝑇𝑇𝑇𝐿𝐿𝐿𝐿 11.5 15.5 6 46 20
15.5 19.5 7 53 14
19.5 23.5 4 57 7
23.5 27.5 3 60 3
C=4 Sum,N=60
3(60)
− 40
𝑄𝑄3 = 11.5 + 4 � 4 � = 14.83 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
6
16 | 1 6 cblamsis