Statistics
Statistics
Introduction
Statistics
Measures of dispersion
Measures of central tendency
1. Range
1. mean, median, mode
2. Mean deviation
2. quartile, decile
3. σ, σ2
• Eg. Suppose Virat Kohli’s and Rohit Sharma’s average runs is 50.
The flowchart for Virat’s runs looks like this :
0 50 100
While Rohit’s runs are distributed like this :
0 50 100
Although both have same average runs, Virat is more consistent. Virat’s runs about the
mean are less scattered. This statistical figure is measured by measures of dispersion.
Types of data
(a) Ungrouped data : Eg. runs scored are 10, 20, 30, 40, 50. These values are denoted by 𝑥𝑖 .
(b) Grouped data :
i. Discrete frequency distribution : For every data there is a corresponding frequency
(denoted by 𝑓𝑖 ).
𝑥𝑖 10 20 30 40 50
𝑓𝑖 5 2 5 1 4
ii. Continuous frequency distribution :
𝑥𝑖 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50
𝑓𝑖 5 1 3 4 2
Now we have represented runs in terms of intervals i.e., there were 5 values where runs
lie in between 0 – 10 interval. Similarly, there were 3 values where runs were in the 20 –
30 interval. These intervals are called class intervals. The assumption is that the data is
centred at the middle of the class interval i.e., in 40 – 50 category, 2 ‘45 runs’ are scored.
Types of distribution
• Symmetric distribution : A distribution is a symmetric distribution if the values of the
mean, mode and median coincide. In a symmetric distribution, frequencies are
symmetrically distributed on both sides of the centre point of the frequency curve.
• Asymmetric distribution : A distribution which is not symmetric is called a skewed –
distribution. In a moderately asymmetric distribution, the interval between the mean and
median is approximately one – third of the interval between the mean and the mode i.e. we
have the following empirical relation between them
Partition values
Partition
values
Quartile
•
0 Q1 Q2 Q3 N
Quartile divides the distribution into four equal parts. Q1 stands for lower quartile, Q3
stands for upper quartile and Q2 is the middle quartile which is same as median.
• Lower quartile :
i. 𝑁 + 1 𝑡ℎ
Discrete series : 𝑄1 = size of ( ) item
4
ii. 𝑁
( − 𝐶)
Continuous series : 𝑄1 = 𝑙 + 4 ×ℎ
𝑓
• Upper quartile :
i. 𝑡ℎ
3(𝑁 + 1)
Discrete series ∶ 𝑄3 = size of [ ] item
4
ii. 3𝑁
( − 𝐶)
Continuous series ∶ 𝑄3 = 𝑙 + 4 ×ℎ
𝑓
• Eg. 10, 20, 30, 40, 50, 60, 70, 80
N=8
𝑁+1 9
∴ 𝑄1 = ( ) th term = th term = 2.25th term
4 4
20 + 30
2.25𝑡ℎ term ≡ = 25
2
𝑁
• For continuous distribution first step is to identify 𝑄1 class (i.e. c.f. > ) or 𝑄3 class (i.e. c.f.
4
3𝑁
> ) as asked in question.
4
Q4. Find Q3
𝑥𝑖 5 4 9 12 15 6 10
𝑓𝑖 8 6 12 8 6 9 10
A4. Ans. 10 (hint : arrange data in ascending order)
Decile
• Decile divides total frequencies N into ten equal parts.
𝑁×𝑗
−𝐶
𝐷𝑗 = 𝑙 + 10 × ℎ [𝑗 = 1,2,3,4,5,6,7,8,9]
𝑓
𝑁
−𝐶
• Eg. If 𝑗 = 5, then 𝐷5 = 𝑙 + 2
× ℎ. Hence D5 is also known as median.
𝑓
Q5. Find D5
𝑥𝑖 5 4 9 12 15 6 10
𝑓𝑖 8 6 12 8 6 9 10
A5. Ans. 9
𝑁 + 1 𝑡ℎ
Hint : Use formula 𝐷5 = 5 ( ) term
10
Percentile
• Percentile divides total frequencies N into hundred equal parts.
𝑁×𝑘
−𝐶
𝑃𝑘 = 𝑙 + 100 × ℎ where 𝑘 = 1,2,3,4,5, … ,99
𝑓
• Eg. For ungrouped data
𝑁 + 1 𝑡ℎ
𝑃10 = value of 10 ( ) term
100
For continuous data
𝑁
(10 × − 𝐶)
𝑃10 = 𝑙 + 100 ×ℎ
𝑓
Q6. Marks obtained by 50 students. If 70% students pass the test, find min. marks needed by
students to pass the exam.
Marks 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of
3 5 9 12 18 3
students
A6. Ans. 28 [or 27.77] (Hint : value of P30 is required)
Measures of dispersion
• The degree to which numerical data tends to spread about an average value is called the
dispersion of the data.
• Four measures of dispersion are : (1) Range, (2) Mean Deviation, (3) Standard deviation,
(4) Square deviation
Range
• It is the difference between the values of extreme items in a series.
Range = 𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛
𝑋𝑚𝑎𝑥−𝑋𝑚𝑖𝑛
• The coefficient of range (scatter) =
𝑋𝑚𝑎𝑥+𝑋𝑚𝑖𝑛
• Range is not the measure of central tendency. Range is widely used in statistical series
relating to quality control in production.
Inter quartile range
• The inter – quartile range is found by taking the difference between third and first quartiles
and is given by the formula :
Inter – quartile range = 𝑄3 − 𝑄1
where 𝑄1 = first quartile or lower quartile and 𝑄3 = third quartile or upper quartile
Percentile range
• This is measured by the following formula
Percentile range = 𝑃90 − 𝑃10
where 𝑃90 = 90th percentile and 𝑃10 = 10th percentile
• Percentile range is considered better than range as well as inter quartile range
Quartile deviation or semi – quartile range
• It is one – half of the difference between the third quartile and first quartile i.e.
𝑄3 − 𝑄1
𝑄. 𝐷. =
2
𝑄 −𝑄
• Coefficient of quartile deviation = 3 1
𝑄3 +𝑄1
where 𝑄3 is the third or upper quartile and 𝑄1 is the first or lower quartile
Mean deviation
• The arithmetic average of the deviations (all taken positive) from the mean, median or
mode is known as mean deviation.
• Formulae :
∑|𝑥𝑖 − 𝑥̅ |
Mean deviation about mean ∶ 𝑀. 𝐷. (𝑥̅ ) =
𝑛
∑|𝑥𝑖 − 𝑀 |
Mean deviation about median ∶ 𝑀. 𝐷. (𝑚𝑒𝑑𝑖𝑎𝑛) =
𝑛
• In general mean deviation (M.D.) always stands for mean deviation about median.
Q14. Find the variance and standard deviation of the following frequency distribution :
𝑥𝑖 2 4 6 8 10 12 14 16
𝑓𝑖 4 4 5 15 8 5 4 5
A14. Calculation of variance and standard deviation :
𝑥𝑖 − 𝑋̅
𝑥𝑖 𝑓𝑖 𝑓𝑖 𝑥𝑖 (𝑥𝑖 − 𝑋̅)2 𝑓𝑖 (𝑥𝑖 − 𝑋̅)2
= 𝑥𝑖 − 9
2 4 8 -7 49 196
4 4 16 -5 25 100
6 5 30 -3 9 45
8 15 120 -1 1 15
10 8 80 1 1 8
12 5 60 3 9 45
14 4 56 5 25 100
16 5 80 7 49 245
𝑁 = Σ𝑓𝑖 Σ𝑓𝑖 𝑥𝑖 Σ𝑓𝑖 (𝑥𝑖 − 𝑋̅)2
= 50 = 450 = 754
Here 𝑁 = 50, Σ𝑓𝑖 𝑥𝑖 = 450
Σ𝑓𝑖 𝑥𝑖 450
𝑋̅ = = =9
𝑁 50
We have Σ𝑓𝑖 (𝑥𝑖 − 𝑋̅)2 = 754
1 754
∴ 𝑉𝑎𝑟(𝑋) = [∑ 𝑓𝑖 (𝑥𝑖 − 𝑋̅)2 ] = = 15.08
𝑁 50
𝑆. 𝐷. = √𝑉𝑎𝑟(𝑋) = √15.08 = 3.88
Alternate method : Use the formula
2
2
∑ 𝑓𝑖 𝑥𝑖2 ∑ 𝑓𝑖 𝑥𝑖
𝜎 = −( )
𝑛 𝑛
• Short cut method : We calculate variance about 𝑥̅ . But we can calculate it about any random
number A as well.
i. ∑(𝑥𝑖 − 𝑥̅ )2
Just like 𝜎 2 =
𝑁
∑ 2
2
𝑑
Similarly, 𝜎 =
𝑁
ii. 2 2
2
∑ 𝑥𝑖 ∑ 𝑥𝑖
Just like 𝜎 = −( )
𝑁 𝑁
2
2
∑ 𝑑2 ∑𝑑
Similarly, 𝜎 = −( )
𝑁 𝑁
where, 𝑑 = 𝑥𝑖 − 𝐴 = deviation from the assumed mean A
𝑓 = frequency of the item
𝑁 = ∑ 𝑓 = sum of frequencies
• Standard deviation for continuous series :
𝑥 −𝐴 2 𝑥 −𝐴 2
∑ 𝑓𝑖 ( 𝑖 ) ∑ 𝑓𝑖 ( 𝑖 ) ∑ 𝑓𝑖 𝑢𝑖2 ∑ 𝑓𝑖 𝑢𝑖
2
2
𝜎 =ℎ 2[ ℎ −( ℎ 2
) ]=ℎ [ −( ) ]
𝑁 𝑁 𝑁 𝑁
18 18
Q15. If ∑(𝑥𝑖 − 8) = 9 and ∑(𝑥𝑖 − 8)2 = 45, then find the standard deviation of 𝑥1 , 𝑥2 , … 𝑥18
𝑖=1 𝑖=1
3
A15. Ans.
2
Q16. Calculate the mean and standard deviation for the following data :
Wages upto
15 30 45 60 75 90 105 120
(in Rs.)
No. of
12 30 65 107 157 202 222 230
workers
A16.
Class Cumulative Mid – 𝑥𝑖 − 67.5
Frequency 𝑢 𝑖 = 𝑓𝑖 𝑢𝑖 𝑓𝑖 𝑢𝑖2
interval frequency values 15
0 – 15 12 7.5 12 -4 -48 192
15 – 30 30 22.5 18 -3 -54 162
30 – 45 65 37.5 35 -2 -70 140
45 – 60 107 52.5 42 -1 -42 42
60 – 75 157 67.5 50 0 0 0
75 – 90 202 82.5 45 1 45 45
90 – 105 222 97.5 20 2 40 80
105 – 120 230 112.5 8 3 24 72
∑ 𝑓𝑖 𝑢𝑖 ∑ 𝑓𝑖 𝑢𝑖2
∑ 𝑓𝑖 = 230
= −105 = 733
Here 𝐴 = 67.5, ℎ = 15, 𝑁 = 230, ∑ 𝑓𝑖 𝑢𝑖 = −105 𝑎𝑛𝑑 ∑ 𝑓𝑖 𝑢𝑖2 = 733
1 −105
∴ 𝑀𝑒𝑎𝑛 = 𝐴 + ℎ ( ∑ 𝑓𝑖 𝑢𝑖 ) = 67.5 + 15 ( )
𝑁 230
= 67.5 − 6.85 = 60.65
2
2
1 2 1
and 𝑉𝑎𝑟(𝑥) = ℎ [ ∑ 𝑓𝑖 𝑢𝑖 − ( ∑ 𝑓𝑖 𝑢𝑖 ) ]
𝑁 𝑁
733 −105 2
⇒ 𝑉𝑎𝑟(𝑥) = 225 [ −( ) ]
230 230
= 225[3.18 − 0.2025] = 669.9375
∴ 𝑆. 𝐷. = √𝑉𝑎𝑟(𝑋) = √669.9375 = 25.883
2
∑ 𝑓𝑖 𝑢𝑖2
2[
Σ𝑓𝑖 𝑢𝑖 2 71 −21 2
𝜎 =ℎ −( ) ] = 4[ − ( ) ]
𝑁 𝑁 50 50
= 4[1.42 − 0.1764] = 4.97
Coefficient of standard deviation
• To compare the dispersion of two frequency distributions the relative measure of standard
deviation is computed which is known as coefficient of standard deviation and is given by
𝜎
Coefficient of S.D. = , where 𝑥̅ is the 𝐴. 𝑀.
𝑥̅
𝜎
• Also, Coefficient of variance = coefficient of S.D. × 100 = × 100
𝑥̅
Square deviation
• Root mean square deviation :
𝑛
1
𝑆 = √ ∑ 𝑓𝑖 (𝑥𝑖 − 𝐴)2
𝑁
𝑖=1
Q18. The mean square deviation of a set of n observations 𝑥1 , 𝑥2 , … 𝑥𝑛 about a point c is defined as
1 𝑛
∑𝑖=1(𝑥𝑖 − 𝑐 )2 . The mean square deviation about -2 and 2 are 18 and 10 respectively, then
𝑛
standard deviation of this set of observations is
(A) 3 (B) 2 (C) 1 (D) None of these
A18. Ans. (A)
1 1
∑(𝑥𝑖 + 2)2 = 18 and ∑(𝑥𝑖 − 2)2 = 10
𝑛 𝑛
⇒ ∑(𝑥𝑖 + 2) = 18𝑛 and ∑(𝑥𝑖 − 2)2 = 10𝑛
2
⇒ ∑(𝑥𝑖 + 2)2 + ∑(𝑥𝑖 − 2)2 = 28𝑛 and ∑(𝑥𝑖 + 2)2 − ∑(𝑥𝑖 − 2)2 = 8𝑛
Q19. For 2 data sets each of size 5 the variances are given to be 4 and 5 & the corresponding
means are given to be 2 and 4 respectively. Variance of combined data set is?
A19. 𝑛1 = 5 , 𝑛2 = 5 , 𝜎12 = 4, 𝜎22 = 5 , ̅̅̅
𝑥1 = 2, ̅̅̅
𝑥2 = 4
𝑛1 ̅̅̅
𝑥1 + 𝑛2 ̅̅̅
𝑥2 10 + 20
𝑥̅ = = =3
𝑛1 + 𝑛2 10
𝑑1 = ̅̅̅
𝑥1 − 𝑥̅ = −1
𝑑2 = ̅̅̅
𝑥2 − 𝑥̅ = 1
2
𝑛1 (𝜎12 + 𝑑12 ) + 𝑛2 (𝜎22 + 𝑑22 )
𝜎 =
𝑛1 + 𝑛2
5 4 + 1) + 5(5 + 1) 11
(
⇒ 𝜎2 = =
10 2