Statistics
Statistics
Statistics
Career After +2
Introduction
Before going into details, let us consider an example.
Consider the marks obtained by 20 students of a class in
a math test, where the maximum marks are 50. The
scores of the 20 students are as follows
8, 2, 6, 17, 16, 2, 6, 18, 20, 15, 14, 8, 7, 9, 25, 5, 7, 8, 10, 12.
Now the marks obtained can be shown or collected in
different ways for their analysis:
(a) In the illustration above, we have shown the marks
of 20 students written in no definite pattern i.e., we
have just asked the student and written down
his/her marks. This method of collection and
representation of data is known as ungrouped
data.
(b) The other way of showing data is to see that how
many times any particular marks are obtained by
how many students.
1
For example in the above data:
Marks obtained No. of Students
2 2
8 3
6 2
and so on.
In this method of collection and representation, we check
how many times a particular observation (in this case,
marks) occurs. This is known as discrete frequency
distribution of the data or simply, the frequency of that
observation.
(c) Another way is to divide the given data in small
groups or intervals with each observation placed in
the interval in which it lies.
Let intervals be 0-10, 10-20, 20-30, 30-40, 40-50 i.e., we
have divided 50 marks into 5 intervals of width 10 each.
Now marks (2, 6, 7, 8, 9) will come under interval 0-10.
Marks [10, 12, 14, 17, 18, 15, 16] will come under 10-20
and so on. In tabular form:
2
Class Interval Frequency
0 – 10 5 (as there are marks of 5 student in
this interval)
10 – 20 7 (as the marks of 7 student lies in the
interval)
20 – 30 and so on.
This method is known as continuous frequency
distribution method.
(d) Alternately, we could have designed the ranges in
such a way that both upper and lower limits are
included. (inclusive distribution)
Class Interval Frequency
0 – 10 6 (as there are marks of 6 students in
this interval)
11 – 20 6 (as the marks of 6 students lie in this
interval)
21 – 30 and so on.
So we have seen a few ways to represent the collected
data.
After the data is collected it is analyzed as per
requirements.
3
Definitions
We use the word statistics in two forms:
in plural: statistics means data.
in singular: it is the science which deals with collection,
presentation, analysis & interpretation of numerical data.
(a) Class Interval: The small group in which we place
each observation is known as class interval. For
example 0 – 10, 10 – 20 etc. are class intervals.
(b) Class Frequency: The number of observations
which comes under any class interval is the
frequency of that class interval. For example in the
above marks distribution of the students, 5 is the
class frequency of class interval 0 – 10, 7 is class
frequency or class interval 10 – 20 and so on.
(c) Limit: Each interval has two limits. For example in
class interval 0 – 10, 0 is lower limit and 10 is
upper limit.
(d) Class Marks or mid value: It is the mid point of any
class interval for example class interval of 10– 20
10 20 30
the class marks is 1.5
2 2
4
(f) Cumulative frequency: The sum of preceding
frequencies of class intervals.
Measures
The terms we calculate and use for analysing data are
broadly divided into 2 major categories or measures:
i) Measures of central tendency: In this category we
have the terms -
(a) Mean (b) Median (c) Mode
ii) Measures of dispersion: In this category we have
terms -
(a) Range (b) Deviation
Note: When the values of mean or median or mode are
calculated, it has been observed that these values lie
within the data and are considered as true or standard
values or very close to standard values. That’s why
mean, median or mode are known as measure of central
tendency.
But when we calculate the values of range, deviation etc.
of the data, it is observed that these values lie outside
the range of data. That is why these parameters are
known as measures of dispersion. In this measure we try
5
to see how close or how far from true or standard values,
our data is.
Mean
Arithmetic Mean: is a numerical value obtained by
dividing the total sum of all observations by total number
of all observations. So if x1, x2, x3......... upto xn are n
observations then arithmetic mean of these n
observations which is denoted by x is given by
x1 x2 x3 ............. xn
x ..................... (i)
n
1 n n
xi , where
n i 1
x
i 1
i (read as summation xi) is a notation
x1 f 1 x 2 f 2 ............. x n f n x
i 1
i fi x
i 1
i fi
where
x , i 1,2,3.......n
f 1 f 2 f 3 ......... f n n
n
f
i 1
i
f
i 1
i n is total number of observation
6
Change of Origin Method for calculating mean.
Let A be any assumed number (usually taken in the
middle of the series) and d t be the deviation about A i.e.,
d t = xi – A.
The Arithmetic mean x is given by
x
f i xi
[ d i xi A ]
N
1 1
(A di ) , f f
1
N
f i
N
i . A
N
i di
A
1
f i di f i N
N
Here A is called Assumed Mean and d i is called deviation
about A.
Shift of Origin and Change of Scale Method for
calculating mean. (Step-Deviation Method).
In Case of continuous distribution having equal class
interval is say of width h, we use
xi A
ui ............(i)
h
x i A hui ............(ii)
7
1
x
1
N f xi i ,
N
f (A h u )
i i
f i A N f i hu i , A. N fi u i
1 1 fi
h.
N N
A h.
fu i i
. ( f i N )
N
Group mean or combined mean:
If two groups with mean x1 and x 2 have n1 and n2
number of observations respectively, then the combined
mean of the group is given by
n1 x1 n 2 x 2
x
n1 n2
Weighted Mean:
When the weights of each observation instead of
frequency is given, then the mean of this set of
observation is known as weighted mean.
w1 x1 w2 x2 w3 x3 ..... wn xn
W
w1 w2 w3 ..... wn
8
For the sake of understanding, consider weights, as
importance in terms of number, given to that
observation.
Some facts about arithmetic mean
(a) If every observation is increased or decreased by
same number, then the arithmetic mean of new
observation also increases or decreases by same
number.
(b) If every observation is multiplied or divided by
same number of constant then arithmetic means
of the new set of observation is obtained by
multiplying or dividing the initial arithmetic mean
by same number respectively.
Examples related to Mean
1. Find the mean of data given below
(i) 15, 20, 25, 10, 5, 30, 12, 8
(ii) If the frequencies of observation of above data
are 2, 4, 3, 5, 7, 6, 1, 8 respectively.
x 1 x 2 ........... x n
Sol: i) We know mean = here total
n
observations are n = 8 and x1, x2, x3…… are 15, 20,
25, 10, 5………
9
15 20 25 10 5 30 12 8
mean = 15.62
8
(ii) xi fi fixi
15 2 30
20 4 80
25 3 75
10 5 50
5 7 35
30 6 180
12 1 12
8 8 64
f i 36 f x = 526
i i
10
n
xi
Sol: We know that x i1 ...........................(i)
n
Here x = 50, n 80 ,
20
x i 80
from (i) 50 i 1
80
or x
i 1
i 80 50 4,000
11
4. Calculate the arithmetic mean of marks from the
following table.
Marks: 0-10 10-20 20-30 30-40 40-50 50-60
No. of 12 18 27 20 17 6
students
Sol:
Variate Freq = f Mid values = u
x 25 f. u
x 10
0 – 10 12 5 -2 - 24
10-20 18 15 -1 - 18
20-30 27 25 0 0
30-40 20 35 1 20
40-50 17 45 2 34
50-60 6 55 3 18
N = 100 fu 30
x a h
f .u , 25
10(30)
25 3 28 marks.
N 100
12
Median
It is size or the value of observation which lies in the
middle of the observations. This mid-point or median is
depends whether the total number of observations are
odd or even. Now the medians for
(i) Ungrouped Data (frequency not given):
First arrange the data in ascending or descending
order and if total observations are n then median.
n 1
If n is odd, median = value of th observation
2
and If n is even then median =
1 n n
[value of th item value of 1th item]
2 2 2
(ii) For discrete series (frequency is given): first we
calculate the cumulative frequencies of the
observation, and then we arrange the cumulative
frequencies in ascending or descending order.
n 1
Then Median = th observation where n is
2
cumulative frequency
(iii) For grouped or continuous frequency distribution.
(a) If series is in ascending order
13
n
c
Then median = l
2 ,
i where
f
14
(ii) x: 10, 15, 20, 25, 30, 35
f: 7, 18, 19, 6, 8, 24
(iii) class interval: 0-10, 10-20, 20-30, 30-40, 40-50
frequency : 4 8 21 7 8
Sol: (i) Here total observations are 7, which are odd in
number. So to locate the median, the data is
arranged either in ascending order or descending
order but here data is already given in ascending
order i.e., 6, 11, 13, 15, 17, 21, 24 here observation
are odd
n 1 7 1
median = value of th item = value of th
2 2
item = value of 4th item = 15
(ii) x f c.f
10 7 7
15 18 25
20 19 44
25 6 50
30 8 58
35 24 82
15
n 1 82 1 83
now value of th item = th = th = 41.5
2 2 2
n 1
The value of th item (which is 41.5) falls in 3rd
2
observation in cumulative frequency table which
corresponds to 3rd observation in value of
variables (x1, x2,………….)
median of this data = 20
(iii) C.I Frequency C.F
0-10 4 4
10-20 8 12
20-30 11 23
30-40 7 30
40-50 9 39
N = 39
N
c
2
median = l i Now N 39
19.5, which
f 2 2
16
Mode
The value of the observation which occurs maximum
number of time in the data. Generally we locate this
value just by inspecting the data.
(i) for individual series or raw data: The mode is that
observation which is occurring maximum number
of time.
(ii) for discrete data (with frequency): The mode is
observation having highest frequency.
(iii) for grouped data with continuous frequency
distribution.
f m f1
Mode = l1 i
2 f m f 1 f 2
17
(iv) Relation between Mean, Median and Mode
Mode = 3 Median – 2 Mean.
6. Find the mode of data given below
(i) 2, 2, 3, 5, 7, 8, 9, 2, 2, 9
(ii) x: 10, 15, 20, 25, 30, 35
f: 7, 18, 16, 14, 12, 10
(iii) class interval: 0-10, 10-20, 20-30, 30-40, 40-50
frequency : 4 8 11 7 8
Sol: (i) We know mode is that observation in data which
comes maximum no. of times in that data.
Here 2 is coming four times in data, which
maximum in occurrence
mode of this data is 2
(ii) In this data frequency of observation 15 is
maximum (18) mode of this data is 15
(iii) here frequency of C.I, 20-30 is maximum
20-30 is modal class
for cumulative frequency distribution, mode is =
f m f1
l1 i , where
2 f m f1 f 2
18
l1 = lower limit of modal class, here l1 = 20
fm = frequency of modal class, here fm = 11
f1 = frequency of class preceding to modal class
here f1 = 8
f2 = frequency of class succeeding to modal class
here f2 = 7
I = width of class interval.
11 8 3
mode = 20 10 20 10 24.28
22 8 7 7
19
is known as First or lower quartile and is denoted by Q1.
The value of the variate midway between last variate and
the median is called the Third or upper quartile and is
denoted by Q3. The median is known as second quartile
and is denoted by Q2.
The methods of finding the values of Q1 and Q3 are
similar to those of the median. In the case of ungrouped
data, when arranged in ascending or descending order of
magnitude, Q1, Q3 can be obtained by the following
formulae:
n 1 3(n 1)
Q1 th variate; Q3 th variate.
4 4
For a continuous frequency distribution, Q1 and Q3 are
given by the following formulae:
hN h 3N
Q1 l C ; Q3 l C , where
f 4 f 4
20
(ii) Deciles: These are the values which divide the total
frequency into ten equal parts and are denoted by Di(I =
1, 2, ………9).
(iii) Percentiles: These are the values which divide the
total frequency into 100 equal parts and are denoted by
Pi(i = 1, 2, ……..99).
These are calculated by the following formulae:
(a) For Individual Series
n 1
Di = the value of i. th item (where i = 1, 2,
10
……..9)
n 1
Pi = the value of i. th item (where i = 1, 2,
100
……..99) where n denotes the total number of items.
8. Calculate quartiles from the following data:
Marks 0 10 20 30 40 50 60 70
– – – – – – – –
10 20 30 40 50 60 70 80
Students 5 7 8 12 28 22 10 8
21
Sol:
Marks No. of students Cumulative frequency
0 – 10 5 5
10 – 20 7 12
20 – 30 8 20
30 – 40 12 32
40 – 50 28 60
50 – 60 22 82
60 – 70 10 92
70 – 80 8 100
N = 100
N th
Q1 th item = 25 item.
4
Q1 lies in the group 30 – 40
hN 10
Q1 l C 30 (25 20) 34.17
f 4 12
3N th
Q3 th item = 75 item.
4
Q3 lies in the group 50 – 60.
h 3N 10
Q3 l C 50 (75 60) 56.81.
f 4 22
22
Measures of Dispersion
There are four prominent measures of dispersion. They
are:
i. Range.
ii. Quartile Deviation.
iii. Mean Deviation.
iv. Standard Deviation.
(i) Range: It is difference in values of highest and
lowest observation in the given data.
Range = highest – lowest
(ii) Quartile Deviation – It is half of the difference
Q3 Q1
between the upper and lower quartiles i.e. .
2
It is also called Semi-Inter Quartile Range.
9. Find the range and quartile deviation of the sample
of observations 9, 13, 23, 11, 15, 17, 25, 18, 14, 24,
20.
Sol: (i) We see that the largest observation is 25 and
the smallest observation is 9.
Range = 25 – 9 = 16.
23
After arrangement we get observations as: 9, 11,
13, 14, 15, 17, 18, 20, 23, 24, 25.
n 1
(ii) n = 11, hence Q1 = value of th item = value
4
3(n 1)
of 3rd item = 13 and Q3 value of th item
4
= value of 9th item = 23.
Q3 Q1 23 13
Quartile Deviation = 5.
2 2
(iii) Mean Deviation: It is mean of all the deviations.
Here deviation may be calculated by taking mean,
median or mode as reference values and therefore
it will known as mean deviation about mean or
mean deviation about median or mean deviation
about mode respectively.
Mean deviation
d i
f
Where di = xi – A [i = 1, 2, 3………….n] respective
deviations of each observation from the reference
parameter A. Here A can be mean, median or mode
of the given data.
24
10. Find the mean deviation of 2, 4, 6, 8, 10, 12, 14
about mean.
Sol: Mean deviation about mean is given by
n
d
i 1
i
m.d , where d i x i x
n
So we will find the mean of given set of
observations which is
2 4 6 8 10 12 14 56
x 8
7 7
then to find deviations (di, i = 1 to 7) we will
subtract each given observation from mean and for
mean deviation we will take the mean of deviation
n n
d i x i x
mean deviation i 1
i 1
n n
x1 x x 2 x x 3 x x 4 x x 5 x x 6 x x 7 x
7
2 8 4 8 6 8 8 8 10 8 12 8 14 8
7
6 4 2 0 2 4 6 24
7 7
25
Standard Deviation
(iv) Standard Deviation – The root mean square
deviation is called Standard Deviation (S.D.) when
the deviations are taken from the arbitrary value
‘a’ is generally denoted by (read as sigma).
i.e. 1 f (x
i i x) 2 where x is the Arithmetic
N
Mean of the distribution.
The square of the standard deviation is called the
Variance.
Methods to Calculate the Standard Deviation
Direct Method- We know that
1 1
f i ( xi x) 2 f (x
2
i i
2
2 xi x x )
n N
1 1 1
f x f x f
2
2x x
2
i i i i i
N N N
1 f i xi
and N f i
2
f i xi 2 x . x x x
2
N N
fx fi xi
2
1 2
f i xi x ; or
2
2
i i
N N N
26
11. Find the standard deviation of the data 0, 3, 6, 7, 10,
10
0 3 6 7 10 10
Sol: mean 6;
6
x x–6 (x – 6)2
0 -6 36
3 -3 9
6 0 0
7 1 1
10 4 16
10 4 16
Total 78
The calculation for data 0, 3, 6, 7, 10, 10 for which
mean is 6
78
S.D= = 13 =3.605
6
27
Sol:
x f f.x xx x x2
f. xx 2
1 6 6 -3 9 54
2 12 24 -2 4 48
3 18 54 -1 1 18
4 26 104 0 0 0
5 16 80 1 1 16
6 10 60 2 4 40
7 8 56 3 9 72
Total 96 384 248
Here x
fx 384 4 ;
N 96
248
Now 1 f ( x x) ;
2
1.6 (nearly).
N 96
13. Find the arithmetic mean and standard deviation
from the following data:
Size of item : 10 11 12 13 14 15 16
Frequency : 2 7 11 15 10 4 1
28
Sol:
Size of the item x Freq d = x – 13 f.d f.d2
10 2 -3 -6 18
11 7 -2 -14 28
12 11 -1 -11 11
13 15 0 0 0
14 10 1 10 10
15 4 2 8 16
16 1 3 3 9
Total 50 -10 92
Here a = 13
Now A.M. = x a 10
fd
13 12.8 , which is a
N 50
fraction
fd2 fd
2 2
92 10
S.D.
N N 50 50
1.84 0.04 1.80 1.342.
29