0% found this document useful (0 votes)
24 views

Statistics

Uploaded by

vighneshmanoj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Statistics

Uploaded by

vighneshmanoj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

HIGHER MATH

Statistics

Career After +2

Every Year 1 Crore Students Trust Us for Test Prep


 https://hitbullseye.com/courses.php  1800-572-7346
Statistics

Introduction
Before going into details, let us consider an example.
Consider the marks obtained by 20 students of a class in
a math test, where the maximum marks are 50. The
scores of the 20 students are as follows
8, 2, 6, 17, 16, 2, 6, 18, 20, 15, 14, 8, 7, 9, 25, 5, 7, 8, 10, 12.
Now the marks obtained can be shown or collected in
different ways for their analysis:
(a) In the illustration above, we have shown the marks
of 20 students written in no definite pattern i.e., we
have just asked the student and written down
his/her marks. This method of collection and
representation of data is known as ungrouped
data.
(b) The other way of showing data is to see that how
many times any particular marks are obtained by
how many students.

1
For example in the above data:
Marks obtained No. of Students
2 2
8 3
6 2
and so on.
In this method of collection and representation, we check
how many times a particular observation (in this case,
marks) occurs. This is known as discrete frequency
distribution of the data or simply, the frequency of that
observation.
(c) Another way is to divide the given data in small
groups or intervals with each observation placed in
the interval in which it lies.
Let intervals be 0-10, 10-20, 20-30, 30-40, 40-50 i.e., we
have divided 50 marks into 5 intervals of width 10 each.
Now marks (2, 6, 7, 8, 9) will come under interval 0-10.
Marks [10, 12, 14, 17, 18, 15, 16] will come under 10-20
and so on. In tabular form:

2
Class Interval Frequency
0 – 10 5 (as there are marks of 5 student in
this interval)
10 – 20 7 (as the marks of 7 student lies in the
interval)
20 – 30 and so on.
This method is known as continuous frequency
distribution method.
(d) Alternately, we could have designed the ranges in
such a way that both upper and lower limits are
included. (inclusive distribution)
Class Interval Frequency
0 – 10 6 (as there are marks of 6 students in
this interval)
11 – 20 6 (as the marks of 6 students lie in this
interval)
21 – 30 and so on.
So we have seen a few ways to represent the collected
data.
After the data is collected it is analyzed as per
requirements.

3
Definitions
We use the word statistics in two forms:
in plural: statistics means data.
in singular: it is the science which deals with collection,
presentation, analysis & interpretation of numerical data.
(a) Class Interval: The small group in which we place
each observation is known as class interval. For
example 0 – 10, 10 – 20 etc. are class intervals.
(b) Class Frequency: The number of observations
which comes under any class interval is the
frequency of that class interval. For example in the
above marks distribution of the students, 5 is the
class frequency of class interval 0 – 10, 7 is class
frequency or class interval 10 – 20 and so on.
(c) Limit: Each interval has two limits. For example in
class interval 0 – 10, 0 is lower limit and 10 is
upper limit.
(d) Class Marks or mid value: It is the mid point of any
class interval for example class interval of 10– 20
10  20 30
the class marks is   1.5
2 2

(e) Frequency: It is a number representing how many


times that observation occurs in the data.

4
(f) Cumulative frequency: The sum of preceding
frequencies of class intervals.
Measures
The terms we calculate and use for analysing data are
broadly divided into 2 major categories or measures:
i) Measures of central tendency: In this category we
have the terms -
(a) Mean (b) Median (c) Mode
ii) Measures of dispersion: In this category we have
terms -
(a) Range (b) Deviation
Note: When the values of mean or median or mode are
calculated, it has been observed that these values lie
within the data and are considered as true or standard
values or very close to standard values. That’s why
mean, median or mode are known as measure of central
tendency.
But when we calculate the values of range, deviation etc.
of the data, it is observed that these values lie outside
the range of data. That is why these parameters are
known as measures of dispersion. In this measure we try

5
to see how close or how far from true or standard values,
our data is.
Mean
Arithmetic Mean: is a numerical value obtained by
dividing the total sum of all observations by total number
of all observations. So if x1, x2, x3......... upto xn are n
observations then arithmetic mean of these n
observations which is denoted by x is given by
x1  x2  x3  .............  xn
x ..................... (i)
n
1 n n
  xi , where
n i 1
x
i 1
i (read as summation xi) is a notation

to represent the sum of all considered observations.


Note: This mean is for ungrouped data.
And if the data is grouped i.e. observations x1, x2,
x3........xn have frequencies f1, f2,.............., fn respectively
then the mean is given by
n n

x1 f 1  x 2 f 2  .............  x n f n x
i 1
i fi x
i 1
i fi
where
x   , i  1,2,3.......n
f 1  f 2  f 3  ......... f n n
n
f
i 1
i

f
i 1
i  n is total number of observation

6
Change of Origin Method for calculating mean.
Let A be any assumed number (usually taken in the
middle of the series) and d t be the deviation about A i.e.,
d t = xi – A.
The Arithmetic mean x is given by

x
f i xi
[ d i  xi  A ]
N
1 1
(A  di ) ,  f f
1

N
f i
N
i . A
N
i di

 A
1
f i di   f i  N
N
Here A is called Assumed Mean and d i is called deviation
about A.
Shift of Origin and Change of Scale Method for
calculating mean. (Step-Deviation Method).
In Case of continuous distribution having equal class
interval is say of width h, we use
xi  A
ui  ............(i)
h
x i  A  hui ............(ii)

Here A and h are both arbitrary constants.

7
1
x 
1
N f xi i ,
N
 f (A h u )
i i

 f i A  N  f i hu i ,  A. N  fi u i
1 1 fi
  h.
N N

 A  h.
fu i i
. (  f i  N )
N
Group mean or combined mean:
If two groups with mean x1 and x 2 have n1 and n2
number of observations respectively, then the combined
mean of the group is given by
n1 x1  n 2 x 2
x
n1  n2

Weighted Mean:
When the weights of each observation instead of
frequency is given, then the mean of this set of
observation is known as weighted mean.
w1 x1  w2 x2  w3 x3  .....  wn xn
W
w1  w2  w3  .....  wn

where w1, w2, w3, ..... wn are weights of observation of x1,


x2, x3, .......... xn respectively.

8
For the sake of understanding, consider weights, as
importance in terms of number, given to that
observation.
Some facts about arithmetic mean
(a) If every observation is increased or decreased by
same number, then the arithmetic mean of new
observation also increases or decreases by same
number.
(b) If every observation is multiplied or divided by
same number of constant then arithmetic means
of the new set of observation is obtained by
multiplying or dividing the initial arithmetic mean
by same number respectively.
Examples related to Mean
1. Find the mean of data given below
(i) 15, 20, 25, 10, 5, 30, 12, 8
(ii) If the frequencies of observation of above data
are 2, 4, 3, 5, 7, 6, 1, 8 respectively.
x 1  x 2  ...........  x n
Sol: i) We know mean = here total
n
observations are n = 8 and x1, x2, x3…… are 15, 20,
25, 10, 5………

9
15  20  25  10  5  30  12  8
 mean =  15.62
8
(ii) xi fi fixi
15 2 30
20 4 80
25 3 75
10 5 50
5 7 35
30 6 180
12 1 12
8 8 64

f i  36  f x = 526
i i

We know when frequency is given the mean is


mean =  f i x i  526  14.611
 f i 36
2. The average marks of 80 students found to be 50.
Later it was disclosed that a score of 35 was
misread as 65. Find the corrected mean of 80
students.

10
n
 xi
Sol: We know that x  i1 ...........................(i)
n
Here x = 50,  n  80 ,
20

x i 80
 from (i) 50  i 1

80
or x
i 1
i  80  50  4,000

i.e., total marks of whole class is 4,000


Now we will deduct the score of 65 from the total
marks and add the score of 35 i.e., (4000 – 65) +
35 = 3970
 Corrected mean = 3970
 49.62 (approx.)
80

3. If arithmetic mean of 10 observations is 120 and


that of 8 observations is 90, the combined mean of
18 observation is -
Sol: We know that measured for combined mean of two
groups is given by
n1 x1  n 2 x 2
x here n1  10, x1  120 and n 2  8, x 2  90
n1  n 2

10 120  8  90 1200  720 1920


x    106.66
10  8 18 18

11
4. Calculate the arithmetic mean of marks from the
following table.
Marks: 0-10 10-20 20-30 30-40 40-50 50-60
No. of 12 18 27 20 17 6
students
Sol:
Variate Freq = f Mid values = u
x  25 f. u
x 10

0 – 10 12 5 -2 - 24
10-20 18 15 -1 - 18
20-30 27 25 0 0
30-40 20 35 1 20
40-50 17 45 2 34
50-60 6 55 3 18
N = 100  fu  30

x  a  h
 f .u ,  25 
10(30)
 25  3  28 marks.
N 100

12
Median
It is size or the value of observation which lies in the
middle of the observations. This mid-point or median is
depends whether the total number of observations are
odd or even. Now the medians for
(i) Ungrouped Data (frequency not given):
First arrange the data in ascending or descending
order and if total observations are n then median.
 n 1
If n is odd, median = value of  th observation
 2 
and If n is even then median =
1 n n 
[value of  th item  value of   1th item]
2 2 2 
(ii) For discrete series (frequency is given): first we
calculate the cumulative frequencies of the
observation, and then we arrange the cumulative
frequencies in ascending or descending order.
 n 1
Then Median =  th observation where n is
 2 
cumulative frequency
(iii) For grouped or continuous frequency distribution.
(a) If series is in ascending order

13
n 
  c
Then median = l
2  ,
i where
f

l = lower limit of the median class


n = sum of all the frequencies
i = width of median class
f = frequency of median class
c = cumulative frequency of class preceding
median class
Median class is the interval where the value of n
2
lies.
(b)If the series is in descending order
n 
  c
Then median is = u  
2 
 i where u = upper limit
f
of the median class.
Note: In case of inclusive series, first convert the series
to exclusive series.
Now we will discuss an example based on median
5. Find the median of data given below
(i) 6, 11, 13, 15, 17, 21, 24

14
(ii) x: 10, 15, 20, 25, 30, 35
f: 7, 18, 19, 6, 8, 24
(iii) class interval: 0-10, 10-20, 20-30, 30-40, 40-50
frequency : 4 8 21 7 8
Sol: (i) Here total observations are 7, which are odd in
number. So to locate the median, the data is
arranged either in ascending order or descending
order but here data is already given in ascending
order i.e., 6, 11, 13, 15, 17, 21, 24 here observation
are odd
 n 1  7 1
 median = value of  th item = value of  th
 2   2 
item = value of 4th item = 15
(ii) x f c.f
10 7 7
15 18 25
20 19 44
25 6 50
30 8 58
35 24 82

15
 n 1  82  1   83 
now value of  th item =  th =  th = 41.5
 2   2   2

 n 1
The value of  th item (which is 41.5) falls in 3rd
 2 
observation in cumulative frequency table which
corresponds to 3rd observation in value of
variables (x1, x2,………….)
 median of this data = 20
(iii) C.I Frequency C.F
0-10 4 4
10-20 8 12
20-30 11 23
30-40 7 30
40-50 9 39
N = 39
N 
  c
2 
median = l i Now N 39
  19.5, which
f 2 2

corresponds to class interval 20-30 as per


cumulative frequency table
 20-30 is median class
 19.5  12 
 Median = 20     10  26.818
 11 

16
Mode
The value of the observation which occurs maximum
number of time in the data. Generally we locate this
value just by inspecting the data.
(i) for individual series or raw data: The mode is that
observation which is occurring maximum number
of time.
(ii) for discrete data (with frequency): The mode is
observation having highest frequency.
(iii) for grouped data with continuous frequency
distribution.
 f m  f1 
Mode = l1     i
 2 f m  f 1  f 2 

Here we first locate the modal class, which is class


interval with highest frequency.
then l1 = lower unit of model class
fm = frequency of modal class
f1 = frequency of the class preceding to modal class
f2 = frequency of class succeeding to modal class
i = size of class internal of modal class

17
(iv) Relation between Mean, Median and Mode
Mode = 3 Median – 2 Mean.
6. Find the mode of data given below
(i) 2, 2, 3, 5, 7, 8, 9, 2, 2, 9
(ii) x: 10, 15, 20, 25, 30, 35
f: 7, 18, 16, 14, 12, 10
(iii) class interval: 0-10, 10-20, 20-30, 30-40, 40-50
frequency : 4 8 11 7 8
Sol: (i) We know mode is that observation in data which
comes maximum no. of times in that data.
Here 2 is coming four times in data, which
maximum in occurrence
 mode of this data is 2
(ii) In this data frequency of observation 15 is
maximum (18)  mode of this data is 15
(iii) here frequency of C.I, 20-30 is maximum
 20-30 is modal class
for cumulative frequency distribution, mode is =
 f m  f1 
l1     i , where
 2 f m  f1  f 2 

18
l1 = lower limit of modal class, here l1 = 20
fm = frequency of modal class, here fm = 11
f1 = frequency of class preceding to modal class
here f1 = 8
f2 = frequency of class succeeding to modal class
here f2 = 7
I = width of class interval.
 11  8  3
mode = 20     10  20     10  24.28
 22  8  7  7

7. In a certain distribution, the mean and median are


24 and 26 respectively. Find the value of the mode.
Sol: We know that mode = 3 median – 2 mean
 mode = 3(24) – 2(26) = 72 – 52 = 20
Quartiles, Deciles, Percentiles
These are those values of the variates which divide the
total frequency into a number of equal parts.
(i) Quartiles: Just as a median divides the total frequency
into two equal parts (when arranged in ascending or
descending order of magnitude), similarly quartiles divide
the total frequency into four equal parts. The value of the
variate midway between the first variate and the median

19
is known as First or lower quartile and is denoted by Q1.
The value of the variate midway between last variate and
the median is called the Third or upper quartile and is
denoted by Q3. The median is known as second quartile
and is denoted by Q2.
The methods of finding the values of Q1 and Q3 are
similar to those of the median. In the case of ungrouped
data, when arranged in ascending or descending order of
magnitude, Q1, Q3 can be obtained by the following
formulae:
n 1 3(n  1)
Q1  th variate; Q3  th variate.
4 4
For a continuous frequency distribution, Q1 and Q3 are
given by the following formulae:
hN  h  3N 
Q1  l    C  ; Q3  l    C  , where
f 4  f  4 

l = lower limit of the class in which a particular quartile


lies.
h = width of the class in which a particular quartile lies.
f = frequency of the class in which a particular quartile
lies.
C = cumulative frequency of the class preceding to the
class in which particular quartile lies.

20
(ii) Deciles: These are the values which divide the total
frequency into ten equal parts and are denoted by Di(I =
1, 2, ………9).
(iii) Percentiles: These are the values which divide the
total frequency into 100 equal parts and are denoted by
Pi(i = 1, 2, ……..99).
These are calculated by the following formulae:
(a) For Individual Series
n 1
Di = the value of i.  th item (where i = 1, 2,
 10 
……..9)
n 1
Pi = the value of i.  th item (where i = 1, 2,
 100 
……..99) where n denotes the total number of items.
8. Calculate quartiles from the following data:
Marks 0 10 20 30 40 50 60 70
– – – – – – – –
10 20 30 40 50 60 70 80
Students 5 7 8 12 28 22 10 8

21
Sol:
Marks No. of students Cumulative frequency
0 – 10 5 5
10 – 20 7 12
20 – 30 8 20
30 – 40 12 32
40 – 50 28 60
50 – 60 22 82
60 – 70 10 92
70 – 80 8 100
N = 100
N th
Q1  th item = 25 item.
4
 Q1 lies in the group 30 – 40
hN  10
 Q1  l    C   30  (25  20)  34.17
f 4  12
3N th
Q3  th item = 75 item.
4
 Q3 lies in the group 50 – 60.
h  3N  10
 Q3  l    C   50  (75  60)  56.81.
f  4  22

22
Measures of Dispersion
There are four prominent measures of dispersion. They
are:
i. Range.
ii. Quartile Deviation.
iii. Mean Deviation.
iv. Standard Deviation.
(i) Range: It is difference in values of highest and
lowest observation in the given data.
Range = highest – lowest
(ii) Quartile Deviation – It is half of the difference
Q3  Q1
between the upper and lower quartiles i.e. .
2
It is also called Semi-Inter Quartile Range.
9. Find the range and quartile deviation of the sample
of observations 9, 13, 23, 11, 15, 17, 25, 18, 14, 24,
20.
Sol: (i) We see that the largest observation is 25 and
the smallest observation is 9.
 Range = 25 – 9 = 16.

23
After arrangement we get observations as: 9, 11,
13, 14, 15, 17, 18, 20, 23, 24, 25.
n 1
(ii) n = 11, hence Q1 = value of th item = value
4
3(n  1)
of 3rd item = 13 and Q3  value of th item
4
= value of 9th item = 23.
Q3  Q1 23  13
 Quartile Deviation =   5.
2 2
(iii) Mean Deviation: It is mean of all the deviations.
Here deviation may be calculated by taking mean,
median or mode as reference values and therefore
it will known as mean deviation about mean or
mean deviation about median or mean deviation
about mode respectively.

Mean  deviation 
 d i

f
Where di = xi – A [i = 1, 2, 3………….n] respective
deviations of each observation from the reference
parameter A. Here A can be mean, median or mode
of the given data.

24
10. Find the mean deviation of 2, 4, 6, 8, 10, 12, 14
about mean.
Sol: Mean deviation about mean is given by
n

d
i 1
i
m.d  , where d i  x i  x
n
So we will find the mean of given set of
observations which is
2  4  6  8  10  12  14 56
x  8
7 7
then to find deviations (di, i = 1 to 7) we will
subtract each given observation from mean and for
mean deviation we will take the mean of deviation
n n

d i x i x
mean deviation  i 1
 i 1
n n
x1  x  x 2  x  x 3  x  x 4  x  x 5  x  x 6  x  x 7  x

7
2  8  4  8  6  8  8  8  10  8  12  8  14  8
7
6  4  2  0  2  4  6 24
 
7 7

25
Standard Deviation
(iv) Standard Deviation – The root mean square
deviation is called Standard Deviation (S.D.) when
the deviations are taken from the arbitrary value
‘a’ is generally denoted by  (read as sigma).

i.e.   1  f (x
i i  x) 2 where x is the Arithmetic
N
Mean of the distribution.
The square of the standard deviation is called the
Variance.
Methods to Calculate the Standard Deviation
Direct Method- We know that
1 1
 f i ( xi  x) 2    f (x
2
 i i
2
 2 xi x  x )
n N

1 1 1
f x f x f
2
  2x x 
2
i i i i i
N N N

1   f i xi 
 and N   f i 
2
 f i xi  2 x . x  x  x 
2

N  N 

fx   fi xi
2
1 2

 f i xi  x ; or
2
  
2

i i

N N  N 
 

26
11. Find the standard deviation of the data 0, 3, 6, 7, 10,
10
0  3  6  7  10  10
Sol: mean  6;
6
x x–6 (x – 6)2
0 -6 36
3 -3 9
6 0 0
7 1 1
10 4 16
10 4 16
Total 78
The calculation for data 0, 3, 6, 7, 10, 10 for which
mean is 6
78
S.D= = 13 =3.605
6

12. Compute the standard deviation of following data:


x: 1 2 3 4 5 6 7
f: 6 12 18 26 16 10 8

27
Sol:
x f f.x xx x  x2

f. xx  2

1 6 6 -3 9 54
2 12 24 -2 4 48
3 18 54 -1 1 18
4 26 104 0 0 0
5 16 80 1 1 16
6 10 60 2 4 40
7 8 56 3 9 72
Total 96 384 248

Here x
 fx  384  4 ;
N 96

248
Now   1  f ( x  x) ;
2
  1.6 (nearly).
N 96
13. Find the arithmetic mean and standard deviation
from the following data:
Size of item : 10 11 12 13 14 15 16
Frequency : 2 7 11 15 10 4 1

28
Sol:
Size of the item x Freq d = x – 13 f.d f.d2
10 2 -3 -6 18
11 7 -2 -14 28
12 11 -1 -11 11
13 15 0 0 0
14 10 1 10 10
15 4 2 8 16
16 1 3 3 9
Total 50 -10 92

Here a = 13

Now A.M. = x  a     10 
fd
 13     12.8 , which is a
N  50 
fraction

 fd2    fd 
2 2
92   10 
S.D.    
N  N  50  50 
 
 1.84  0.04  1.80  1.342.

29

You might also like