0% found this document useful (0 votes)
18 views

Statistics: Ankon Gopal Banik Umme Hasunat Toafia Kanij Fatema Koli Puja Sutradhar

Here is the frequency table of the ungrouped data: Number of workers Tally marks Frequency Cumulative frequency (c.f.) from top Cumulative frequency (c.f.) from bottom ||||| 5 5 5 |||| 4 9 9 |||| 4 13 13 |||| 4 17 17 |||| 4 21 21 |||| 4 25 25 |||| 4 29 29 |||| 4 33 33 |||| 4 37 37 |||| 4 41 41 |||| 4 45 45 |||| 4
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Statistics: Ankon Gopal Banik Umme Hasunat Toafia Kanij Fatema Koli Puja Sutradhar

Here is the frequency table of the ungrouped data: Number of workers Tally marks Frequency Cumulative frequency (c.f.) from top Cumulative frequency (c.f.) from bottom ||||| 5 5 5 |||| 4 9 9 |||| 4 13 13 |||| 4 17 17 |||| 4 21 21 |||| 4 25 25 |||| 4 29 29 |||| 4 33 33 |||| 4 37 37 |||| 4 41 41 |||| 4 45 45 |||| 4
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

STATISTICS

Ankon Gopal Banik


Umme Hasunat Toafia
Kanij Fatema Koli
Puja Sutradhar
According to the syllabus of 3rd semester of department of CSE of Gono Bishwabidyalay
Without the help of our respective teacher
mr. Dipto Ahmed this work was quite
impossible

1|Page
Contents
Chapter 01: Introduction------------------------------------------------------------------------------------------------03
Chapter 02: Collection and presentation of data------------------------------------------------------------------06
Chapter 03: Measure of central tendency and location---------------------------------------------------------15
Chapter 04: Measures of dispersion----------------------------------------------------------------------------------31
Chapter 06: Probability--------------------------------------------------------------------------------------------------43

2|Page
Chapter 01: Introduction

Statistics is the science which deals with methods of collecting, classifying, presenting, comparing,
interpreting numerical data collected to throw some light on any sphere of inquiry.

Variable: As the information of a particular characteristic vary from unit to unit, it is called variable.
Variable is of two types, they are -
1. Qualitative variable: The variable which cannot normally be measured by numerical figure
is known as qualitative variable. For example, occupation of a person, type of disease of a
patient, qualification of a doctor, type of computer, experience of a computer scientist, rank
of nurse etc.
2. Quantitative variable: The variable which is measured by numerical figure is known as
quantitative variable. For example, age of patient, height of patient, blood pressure level of
patient, sugar level of patient, duration of disease of a patient number of patients admitted
in different days etc.
The quantitative variable is of two types, such as-
a) Discrete variable: The variable which is measured only by integral value is known as discrete
variable. For example, number of ever born children of mother, number of dead children of
mother, number of patients admitted in days in a hospital, number of doctors/nurse in a
hospital, number of computers in a computer lab, number of expert programmers etc.
b) Continuous variable: The variable which is measured by integral as well as by functional
value is known as continuous variable. For example, age of patient, height of patient, blood
pressure level of patient, sugar level of patient, time to develop programs, time to send
emails, consumption of electricity in days etc.
Classification of variable can be displayed as follows -

3|Page
The variable, discrete or continuous, again be classified as -
(i) Random variable: If the values of a variable are observed from different units
selected by random process, then the variable is known as random variable. For
example, let there be N=500 patients in a hospital and n=100 patients are
selected by random process for investigation. In that case if variable is observed
from those selected 100 patients, then it would be random variable.
(ii) Non-random variable: If variable is observed from all units in the population,
then it is known as non-random variable.

•••Importance of statistics: The statistical methods depict the cleaner picture of any problem
affecting the welfare of mankind. The importance of statistics is thus be pointed out as follows -
1. Statistics of birth, death, marriage, migration etc. are important for the plan of welfare of
people.
2. Statistics of family planning adoption, service provided for family planning activities, current
fertility level, child morality level, level of education etc. are important for sound population
policy.
3. Statistics of child health care, health care of pregnant mothers, neonatal and post neonatal
death etc. are needed for planning improved family planning activities.
4. Statistics of current number of doctors, nurses, health care units, hospitals, clinics, system of
service provided in hospitals/clinics etc. are needed for planning efficient hospital
management.
5. Statistics of current academic institutes and trend students, ability of trend teachers are
needed to develop educational system of the country.
6. Statistics of current status of digital system will help in planning future IT system.
7. Statistics of production, import, distribution are also needed to improve health care system.

•••Limitations of statistics: Despite of its wide application in medical science, population science,
industrial sector and administrative sector, statistics has some limitations which restrict its scope
and utility. These limitations are-
1. Qualitative character in medical and biological science and in administrative science is very
prominent. But, except some classification by qualitative character the variable cannot be
used for mathematical treatment. However, the qualitative variable is used for
mathematical treatment if its values are scored by assigning appropriate number.
2. Statistics deals with comment for population units. So, it needs mass of data instead of data
from a single unit of population.
3. Analyzing bio-statistical data statistical inference is drawn with certain level of accuracy
based on probability theory. Nothing is concluded with 100% confidence.
4. Statistics cannot provide anything. It plays an auxiliary role to summarize fact.

4|Page
5. Though it can be reduced by efficient planning of survey, yet statistical measurement shows
an error due to the difference of population true value and estimated value from simple
survey result.

5|Page
Chapter 02: Collection and presentation of data

Population: By population we mean total of units which are under investigation according to pre-
determined object and are available in a specified area at a specified time period.
For example, we want to study the performance of computer centers working in Dhaka city. Here
each computer center of any area is a unit and all computer centers of the city constitute the
population of computer centers. This number total unit is denied by 'N'.
Population is of two types-
1. Finite population: If a population has definite number of units, it is called finite population.
For example, the population of mobile operator's servicing centers of an area, the
population of nurses working in different private clinics etc.
2. Infinite population: The number of units of a population which is not known to the
researcher is known as infinite population.

Unit: A unit in a statistical analysis is one member of a set of entities being studied.
Common examples of a unit would be a single person, animal, plant, manufactured item, or country
that belongs to a larger collection of such entities being studied.

Sample: A representative number of population units, which are under investigation is known as
sample. Sample size is denoted by n (≤N).
For example, if we want to know the condition of fishes of a pond, we can't check all fishes, but we
can take some fishes to check.

Sampling unit: The population units which are to be investigated for a pre-determined objective
are known as sampling units. For example, when studying a group of college students, a single
student could be a sampling unit.

6|Page
Data: Any measurement of one or more variables recorded either from population units or from
sample units is known as data.
For example, the number of computers of a selected group of computer centers are as follows -
Number of computers: 5, 8, 7, 12, 6, 10, 8, 7, 11, 12, 10, 9.

*** The following data represent the number of workers in different small scale industries in the
country-
16 25 28 32 26 25 25 20 20 22 24 26 28 30 35 32 17 20 22 22 24 25 26 28 20 18 26 28
30 30 32 34 31 36 30 35 28 27 21 24 20 18 15 15 15 18 20 22 36 26 21 23 24 26 28 30
32 34 28 27 15 20 19 26 16 24 20 18 20 20 24 27 25 25 25 26 20 21 20 28 17 30 32 33
30 28 26 24 26 24 20 18 19 18 15 16 23 18 15 17 18 20 20 18 18 19 20 21 27 25 26 19
29 20 24 26 27 29 30 32 34 28 30 27 26 28 28 33
I. Prepare a frequency table of discontinuous type (ungrouped).
II. Find number of industries having 25 workers or more.
III. Find number of industries having 20 workers or less.
IV. Prepare a frequency table (grouped).
Solve: I.

7|Page
Title: The distribution of industries by number of workers

Number of Tally marks Frequency No. Cumulative frequency(c.f) from


workers of industries top bottom
15 |||| | 6 6 128
16 ||| 3 9 122
17 ||| 3 12 119
18 |||| |||| | 11 23 116
19 |||| 4 27 105
20 |||| |||| |||| | 17 44 101
21 |||| 4 48 84
22 |||| 4 52 80
23 || 2 54 76
24 |||| |||| 9 63 74
25 |||| ||| 8 71 65
26 |||| |||| ||| 13 84 57
27 |||| | 6 90 44
28 |||| |||| | 11 101 38
29 || 2 103 27
30 |||| |||| 9 112 25
31 | 1 113 16
32 |||| | 6 129 15
33 || 2 121 9
34 ||| 3 124 7
35 || 2 126 4
36 || 2 128 2
Total N = 128

II. There are 65 industries having 25 or more workers.

III. There are 44 industries having 20 or more workers.

IV. We have N = 128


X(i) = 15 , X(N) = 36
∴ Range, R = X(N) - X(i) = 36 – 15 = 21
K = 1 + 3.3 log10 128 =8
𝑅 21
Class interval = 𝐾 = ≈3
8

8|Page
Title: The distribution of industries by number of workers

Class Tally marks Frequency f Cumulative frequency(c.f) from


interval of No. of industries top bottom
workers
15 - 18 |||| |||| || 12 12 128
18 - 21 |||| |||| |||| |||| |||| 32 44 116
||
21 - 24 |||| |||| 10 54 84
27 - 30 |||| |||| |||| |||| 19 103 44
30 - 33 |||| |||| |||| | 16 119 25
33 - 36 |||| || 7 126 9
36+ || 2 128 2

***The following data represent the production of tea (in kg) per day in a tea garden-

8.8 4.8 6.2 8.4 10.0 12.5 6.7 3.8 8.8 10.0 8.5 3.6 5.4 8.8
9.6 9.7 10.4 11.6 12.8 8.9 10.2 14.0 6.6 7.2 8.6 5.8 6.7 9.7
7.2 11.2 4.8 5.0 5.6 6.6 7.2 3.4 7.6 8.8 9.0 10.2 6.7 11.0
11.4 12.8 13.4 8.6 4.7 7.2 9.8 10.4 6.5 7.8 9.7 5.8 6.7 11.5
10.4 13.0 14.4 10.4 5.5 5.6 6.8 7.4 9.4 12.0 12.5 11.2 14.0 10.0
6.8 9.5 5.8 4.7 6.6 5.2 9.2 10.2 9.0 8.0 7.4 10.0 8.2 9.5

I. Prepare a frequency table to represent the production of tea in different days.


II. In how many days the production 5.0 kg and above?
III. How many days the production is less than 9.8 kg?
Solve: I. We have N = 84
Lowest production. X(I) = 3.4 kg, Highest production. X(N) = 14.4 kg
Range of production, R = X(I) – X(N) = 14.4 – 3.4 = 11.0 kg
K = 1 + 3.3 log10 84 =7
𝑅 11.0
Class interval = 𝐾 = ≈ 1.6
7

9|Page
Title: The distribution representing tea production (in kg) in different days

Class Tally marks Frequency f Cumulative frequency(c.f) from


interval of No. of industries top bottom
production
3.4 – 5.0 |||| || 7 7 84
5.0 – 6.6 |||| |||| | 11 18 77
6.6 – 8.2 |||| |||| |||| ||| 18 36 66
8.2 – 9.8 |||| |||| |||| |||| 20 56 48
9.8 – 11.4 |||| |||| |||| 15 71 28
11.4 – 13.0 |||| ||| 8 79 13
13.0 – 14.6 |||| 5 84 5
Total N = 84

II. In 77 days the production is 5.00 kg and above (92% days)


III. In 56 days the production is less than 9.8 kg (67% days)

Uses of graphs and diagrams:


• Graphs and diagrams elucidate the main features of data set.
• These are more effective than the tables as these create a vivid and more lasting
impression in the mind.
• It is often valuable in suggesting an appropriate method of analysis and in explaining
the conclusion founded upon the analysis.
• Graphs and diagrams are helpful to pinpoint the gross errors in statistical record.

Difference between graph and diagram:


• Graphs are usually drawn in graph paper, whereas diagram is constructed in plane
paper.
• Graphs are helpful to have the of relationships between two correlated variables,
whereas diagram is used specially to represent the value of an attribute. It does not
depict or the association of two or more attributes.
• Bars, rectangles, squares, cubes etc. are used for diagrammatical presentation, while
point or lines of different kinds are used in graphical representation of data.
• Graphs are usually used in depict the time series data, while diagram is not used for
the same purpose. It is used in depicting categorical and geographical data. Graphs
fails to depict such type of data.
• Graphs are more precise and accurate than the diagram and are quite helpful to the
statistician to estimate rate of change of one variable for the unit change in another

10 | P a g e
variable. On the other hand diagram furnish only approximate information relating
to figure and do not add anything to the meaning of data.
• Graphs are helpful to indicate the further statistical analysis, while diagrams cannot
do so.

Different graphs and diagram:


❖ Line diagram: This diagram is used to represent the time series of time, where time is
plotted in horizontal axis (X- axis) and the value of the variable against time is plotted in
vertical axis (Y- axis) with appropriate scale. The plotted point in Y- axis is added to the X-
axis by a straight line parallel to Y- axis there will be several lines for all points of time. The
resultant diagram is called line diagram.

Example: The following data represent the number of dead patient of a hospital in different
years-
Year 2006 2007 2008 2009 2010 2011
No. of 725 680 550 650 540 500
dead
patient

Line diagram of the number of dead patients of a hospital in different years-

❖ Bar diagram: This diagram is similar to line diagram expect that the value of a variable is
shown by rectangle instead of a line.
This is specially used to present the values of different levels of a qualitative variable, where
levels of qualitative variable are plotted in X- axis and the values of different levels are
plotted in Y- axis. The value of each level is shown by a rectangle paralleled to the Y- axis.
This rectangle is also called a bar and hence the diagram is bar diagram. The distance from

11 | P a g e
bar to bar should be half of the width of the bar.

*** The following data represent the monthly rainfall (in m.m.) in August and September
of 1998 in different Meteorological stations of Bangladesh-
Station Chittagong Noakhali Comilla Dhaka Rajshahi Khulna
rainfall in
(m.m.)
August 1194 897 336 552 268 258
September 224 313 281 246 310 300

Solve: Bar diagram representing the rainfall of two months in 1998 in different
Meteorological stations of Bangladesh-

Chart Title
1400

1200

1000

800

600

400

200

0
Chittagong Noakhali Comilla Dhaka Rajshahi Khulna

August September

❖ Pie Diagram: This diagram is also used to present the values of different levels of a
qualitative variable, where values are transformed to angle and the angles are drawn within
a circle. The total angle of a circle is 3600. Accordingly, the value of a particular level of the
qualitative variable is transformed to angle, where the angle is proportional to total angle
3600. All the angles, except one, are drawn within the circle one.

12 | P a g e
***The following data represent the number of diabatic patients visiting in a week in
different centers in a city-
Center A B C D Total
No. of 1500 3660 1890 2575 9625
patients
% of 16.0 37.0 20.0 27.0 100.0
patients
Angle of 1500 × 360 3660 × 360 1890 × 360 2675 × 360 360.0
patients 9625 9625 9625 9625

= 56.10 = 136.90 = 70.70 = 96.30

Solve: Pie diagram representing the number of diabatic patients visiting in a week in
different centers in a city-

Sales

A B C D

❖ Histogram: This diagram is used to represent the frequencies of continuous classes of a


frequency distribution of quantitative variable. In this diagram class interval are plotted in X-
axis and frequencies are plotted in Y-axis with appropriate scale.

***The following is the frequency distribution of S. GGT of some person-


Class 15-20 20-25 25-30 30-35 35-40 Total
interval of
S. GGT
(U/L)
No. of 4 6 8 5 2 25
person, f

Solve: Histogram of distribution of S. GGT of some persons-

13 | P a g e
❖ Frequency Curve: This graph is also used to represent the frequency of a continuous class of
the quantitative variable, where class intervals are plotted in X-axis and frequencies are
plotted in Y-axis with appropriate scales.

***The following is the frequency is distribution of S. Sodium of some person-


Class interval of 136-138 138-140 140-142 142-144
S. Sodium
(mmol/l)
No. of persons, 5 12 6 2
f

Solve: Frequency curve of the distribution of S. Sodium-

Y-Values
14

12

10

0
136 137 138 139 140 141 142 143 144

14 | P a g e
Chapter 03: Measure of Central Tendency and Location

Measure of central tendency: The measure which usually reflects the complete data set and falls in
the center is known as measure of central tendency, since it tends to lie in the center. Some
measure of tendency applications can be listed as follows-

Arithmetic mean: Let x1, x2, ………… xn ne n observations recorded from any statistical investigation.
Then arithmetic mean (A.M) is defined by-
1
A.M, 𝑥̅ = 𝑛 ∑𝑛𝑖=1 𝜘𝑖

For example, let us consider that the followings are the income per day (in taka) of some laborer’s
selected by a random process-
125.50, 100.00, 110.00, 115.50, 90.00, 95.00, 110,00, 125.00, 120.00, 115.00
The mean income of these n=10 laborer’s is-
1 1106
A.M, 𝑥̅ = 𝑛 ∑ 𝜘𝑖 = = 110.60 taka
10

Again. Let us consider that the income (in taka) data are recorded from 100 randomly selected
laborers and they are classified as follows-

Income (in taka), xi: 90.00 95.00 100.00 110.00 115.00 120.00 125.00
No. of laborers, fi: 5 18 32 25 12 5 3
The average income of these 100 laborers is calculated by-
1
A.M, 𝑥̅ = 𝑁 ∑𝑘𝑖=1 𝑓𝑖 𝜘𝑖 ; N = ∑𝑘𝑖=1 𝑓𝑖 = 100

15 | P a g e
10465.00
= = 104.65 taka
100

***The following are the number of customers of a bank who are served within 10 minutes of
their arrival in different days-

Class interval of customers 50-60 60-70 70-80 80-90 90-100 Total


No. of days, f 15 18 12 10 5 60
a. Calculate average number of customers served within 10minute of arrival.
b. Find the percentage of days in which has less than 80 customers served within 10minute of
arrival.
Solve:

Class interval of No. of days Mid value fx c.f


customers f x
50-60 15 55 825 15
60-70 18 65 1170 33
70-80 12 75 900 45
80-90 10 85 850 55
90-100 5 95 475 60
Total 60 4220

a. The average number customer served per day is-


1 4200
𝑥̅ = ∑ 𝑓𝑥 = = 70.33 ≈ 70
𝑁 60

b. Percentage of days in which less than 80 customers served is-


45
× 100 = 75.00%
60

***The following data represent the production of garments (in 1000pices) of different industries
in a day and the working hours of the industry per day.

Industry 1 2 3 4
Production, Xi 5.0 3.0 5.5 4.0
Working hours, Wi 10 8 12 10
Calculate average production per day of the industries.
Solve: The average production is-
1 180
𝑥̅ = 𝑤 ∑4𝑖=1 𝑤𝑖 𝜘𝑖 = = 4.5
40

16 | P a g e
Properties of arithmetic mean:
1. The from mean is zero.
Proof: Let xi be the mid-value of i-th class and fi is the frequency of that class, then
arithmetic mean is
1
𝑥̅ = 𝑁 ∑𝑘𝑖=1 𝑓𝑖 𝜘𝑖

The mean deviation is measured by –


1 1 1
M.D = 𝑁 ∑ 𝑓𝑖 ( 𝑥𝑖 − 𝑥̅ ) = 𝑁 [∑ 𝑓𝑖 𝑥𝑖 − 𝑥̅ ∑ 𝑓𝑖 ] = 𝑁 ∑ 𝑓𝑖 𝑥𝑖 − 𝑥̅ = 𝑥̅ − 𝑥̅ = 0

2. Mean square deviation is minimum if deviation is measured from arithmetic mean.


1
Proof: We have, 𝑥̅ = 𝑁 ∑ 𝑓𝑖 𝑥𝑖 , where xi be the mid-value of i-th class and fi is the frequency
of that class. Let A be an arbitrary value. The mean square deviation of x i from A is
measured by-
1
Mean square deviation is measured, M.S = 𝑁 ∑ 𝑓𝑖 (𝑥𝑖 − 𝐴)2
This M.S is minimum if A = 𝑥̅ . We have
∂M.S 1
= 𝑁 ∑ 𝑓𝑖 (𝑥𝑖 − 𝐴) =0
𝜕𝐴
Or, ∑ 𝑓𝑖 𝑥𝑖 – NA = 0
1
Or, ∑ 𝑓𝑖 𝑥𝑖 = A
𝑁
∴ 𝑥̅ = A
Thus mean square deviation is minimum if deviation is measured from arithmetic mean.
3. Arithmetic mean depends on change of origin and scale.
Proof: Let xi be the mid-value of i-th class (i = 1, 2, ….. k) of a frequency distribution where f i
is the frequency of that class.
We have,
1
𝑥̅ = 𝑁 ∑𝑘𝑖=1 𝑓𝑖 𝜘𝑖
Let us make a transformation of xi to zi, where
𝑥 −𝑎
zi = 𝑖ℎ ⇒ xi = a + h zi

Now,
1 1 ℎ
∑ 𝑓𝑖 𝑥𝑖 = ∑ 𝑓𝑖 𝑎 + ∑ 𝑓𝑖 𝑧𝑖
𝑁 𝑁 𝑁
1
Or, 𝑥̅ = a + h𝑧̅ ;where, 𝑧̅ = 𝑁 ∑𝑘𝑖=1 𝑓𝑖 𝑧𝑖

Thus Arithmetic mean depends on change of origin and scale. Usually, the middle or
approximately middle value of xi is taken as ‘a’ and h is the width of the class interval.

17 | P a g e
***The average blood pressure (mm Hg) of a group of people is 130. The distribution of people by
blood pressure is shown bellow-

Class interval of blood 80-90 90-100 100-110 110-120 120-130


pressure
No. of patients, fi 10 f2 18 12 5
Mid-value, xi 86 95 105 115 125
Find the number of patients (f2) who have blood pressure in the range 90-100mm Hg.
Solve: Given that, 𝑥̅ = 103
85×10+95𝑓2 +105×18+ 115×12+125×5
We have, 𝑥̅ = 45+𝑓2

Or, 130 (45 + f2) = 95f2 +4635


Or, 8f2 = 4745 – 4635
∴ f2 ≈ 14
Number of patients having blood pressure 90-100 is 14.

***Find arithmetic of first natural numbers.


Solve: The first numbers are 1, 2, 3, ……… n
𝑛(𝑛+1)
We have, (1 + 2 + 3 + ……… +n) = 2
1 𝑛(𝑛+1) 1
∴ Arithmetic mean, = 𝑛 (1 + 2 + 3 + ……… +n) = = 2 (n + 1)
2𝑛

***Find the weighted average of first n natural numbers, where weights are the respective values
of the numbers.
Solve: We have, xi : 1, 2 ,3, ………., n
wi : 1, 2 ,3, ………., n
weighted mean of x is-
𝑛(𝑛+1)(2𝑛+1)
12 +22 +32 + …………+ 𝑛2 6 1
𝑥̅ = = 𝑛(𝑛+1) = 3 (2n + 1)
1 + 2 + 3 + ……… +n
2

***For the following frequency distribution is average is 13.2


xi: 5 10 15 20
fi: f1 f1+2 f1-1 2f1

18 | P a g e
5𝑓1 +(𝑓1 +2)10+(𝑓1 −1)15+40𝑓1
Solve: Given, 13.2 =
𝑓1 +𝑓1 +2+𝑓1 −1+2𝑓1

Or, (5f1 + 1) 13.2 = 70f1 +5


∴ f1 ≈ 2

Merit and demerit of arithmetic mean:


Merits:
1. It is rightly defined and easy to understand.
2. It has a definite formula and easy to apply.
3. It is suitable for further mathematical treatment.
4. It is least affected by sampling function.
Demerits:
1. It is very much affected by extreme value.
2. The arithmetic mean may not be representative in data set.
3. The arithmetic mean cannot be found out by inspection nor it can be found out graphically.
4. It is not used to calculate average value if the variable is qualitative in nature.
5. It is not suitable measure of central tendency in case of extremely skewed distribution.
6. The arithmetic mean does not provide exact value if the classes are open in case of
frequency distribution.

Geometric mean: Let x1, x2, ………, xn be a set of observations recorded in a statistical investigation.
Then G.M. of these n observations is defined by
1
G.M. = (𝑥1 𝑥2 … … … … 𝑥𝑛 )𝑛
1
Or, log G. M. = 𝑛 log(𝑥1 𝑥2 … … … … 𝑥𝑛 )
1
∴ G.M. = Anti log 𝑛 ∑𝑛𝑖=1 log 𝑥𝑖

Let xi be the mid-value of i-th class of a frequency distribution and fi be the corresponding
frequency (i = 1, 2, ……., k).
Then geometric mean is defined by
1
G.M. = (𝑥1 𝑓1 𝑥2 𝑓2 … … … 𝑥𝑘 𝑓𝑘 )𝑁 ;where, N = f1 + f2 + ………… fk
1
Or, log G. M. = 𝑁 log(𝑥1 𝑓1 𝑥2 𝑓2 … … … 𝑥𝑘 𝑓𝑘 )
1
Or, log G. M. = 𝑁 ∑𝑘𝑖=1 𝑓𝑖 log 𝑥𝑖
1
∴ G.M. = Anti log 𝑁 ∑𝑘𝑖=1 𝑓𝑖 log 𝑥𝑖

19 | P a g e
***The following data represent the rate of change of production of rice in different years
compared to the production of 1980. Calculate geometric mean of the rate of change of
production.

Class interval of No. of years Mid-value 𝒇𝒊 𝐥𝐨𝐠 𝒙𝒊


rate of change fi xi
100-125 3 112.5 6.15346
125-150 2 137.5 4.27661
150-175 4 162.5 8.84341
175-200 5 187.5 11.36501
200-225 3 212.5 6.98208
224-250 2 237.5 4.75133
250-275 2 162.5 4.83826
Total 21 47.21016

1
Solve: We have, log G. M. = 𝑁 ∑𝑘𝑖=1 𝑓𝑖 log 𝑥𝑖
47.21016
= 21

= 2.248103
∴ G.M. = Anti log 2.248103 =177.05

Properties of arithmetic mean:


1. Let 𝑥11 , 𝑥12 ,……… 𝑥1𝑛1 be a set of observations and geometric mean of this set of
observations be 𝐺1 . Also, consider that 𝑥21 ,𝑥22 ,……𝑥2𝑛2 are another 𝑛2 observations, the
geometric mean of which is 𝐺2 . Then the geometric mean of both sets of observations is
given by
𝑛1 log 𝐺1 + 𝑛2 log 𝐺2
G = Anti log 𝑛1 + 𝑛2

Proof :
The geometric mean of all observation is defined by
1
G = (𝑥11 𝑥12 … … 𝑥1𝑛1 𝑥21 𝑥22 … … 𝑥2𝑛2 )(𝑛1+ 𝑛2 )
1
Log G = 𝑛 log (𝑥11 𝑥12 … … 𝑥1𝑛1 𝑥21 𝑥22 … … 𝑥2𝑛2 )
1 +𝑛2

1
=𝑛 [ log 𝑥11 𝑥12 … … 𝑥1𝑛1 + log 𝑥21 𝑥22 … … 𝑥2𝑛2 ]
1 +𝑛2

1
We know, log 𝐺1 = 𝑛 log ( 𝑥11 𝑥12 … … 𝑥1𝑛1 )
1

Or, 𝑛1 log 𝐺1 = log (𝑥11 𝑥12 … … 𝑥1𝑛1 )

Similarly, 𝑛2 log 𝐺2 = log (𝑥21 𝑥22 … … 𝑥2𝑛2 )

20 | P a g e
1
Then, Log G = log [ 𝑛1 log 𝐺1 + 𝑛2 log 𝐺2 ]
𝑛1 +𝑛2

𝑛1 log 𝐺1 + 𝑛2 log 𝐺2
∴ 𝐺 = 𝐴𝑛𝑡𝑖 log
𝑛1 + 𝑛2
In a similar way, if the sample observations are divided into K groups, where the geometric
mean of 𝑛1 observations is 𝐺𝑖 of i-th group (i = 1, 2,….k ), then the geometric mean of all
observations is given by
𝑛1 𝐺1 + 𝑛2 𝐺2 + …..+ 𝑛𝑘 𝐺𝑘
log 𝐺 = 𝑛1 +𝑛2 + ……..+𝑛𝑘

2. A.M. ≥ G.M.
Proof : Let there be two observations 𝑥1 and 𝑥2 , The arithmetic and geometric mean of
these two observations are, respectively
1
A.M. = 2 (𝑥1 + 𝑥2 ) ; G.M. = √𝑥1 𝑥2
1
Now, A.M. – G.M. = (𝑥1 + 𝑥2 ) - √𝑥1 𝑥2
2
1
= (𝑥1 + 𝑥2 - 2√𝑥1 𝑥2 )
2
1
= 2 (√𝑥1 − √𝑥2 )2 = +ve

∴ A.M. > G.M. ; The equality sign holds good if 𝑥1 = 𝑥2 .

Uses of geometric mean:


I. If data are expressed in rates, ratios or percentages, geometric mean is suitable to calculate
the average rate or ratio or percentage.
II. It is used to calculate rate of growth of population and to find the average rate of increase
or decrease in economic activities such as economic growth of the country or turnover of a
business.
III. Geometric mean is used to calculate index number.

Merits and demerits of geometric mean:


Merits:
I. Geometric mean has definite formula to calculate.
II. It is less affected by extreme value.
III. It is not affected by sampling fluctuation.
IV. It is amenable to further mathematical treatment.

21 | P a g e
Demerits:
I. It is neither easy to understand nor easy to calculate.
II. Geometric mean is not calculated if any of the observation is zero, since the result is found
as zero.
III. It is not calculated if even or odd number of observations are negative number of negative
observations give imaginary value of geometric mean.

***The arithmetic mean and geometric mean of two observations are 15 and 9 respectively. Find
the two observations.
Solve: Let x and y be the two observations. Then,
𝑥+𝑦
2
= 15 and √𝑥𝑦 =9

𝑜𝑟, x + y = 30 and xy = 81
We know, (x − y)2 = (x + y)2 – 4xy
= 302 - 4×81
= 576
∴ x – y = ± 24
We have, x + y = 30……………………..(i)
x – y = 24……………………...(ii)
Solving (i) and (ii), we get x = 27, y=3
Again, x + y = 30……………………..(iii)
x – y = -24…………………….(iv)
Solving (iii) and (iv), we get x = 3, y = 27

Harmonic mean (H.M): Let x1, x2,……...., xn be a set of n observations recorded in any statistical
investigation. Then harmonic mean of this set of observations is defined by
n 𝑛
H= 1 1 1 = 1
+ +⋯+ ∑𝑛
𝑖=1 𝑥
x 1 x2 xn 𝑖

This H is the reciprocal of arithmetic mean of the reciprocal of xi ’s (i=1,2,….n)

22 | P a g e
***A train moves first 50 km at a speed of 60 km/hour, second 50 km at a speed of 75 km/hour,
third 50 km at a speed of 65 km/hour and forth 50 km at a speed of 80 km/hour. What is the
average speed of the train throughout the journey?
Solution: Given x1 = 60, x2 = 75, x3 = 65, x4 = 80.
The train covers same distance at each step. The distance can be ignored in calculating average
speed. The average speed is given by
𝑛
H= 1 ; n=4
∑𝑛
𝑖=1 𝑥𝑖

4
H= 1 1 1 1 = 69.10𝑘𝑚/ℎ.
+ + +
60 75 65 80

***A plane moves first 800 km at a speed of 600 km/hour, second 400 km at a speed of 800
km/hour and last 200 km at a speed of 500 km/hour. Find the average speed of the plane.
Solution: Let the speeds be x1=600, x2=800 and x3=500 and the distances covered by the plane be
d1=800, d2=400 and d3=200. The time taken to cover the distances are
𝑑 800 𝑑 400 𝑑 200
t1= 𝑥1 = 600, t2 = 𝑥2 = 800, and t3 = 𝑥3 = 500,
1 2 3

The average speed of the plane is


𝑇𝑜𝑡𝑎𝑙 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 800+400+200
H= = 800 400 200 = 626.86 km/h
𝑇𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 + +
600 800 500

The above average speed is known as weight harmonic mean, where weights are the distances (d1)
covered and the speeds are x1, x2, ……. xk (K=3)

Harmonic mean from frequency table: Similar to the weighted harmonic mean, this mean can also
be calculated from the frequency distribution, where the weight or the important of any value
(mid-value) is the frequency of that value. Let x1 be the mid-value of i-th class of a frequency
distribution and f1 be the corresponding frequency (i=1, 2, ……., k).

Then the harmonic mean of the distribution is given by-


𝑁
H= 𝑓1 ; 𝑁 = ∑𝑘𝑖=1 𝑓1
∑𝑘
𝑖=1𝑥1

23 | P a g e
***The following data represent the distribution of workers in a garments industry according to
the rate of bonus they received during a festival.

Class interval of rate of bonus (in %) No of workers. fi 𝒇𝒊


Mid-value xi c.f
𝒙𝒊
<60 60 55 1.0909 300
60-70 75 65 1.1538 240
70-80 125 75 1.6667 165
80-90 25 85 0.2941 40
90-100 15 95 0.1579 15
Total 300 4.3634
Find the average rate of bonus of all workers of the industry. How many workers received bonus
at the rate of 70% and above?
Solve: The average rate of bonus is-
𝑁 300
H= 𝑓1 = = 68.75%
∑𝑘
𝑖=1𝑥1
4.3634

There are 165 workers who received bonus at the rate of 70% and above.

Uses of harmonic mean:


i. The harmonic mean is used to calculate average of data if these are expressed in rates or
ratios or in relatives.
ii. It is used to calculate the average speed of any vehicle.
iii. It is used to calculate average growth rate or the average rate of profit in any business.

Merits and demerits of harmonic mean:


Merits
i. The harmonic mean is calculated by a definite formula.
ii. It is calculated using all observations.
iii. It is amenable to further mathematical treatment.
iv. It is not affected much by one or two extreme values.
v. It is not much affected by sampling fluctuation.
Demerits
i. It is neither easy to calculate nor easy to understand.
ii. It is not suitable in economic analysis.
iii. It is not calculated if any observation is zero.
iv. It is not a representative measure of central tendency unless it is not justified to give higher
weight to the smaller observation.

24 | P a g e
Theorem: A. M. ≥ G. M. ≥ H. M.
Proof: The theorem is proved for two value of series.
However, the theorem is true for any number of observations.
Let the observation be x1 and x2. Then
1 2 2𝑥1 𝑥2
A.M = 2 (𝑥1 + 𝑥2 ). G.M = √𝑥1 𝑥2 . H.M.= 1 1 =𝑥
+ 1 +𝑥2
𝑥1 𝑥2

1 1
A. M.─ G.M. = (𝑥1 + 𝑥2 )─ √𝑥1 𝑥2 = 2 (𝑥1 + 𝑥2 ─2 √𝑥1 𝑥2 )
2
1
= 2 (√𝑥1 ─ √𝑥2 )2 = +ve

=> A.M.>G.M. If x1 = x2, A.M. = G.M.

Again,
2𝑥1 𝑥2 𝑥1 𝑥2
G.M. ─ H.M. = √𝑥1 𝑥2 ─ 𝑥 = 𝑥√ [𝑥1 + 𝑥2 ─2 √𝑥1 𝑥2 ]
1 +𝑥2 1 +𝑥2

𝑥1 𝑥2
= 𝑥√ (√𝑥1 ─ √𝑥2 )2 = +ve
1 +𝑥2

=> G.M. ≥ H.M. If x1 = x2, G.M. = H.M.


∴ A.M. ≥ G.M. ≥ H.M.

***Find arithmetic mean, geometric mean and harmonic mean of the series
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
And show that A.M. > G.M. > H.M.
Solution: The arithmetical mean (A.M.) is given by
1
A.M. = 10 (1 + 2 + 3 + ⋯ + 10)[∵ 𝑛 = 10]
55
= 10 = 5.5

The geometric mean (G.M) is given by


1 6.5595
log G.M. = 10 [𝑙𝑜𝑔1 + 𝑙𝑜𝑔2 + 𝑙𝑜𝑔3 + ⋯ + 𝑙𝑜𝑔10] = = 0.65595
10

G. M. = Anti log 0.65595 = 4.53


The harmonic mean (H.M.) is given by
𝑛 10 10
H.M = 1 = 1 1 1 = 2.92897 = 3.41
∑𝑛
𝑖=1 1+ + +⋯+
𝑥1 2 3 10

25 | P a g e
∴ A. M. > G. M. > H.M.

Median: Median is that value of the series which divides the array of the series into two equal parts
such that half of the observations are bellow it and another half of the observation are above it. For
example, let, x: 1, 3, 7, 12, 13. Here, 7 is called median as it is middle of the array.

***The following data represent the distribution of cows of a dairy farm according to the amount
of milk (in kg) given per day:

Class interval of milk No. of cows Cumulative frequency c.f


top bottom
<10 10 10 132
10-15 15 35 122
15-20 48 83 97
20-25 21 104 49
25-30 16 120 28
30 + 12 132 12
I. Find median amount of milk given by cows.
II. How many cows give 15kg or more milk?
𝑁
Solve: (i) in the given distribution N = 132 ∴ 2 = 61

h = 5,
The class for which c.f ≥ 61 is 15-20, where
Lower limit of the class is l = 15
Frequency of this class is f = 48
And, c.f preceding to the class is c = 35
We know,
ℎ 𝑛 5
Me = l + 𝑓 (2 − 𝑐) = 15 + 48 (61 - 35) = 17.71kg

(ii) The c.f calculated from bottom shows that 97 cows are giving milk 15kg or above.

Advantages and dis advantages of median:


Advantages
I. It is rigidly defined.
II. It is easy to calculate and easy to understand.
III. It can be found out from graph.

26 | P a g e
IV. Median is a suitable measure of central tendency if the frequency distribution is prepared
with upper end open classes.
V. Median is the only average to be used to deal with qualitative variable.
VI. It is not affected by extreme value in the series.
Disadvantages
I. It is not based on all observations.
II. It is not suitable for further mathematical treatment.
III. Median is not found out properly from ungrouped data if number of observations is even.
IV. It is affected more by sampling fluctuations.
V. Median is not found out from a frequency table with lower end open classes,
VI. If there are several groups of observations and median of each group is available, then the
median of the combined observations is not available by combining the medians of different
groups.

Uses of median:
I. It is a good measure of central tendency for a markedly skewed distribution such as income
distribution.
II. It is used to find the average wage rate of a group of workers.
III. It is used to find average value, of a set of observations related to qualitative variable such
as intelligence, health condition, socioeconomic condition etc.
IV. Median is a good measure of central tendency if the characteristic under study is in ranks or
scores.

***Two frequencies of two classes are missing but median value of the distribution is known as
Me = 80, where the frequency distribution is related to the number of packages received in
different days in a post office. Find the missing frequencies.

Class interval of packages No. of days c.f


<50 15 15
50-60 f2 f2 + 15
60-70 39 f2 + 54
70-80 42 f2 + 96
80-90 48 f2 + 144
90-100 30 f2 + 174
100+ f7 f2 + f7 + 174
Total 246

Solve: Me = 80
f2 + f7 = 246 – 174 = 72

27 | P a g e
ℎ 𝑛
Me = l + ( − 𝑐)
𝑓 2
𝑛
Here, 2 =123

Since, Me = 80, median class is either 70-80 0r 80-90


𝑛
Since, C is the c.f of the class preceding the median class, ( 2 − 𝑐) cannot be zero negative and hence
median class cannot be 80-90.
Therefor,
l = 70
f = 42
C = f2 + 54
Thus,
10
80 = 70 + 42 [123 - (f2 + 54)]
10
Or, 10 = 42 [123 - (f2 + 54)]

Or, 42 = 123 - f2 – 54
∴ f2 = 27
We have,
f2 + f7 = 72
∴ f7 = 72 – 27 = 45

Mode: Mode is that value of the variable which occurs most frequently in the series of observations
of the variable. For example, let us consider the ages (in year) of some children investigated in a
small locality:
5, 2, 2, 8, 7, 6, 5, 4, 3, 4, 5, 2, 2
In the above example, age 2 years are recorded 4 times (maximum time). So, 2years is the mode of
the distribution of ages of children.

28 | P a g e
***The following data represent the distribution of female workers in different garments
industries according to their monthly salary (in taka).

Class interval of salary No. of female workers c.f


<600 10 10
600-700 40 50
700-800 65 115
800-900 250 365
900-1000 175 540
1000-1100 82 622
1100-1200 50 672
I. Find the maximum salary, on an average, of the major group of female workers.
II. How many female workers have salary less than 1000.00taka.
Solve: (i) The major group of female workers are 250, whose salary is in the limit 800-900. Where,
l = 800
h = 100
f1 = 250
f0 = 65
f2 = 175
Their, on an average, salary is given by mode,
ℎ (𝑓 − 𝑓 ) 100 (250−65)
Mo = l + 2𝑓 −1𝑓 −𝑓
0
= 800 + 2 ×250−65−175 = 871.15taka
1 0 2

(ii) From c.f it is observed that 540 female workers salary is less than 1000.00taka.

***Find the mode of the following frequency distribution:

Class interval: 5-10 10-15 15-20 20-25 25-30 30-35


Frequency: 12 25 24 25 18 `10

Solve: Since 25 is the frequency of two classes, there are two modes of the given distribution.
The first mode is-
ℎ (𝑓 − 𝑓 ) 5 (25−12)
Mo = l + 2𝑓 −1𝑓 −𝑓
0
= 20 + 2 ×25−12−24 = 14.64
1 0 2

The second mode is-


ℎ (𝑓 − 𝑓 ) 5 (25−24)
Mo = l + 2𝑓 −1𝑓 −𝑓
0
= 20 + 2 ×25−24−18 = 20.625
1 0 2

The given distribution is bi-modal.

29 | P a g e
Merits and de merits of mode:
Merits
I. It is easy to understand and easy to calculate.
II. It has a definite formula if it is calculated from frequency distribution.
III. It can be found out easily from graph.
IV. It can be calculated if frequency distribution is with upper end open classes.
V. It is not affected by extreme value unless it occurs frequently.
VI. It is obtained by inspection from raw data.
Demerits
I. It is not rigidly defined, specially if the data are in raw form.
II. It is not based on all the observations of a distribution.
III. It is to a greater extent by extreme value if extreme value occurs most frequently.
IV. Mode is ill defined if the modal class is the first or last class of the frequency distribution.
V. It is also ill defined if maximum frequency occurs repeatedly.
VI. Modes of different sets of observations cannot be combined to get mode of combined
observations of all sets.

Uses of mode:
I. Mode is used to handle economic data such as daily sells in a shop, daily output of an
industry, daily wages of workers, daily export of a company etc. to know the maximum
value of the variable with frequency.
II. It is also used in market research, where business man or company needs to know the type
or quality of commodities which are most frequently demanded.
III. It is also used in analyzing data related to weather, where mode provides number of days
with maximum temperature during a summer, number of days having maximum rainfall
during a rainy season, number of days having maximum or minimum humidity etc.
IV. It is also used to handle social data. Mode provides information related to maximum
number of road accidents in days, maximum number of suicide cases in days, maximum
number of people killed due to illegal agents in days etc.

30 | P a g e
Chapter 04: Measures of dispersion

The term dispersion means the scatteredness of observations from some central value.
Data set -1: x1i : 4, 5, 6, 7, 6, 8; 𝑥1 = 6, n1 = 6
̅̅̅
Data set -2: x2i : 2, 4, 6, 8, 4, 12; 𝑥2 = 6, n2 = 6
̅̅̅
The amount of scatteredness can be evaluated by absolute deviations as follows:
| x1 - ̅̅̅
𝑥1 | : 2, 1, 0, 1, 0, 2
| x2 - ̅̅̅
𝑥2 | : 4, 2, 0, 2, 2, 6

Measures of dispersion
Let us consider the age of some children as follows-
x (in years) : 2, 5, 4, 3, 4, 5, 2, 7; 𝑥̅ = 4years, n = 8
Consider another set of observations which indicate electric failure in different days as follows-
Electric failure, y (in hours) : 2, 5, 4, 3, 4, 5, 2, 7; 𝑦̅ = 4hours, n = 8
The deviation of x from 𝑥̅ is,
|x - 𝑥̅ | : 2, 1, 0, 1, 0, 1, 2, 3
10
Mean deviation = = 1.25
8

Again, deviation of y from 𝑦̅ is,


|y - 𝑦̅ | : 2, 1, 0, 1, 0, 1, 2, 3
10
Mean deviation = = 1.25
8

Measure of dispersion can be divide into two groups:


1. Absolute measure of dispersion: It is a measure which provides the information on average
deviation or scatteredness of observations, where measure depends on unit of the variable
under study. Such measures are-
I. Range,
II. Mean Deviation (MOD),
a) Mean Deviation from Mean (M.D (mean)),
b) Mean Deviation from Median (M.D (median)),
c) Mean Deviation from Mode (M.D (mode)),
III. Standard Deviation (S.D),

31 | P a g e
IV. Semi-interquartile Range or Quartile Deviation (Q.D).
2. Relative Measure of Dispersion: It' is a measure which depicts the average amount of
scatteredness of observations but free of unit of the variable under study. It measures the
percentage variation of observations from some central value. The measures are-
I. Coefficient of Range,
II. Coefficient of Mean Deviation,
III. Coefficient of Standard Deviation,
IV. Coefficient of Variation (c.v),
V. Coefficient of Quartile Deviation.

Range
Consider a set of observations of size n; where the observations can be arranged in ascending order
as follows:
x(1) < x(2) < x(3) < ……………. < x(n)
Here x(1) = the lowest observation in the series, and x(n) = the highest observation in the series.
Then range, R, is defined by, R = x(n) - x(1)

For example, let us consider the total annual rainfall (in m.m) recorded in some Meteorological
stations in Bangladesh in 1998, where the rainfall data are as follows :
3863, 3914, 4672, 4139, 4435, 4245, 3216, 2518, 3368, 4388, 2312, 1819, 2200, 2858, 2548, 1490,
1994, 3217, 2852, 2601, 2391, 1636, 1540, 2365, 3139.
Here, n= 25
R(25) = 4672, highest amount of rainfall,
R(1) = 1490, lowest amount of rainfall.
Therefore, range of rainfall is,
R = R(25) - R(1) = 4672 – 1490 = 3182m.m
This R is an absolute measure of dispersion. The corresponding relative measure of dispersion is
coefficient of range and is given by-
𝑥(𝑛) − 𝑥(1)
Coefficient of range = 𝑥
(𝑛) + 𝑥(1)

This coefficient is multiplied by 100 to express the result in percentage. In our given example, the
coefficient of range is,
4672 – 1490
Coefficient of range = 4672+ 1490 × 100% = 51.64%

32 | P a g e
***Find the range and coefficient of range for the following frequency distribution-

Class interval 5-10 10-15 15-20 20-25 25-30 30-35


Frequency 18 20 48 52 36 26

Solve: R = 35 – 5 = 30
35 – 5
Coefficient of range = 35+ 5 × 100% = 75.0%

MARITS AND DEMERITS OF RANGE


Merits:
I. Range is calculated easily, since it is based on only two observations.
II. It is, easily understood.
III. It is rigidly defined.
Demerits:
I. Range is not based on all observations and hence does not depict the variability of all
observations.
II. The amount of range is affected by extreme values.
III. It is affected much by sampling fluctuations.
IV. It is not calculated from frequency distribution with open-end classes.

USE OF RANGE
I. Range is used for statistical quality control of industrial products,
II. It used to measure the variation of data, where small variations are observed in the data set.
Such data with small variations are (a) stock market data throughout the day, (b) rate of
exchange of money, (e) rate of interest in call money,
III. It is used in weather forecast to estimate the difference between maximum and minimum
temperature or between maximum rainfall and minimum rainfall during rainy season.
IV. It is used in quoting interest rate and security prices at the stock exchange.

MEAN DEVIATION
One of the important absolute measure of dispersion is mean deviation, since it is based on all
observations. The deviation can be measured from mean, median and mode. Let us consider a set
of observations. as follows:
x1, x2, x3, ……………. Xn
Let the mean, median and mode of this set of observations be 𝑥̅ , Me and Mo, respectively, Then
mean deviation from mean is defined by -

33 | P a g e
1
M.D. (mean) = ∑ |𝑥𝑖 − 𝑥̅ |
𝑛

Similarly, the mean deviation from median and mode are given, respectively by-
1
M.D. (median) = 𝑛 ∑ |𝑥𝑖 − 𝑀𝑒| and
1
M.D. (mode) = 𝑛 ∑ |𝑥𝑖 − 𝑀𝑜|

The corresponding relative measure of dispersion are coefficient of mean deviation are given by-
M.D.(mean)
Coefficient of M.D. (mean) = × 100%
mean
M.D.(median)
Coefficient of M.D. (median) = × 100%
median

M.D.(mode)
Coefficient of M.D. (mode) = × 100%
mode

The mean deviation from mean, median, mode can be found out from frequency distribution. The
formula are-
1
M.D. (mean) = 𝑁 ∑𝑘𝑖=1 𝑓𝑖 |𝑋𝑖 − 𝑋̅|
1
M.D. (median) = 𝑁 ∑𝑘𝑖=1 𝑓𝑖 |𝑋𝑖 − 𝑀𝑒| and
1
M.D. (mode) = 𝑁 ∑𝑘𝑖=1 𝑓𝑖 |𝑋𝑖 − 𝑀𝑜|

where Xi is the mid-value of i-th class of a frequency distribution and fi is the corresponding
frequency, 𝑋̅ is the mean of the distribution, i = 1, 2, ………… k. The coefficient of mean deviation is
calculated by the formula shown above.

***The monthly average temperature (oc) in different months of 1998 recorded in Dhaka
Meteorological station are-
12.7, 16.1, 18.3, 22.9, 25.3, 28.1, 26.4, 26.8, 26,3, 25.4, 20.6, 14.8
Find percentage change of variation in average temperature in different months
Solution: The percentage change of variation in the average amount of minimum temperature is
found out by coefficient of mean deviation, where deviation can be measured from mean or
median. The mean and median of the given set of observations are-
1 263.70
Mean, 𝑥̅ = 𝑛 ∑ 𝑥 = = 21.97 oc
12

The temperature data in ascending order are as follows:


12.7, 14.8, 16.1, 18.3, 20.6, 22.9, 25.3, 25.4, 26.3, 26.4, 26.8, 28.1
1 𝑛 𝑛
Me = Value of 2 [2th +( 2 + 1)th] observation

As, n is even,

34 | P a g e
1
= Value of [6th + 7th] observation; n=12
2
1
= 2 [22.9 + 25.3] = 24.10 oc

Now, the observations of temperature data from mean and median are shown bellow-

|𝒙𝒊 − 𝒙̅| 9.27 5.87 3.67 0.93 3.33 6.13 4.43 4.83 4.33 3.43 1.37 7.17
̅̅̅̅̅
|𝒙𝒊 − 𝑴𝒆| 11.4 8.0 5.8 1.2 1.2 4.0 2.3 2.7 2.2 1.3 3.5 9.3

1 54.76
∴ M.D. (mean) = 𝑛 ∑ |𝑥𝑖 − 𝑥̅ | = = 4.56 oc
12
1 52.9
And, M.D. (mean) = 𝑛 ∑ |𝑥𝑖 − 𝑀𝑒| = = 4.40 oc
12
M.D.(mean) 4.56
∴ Coefficient of M.D. (mean) = × 100% = 21.97 = 20.76%
mean

M.D.(median) 4.40
And, Coefficient of M.D. (mdian) = × 100% = 24.1 = 18.26%
median

The average minimum temperature varies by 20.76% from mean temperature. It varies by 18.26%
from median.

***The following data represent the distribution of amount of fertilizer (in kg) sold in a shop
different days during boro rice season-

Class interval 100-150 150-200 200-250 250-300 300-350 350+


of amount of
fertilizer
No. of days, 𝒇𝒊 15 17 18 10 8 12
c.f 15 32 50 60 68 80
Mid-value, 𝑿𝒊 125 175 225 275 325 375
𝒇𝒊 𝑿𝒊 1875 2975 4050 2750 2600 4500
̅
𝒇𝒊 |𝑿𝒊 - 𝑿| 1640.625 1009.375 168.75 406.25 725.00 1687.36
𝒇𝒊 |𝑿𝒊 - Me| 1483.30 802.74 50.04 527.80 822.24 1833.36
𝒇𝒊 |𝑿𝒊 - Mo| 1208.40 519.52 349.92 694.40 955.52 2033.28
Find percentage of variation in amount of fertilizer so9ld in different days in respect of mean,
median and mode.
1 18750
Solve: Mean, 𝑋̅ = 𝑁 ∑ 𝑓𝑖 𝑋𝑖 = 80 = 234.375kg
ℎ 𝑁 50
Median, Me = l + 𝑓 ( 2 - c) = 200 + 18 (40 - 32) = 222.22kg

ℎ(𝑓 −𝑓 ) 50(18−17)
Mode, Mo = I + 2𝑓 −𝑖𝑓 −𝑜 𝑓 = 200 + 2×18−17−10 = 205.56kg
𝑖 𝑜 2

1 5637.5
M.D. (mean) = 𝑁 ∑ |𝑋𝑖 − 𝑋̅| = 80 = 70.47kg

35 | P a g e
M.D.(mean) 70.47
Coefficient of M.D. (mean) = × 100% = × 100% = 30.07%
mean 234.375

Therefore, the daily sell is scattered by around 30% from mean sell.

1 5519.48
M.D. (median) = 𝑁 ∑ |𝑋𝑖 − 𝑀𝑒| = = 68.99kg
80
M.D.(median) 68.99
Coefficient of M.D. (median) = × 100% = 222.22 × 100% = 31.05%
median

The daily sell is dispersed by around 31% from median amount of sell.

1 5761.04
M.D. (mode) = 𝑁 ∑ |𝑋𝑖 − 𝑀𝑜| = = 72.013kg
80
M.D.(mode) 72.013
Coefficient of M.D. (mode) = × 100% = 205.56 × 100% = 35.03%
mode

The daily sell is scattered by around 35% from the mode amount of sell.

USES OF MEAN DEVIATION:


I. Mean deviation is frequently used in studying the distribution of personal wealth of a
nation.
II. It is also used in analysis related to forecasting business cycles.

Variance:
Let x1, x2, …………………….. xn be a set of observations recorded in any statistical investigation. Then
variance of x is defined by-
1 1 (∑ 𝑥𝑖 )2 𝑆𝑆(𝑥)
V(x) = 𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 = 𝑛 [∑ 𝑥𝑖2 − ]= , where SS(x) = sum of sequences of x.
𝑛 𝑛

It is called mean square deviation about mean. This mean square deviation is minimum (property of
arithmetic mean).
The variance of x from a frequency table is calculated by-
1 1 (∑ 𝑓𝑖 𝑋𝑖 )2
V(x) = 𝑁 ∑𝑘𝑖=1 𝑓𝑖 (𝑋𝑖 − 𝑋̅)2 = 𝑁 [∑ 𝑓𝑖 𝑋𝑖2 − ], where N = ∑𝑘𝑖=1 𝑓𝑖 = total frequency.
𝑛
Here, 𝑋𝑖 is the mid value of the i-th class of a frequency distribution and 𝑓𝑖 is corresponding
frequency.

Standard deviation (𝝈):


The positive square root of variance is known as standard deviation. Thus-

36 | P a g e
1 (∑ 𝑓𝑖 𝑋𝑖 )2
Standard deviation, 𝜎 = √V(x) = √𝑁 [∑ 𝑓𝑖 𝑋𝑖2 − ]
𝑛

The formula of standard deviation for raw data is-

1 (∑ 𝑥𝑖 )2
𝜎 = √V(x) = √𝑛 [∑ 𝑥𝑖2 − ]
𝑛

By standard deviation we measure the average distance of the observations from mean and hence
we consider only the positive square root of variance.

***The following observations represent the prices of mango sold in different market in a city-
Price of mango (in taka): 45.40, 50.65, 50.00, 45.00, 46.00, 48.00, 47.00, 55.00, 50.00, 54.00
Find standard deviation in price of mango.
Solution: The variance of prices is calculated by-
1 (∑ 𝑥𝑖 )2
V(x) = 𝜎 2 = 𝑛 [∑ 𝑥𝑖2 − ]; n = 10
𝑛

1 (491.15)2
= 10 [24234.6725 - ]
10
1
= 10 [24234.6725 – 24122.8322]

= 11.18403 (taka)2
∴ Standard deviation of price is-

𝜎 = √V(x) = √11.18403 = 3.34 taka

This indicates that the price of mango varies from mean price by an amount ±3.34 taka.

***The distribution of working hours of some female workers in different garments indrustries
are shown below-

Class interval of 8-9 9-10 10-11 11-12 Total


working hours
No. of industries, fi 50 165 45 25 285
Mid-value, Xi 8.5 9.5 10.5 11.5
f i Xi 425.0 1567.5 472.5 287.5 2752.5
f i Xi2 3612.5 1498.35 4961.25 3306.25 26771.25
Calculate standard deviation of working hours of workers in different industries.

Solve: The standard deviation of working hours is given by, 𝜎 = √V(x)


1 (∑ 𝑓𝑖 𝑋𝑖 )2
Where, V(x) = 𝑁 [∑ 𝑓𝑖 𝑋𝑖2 − ]
𝑛

37 | P a g e
1 (2752.5)2
= [26771.25 - ]
285 285
1
= 285 [26771.25 – 26583.36]

= 0.6593 (hour)2

∴ 𝜎 = √0.6593 = 0.81 hour

PROPERTIES OF STANDARD DEVIATION:


I. The variance and hence standard deviation is zero if all observations under study are same.
For example, let x : 2, 2, 2, 2, 2 the mean of which is 2 and deviations are 𝑥𝑖 − 𝑥̅ : 0, 0, 0, 0,
0. Therefore, mean deviation from mean or mean square deviation from mean is zero. This
implies that standard deviation is zero.
II. Standard deviation is minimum if the variation among observations are less. For example,
let us consider two sets of observations as follows :
x : 1, 2, 3, and y: 2, 4, 6, 8, 10
where, 𝑥̅ = 3 and 𝑦̅ = 6
The square of deviations of observations from mean are-

(𝑥 − 𝑥̅ )2 : 4, 1, 0, 1, 2
(𝑦 − 𝑦̅)2 : 16, 4, 0, 4, 16
1 10 1 40
Therefore, the V(x) = 𝑛 ∑(𝑥 − 𝑥̅ )2 = = 2 and V(y) = 𝑛 ∑(𝑦 − 𝑦̅)2 = = 8.
5 5

It is observed that V(y) > V(x).

***The distribution of mothers by their number of ever born children are shown below. Show
that, for this distribution 𝝈 ≥ M.D (mean).

Class interval of No. of Mid-value, Xi fi Xi fi Xi2 ̅)


fi (𝑿𝒊 − 𝑿
ever born child mothers, fi
<2 15 1.5 22.5 33.74 33.0
2-3 52 2.5 130.0 325.0 62.4
3-4 105 3.5 367.5 1286.25 21.0
4-5 44 4.5 198.0 891.0 35.2
5-6 26 5.5 143.0 786.5 46.8
6+ 12 6.5 78.0 507.0 33.6
Total 254 939 3829.5 232.0

1 939.0
Solve: 𝑋̅ = 𝑁 ∑ 𝑓𝑖 𝑋𝑖 = = 3.7
254
1 232.0
M.D (mean) = 𝑁 ∑ 𝑓𝑖 (𝑋𝑖 − 𝑋̅) = 254 = 0.91

38 | P a g e
1 1 (∑ 𝑓𝑖 𝑋𝑖 )2
𝜎2 = ∑ 𝑓𝑖 (𝑋𝑖 − 𝑋̅)2 = [∑ 𝑓𝑖 𝑋𝑖2 − ]
𝑁 𝑁 𝑛

1 (939)2
= 254 [3829.5 - ]
254
1
= 254 [3829.5 – 3471.34]

= 1.41

∴ 𝜎 = √𝜎 2 = √1.41 = 1.19
∴ 𝜎 > M.D

USES OF STANDARD DEVIATION:


I. It is a good measure to study the formation of observations in a distribution.
II. It is used to estimate lower limit and upper limit of the observations in a distribution (𝑥̅ ±
𝜎). For normal distribution 95% observations fall in the limits 𝑥̅ ± 2𝜎.
III. The relative measure corresponding to standard deviation is used to compare the dispersion
of two or more distributions.
IV. The standard deviation of an estimate is used to have an idea of the precision of estimate.

COEFFICIENT OF VARIATION (C.V):


This is a relative measure of dispersion corresponding to standard deviation. The coefficient of
standard deviation is defined by-
𝜎
Coefficient of standard deviation = 𝑥̅

This coefficient is multiplied by 100 to get the percentage change of variation of a set of
observations. This percentage change of variation of a set of observations is called coefficient of
variation and is given by-
𝜎
C.V = 𝑥̅ × 100%

This measure is free of unit of variable under study. Hence it is a measure compare the dispersion
of two or more distributions.

***Calculate percentage change of variation of the following two sets of observations and
compare the formation of two sets of observations.
Set-1, x1i: 4, 8, 10, 12, 18, 8
Set-1, x2i: 10, 10, 11, 11, 12, 6
Solve: For 1st of observations,

39 | P a g e
1 60
𝑥1 =
̅̅̅ ∑ 𝑥1𝑖 = = 10
𝑛1 6

1 1 (∑ 𝑥1𝑖 )2
𝜎12 = 𝑛 ∑(𝑥1𝑖 − 𝑥
̅̅̅)
1
2
= 𝑛 [∑ 𝑥1𝑖 2 − ]
1 1 𝑛1

1 (60)2
= 6 [696 - ]
6

= 16
∴ 𝜎1 = 4
𝜎 4
C.V1 = ̅𝑥̅̅1̅ × 100% = 10 × 100% = 40%
1

For 2nd of observations,


1 60
𝑥2 = 𝑛 ∑ 𝑥2𝑖 =
̅̅̅ = 10
2 6

1 1 (∑ 𝑥2𝑖 )2
𝜎22 = 𝑛 ∑(𝑥2𝑖 − 𝑥
̅̅̅)
2
2
= 𝑛 [∑ 𝑥2𝑖 2 − ]
2 2 𝑛2

1 (60)2
= 6 [622 - ]
6

= 3.67
∴ 𝜎1 = 1.91
𝜎 1.91
C.V2 = ̅𝑥̅̅1̅ × 100% = × 100% = 19.1%
2 10

This observed that C.V2 < C.V1. This implies that though the means of two sets of observations are
same, the first set of observations are more scattered from mean than the scatteredness of second
set of observations. The second set of observations are more homogenious.

***The combine grade point average (CGPA) in different semesters of students are shown below-

Students CGPA in semesters


1 2 3 4 5 6 7 8
A 2.5 2.5 3.0 3.5 3.5 4.0 3.5 3.5
B 2.5 3.0 4.0 4.0 4.0 2.0 2.5 4.0
Which student would you consider better throughout the courses of studies?
Solve: For student A-
1 26
𝑥1 = 𝑛 ∑ 𝑥1𝑖 =
̅̅̅ = 3.25
1 8

1 1 (∑ 𝑥1𝑖 )2
𝜎12 = 𝑛 ∑(𝑥1𝑖 − 𝑥
̅̅̅)
1
2
= 𝑛 [∑ 𝑥1𝑖 2 − ]
1 1 𝑛1

1 (26)2
= 8 [86.5 - 8
]

= 0.25

40 | P a g e
∴ 𝜎1 = √0.25 = 0.5
𝜎 0.5
C.V1 = ̅𝑥̅̅1̅ × 100% = 3.25 × 100% = 15.38%
1

For student B-
1 26
𝑥2 = 𝑛 ∑ 𝑥2𝑖 =
̅̅̅ = 3.25
2 8

1 1 (∑ 𝑥2𝑖 )2
𝜎22 = 𝑛 ∑(𝑥2𝑖 − 𝑥
̅̅̅)
2
2
= 𝑛 [∑ 𝑥2𝑖 2 − ]
2 2 𝑛2

1 (26)2
= 8 [89.5 - ]
8

= 0.625

∴ 𝜎1 = √0.625 = 0.79
𝜎 0.79
C.V2 = ̅𝑥̅̅1̅ × 100% = 3.35 × 100% = 24.31%
2

It is observed that average CGPA of both students are same but C.V of A is less than C.V of B (C.V 1 <
C.V2). This implies that the student A is better than B throughout of the course of studies. The of A
is more homogeneous in all semesters.

***The production of jute goods (in tons) in different days of first and second half of the year are
shown below-

Class interval of 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 4.0-4.5


production
No. of days, f1i in 12 48 70 35 25
the first half of year
No. of days, f2i in 5 38 80 50 7
the last half of year
Which part of the year the production level is homogeneous?
Solve:

Class f1i f2i Mid-value f1i Xi f1i Xi2 f2i Xi f2i Xi2
interval of Xi
production
2.0-2.5 12 5 2.25 27.00 60.75 11.25 25.3125
2.5-3.0 48 38 2.75 132.00 363.00 104.50 287.375
3.0-3.5 70 80 3.25 227.50 739.375 260.00 845.00
3.5-4.0 35 50 3.75 131.25 492.1875 187.50 703.125
4.0-4.5 15 7 4.25 63.75 270.9375 29.75 126.4375
Total 180 180 581.50 1926.25 593.00 1987.25
For the first half of the year-

̅̅̅1 = 1 ∑ 𝑓1𝑖 𝑋𝑖 = 581.50 = 3.23tons


𝑋 𝑁
1 180

41 | P a g e
1 (∑ 𝑓1𝑖 𝑋𝑖 )2
𝜎12 = [∑ 𝑓1𝑖 𝑋𝑖2 − ]
𝑁1 𝑁1

1 (581.50)2
= 180 [1926.25 - ]
180

= 0.2649

∴ 𝜎1 = √0.2649 = 0.51tons
𝜎 1 0.51
C.V1 = ̅̅̅̅ × 100% = 3.23 × 100% = 15.79%
𝑋 1

For second half of the year-

̅̅̅2 = 1 ∑ 𝑓2𝑖 𝑋𝑖 = 593.00 = 3.29tons


𝑋 𝑁 2 180

1 (∑ 𝑓2𝑖 𝑋𝑖 )2
𝜎22 = 𝑁 [∑ 𝑓2𝑖 𝑋𝑖2 − ]
2 𝑁2

1 (593.00)2
= 180 [1987.25 - ]
180

= 0.1864

∴ 𝜎1 = √0.1864 = 0.43tons
𝜎 2 0.43
C.V2 = ̅̅̅̅ × 100% = 3.29 × 100% = 13.11%
𝑋 2

The results show that in the second part of the year the production is, on an average, more and it is
more homogeneous over days, since C.V2 < C.V1.

42 | P a g e
Chapter 06: Probability
Probability: If in any random experiment the n outcomes are exhaustive, mutually exclusive and
equally likely and m of these are favorable to an event A, then the probability of A is defined by-
𝑚
P(A) = 𝑛

Let 𝐴̅ be the complementary event to A. The favorable outcomes to 𝐴̅ are n – m. Then, probability
of 𝐴̅ is given by-
n–m
P(𝐴̅) = 𝑛
𝑚
=1- 𝑛

= 1 - P(A)

∴ P(A) + P(𝐴̅) = 1

***In a family there are 3 male and 2 female members. A family work is to be finished by any two
of them. Find the probability that-
a) The work will be finished by one male and one female member.
b) The work will be finished by either two males or two females.
Solve: (a) Let A be the event that the work will be finished by one male and one female member.
Since the work is to be finished by any two members, it can be finished in-
n = 5𝐶2 = 10ways

The work can be finished by one male and one female member in-
m = 3𝐶1 × 2𝐶1 = 6ways
𝑚 6
∴ P(A) = = = 0.6
𝑛 10

(b) Let 𝐴̅ be the event that the work will be finished by either two males or two females. Here 𝐴̅ is
complimentary event, since statement of 𝐴̅ is against the statement of A.

∴ P(𝐴̅) = 1 - P(A) = 1 – 0.6 = 0.4

***Two unbiased dice are thrown once. Find the probability that
(i) Both dice show same number,
(ii) First dice shows even number,
(iii) Both dice show even number,
(iv) Sum of the upper faces of the dice is 8 or more,
(v) Sum of the upper faces of the dice is above 10,

43 | P a g e
(vi) Sum of the upper faces of the dice is less than 7,
(vii) Second die shows number 5 or more.
Solution: The sample space of the experiment is –
11 21 31 41 51 61
12 22 32 42 52 62
S: { 13 23 33 43 53 63 } ; n = 36
14 24 34 44 54 64
15 25 35 45 55 65
16 26 36 46 56 66
(i) Let A be the event that both dice show same number. Favorable case to A, m = 6
𝑚 6 1
∴ P(A) = = =6
𝑛 36

(ii) Let B be the event that first dice shows even number. Favorable case to B, m = 18
𝑚 18 1
∴ P(B) = = =2
𝑛 36

(iii) Let C be the event that both dice shows even number. Favorable case to C, m = 9
𝑚 9 1
∴ P(C) = = =
𝑛 36 4

(iv) Let D be the event that sum of the upper faces of the dice is 8 or more. Favorable case to D, m =
15
𝑚 15 5
∴ P(D) = = = 12
𝑛 36

(v) Let E be the event that sum of the upper faces of the dice is above 10. Favorable case to E, m = 3
𝑚 3 1
∴ P(E) = = = 12
𝑛 36

(vi) Let F be the event that sum of the upper faces of the dice is less than 7. Favorable case to F, m =
15
𝑚 15 15
∴ P(F) = = =
𝑛 36 12

(vi) Let g be the event that Second die shows number 5 or more. Favorable case to G, m = 12
𝑚 12 1
∴ P(G) = = =3
𝑛 36

44 | P a g e
𝟐
*** Three biased coins are tossed once. It is known that any shows head with probability P(H) =
𝟑
𝟏`
[P(T) = 𝟑 ] . Find the probability that-

(i) At least two heads will appear.


(ii) At best two heads will appear,
(iii) There will be no head,
(iv) There will be two heads
Solution: The sample space of the experiment is-
S: {HHH, HHT, HTT, HTH, TTH, THH, THT, TTT}
(i) Let A be the event that there will be at least two heads will appear. Favorable points to
A: {HHH, HHT, HTH, THH}
∴ P(A) = P(HHH) + P(HHT) + P(HTH) + P(THH)
2 2 2 2 2 1 2 1 2 1 2 2
=3.3.3+3.3.3+3.3.3+3.3.3
20
=
27
(ii) Let B be the event that there will be at best two heads will appear. Then 𝐵̅ is the event
showing 3 hades.
P(𝐵̅) = P(HHH)
2 2 2
=3.3.3
8
= 27
∴ P(B) = 1 - P(𝐵̅)
8
= 1 - 27
19
= 27
(iii) Let C be the event that there will be no head. Favorable points to C: {TTT}
∴ P(C) = P(TTT)
1 1 1
=3.3.3
1
= 27
(iv) Let D be the event that there will be two head. Favorable points to D: {HHT, HTH, THH}
∴ P(D) = P(HHT) + P(HTH) + P(THH)
2 2 1 2 1 2 1 2 2
=3.3.3+3.3.3+3.3.3
12
= 27

Permutation: It is a technique to arrange r objects taken together from n distinguished objects. The
total number of such arrangements is denoted by 𝑛𝑃𝑟

where
𝑛!
𝑛𝑃𝑟 = (n—r)!

45 | P a g e
If n objects are arranged taken all n together, the total number of arrangements is
𝑛! 𝑛!
𝑛𝑃𝑛 = = = n (n - 1) (n - 2) (n - 3) ……………………….. 3 . 2 . 1
(n—n)! O!

***In a box there are 10 books. The books are to be arranged in 2 shelves each of which can
contain 5 books. Find the number of arrangements of the books.
Solution: Let us consider that the books are numbered by 1, 2, ………..., 10. We need to find the
value of 10𝑃5 , where
10!
10𝑃5 = = 30240
(10—5)!

The other rules of permutation are


Rule: The number of different permutations of n different objects, taken r at a time with repetition
is-
𝑛𝑃𝑟 = nr

In particular 𝑛𝑃𝑛 = nn

***In a box there are 3 balls numbered 1, 2, 3. Find the number of arrangements of balls-
(i) Taken 2 together with repetition,
(ii) Taken 3 at a time with repetition.
Solution: (i) Given n = 3, r = 2. We need 𝑛𝑃𝑟 with repetition, where,

𝑛𝑃𝑟 = nr = 32 = 9

These arrangements are-


11, 12, 13, 21, 22, 23, 31, 32, 33
(ii) Again, we need 𝑛𝑃𝑛 with repetition, where,

𝑛𝑃𝑛 = nn = 33 = 27

The arrangements are-


111, 112, 113, 121, 131, 122, 133, 123, 132
211, 212, 213, 221, 231, 222, 233, 223, 232
311, 312, 313, 321, 331, 322, 333, 323, 332

***Find the number of arrangements of the letters in the word 'STA STICS’ taken all together.

46 | P a g e
Solution: Given n = 10, n1 = 3 (S), n2 = 3 (T), n3 = 2 (I), n4 = 1 (A), n5 = 1 (C),
The number of arrangements of the letters is-
𝑛! 10!
= = 50400
𝑛1 ! 𝑛2 ! 𝑛3 ! 𝑛4 ! 𝑛5 ! 3!3!2!1!1!

***In an office there are 5 chairs with handle and 5 others without handle. In how many ways
these chairs can be arranged for sitting?
Solution: Given, n = 5 + 5 = 10 objects, n1 = 5, n2 = 5
The number of arrangements of these chairs is-
𝑛! 10!
= 5!5! =252
𝑛1 ! 𝑛2 !

Combination: It is a technique to group n distinct objects taken r(r ≤ n) at a time.


The number of combination of n objects taken r at time is given by-
𝑛!
𝑛𝐶𝑟 = r!(n—r)!

In this technique of arrangement the order of arrangement of objects is not considered. Thus it is
different from permutation.

*** In an industry there are 4 engineers, 2 technicians and 3 machine operators. A committee of
3 is to be formed to run the machines of the industry efficiently. In how many ways the
committee can be formed?
Solution: Given n = 4 + 2 + 3 = 9. A committee of r = 3 members is to be formed.
This can be done in-
9!
𝑛𝐶𝑟 = 9𝐶3 = 3!(9—3)! = 84ways

***In a box there are 3 red balls, 2 white balls and 3 black balls.
(i) In how many ways 3 balls can be drawn from the box?
(ii) In how many ways one ball of each color can be drawn?
Solution: Total number of balls in the box is n = 8
(i) Three balls from 8 balls can be drawn in-
8!
𝑛𝐶𝑟 = 8𝐶3 = 3!(8—3)! = 56ways.

47 | P a g e
(ii) Drawing one ball of each color means to draw 1 red, 1 white and 1 black ball from the box.
Number of ways to draw one ball of each color is-
3𝐶1 × 2𝐶1 × 3𝐶1 = 3 × 2 × 3 =18

***In a packet there are 6 books. threeOfWhich are on mathematics and 3 are on statistics. Two
books are taken at random. Find the probability that-
(i) The drawn books are on mathematics,
(ii) The drawn books are on statistics,
(iii) One of the drawn book is on mathematics and another one Is on statlst,icg.
Solution: Two books can be drawn in n = 6𝐶2 = 15ways.

(i) Let A be the event that the drawn books are on mathematics. Two mathematics books can be
drawn from 3 mathematics books in m = 3𝐶2 = 3ways.
𝑚 3 1
∴ P(A) = = =
𝑛 15 5

(ii) Let B be the event that the drawn books are of statistics. Two statistics books can be drawn
from 3 statistics books in m = 3𝐶2 = 3ways.
𝑚 3 1
∴ P(B) = = =5
𝑛 15

(iii) Let C be the event that one of the drawn book is of mathematics and another one is of
statistics. The event C can be occur in m = 3𝐶1 × 3𝐶1 = 9ways.
𝑚 9 3
∴ P(C) = = =
𝑛 15 5

***From a pack of 52 cards two cards are drawn at random. Find the probability that the cards
are-
(i) Aces.
(ii) Kings,
(iii) Spades,
(iv) One spade and one club,
(v) Of same color,
(vi) Of same number.
Solution: Two cards from 52 cards can be taken in n = 52𝐶2 = 1326ways.

(i) Let A be the event that the two cards are aces.
There are 4 aces. Two aces can be taken from 4 aces in m = 4𝐶2 = 6ways.
𝑚 6 1
∴ P(A) = = = 221
𝑛 1326

48 | P a g e
(ii) Let B be the event that the cards are kings.
There are 4 kings. Two kings can be taken in m = 4𝐶2 = 6ways.
𝑚 6 1
∴ P(B) = = =
𝑛 1326 221

(iii) Let C be the event that the cards are spade.


There are 13 spades. Two spades can be taken in m = 13𝐶2 = 78ways.
𝑚 78 1
∴ P(C) = = = 17
𝑛 1326

(iv) Let D be the event that one card is a spade and another one is club.
There are 13 spades and 13 clubs. One spade and one club can be taken in in m = 13𝐶1 × 13𝐶1 =
169ways.
𝑚 169 13
∴ P(D) = = =
𝑛 1326 102

(v) Let E be the event that the cards are of same color.
There are 26 cards of black color and 26 cards of red color. Two black color cards are drawn in 26𝐶2 =
325 ways. Similarly. two red color cards are drawn in 325 ways.
Favorable cases to E are m = 325 + 325 = 650
𝑚 650 325
∴ P(E) = = = 663
𝑛 1326

(vi) Let F be the event that the cards are of same number.
There are 4 cards of four suits bearing same number. In each suit there are 13 cards. Two cards of any
one number can be drawn in 4𝐶2 = 6ways. Since there are 13 numbers of one suit of card. Two
cards of same number can be drawn in m = 13 x 6 = 78ways.
𝑚 78 1
∴ P(F) = = =
𝑛 1326 17

***In a family 3 babies are born. The birth of a boy or a girl is equiprobable, find the probability
that-
(i) All three babies are boys.
(ii) There are exactly two boys.
(iii) There are at least two boys:
(iv) There are at best two boys.
Solution: "me birth of 3 babies can occur in n = 23 = 8ways.

(i) Let A be the event that the 3 babies are boys. Out of 3 babies 3 can be boy in m = 3𝐶3 = 1way.
𝑚 1
∴ P(A) = =
𝑛 8

49 | P a g e
(ii) Let B be the event that there are exactly 2 boys. Out of 3 babies 2 boys can take birth in m = 3𝐶2
= 3ways.
𝑚 3
∴ P(B) = =
𝑛 8

(iii) Let C be the event that there are at least 2 boys. The number of boys are either 2 or 3. Two boys
can take birth in 3𝐶2 = 3ways and 3 boys can take birth in 3𝐶3 = 1way. Therefore, m = 3 + 1 = 4
𝑚 4 1
∴ P(C) = = =2
𝑛 8

(iv) Let D be the event that there are at best 2 boys. The number of boys are either zero or 1 or 2,
̅ indicates that there are 3 boys.
but not 3. The event 𝐷

̅) = 1
P(𝐷 [(i)]
8
1 7
̅) = 1 - =
∴ P(D) = 1 - P(𝐷 8 8

***An urn contains 4 red and 3 white balls. balls are drawn one after another (a) with
replacement, (b) without replacement. Find the probability that-
(1) Both balls are white,
(2) One ball is white and another one is red.
Solution: The Probable types of point to draw two balls one after another is S: {WW, WR, RW, RR},
where R = red ball, W = white ball.
(1) Let A be the event that both balls are white. Favorable point to A: {WW}
(a) P(A) = P(WW)
3 𝐶1 3 𝐶1 9
=7 . =
𝐶1 7 𝐶1 49

(b) P(A) = P(WW)


3 𝐶1 2 𝐶1 6 1
=7 . = =7
𝐶1 6 𝐶1 42

(2) Let B be the event that one ball is white and another one is red. Favorable point to B: {WR, RW}.
(a) P(B) = P(RW) + P(WR)
4 𝐶1 3 𝐶1 3𝐶1 4𝐶1 12 12 24
=7 . +7 . = + 49 = 49
𝐶1 7 𝐶1 𝐶1 7𝐶1 49

(b) P(B) = P(RW) + P(WR)


4 𝐶1 3 𝐶1 3𝐶1 4𝐶1 12 12 24 4
=7 . +7 . = + 42 = 42 = 7
𝐶1 6 𝐶1 𝐶1 6𝐶1 42

50 | P a g e
***An urn contains 6 red and 4 black balls. Three balls are taken at random from the urn. Find
the probability that-
(a) All three are red
(b) Two balls are red
(c) One ball is red.
Solution: The urn contains 6 + 4 = 10 balls. Three balls can be taken from the urn in n = 120ways.

(a) Let A be the event that all 3 balls are red. Three red balls can be drawn in m = 6𝐶3 = 20ways.
𝑚 20 1
∴ P(A) = = =6
𝑛 120

(b) Let B be the event that two balls are red and 1 ball is black. Two red balls and 1 black ball can be
drawn in m = 6𝐶2 × 4𝐶1 = 60ways.
𝑚 60 1
∴ P(B) = = =2
𝑛 120

(c) Let C be the event that one ball is red and other two are black. One red and 2 black balls can be
drawn in m = 6𝐶1 × 4𝐶2 = 36ways.
𝑚 36 3
∴ P(C) = = = 10
𝑛 120

***The letters of the word 'MATHEMATICS' are arranged at random. Find the probability that the
vowels occupy only odd positions.
Solution: The word 'MATHEMATICS' can be arranged in-
11!
n = 2!2!2!1!1!1!1!1! = 6652800

There are 11 letters out of which 4 are vowels. These 4 vowels will occupy 1 st, 3rd, 5th, 7th, 9th and
11th places and this can be done in 6𝐶4 = 15ways.

Again these 4 vowels can be arranged among themselves in 4! = 24 ways and remaining 7
consonants can be arranged among themselves in 7! = 5040ways.
Therefore, 4 vowels can be placed only odd places in-
m = 15 × 24 × 5040 = 1814400
Hence the required probability is-
1814400
= 0.2727
6652800

51 | P a g e

You might also like