0% found this document useful (0 votes)
187 views

IME602 Notes 01

The document discusses various statistical concepts and methods for collecting, organizing, and analyzing numeric data. It provides examples of situations that involve uncertainty, such as predicting newspaper sales or machine failure rates. It then outlines the main steps in statistical analysis: data collection, data scrutiny, and data presentation through tables, diagrams, histograms, or other pictorial representations. The goal is to understand a given phenomenon and make informed decisions using appropriate statistical methodology.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
187 views

IME602 Notes 01

The document discusses various statistical concepts and methods for collecting, organizing, and analyzing numeric data. It provides examples of situations that involve uncertainty, such as predicting newspaper sales or machine failure rates. It then outlines the main steps in statistical analysis: data collection, data scrutiny, and data presentation through tables, diagrams, histograms, or other pictorial representations. The goal is to understand a given phenomenon and make informed decisions using appropriate statistical methodology.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 65

1

Page 1 of 65
CHAPTER 1
INTRODUCTION TO DATA COLLECTION, ANALYSIS AND INTERPRETATION

Think for a moment that you are a newspaper vendor in the city of Kanpur selling "Dainik Jagaran".
You get your supplies from the publisher everyday in the morning and you sell your papers to the
customers who are inside the campus of IIT Kanpur. Your procurement or purchase cost per paper is
Rs. 2.00 and you sell them at Rs. 2.50 per paper. What you cannot sells has to be disposed off by you
to the kabariwala for Rs. 0.25. The decision you wish to make relates to the number of copies you
should procure every morning so that your profit is maximized. Now this is hardly a problem for you
if you can exactly predict the number of copies you will be able to sell during the day. But in realty
there is always uncertainty with respect to the daily demand of the number of newspapers.
Let us consider another example, where you are functioning as the production manager in a certain
manufacturing unit. Now few machines which are there in the production line are liable to fail from
time to time and you as a production manager have to replace the defective part(s) of the machine(s)
to again renew your production process. If you knew before hand the failure profile of all the
machines then your job of planning the inventory of spare parts of machines would never be a
problem. But as failure rates of machines are uncertain, hence you have to make a fine balance and
decide whether you would stock up inventory of spare parts of machines (which may be quite costly)
or wait for the spare part(s) when a failure occurs thus lose valuable un-productive hours or loss of
labour hours.
Yet a third example can be constructed where for the sake of illustration consider you are the
marketing manager of a firm who is selling detergent powder. You are interested in finding out the
response of a new product your company has started selling in the market. Depending on the feed
back of the consumers you would recommend whether any packaging changes like shape, size etc are
needed. But here also you are not sure how many customers should you talk to before you send your
recommendation to your boss. Taking a fewer number of customers would not give you a true picture
of the market, but that would definitely be less costly for the company. While on the other hand
increasing the number of customers from whom you would gather data would definitely give you a
much better idea of their requirements, but at a higher cost due to marketing survey. Here also we face
uncertainty with respect to the number of customers we would like to survey.
Thus we see that in all spheres of management we face situation where there are uncertainties and the
way the study of such uncertainty can be undertaken if we are aware of the approach of how to tackle

Page 2 of 65
these different types of uncertainties. Statistics provides appropriate methodology to collect, organize
and analyze numeric data so as to understand the given phenomenon for the purpose of making some
decision.
STATISTCS: The word STATISTICS is derived from the Italian word stato, which means state and
statista refers to a person involved with the affairs of the state. Now a days, STATISTICS (in a plural
sense) is the study of qualitative and quantitative data from our surrounding, be it environment or any
system so as to draw meaningful conclusions about the environment or system. It also means (in the
singular sense) the body of methods that are meant for treatment of such data
In statistics we must express facts in numerical term and there are various methods through which we
can do this. Before we attempt to discuss the various terms we must be sure what are the main steps or
methods which are employed by statistics. The main steps which constitutes the study of statistics are:

1) Method of collection of data (primary or secondary)
2) Scrutiny of data
3) Presentation of data (non frequency data, frequency data)
i) Non-frequency data: Consider the case where the representation of the values of one or more
variables like population of India, price of petroleum etc., may be given for different periods of
time. For instance we may be interested in knowing the population change over time or the
change of production of petroleum over time. Data of this type are called time series data or
historical data. Or it may be that values of one or more variables are given for different
individuals in a group for the same period of time. But instead of considering the group as such
we may be more interested in studying the way the values of the variable(s) change from
individual to individual in that group. These types of data are called spatial-series data.

Time series representation of data

Page 3 of 65


Spatial series representation of data


ii) Frequency data
In this case we shall have the data again on one or more variables for different individuals
maybe for different periods of time or for different points. But now we are more interested in
the characteristic(s) of the group rather than the individuals in that group. In studying the IQ
level of students in a school we may be interested in such group characteristics as the
percentage of students with IQ higher than 130 or the percentage of students with average IQ
less than 90, etc.
BSE30 (Close)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
3
-
J
a
n
-
9
4
1
1
-
J
a
n
-
9
4
1
9
-
J
a
n
-
9
4
3
1
-
J
a
n
-
9
4
8
-
F
e
b
-
9
4
1
6
-
F
e
b
-
9
4
2
4
-
F
e
b
-
9
4
3
-
M
a
r
-
9
4
1
5
-
M
a
r
-
9
4
2
3
-
M
a
r
-
9
4
3
1
-
M
a
r
-
9
4
1
2
-
A
p
r
-
9
4
2
2
-
A
p
r
-
9
4
2
-
M
a
y
-
9
4
1
0
-
M
a
y
-
9
4
1
9
-
M
a
y
-
9
4
3
0
-
M
a
y
-
9
4
7
-
J
u
n
-
9
4
1
5
-
J
u
n
-
9
4
2
4
-
J
u
n
-
9
4
Time
V
a
l
u
e

i
n

R
s
.
BSE30 (Close)
Fertiliser Consumption for few Indian states for 1999-2000 (in tonnes)
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
A
n
d
h
r
a

P
r
a
d
e
s
h
K
a
r
n
a
t
a
k
a
K
e
r
a
l
a
T
a
m
i
l
N
a
d
u
G
u
ja
r
a
t
M
a
d
h
y
a

P
r
a
d
e
s
h
M
a
h
a
r
a
s
h
t
r
a
R
a
j
a
s
t
h
a
n
H
a
r
y
a
n
a
P
u
n
ja
b
U
t
t
a
r

P
r
a
d
e
s
h
B
i
h
a
r
O
r
i
s
s
a
W
e
s
t

B
e
n
g
a
l
A
s
s
a
m
States
F
e
r
t
i
l
i
s
e
r

c
o
n
s
u
m
p
t
i
o
n

(
i
n

t
o
n
n
e
s
)
Fertiliser Consumption

Page 4 of 65

a) Tabular representation: In this representation we present the data by means of tables. The
tables should have a number of parts, like title, stub, caption, body, footnote.

India at a glance
Structure of the economy (% of GDP) 1983 1993 2002 2003
Agriculture 36.6 31.0 22.7 22.2
Industry 25.8 26.3 26.6 26.6
Manufacturing 16.3 16.1 15.6 15.8
Services 37.6 42.8 50.7 51.2
Private Consumption 71.8 37.4 65.0 64.9
Central government consumption 10.6 11.4 12.5 12.8
Import of goods and services 8.1 10.0 15.6 16.0
Gross domestic savings 17.6 22.5 24.2 22.2
Interests payments 0.4 1.3 0.7 18.3
Note: 2003 refers to 2003-2004; data are preliminary. Gross domestic savings figures are taken
directly from India's central statistical organization.

b) Diagrammatic representation: This is the most commonly used for representing time series.
The line diagram (also called histogram) is a graph showing the relationship of the given
variable with time. There may be three types of line diagram, for the scales used for both the
axes of co-ordinates may be arithmetic (or natural) scales, or one of them may be arithmetic and
the other logarithmic, or both may be logarithmic. A line diagram where the vertical scale is
logarithmic but the horizontal scale is of the ordinary arithmetic type is called a ratio chart or
semi-logarithmic chart. When both the vertical as well as the horizontal axes are logarithmic
then the chart is called the doubly-logarithmic chart.

c) Line diagram representation
d) Bar diagram (histogram) representation

Bar diagram (histogram) representation

Page 5 of 65


Bar diagram (histogram) representation


Bar diagram (histogram) representation (with intervals)


World Population (projected mid 2004)
0
1000000000
2000000000
3000000000
4000000000
5000000000
6000000000
7000000000
1950 1960 1970 1980 1990 2000
Year
P
o
p
u
l
a
t
i
o
n
World Population
World Population (projected mid 2004)
0 1000000000 2000000000 3000000000 4000000000 5000000000 6000000000 7000000000
1950
1960
1970
1980
1990
2000
Y
e
a
r
Population
World Population
Number of countries
0
2
4
6
8
10
12
10 to 15 15 to 20 20 to 25 25 to 30 30 to 35 35 to 40 40 to 45 45 to 50
GDP in 1000 US$ (year 2002)
N
u
m
b
e
r
Number of countries

Page 6 of 65
Bar diagram (histogram) representation


Bar diagram (histogram) representation


e) Pictorial diagram representation
In this type we can represent the data more vividly and in many a cases it is the popular method
of representing the data. Here a suitable symbol is first chosen to represent a definite
number/quantities of units of the variable. Against each data or observation the symbol is
represented proportionally so that we can get the idea of the variable quantities against that data
point.


Height and Weight of individuals
0
20
40
60
80
100
120
140
160
180
200
Ram Shyam Rahim Praveen Saikat Govind Alan
Individual
H
e
i
g
h
t
/
W
e
i
g
h
t
Height (in cms) Weight (in kgs)
Height and Weight of individuals
0 20 40 60 80 100 120 140 160 180 200
Ram
Shyam
Rahim
Praveen
Saikat
Govind
Alan
I
n
d
i
v
i
d
u
a
l
Height/Weight
Height (in cms) Weight (in kgs)

Page 7 of 65
State in USA Pictorial representation of number of
universities
Each of the [ symbol
represents 5 universities
Alabama
[ [ [ [ [ [
Alaska
[
Arizona
[ [ [
Arkansas
[ [ [ [
Colorado
[ [ [ [ [
Connecticut
[ [ [ [ [
Delaware
[
Kansas
[ [ [ [ [ [

f) Statistical map representation: If for example we are interested to show diagrammatically
regional seismicity in Alaska of earthquakes of all magnitudes reported between 01/01/1960 to
11/09/2002. The colour code as shown indicates the depth of the event. Thus blue: 0 < h s 33
km, green: 33 < h s 75 km, red: 75 < h <= 125 km and yellow : h > 125 km. The larger circles
are earthquakes of M 7.0 and higher from 1900-11/09/2002. The colour of circle indicates the
depth of the event, as above. The star indicates the location of the 03/11/2002



Page 8 of 65

g) Divided bar diagram representation: Consider we have the time spent in hours by student
appearing for the CBSE examination for the preparation of Mathematics, Physics, Chemistry
and Biology. We collect the student's preparation pattern for a five day period and want to
represent the data thus obtained. In that case we would use the divided bar diagram as illustrated
below.

Divided bar diagram


h) Stacked column diagram representation: The method of depicting the data is almost similar
to the divided bar diagram representation, but for here we represent the percentage wise figures
for the variables for each and data point. Consider we are finding the consumption in rupees for
the four main categories of food for a family in the months of January to June. Remembering
that the total amount spent for each month is different we can depict the percentage wise
consumption in food for the four categories.

Stacked column diagram
Time spent in preparation for subjects
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
Monday Tuesday Wednesday Thursday Friday
Day
H
o
u
r
s
Mathematics Physics Chemistry Biology

Page 9 of 65


i) Pie diagram/chart representation: When the values of a variable are given for a number of
categories, as in spatial series, we may be interested in a comparison of the categories or series
or in the contribution of each category to the total. Here the proportions or percentages of
various categories, rather than the absolute values for the categories, will be the principal
subject of study



j) Textual representation: In textual representation of data we depict the information through
text. We will give here an example to make this point clear. Consider for the year 2004-2005 we
know the number of post graduate students who have registered in different engineering course
at IIT Kanpur. The figures are 83 in Aerospace, 88 in Chemical, 139 in Civil, 222 in Electrical,
176 in Mechanical, 115 in Computer Science. Given this data we may be required to utilize this
information to answer some queries.
Percentage wise consumption of food
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
January
February
March
April
May
June
M
o
n
t
h
Percentage Rice Wheat Vegetables Cereals
Median marks in JMET (2003)
Verbal Quantitative Analytical Data Interpresentation

Page 10 of 65

k) Stem leaf representation: The stem and leaf representation is a quick way of looking at the
data set. It contains the information of a histogram but avoids the loss of information in a
histogram that results from aggregating the data into intervals. The stem and leaf display is
based on the tallying principle but also uses the decimal base of our number system. In the
steam and leaf representation, the stem is the number of without its rightmost digit (the leaf).
The steam is written to the left of a vertical line separating the steam from the leaf. Suppose we
have the numbers 105, 106, 107, 109, 100, 108. Then if we use the steam and leaf
representation we would depict the numbers as 10 | 567908

l) Box plot representation: The box plot is also called the box whisker plot. A box plot is a set of
five summary measures of distribution of the data which are median, lower quartile, upper
quartile, smallest observation and largest observation.

Box plot representation



X Y

Whisker

Page 11 of 65



Here:
UQ LQ = Inter quartile range (IQR)
X = Smallest observation within 1.5(IQR) of LQ
Y = Largest observation within 1.5(IQR) of UQ

4) Analysis of data through statistical models/methods
5) Conclusions from results obtained
6) Modification of statistical models/methods depending on answers obtained

Definition
1) A quantitative variable can be described by a number for which arithmetic operations such as
averaging make sense.
2) A qualitative (or categorical) variable simply records a qualitative, e.g., good, bad, right, wrong,
etc.
As already discusses statistics deals with measurements, some being qualitative others being
quantitative. The measurements are the actual numerical values of a variable. (Qualitative variables
could be described by numbers, although such a description might be arbitrary, e.g., good = 1, bad =
0, right = 1, wrong = 0, etc.
Generally there are four scales of measurement which are:
1) Nominal scale: In this scale numbers are used simply as labels for groups or classes. If we are
dealing with a data set which consists of colours blue, red, green and yellow, then we can
designate blue = 3, red = 4, green = 5 and yellow = 6. We can state that the numbers stand for the
category to which a data point belongs. It must be remembered that nothing is sacrosanct
Median
LQ
UQ

Page 12 of 65
regarding the numbering against each category. This scale is used for qualitative data rather than
quantitative data.
2) Ordinal Scale: In this scale of measurement, data elements may be ordered according to relative
size or quality. For example a customer or a buyer can rank a particular characteristics of a car as
good, average, bad and while doing so he/she can assign some numeric value which may be as
follows, characteristic good = 10, average = 5 and bad = 0.
3) Interval Scale: For the interval scale we specify intervals in a way so as to note a particular
characteristic, which we are measuring and assign that item or data point under a particular
interval depending on the data point. Consider we are measuring the age of school going students
between classes 5 to 12 in the city of Kanpur. We may form intervals 10-12 years, 12-14
years,....., 18-20 years. Now when we have one data point, i.e., the age of a student we put that
data under any one particular interval, e.g. if the student's age is 11 years, we immediately put that
under the interval 10-12 years.
4) Ratio Scale: If two measurements are in ratio scale, then we can take ratios of measurements. The
ratio scale represents the reading for each recorded data in a way which enables us to take a ratio
of the readings in order to depict it either pictorially or in figures. Examples of ratio scale are
measurements of weight, height, area, length etc.

Population: Consists of the set of all measurements in which the investigator is interested. Example
can be all the students in the city of Kanpur. The population is also called the universe.
Sample: Is a subset of measurements selected from the population. Sampling from the population is
often done randomly, such that every possible sample of n elements will have an equal chance of
being selected. A sample selected in this way is called a simple random sample or just a random
sample. Example can be the students in Kendriya Vidalaya inside IIT Kanpur campus.

Tally number: By tally number we mean the tally we give depending upon the number of times that
particular value of the variable occurs in the total universe or sample.
Frequency (absolute frequency): By frequency (absolute frequency) we mean the number of data
points which fall within a given class or for a given value in a frequency distribution. It means that it
denotes the number of occurrence or happening of a particular outcome. We denote frequency
(absolute frequency) with f
i
.

Page 13 of 65
Cumulative frequency: The cumulative frequency corresponding to the upper boundary of any class
interval or value in a frequency distribution is the total absolute frequency of all values less (greater)
than that boundary for the class or value. We denote cumulative frequency less and greater than type
by
i
n
i n
F f
s
s
=
and
i
n
i n
F f
>
>
=
respectively.
F
f
i

Consider we have the following data related to the size in number of thirty families.

2, 6, 3, 4, 4, 5, 3, 6, 4, 4, 5, 3, 2, 3, 6, 5, 4, 4, 4, 3, 2, 4, 5, 6, 7, 4, 4, 5, 3, 3

If we convert the data given above such that we are interested in finding out the number of families
having 2 or 3 or 4 members, then we can do it very easily by tally numbers as shown below

Number of
members (Value)
Tally
number
Frequency
(f
i
)
Cumulative frequency
(less than type)

s
s
=
n i
i
n
f F
Cumulative frequency
(more than type)

>
>
=
n i
i
n
f F
2 ||| 3 3 30
3 |||| || 7 10 27
4 |||| |||| 10 20 20
5 |||| 5 25 10
6 |||| 4 29 5
7 | 1 30 1

Ogives

Page 14 of 65





Measure of central tendency: The three measures of central tendency are the mean, median and mode.
Under mean we have the arithmetic mean (AM), the geometric mean (GM) and the harmonic mean
(HM).
Pictorial representation of cumulative frequencies
0
5
10
15
20
25
30
35
2 3 4 5 6 7
Number
C
u
m
u
l
a
t
i
v
e

f
r
e
q
u
e
n
c
y
CF < then type CF < then type
Cumulative frequency chart
0
5
10
15
20
25
30
35
40
45
145.95-152.95 152.95-159.95 159.95-166.95 166.95-173.95 173.95-180.95 180.95-187.95
Class interval
C
u
m
u
l
a
t
i
v
e

f
r
e
q
u
e
n
c
y
CF < then type CF < then type

Page 15 of 65
CHAPTER 2
INTRODUCTION TO MEASURE THEORY AND PROBABILITY THEORY
If one delves into history in order to find the origins of probability, then in all probability (remember
the word is not being used in its puritan sense of mathematics, but more in the English language
connotation which means quite likely) its origin will be in the game of chance or gambling.
Historically the first serious attempt (as far as the author is aware in his limited knowledge of this
field) for some rigorous definition of probability is due to Laplace in his seminal work Theorie
analytique des probabilities (1812). He gave the so called classical definition of probability of an
even that can occur in a finite number of ways as the proportion of the number of favourable
outcomes to the total number of all possible outcomes, provided that all the outcomes have equal
likely chance of occurrence.

The moment one wants to extend the notion of classical definition of probability as proposed by
Laplace to understand the probability of events which have infinite outcomes, it becomes difficult.
The concept of equal likelihood of certain events is a key idea in the definition proposed by Laplace.
Accordingly if is a well defined measure (e.g., length, area, volume) is some region of space, then
the probability that a point selected at random lies in a sub-set of , defined as , is a ratio of
()
()
. Hence rather than taking a mathematical view point for probability one looks at it from a
geometrical point of view. This leads to problems, in the sense that one can define this measure
geometrically randomly in anyway as one may wish which leads to different answers. Joseph Bertrand
cites number of problems in geometric probability in his book Calcul des probabilities (1989) in
which the final result depends on the method of solution. This lead to a slow of probability theory and
only in 1933 it was A. N. Kolmorogov who is in work titled Foundations of the Theory of Probability
gave us the definition of probability form the axiomatic view point where it is defined as a normed
measure defined on sets, where the sets themselves represent the random events.
We have been talking about events, hence the next logical question which arises is how do we define
an event for any experiment we perform. Let us consider few simple examples which should make
this clear.
Example 2.1
Suppose you toss an unbiased coin and there are only two possibilities of the outcome which is a head
(H) or a tail (T). Here one defines the event as either a head or a tail or a combination of heads and
tails depending on the experiment one is interested to perform.

Page 16 of 65
Example 2.2
As the next example consider you are interested to study the working life in hours of an electrical
equipment whose life can be any number between to . Thus an even can be described as an
outcome where we want to know that whether the electrical equipment will function for a minimum
of 22 hours.
Example 2.3
As a next example consider the example of measuring the height of children in class II of a certain
school. Here technically the heights theoretically can any number between

to

.
Example 2.4
As a final example consider you are playing a game of roulette wheel which is a circular disk marked
with different numbers such that there are 38 equal sectors. As the roulette wheel is rotated
simultaneously a ball is rolled in the opposite direction on the edge of the roulette wheel. Once the
wheel stops the ball comes and stops at any one of the 38 clots. Depending on many different
instances games can be designed whereby the event will be defined depending on the outcome which
is desired.

Thus by definition we describe a random experiment such that (i) all the outcomes are know in
advance, (ii) any performance of the experiment results in an outcome which is not know in advance
and finally (iii) the experiment can be repeated under identical conditions.

If we consider this definition of the random experiment, then invariably three important issues come
into our mind and they are: (i) the definition of the universal set, i.e., , which is the set of all the
possible outcomes of the experiment one can think of including the universal set as well as the null
set, (ii) -field, associated with., , which would have some properties which are closed under
countable unions and complements and contains the null set, also.

A random experiment is an experiment whose outcome cannot be predicted with certainty. The set of
all possible outcomes of a random experiment is called the sample space and is denoted by the pair
( ). The elements of are called the sample points or simply cases and is denoted by , While an
event is a subset of the sample space and can also be said to be the collection of sample points and it

Page 17 of 65
is denoted by A, or B or any other alphabet, such that , . One should that the number of
elementary points, , inside the set, A, or B may be finite of infinite depending on the way on defines
the event or class of elementary points for .
O




Thus notation wise we define: { } A i A
i
e = : e or more specifically

O c e
=
A i
i
A e
Suppose there are two dice each with faces 1, 2,....., 6 and they are rolled simultaneously. This rolling
of the two dice would constitute a random experiment, such that we would have
O = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (3,1), (3,2), (3,3), (3,4),
(3,5), (3,6), (4,1), (4,2), (4,3), (4,4), (4,5), (4,6), (5,1), (5,2), (5,3), (5,4), (5,5), (5,6), (6,1), (6,2), (6,3),
(6,4), (6,5), (6,6)}. Thus a typical sample point for this sample space would be denoted by (i,j), where
denotes the number shown on the first dice and we have while denotes the number
shown on the second dice and we have , i.e., ( ) 6 , 1 s s j i . Let A denote the event (here
) such that the total score is at least 8, then one can obviously, without much difficulty say that
the set A is: A = {(2,6), (3,5), (4,4), (5,3), (6,2), (3,6), (4,5), (5,4), (6,3), (4,6), (5,5), (6,4), (5,6), (6,5),
(6,6)}. Thus the sub set A of O, consists of 15 distinct elements of the so called universal set, O.
Remember in the case if we define the experiment as getting the number face as any number less than
or equal to 4 when one rolls only one die, then in that case the universal set is defined as: O = {1, 2, 3,
4, 5, 6}, and A = {1, 2, 3, 4}, such that the elementary points or sample points are (1), (2), (3), (4), (5)
and (6), while in the initial case the elementary points or sample points are couplets defined as (i, j),
( ) 6 , 1 s s j i .
Example 2.5
Suppose a coin is tossed repeatedly till the first head appears, then the sample space is defined as O
={H, TH, TTH, TTTH,..} Let us denote by A as the event that at most 3 tosses are needed to get the
first head, then A = {H, TH, TTH}. On the other hand the event B, which denotes the event that at
least 5 tosses are needed to get the first head, then we need to define B as B = {TTTTH, TTTTTH,

e
i
e
7
e
278

A
e
2
C

Page 18 of 65
TTTTTTH, .}. One can easily decipher that the number of constituent elements in A or B or what
ever the case may be depending on the experiment and the number of elements can be finite or infinite
in number.

Classical definition
Suppose the sample space is finite and assume that all the sample points are equally likely. Then
according to the classical definition, the probability of an even A is defined as: ( )
( )
( )
( )
n
A n
n
A n
A P =
O
=
where ( ) n n = O is the total number of sample points and ( ) A n is the number of sample points
contained in event A.

Example 2.6
Find P(A), from the example above and also find the probability that the total score when one rolls
two dice consecutively, then the sum of the numbers appearing on both the die is a perfect square. For
the first case we have n(A) = 15 and n = 36, hence P(A) = 15/36. For the second example we have
P(A) = 7/36, as A = {(1,3), (2,2), (3,1), (3,6), (4,5), (5,4), (6,3)}

Example 2.7
In a club there are 10 members of whom 5 are Asians and the rest are Americans. A committee of 3
members has to be formed and these 3 members are to be chosen randomly. Find the probability that
there will be at least one Asian and at least one American in the committee. Here the total number of
number of cases is
10
C
3
. Note that there is a at least one Asian iff the committee consists of either 2
Asians and 1 American or 1 Asian and 2 Americans. Hence the number of cases favourable as stated
in the problem equals
5
C
2
X
5
C
1
+
5
C
1
X
5
C
2
, thus P(A) = {
5
C
2
X
5
C
1
+
5
C
1
X
5
C
2
}/
10
C
3
} = 5/6.

Example 2.8
Four letters marked 1, 2, 3 and 4 are placed at random in four envelopes also numbered 1, 2, 3, and 4,
such that each envelope receives exactly one letter. Find the probability that exactly two letters go to

Page 19 of 65
the correct envelope. Here the total number of cases is 4! = 24. The favourable cases are enumerated
in the table given below. In the body of the table we show the letter number
Cases Envelopes
1 2 3 4
1 1 2 4 3
2 1 4 3 2
3 1 3 2 4
4 4 2 3 1
5 3 2 1 4
6 2 1 3 4

Hence P(A) = 6/24 = 0.25.
In the set up of the above problem if one is required to find the probability that all the letters go the
wrong envelope then the corresponding probability is 3/8.
Remark
One would immediately understand, obviously after a lot of thinking, in case one wants to do so or
take this as the truth statement, that there are several deficiencies in the classical definition of
probability, some of which are as follows:
1. The above example and the definition is only applicable for the case when the sample space is
finite, which always may not be the case.
2. Even with finite sample space, not all sample points, e, are equally likely, and this induces an
element of cyclicality in the definition.
In order to overcome these and other problems we have to take the recourse of the axiomatic
definition of probability which we will define below for the benefit of the readers.

Axiomatic definition
A discrete set is one whose members can be arranged in the form of a sequence which may be finite
or infinite (countable or uncountable infinite). Now to be more clear in explaining this concept let us
define, though briefly, the basic concept of finite and infinite set/sequence.


Page 20 of 65
Finite set: You are able to initiate the process of counting and also are able to terminate the counting
process.
Infinite set (countable infinite): You are able to initiate the process of counting but unable to
terminate the counting process, say you are counting the number of integers, Z+, i.e., {1, 2, 3,}.

0 1 2 3
Infinite set (uncountable infinite): You are neither able to initiate the process of counting not able to
terminate the counting process, say you are attempting to count the number between (0, 1).

Suppose the sample space is discrete and let O = {e
1
, e
2
, e
3
,.}, and let p
1
, p
2
, p
3
,. be non-
negative real numbers (obviously between 0 and 1, both inclusive) associated with each
corresponding e
i
, i e O, such that 1
1
= =


= i
i
i
i
p p , and also associated with each e
i
, i e O, we have
P(e
i
) = p
i
, then accordingly to the axiomatic definition of probability we have

e
=
A i
i
i
p A P
e :
) ( . If one
tries to have an illustrative feel of this, then one can refer to the diagram given below:

P(e
1
)
e
1

e
i
0 1
P(e
i
)
O





e
i
e
7
e
278

A
e
2
C

Page 21 of 65
Remarks
1. The above definition is valid for every choice of the non-negative quantities p
i
, as long as they
satisfy the property of 1
1
= =


= i
i
i
i
p p . In practice for each i, the quantity p
i
, measure the
statistician's heuristic belief in favour of the occurrence of e
i
.
2. We now examine how the classical definition follows as a special case of the axiomatic definition.
Suppose O is finite, say O = {e
1
, e
2
,., e
n
} and let the experimental set exhibit sufficient
symmetry, then
n
p p p
n
1
2 1
= = = = . Also we have, according to the axiomatic definition for any
event A we have ( )
( )
n
A n
n
p A P
A i A i
i
i i
= = =

e e e e : :
1
, which is same as the classical definition of
probability.

Example 2.9
The sample space in the example where we want to stop the experiment when we first get the H when
we toss the coin can be expressed as
( )
H T T T
times i
i
, , , ,
1
_
.

= e . Let
2
1
1
= p ,
2 2
2
1
= p ,, then
1
2
1
2
1
2
= + + =

.
i
i
p ,
Let
2
1
1
= p ,
2
2
2
1
|
.
|

\
|
= p ,.., hence 1
2
1
2
1
2
= + + =

.
i
i
p . As before A denotes the event that at most
3 tosses are needed to produce the first head. The A = {e
1
, e
2
, e
3
, e
4
} and following axiomatic
definition would give us the answer as follows: P(A) = p
1
+ p
2
+ p
3
= + ()
2
+ ()
3
= 7/8.

Example 2.10
In the above example, find the probability that at least 5 tosses are needed to produce the first head
[Answer: p
5
+ p
6
+. = 1/16]

Theorem 2.1

Page 22 of 65
1. For any even A, 0 P(A) 1
In the diagram given below, any of the set A or B or C has either no elements inside it, hence n(A) =
|, else there would be some e
i
's which are elements of the set A, such that P(A) > 0. More we
understand that probability is the concept of relative frequency, better for us to understand the concept
of probability as under no circumstances would the probability ever exceed 1. Hence 0 P(A) 1
would hold true.
O




2. P(O) = 1 and P(|) = 0
From the above diagram we can easily deduce that if the number of elements inside A is null, i.e.,
{|}, hence there does not exist any sample points inside the event marked A, such that the concept of
probability would be non-existence, i.e., 0. On the other hand if all the sample space is included in the
event A, and as the sample points are mutually exclusive and exhaustive, hence the statement that
P(e
1
) + P(e
2
) +. = 1, would lead to the fact that P(O) = 1 is true.

3. If A c B, then P(A) P(B)
In case we have the diagram as given below, then there would be atleast one elementary sample point
e
k
, such that e
k
eB, but e
k
eA, i.e., B = A + e
k
. Now we already know that P(e
k
) > 0, hence the fact
that P(B) = P(A) + P(e
k
) would mean that P(A) P(B) is true.

O




B A




e
i
e
7
e
278

A
e
2
C

Page 23 of 65

4. P(AB) = P(A) + P(B) P(AB).
As shown in the diagram below, the set (AB) = |, hence there would be at least one elementary
sample point in the region marked (AB), which would immediately imply that P(AB) > 0. Now in
case if we add the area of A and B, then the actual area included would be A + B -(AB), and they
are mutually exclusive events in the sense the elementary points, i.e., sample points are not common,
which would mean that the probability can be added, i.e., P(AB) = P(A) + P(B) P(AB)

O



5. P(A
C
) = 1 P(A).
From the diagram below we can immediately make out that A A
C
= O, and the set A and A
C
are
mutually exclusive and exhaustive, such that we can find the probability in the way as given in point
# 4 above, and we only need to remember that P(AA
C
) = |. Thus P(A) + P(A
C
) - P(AA
C
) = P(O) =
1, i.e., P(A) + P(A
C
) = 1


O




Note:


A B



A
C


A

Page 24 of 65
The diagrammatic representation of all the statements under Theorem I are given alongside the
corresponding statements for the convenience of the reader for a better understanding of the concepts
discussed.

Proof 2.1
Since the p
i
involved in the axiomatic definition are non-negative (by basis and intuitive assumption)
with sum unity, then # 1, # 2 and # 3 are obvious.
Let us now take # 4. We have:
( ) ( ) ( ) ( ) B A P B P A P p p p B A P
B A i
i
B i
i
A i
i
i i i
+ = + =

e e e e e e : : :

Now consider # 5
We know that A A
C
= O, and A A
C
= |. Then:
P(O) = P(A A
C
) = P(A) + P(A
C
) P(A A
C
) = P(A) + P(A
C
), hence P(A
C
) = 1 - P(A)

Example 2.11
Extend the formulae in Theorem I (point # 4) to the case of 3 events.
Generate further for the case when n events are there. A good example can be when you have n letters
numbered 1, 2,, n are they are placed at random into envelopes numbered 1, 2,., n, such that
each envelope receives exactly one letter. Find the probability that all the letters go to the wrong
envelopes.

Solution 2.11
With three events A, B and C, let D = BC, then P(ABC) = P(AD), i.e.,
P(ABC) = P(AD) = P(A) + P(D) P(AD)
= P(A) + P(BC) P(A(BC))

Page 25 of 65
= P(A) + P(BC) P{(AB) (AC)}
= P(A) + P(B) + P(C) P(BC) P(AB) P(AC) + P(AB C)

In general by the method of induction or other wise we can show that:
( ) ( )
n
n
n
S S S S S A A A P
1
4 3 2 1 2 1
1

+ + + = ,
Where:
( ) ( ) ( )
n
A P A P A P S + + + =
2 1 1

( ) ( ) ( )
n n
A A P A A P A A P S + + + =
1 3 1 2 1 2

( ) ( )
n n n
A A A P A A A P S + + =
1 2 3 2 1 3


( )
n n
A A A P S =
2 1

Note: for n i , , 2 , 1 . = , the number of terms involved in
i
S is
i
n
C . For a better understanding of this
one can refer W Feller (Introduction to Probability Theory and its Applications Volume I)

Remember that for n i , , 2 , 1 . = , let
i
A denote the event that letters i goes into the envelope i , then
the required probability using De-Morgan's, ( )

or ( )

, is as
follows
( ) ( ) | |
C
n
C
n
C C
A A A P A A A P =
2 1 2 1

( )
n
A A A P =
2 1
1
( ) | |
n
n
S S S S S
1
4 3 2 1
1 1

+ + + =

Page 26 of 65
Note for any i we have ( )
|
.
|

\
|
=
n
A P
i
1
, and for any combination of ( ) j i, , with ( ) j i < we have
( )
( )
( )
!
! 2
1
1
n
n
n n
A A P
j i

=

= , such that
( )
! 2
1
!
! 2
2 2
=

=
n
n
C S
n
. In a similar manner we can
show through simple mathematics that
!
1
i
S
i
= , n i , , 2 , 1 . = . Using this we have the required
probability as given below which is
( )
(

+ + =

!
1
1
! 3
1
! 2
1
! 1
1
1
1
n
n

( ) ( ) 1 exp
!
1
1
! 3
1
! 2
1
= + + =
n
n
, as n .
In particular if n = 4, then the above equals: |
.
|

\
|
= +
8
3
! 4
1
! 3
1
! 2
1

Theorem 2.2: Boole's Inequality
If A
i
, for n i , , 2 , 1 . = are events in A, then the following holds: ( )

=
=
s
|
.
|

\
|
n
i
i i
n
i
A P A P
1
1


Proof 2.2
First let us illustrate this with the simple Vein diagram for the ease of understanding, but the proof
will be given mathematically


O




A
i

A
1
A
2
A
4

A
n
A
n-1



Page 27 of 65
Consider there are only two events, i = 1, 2, namely A
1
and A
2
. In that case we can write
( )
1 2 1 2 1
A A A A A + = , where events
1
A and ( )
1 2
A A both belong to A and they are disjoint.
Thus we would have: ( ) ( ) ( ) ( ) | |
1 2 1 1 2 1 2 1
A A A P A A P A P A A P + = .
Hence ( ) ( ) ( ) ( ) ( )
2 1 1 2 1 2 1
A P A P A A P A P A A P + s + = , since ( )
2 1 2
A A A c for the fact that the
probability function P is monotone (increasing). Or better still for a clear understanding we can state
that ( ) | | 1 0
1 2 1
s s A A A P always holds, hence ( ) ( ) ( )
2 1 2 1
A P A P A A P + s
Now considering the number of events is n, i.e., the events can now be denoted as A
1
, A
2
,., A
n
such
that one can write
( )
n n n
A A A A A A A =
1 2 1 2 1
. . , i.e.,
( ) ( ) { }
1 2 1 1 2 1
+ =
n n n
A A A A A A A . . . Here again we divide
the whole set of events in to separate class of events one being ( )
1 2 1

n
A A A . and the other
being ( ) { }
1 2 1

n n
A A A A . .
Thus:
( ) ( ) ( ) { }
1 2 1 1 2 1 2 1
+ =
n n n n
A A A A P A A A P A A A P . . .
( ) ( ) { } | |
1 2 1 1 2 1

n n n
A A A A A A A P . .
( ) ( ) { }
1 2 1 1 2 1
+ s
n n n
A A A A P A A A P . .
( )
1 2 1

n
A A A . and ( ) { }
1 2 1

n n
A A A A . are disjoint (same logic can be used here
as for i = 1, 2)
Hence using similar logic we obtain:
( ) ( ) ( )
n n n
A P A A A P A A A P + s
1 2 1 2 1
. .
Similarly we would have
( ) ( ) ( ) ( )
n n n n
A P A P A A A P A A A P + + s
1 2 2 1 2 1
. .
( ) ( ) ( ) ( ) ( )
n n n n n
A P A P A P A A A P A A A P + + + s
1 2 3 2 1 2 1
. .
( ) ( ) ( ) ( ) ( ) ( )
n n n n n n
A P A P A P A P A A A P A A A P + + + + s
1 2 3 4 2 1 2 1
. .

Page 28 of 65
.
.
( ) ( ) ( ) ( )
n n
A P A P A P A A A P + + + s . .
2 1 2 1
, i.e., ( )

=
=
s
|
.
|

\
|
n
i
i i
n
i
A P A P
1
1


Example 2.12 [Use of Boole's inequality]
Consider there are 3 Indian and 5 Americans and you need to form a committee of 4 from then taking
any number of Indians or Americans. Furthermore it is also known to you that 2 out of the 5
Americans are of PIO origin and they have dual citizenships. So, maximum how many such
committees can you form which has at most two members from each of the above mentioned
countries.
Solution 2.12
A
1,1

O

A
1,2

A
1,3


Remember that A
1,1
has no one who is both Indian as well as American, i.e., none is a PIO, while A
1,2

has one common member who is a PIO, and in the last scenario A
1,3
has two members common who
are the two PIOs. With this background if you need to find the probability of having at least one PIO,
then one can easily conclude the corresponding probability would be less than that formed using A
1,1

and A
2
, as the intersection of A
1,1
and A
2
is |.

Theorem 2.3: Bonferroni's Inequality
If A
i
, for k i , , 2 , 1 . = are events in A, then:



A
2





Page 29 of 65
(a) ( )

=
=
>
|
.
|

\
|
k
i
C
i i
k
i
A P A P
1
1
1 and
(b) ( ) ( ) 1
1
1
>
|
.
|

\
|

=
=
k A P A P
k
i
i i
k
i


Proof 2.3
Part (a): Before deriving the deduction one should recollect that
{ }
C
C
n
C C
n
A A A A A A = . .
2 1 1 1
. This is true as De-Morgan's, ( )

or
( )

, holds. Also remember that ( ) ( ) A P A P


C
=1 and Boole's inequality holds,
thus
( ) { } | |
C
C
n
C C
n
A A A P A A A P = . .
2 1 1 1

{ }
C
n
C C
A A A P = .
2 1
1
( ) ( ) ( ) | |
C
n
C C
A P A P A P + + + > .
2 1
1
Hence: ( )

=
=
>
|
.
|

\
|
k
i
C
i i
k
i
A P A P
1
1
1
Part (b): Let us consider the result proved in part (a), i.e., ( )

=
=
>
|
.
|

\
|
k
i
C
i i
k
i
A P A P
1
1
1 . We again utilize
the simple fact that ( ) ( )
C
i
C
i
A P A P =1 for i = 1,2,, n and use that in part (a), which leads to the
following:
( ) | |

=
=
>
|
.
|

\
|
k
i
i i
k
i
A P A P
1
1
1 1
( )

=
+ >
k
i
i
A P k
1
1
( ) ( ) 1
1
>

=
k A P
k
i
i


Page 30 of 65
Hence: ( ) ( ) 1
1
1
>
|
.
|

\
|

=
=
k A P A P
k
i
i i
k
i


Theorem 2.4: Poincare's Theorem
If A
i
, for m i , , 2 , 1 . = are events in A, then
( ) ( ) ( ) ( ) ( )
m
m
m
k j i k j i
k j i
m
j i j i
j i
m
i
i i
m
i
A A A P A A A P A A P A P A P + + =
|
.
|

\
|
< < = < = =
=

. .
2 1
1
; 1 , , ; 1 , 1
1
1

Proof 2.4
We use one of the most widely used methods of prove one utilizes in many occasions, which is the
method of induction.
First consider the existence of only two events, A
1
and A
2
. Then we can write the following which is
very intuitive and simple to understand, ( ) ( ) { }
2 1 2 1 2 1
A A A A A A + = , which can be represented
diagrammatically as given below:

O



Thus in the probability sense we can write this as:
( ) ( ) ( ) { } ( ) { } | |
2 1 2 1 2 1 2 1 2 1
A A A A P A A A P A P A A P + =
( ) ( ) { }
2 1 2 1
A A A P A P + = , as third term vanished as they are disjoint
( ) ( ) ( )
2 1 2 1
A A P A P A P + = , since by proof we know P(B-A) = P(B) P(A)
Thus for n = 2 we prove that Poincare's Theorem holds. Now assume that it holds for some n.


A
1
A
2


Page 31 of 65
Such that we have the following as true
( ) ( ) ( ) ( ) ( )
n
n
n
k j i k j i
k j i
n
j i j i
j i
n
i
i i
n
i
A A A P A A A P A A P A P A P + + =
|
.
|

\
|
< < = < = =
=

. .
2 1
1
; 1 , , ; 1 , 1
1
1
Now consider
)
`


|
.
|

\
|
+ =
+
=
+
=
+
=
1
1
1
1
1
1
n i
n
i
n i
n
i
i
n
i
A A A A A , i.e., ( )
)
`

+ =
+
=
+
=
+
=
1
1
1
1
1
1
n i
n
i
n i
n
i
i
n
i
A A A A A ,
and since we have two events, this would immediately mean that:
( ) ( )
)
`

+
|
.
|

\
|
=
|
.
|

\
|
+
+
=
+
=
+
=
1
1
1
1
1
1
1
n i
n
i
n i
n
i
i
n
i
A A P A P A P A P
Furthermore we can apply the result which is we have taken as true for n, and apply it separately to
i
n
i
A
1 =
and ( )
1
1
1
+
+
=

n i
n
i
A A and noting that ( ) ( ) ( )
n n
A A A A A A
1 3 2 2 1
. can be expressed
as
n n
A A A A
1 2 1
. , we can re-arrange and have the following
( ) ( ) ( ) ( ) ( )
m
m
m
k j i k j i
k j i
m
j i j i
j i
m
i
i i
m
i
A A A P A A A P A A P A P A P + + =
|
.
|

\
|
< < = < = =
=

. .
2 1
1
; 1 , , ; 1 , 1
1
1

Example 2.13
Show that ( ) ( )

=
s
n
i
i n
A P A A A P
1
2 1
. .
The results holds trivially for n = 1, since then the two sides are equal.
Now for n = 2
LHS = ( ) ( ) ( ) ( ) ( ) ( ) RHS A P A P A A P A P A P A A P = + s + =
2 1 2 1 2 1 2 1
, as ( ) 0
2 1
> A A P
Suppose it holds true for n = k, then for n = (k+1) we have
LHS = ( ) ( )
1 1 2 1 + +
=
n n
A B P A A A P . , where ( )
n
A A A B = .
2 1

( ) ( )
1 +
+ s
n
A P B P , since the results hold true for n = 2
( ) ( ) ( )
1 1 +
+ + s
n n n
A P A P A P
( ) ( ) RHS A P A P
n n
= + + s
+ 1 1
. , since the result holds true for n = k

Page 32 of 65
Hence the result follows from induction, that
( ) ( )
n
n
n
S S S S S A A A P
1
4 3 2 1 2 1
1

+ + + = , such that,
( )
1 2 1
S A A A P
n
s , ( )
2 1 2 1
S S A A A P
n
> , ( )
3 2 1 2 1
S S S A A A P
n
+ s
For over all probability of a system containing a very large number of events, the above can be used
to get an approximate value.

Mutually exclusive and exhaustive events
1. Consider events, A
1
, A
2
,, A
n
. They are called mutually exclusive if no two of the events can
occur together, i.e., ( ) 0 =
j i
A A P , for every and j i = and j i < .
2. Consider events, A
1
, A
2
,, A
n
. They are mutually exhaustive it at least one of them must
occur, i.e., ( ) 1
2 1
=
n
A A A P . .

Example 2.14
Suppose a fair die with faces marked 1, 2,, 6 is rolled. Then we know that ( ) 6 , 5 , 4 , 3 , 2 , 1 = O . Define
the events { } 2 , 1
1
= A , { } 6 , 5 , 4 , 3
2
= A and { } 5 , 3
3
= A . Then it can be easily seen and verified that
1. The events A
2
and A
3
are neither mutually exclusive and exhaustive
2. The events A
1
and A
3
are mutually exclusive, but not exhaustive
3. The events A
1
, A
2
and A
3
are not mutually exclusive but they are exhaustive
4. The events A
1
and A
2
are mutually exclusive as well as exhaustive

Theorem 2.5
Let the events B
1
, B
2
,, B
n
be mutually exclusive and exhaustive events, then: (i) ( ) 1
1
=

=
n
i
i
B P , and
(ii) for any event A we have ( ) ( )

=
=
n
i
i
B A P A P
1


Page 33 of 65
(i) Since the events B
1
, B
2
,, B
n
be mutually exclusive and exhaustive events hence the following
it true, which is ( ) ( )

=
=
n
i
i n
B P B B B P
1
2 1
, since the following terms S
1
, S
2
,, S
n
all
vanish. Now as the events, B
1
, B
2
,, B
n
are mutually exhaustive, hence we can immediately
conclude that ( ) ( ) ( ) 1
1
2 1
= O = =

=
P B P B B B P
n
i
i n

(ii) Let ( )
n
B B B B =
2 1
, thus for any event ( ) O = A A , we would have
( ) ( )
C
B A B A A = . This would immediately imply that:
( ) ( ) ( ) | | ( ) ( ) ( )
C C C
B B A P B A P B A P B A B A P A P + = = .
Now ( )
C
B B A c , such that
( ) ( )
C
B P B A P s , i.e.,
( ) ( ) B P B A P s 1
( )
n
B B B P s .
2 1
1
0 1 1 = = , since B
1
, B
2
,, B
n
are exhaustive. We also have ( ) 0 =
C
B B A P , hence
( ) ( ) B A P A P =
( ) | |
n
B B B A P = .
2 1

( ) ( ) ( ) | |
n
B A B A B A P = .
2 1

| |
n
C C C P = .
2 1
, where
i i
B A C = , for n i , , 2 , 1 . =
Remember since the events B
1
, B
2
,, B
n
mutually exclusive, so are the events C
1
, C
2
,, C
n
.
Hence:
| | ( ) ( )

= =
= =
n
i
i
n
i
i n
B A P C P C C C P
1 1
2 1
. , hence
( ) ( )

=
=
n
i
i
B A P A P
1


Page 34 of 65
Note:
1. We explain how the mutual exclusiveness of B
i
's explain that of C
i
's
Assume that the B
i
's are mutually exclusive, and remember that even that may not be true we can
make our experiment exactly as that which is desired by us. Then for any i <j, we have the
following, i.e.,
| | ( ) ( ) | | ( ) ( ) 0 = s = =
i j i j j i j i
B B P B B A P B A B A P C C P
| | 0 =
j i
C C P . Now as
i j i j
B B B B A c
( ) ( )
i j i j
B B P B B A P s
2. In the above proves and else where we have used the fact that A
1
, A
2
,, A
n
are mutually
exclusive, i.e., ( ) ( )

=
=
n
i
i n
A P A A A P
1
2 1
. , which follows from the fact that:
( ) ( )
n
n
n
S S S A A A P
1
2 1 2 1
1

+ + = . . , and as 0
2
= = =
n
S S . , hence
( ) ( )

=
= =
n
i
i n
A P S A A A P
1
1 2 1
.

Conditional Probability
Let us think of the experiment or example as you may say, about tossing an unbiased die or even an
unbiased coin. Now we know that if we are interested to find the probability that a head will come or
else the number 4 will appear then we can very simple say that those values are or 1/6 as the case
may be depending on the example. But would you answer be if some one asks what is the probability
that the head will appear in the 5
th
toss provided that exactly two heads two have already appeared, or
for that matter, what is the probability that the number appearing on the face of the die is even given
that the numbers appearing in the last three throws have been odd. In these types of examples when
ever we have to answer such question it becomes apparent that the answer is dependent on the case of
what ever has been the output before. The branch of statistics which covers that is known as the
branch of Bayesian analysis and the person credited with this is Thomas Bayes. Thomas Bayes was
the son of London Presbyterian minister Joshua Bayes, and perhaps born in Hertfordshire. In 1719 he
enrolled at the University of Edinburgh to study logic and theology. On his return around 1722 he

Page 35 of 65
assisted his father at the latter's non-conformist chapel in London before moving to Tunbridge Wells,
Kent around 1734. There he became minister of the Mount Sion chapel until 1752.
So with this simple introduction it is now the opportune moment for us to give a brief idea about
conditional probability and Bayes theorem. Let A and B be two events such that P(B) > 0. Here it
should be noted we mentioned nothing about P(A). Then the conditional probability of A given B, is
as follows: ( )
( )
( ) B P
B A P
B A P

= .


O
( )
( )
( ) B P
B A P
B A P

=


Consider as an example { } 6 , 5 , 4 , 3 , 2 , 1 = O , { } 2 = A , { } 6 , 4 , 2 = B , then { } 2 = B A , hence
( )
( )
( )
|
.
|

\
|
=

|
.
|

\
|
|
.
|

\
|
=

=
3
1
6
3
6
1
B P
B A P
B A P

Theorem 2.6: Bayes Theorem
Let B
1
, B
2
,, B
n
be mutually exclusive and exhaustive events, such that ( ) 0 >
j
B P ,
n i , , 2 , 1 . = . Then
(a) ( ) ( ) ( )

=
=
n
i
i i
B P B A P A P
1



A
B

Page 36 of 65
(b) Further note that if P(A) > 0 then for any j, the following: ( )
( ) ( )
( ) ( )

=
=
n
i
i i
j j
j
B P B A P
B P B A P
A B P
1
.
Proof 2.6
(a) ( ) ( )
( )
( )
( )

= =

= =
n
i
i
i
i
n
i
i
B P
B P
B A P
B A P A P
1 1

( ) ( )

=
=
n
i
i i
B P B A P
1

(b) For any j ( )
( )
( ) A P
A B P
A B P
j
j

= , now we know that


( )
( )
( )
( ) ( ) ( )
j j j
j
j
j
B P B A P B P
B P
B A P
A B P =

=
Using the result of (a) above we have: ( )
( ) ( )
( ) ( )

=
=
n
i
i i
j j
j
B P B A P
B P B A P
A B P
1



O

A

Example 2.15
Let A and B be any two events such that 0 < P(A) < 1 and ) 0< P(B) < 1. Then Proof/Disproof the
following
(i) P(A|B) + P(A
C
|B) = 1 (ii) P(A|B) + P(A|B
C
) = 1
B
1
B
4

B
5

B
n
B
3
B
9

B
2
B
6


B
n-1
B
10

B
i

B
7
B
8


Page 37 of 65
(i) Since the events A and A
C
are mutually exclusive and exhaustive hence using the concepts of the
theorem above we have
( ) ( ) ( )
C
A B P A B P B P + = , this is evident as A and A
C
are exclusive and exhaustive
( )
( )
( )
( ) B P
A B P
B P
A B P
C

= 1
( ) ( ) B A P A B P
C
+ = 1 , which proves the first statement
(ii) We now show the falsity of (ii) using a counter example. Suppose a fair dice with faces 1, 2, 3, 4,
5, 6, is rolled once. With this background define the following events: (i) A: number 2 is obtained and
(ii) B: An even number is obtained. Then using the classical definition we know that P(A|B) = 1/3 and
P(A|B
C
) = 0, and in that case P(A|B) + P(A|B
C
) = 1.
Note: Two events, A and B are called independent if P(AB) = P(A) P(A). Thus in general of r
events A
1
, A
2
,, A
r
, we say that these are mutually independent if each s independent of all and any
of the others. The definition of mutual or jointly independence of these events is given in term of the
following equations:
( ) ( ) ( )
j i j i
A P A P A A P = for r j i s < s 1
( ) ( ) ( ) ( )
k j i k j i
A P A P A P A A A P = for r k j i s < < s 1
.
.
( ) ( ) ( ) ( )
r j i r j i
A P A P A P A A A P = . . for r j i s < < s . 1
Just note the total number of such equations is:
( ) 1 2
1 0
1 1
3 2
=
|
|
.
|

\
|

|
|
.
|

\
|
+ =
|
|
.
|

\
|
+ +
|
|
.
|

\
|
+
|
|
.
|

\
|
r
r r
r
r r r
r r
.
Incidentally, for the mutual independence of r events (r > 2), it is not enough that the events be pair-
wise independent. This may be illustrated by taking the simple case of three vents A
1
, A
2
and A
3
.
Example 2.16

Page 38 of 65
Let us suppose that for an experiment the sample space consists of four points (i.e., sample points)
only,
3 2 1
, , e e e and
4
e . Let ( )
|
.
|

\
|
= =
4
1
i i
p P e , for 4 , 3 , 2 , 1 = i , as would be the case in two throws
of a perfect coin. Now consider three events defined in this sample space as A
1
, A
2
and A
3
, such that
{ }
2 1 1
,e e = A ; { }
3 1 2
,e e = A and { }
4 1 3
,e e = A . Then:
( ) ( ) ( )
2 1 2 1
4
1
A P A P A A P = = ; ( ) ( ) ( )
3 1 3 1
4
1
A P A P A A P = = and
( ) ( ) ( )
3 2 3 2
4
1
A P A P A A P = =
Hence we say the three events are pairwise independent. But for mutual independence of the events it
is necessary that ( ) ( ) ( ) ( )
3 2 1 3 2 1
A P A P A P A A A P = . However we have
( )
|
.
|

\
|
=
4
1
3 2 1
A A A P and ( ) ( ) ( )
|
.
|

\
|
=
8
1
3 2 1
A P A P A P , thus
( ) ( ) ( ) ( )
3 2 1 3 2 1
A P A P A P A A A P =
Few results
(i) Let ( ) 0 > C B P , then ( ) ( ) ( ) C B A P C B P C B A P =
(ii) If { }
i
B is a sequence of disjoint events in A, and B B
i
i
=

=1
. Then provided ( ) 0 >
i
B P for each
i we have ( ) ( ) ( )

=
=
1 i
i i
B A P B P B A P .
(iii) Theorem of total probability: If the sequence of events { }
i
B in A forms a partition of the sample
space (i.e., if the events are exhaustive as well as mutually exclusive) then provided ( ) 0 >
i
B P
for each event i, we have ( ) ( ) ( )

=
=
1 i
i i
B A P B P A P .
(iv) Let A
i
, i =1, 2,, r be events in A and also let S
k
be the sum of the probabilities of the
intersection of the r events taken k at a time. Thus we have ( )

=
=
r
i
i
A P S
1
1
, ( )

= <
=
r
j i
j i
A A P S
1
2

and so on.
Then

Page 39 of 65
(a) The probability that exactly m of the events A
1
, A
2
,, A
m
will occur is given by:
| |
( )
r
m r
m m m m
S
m
r
S
m
m
S
m
m
S P
|
|
.
|

\
|
+
|
|
.
|

\
| +
+
|
|
.
|

\
| +
=

+ +
1
2 1
2 1
. .
(b) The probability that at least m of the events A
1
, A
2
,, A
m
will occur is given by:
( )
r
m r
m m m m
S
m
r
S
m
m
S
m
m
S P
|
|
.
|

\
|

+
|
|
.
|

\
|

+
+
|
|
.
|

\
|

=

+ +
1
1
1
1
1
1
2 1
. .
(c) If there are m events marked A
1
, A
2
,, A
m
in A, then
( ) ( ) ( ) ( ) ( )
1 2 1 2 1 3 1 2 1 2 1
=
m m m
A A A A P A A A P A A P A P A A A P . . . ,
provided ( ) 0
1 2 1
>
m
A A A P . , which would imply, ( ) ( ), ,
2 1 1
A A P A P ,
( )
2 2 1

m
A A A P . are also positive.
(d) Let C be an event in A. The under the condition of Bayes Theorem, together with the
condition ( ) 0 > B A P
i
for each i we would have ( )
( ) ( ) ( )
( ) ( )

=

=
1
1
i
i i
i
i i i
A B P A P
B A C P A B P A P
B C P .
Now if we say two events A and B are independent, then we have ( )
( )
( ) B P
B A P
B A P

= , which
implies that ( )
( ) ( )
( )
( ) A P
B P
B P A P
B A P =

= . Hence the conditional probability of A given B is


equal to the conditional probability of A.

Example 2.17
If A and B are independent events then show that so are: (i) A
C
and B, (ii) A and B
C
, (iii) A
C
and B
C
.
Since B and B
C
are mutually exclusive and exhaustive, hence we have:
( ) ( ) ( )
C
B A P B A P A P + = . Now as A and B are independent, hence
( ) ( ) ( ) ( )
C
B A P B P A P A P + =
( ) ( ) ( ) { } ( ) ( )
C C
B P A P B P A P B A P = = 1 , hence A and B
C
are independent.

Page 40 of 65
Since A and A
C
are mutually exclusive and exhaustive events, hence we have
( ) ( ) ( )
C
A B P A B P B P + = . Now as A and B are independent, hence
( ) ( ) ( ) ( )
C
A B P A P B P B P + =
( ) ( ) ( ) { } ( ) ( )
C C
A P B P A P B P A B P = = 1 , hence B and A
C
are independent.
Finally, we can write ( ) ( ) | |
C C C
B A P B A P = , i.e.,
( ) ( ) ( ) ( ) ( ) B A P B P A P B A P B A P
C C
+ = = 1 1
( ) { } ( ) ( ) ( ) B P A P B P A P + = 1 , as A and B are independent
( ) { } ( ) ( ) { } ( ) { } ( ) { } B P A P A P B P A P = = 1 1 1 1
( ) ( )
C C
B P A P = . Hence A
C
and B
C
are independent

Example 2.18
Can two events be simultaneously mutually exclusive and independent, justify your answer
Let us assume the statement to be true in the sense let A and B be two events which are
simultaneously mutually exclusive and independent, then
Solution 2.18
One would immediately notice that both (i) ( ) 0 = B A P and (ii) ( ) ( ) ( ) B P A P B A P = would
hold as per the statement of the above example. This would imply that ( ) ( ) 0 = B P A P and that would
mean that at least one of the events, A or B or both is/are an impossible set, i.e., the element inside
that set is | , i.e., null. Hence except in trivial situations where at least one of A and B is an
impossible event, they cannot simultaneously be mutually exclusive and independent. This is
expected since A and B are mutually exclusive, and hence the very occurrence of one of them implies
the non-occurrence of the other. For the benefit of the reader it should be mentioned here that this
concept of exclusiveness and independence can be extended to the case of more than two events, say
for example A
1
, A
2
,., A
n
, and as expected both ( ) 0
2 1
=
n
A A A P . and (ii)

Page 41 of 65
( ) ( ) ( ) ( )
n n
A P A P A P A A A P . .
2 1 2 1
= would hold simultaneously for the problem statement
to hold true.

Example 2.19
Give an example of an event which is independent of itself?

Solution 2.19
Suppose A be an event, then we can write ( ) ( ) ( ) A P A P A A P = . This would imply that:
( ) ( ) ( ) { }
2
A P A P A A P = = , i.e., ( ) 0 = A P or ( ) 1 = A P , which is, A is either the impossible event,
( ) 0 = A P , or the certain/sure event, ( ) 1 = A P .

Example 2.20
Give an example to illustrate that pair-wise independent events may not be mutually independent,
which is that even if we have ( ) ( ) ( )
j i j i
A P A P A A P = , for all n j i s = s 1 , yet we may not have
( ) ( ) ( ) ( )
k j i k j i
A P A P A P A A A P = , for all n k j i s = = s 1 or more such combination of i, j, k, l
and so and so forth. Let us give this prove with a simple example. Let a box contain four balls marked
1, 2, 3 and 4. One ball is drawn at random from the box. Here as per the setting of the example
{ } 4 , 3 , 2 , 1 = O . Now consider the events { } 4 , 1
1
= A , { } 4 , 2
2
= A and { } 4 , 3
3
= A . Then we have,
{ } 4
3 2 1 3 2 3 1 2 1
= = = = A A A A A A A A A . Now using the classical definition we have:
( ) ( ) ( )
2 1 2 1
2
1
2
1
4
1
A P A P A A P = = = , and on the similar lines we can write
( ) ( ) ( )
3 1 3 1
A P A P A A P = and ( ) ( ) ( )
3 2 3 2
A P A P A A P = . Then the events are pair-wise
independent. However let us see the independence property for the case when we take more than two
events at a time, such that we have ( )
4
1
3 2 1
= A A A P and ( ) ( ) ( )
2
1
2
1
2
1
3 2 1
= A P A P A P .
This shows that ( ) ( ) ( ) ( )
3 2 1 3 2 1
A P A P A P A A A P = . So for examples and proves (n >3)
events are described as independent, then ought to interpret the as mutually independent.

Page 42 of 65
Consequently we have mutual independence which implies pair-wise independence, but the converse
may not be necessarily be true as already illustrated with the simple example.

Example 2.21
In an election a politician is contesting for three constituencies, and her chance of being elected from
these three constituencies are 0.12, 0.20 and 0.21 respectively. Then find the chances that the
politician will be elected from (a) at least one constituency, (b) exactly two constituencies. Also state
the assumptions you would like to make in solving this problem.

Solution 2.21
First let us state the simple assumption which we have to consider in order to keep this problem a
simple one, and not make unnecessary complication. So, one can assume that the results from the
three constituencies are independent from one another. So for i = 1, 2, 3, let A
i
denote the event that
the politician wins from the i
th
constituency.
(a) So as per the question in this part we are required to find the following probability, i.e.,
( )
3 2 1
A A A P . Hence
( ) ( ) | |
C
A A A P A A A P
3 2 1 3 2 1
1 = , using the concept of complementary set
( )
C C C
A A A P
3 2 1
1 = , using De'Moivre theorem
( ) ( ) ( )
C C C
A P A P A P
3 2 1
1 = , since the events are jointly independent
( ) | | ( ) | | ( ) | |
3 2 1
1 1 1 1 A P A P A P =
| | | | | | = = 21 . 0 1 20 . 0 1 12 . 0 1 1
(b) Here in this part of the problem the required probability is:
( ) ( ) ( )
3 2 1 3 2 1 3 2 1
A A A P A A A P A A A P
C C C

( ) ( ) ( ) 21 . 0 20 . 0 12 . 0 1 21 . 0 20 . 0 1 12 . 0 21 . 0 1 20 . 0 12 . 0 + + =

Page 43 of 65
Example 2.22
Consider n double throws of a fair dice with faces, { } 6 , 5 , 4 , 3 , 2 , 1 . Then find the probability that each
of the configurations { } 1 , 1 , { } 2 , 2 ,, { } 6 , 6 will appear at least once.

Solution 2.22
For i = 1,2,., 6, let A
i
denote the event that the configuration { } i i, never occurs. Also let B be the
event that denotes that each of the configurations { } 1 , 1 , { } 2 , 2 ,, { } 6 , 6 appears at least once. Then it
is obvious that B
C
= A
1
A
2
A
3
A
4
A
5
A
6
. Hence we can write the following:
( ) ( ) ( )
6 5 4 3 2 1
1 1 A A A A A A P B P B P
C
= =
6 5 4 3 2 1
1 S S S S S S + + + =
where: ( ) ( ) ( ) ( ) ( ) ( )
6 5 4 3 2 1 1
A P A P A P A P A P A P S + + + + + =
( ) ( ) ( )
6 5 3 1 2 1 2
A A P A A P A A P S + + + = . and so on. Assuming independence of the double
throws and applying classical definition of probability, for every i, ( )
n
i
A P
|
.
|

\
|
=
36
35
, hence
( )
n
C A P
|
.
|

\
|
=
36
35
1
6
1
Similarly for i < j, we have ( )
n
j i
A A P
|
.
|

\
|
=
36
34
, and hence
n
C S
|
.
|

\
|
=
36
34
2
6
2
.
Extending this simple logic we have
n
i i
i
C S
|
.
|

\
|
=
36
36
6
for I = 1,2,, 6, and suing these formulae we
can find our answer.

Example 2.23
In an examination each question has four alternatives, answer of which only one is correct. If a
student knows the correct alternative, then he/she is definitely able to identify and answer it correctly.
Other-wise he/she picks up one of the four alternatives at random, and then marks it. Given that a
student has identified the correct alternative, what is the conditional probability that he/she actually
knew it, assuming 70% of the students know the correct alternative to the question under
consideration.

Page 44 of 65
Solution 2.23
To begin let us (for our convenience) define the following sets.
A = the event which denotes that the student identifies the correct alternative
B
1
= the event which denotes that the student knows the correct alternative
B
2
= the event which denotes that the student does not know the correct alternative
Hence from the information given in the problem once knows that P(B
1
) = 0.7, P(B
2
)= 0.3, P(A|B
1
) =
1 and P(A|B
2
) = . As B
1
and B
2
are mutually exclusive and exhaustive events we apply Bayes
theorem, thus: ( )
( ) ( )
( ) ( ) ( ) ( ) 25 . 0 3 . 0 1 7 . 0
1 7 . 0
2 2 1 1
1 1
1
+

=
+
=
B P B A P B P B A P
B P B A P
A B P

Example 2.24
From a box containing a white and b black balls, one ball is drawn at random and transferred to
another box which already has c white and d black balls. After this first step, in the next step another
ball is now drawn from the second box and kept aside and its colour noted. Then, first find the
probability that the ball drawn from the second box is white. In the next stage, given that this ball you
had drawn in the first step is white, what is the conditional probability that the ball transferred from
the first box to the second box is also white?

Solution 2.24
As is the case for maximum of the problems in conditional probability, we need to define few events
such that formulating the problem will lead us to conceptualize the problem and also help us to solve
the problem easily. So as before let us define the following events:
A = The ball drawn from the second box is white
B
1
= The ball transferred from the first box to the second box is white
B
2
= The ball transferred from the first box to the second box is black

Page 45 of 65
So with this set of predefined notion one can easily note that ( )
|
.
|

\
|
+
=
b a
a
B P
1
, ( )
|
.
|

\
|
+
=
b a
b
B P
2
,
and also the fact B
1
and B
2
are mutually exclusive and exhaustive events for this example. Thus from
this set of information we easily know that ( )
|
.
|

\
|
+ +
+
=
1
1
1
d c
c
B A P and ( )
|
.
|

\
|
+ +
=
1
2
d c
c
B A P . Thus
using Bayes theorem we have:
( )
( ) ( )
( ) ( ) ( ) ( )
2 2 1 1
1 1
1
B P B A P B P B A P
B P B A P
A B P
+

= , and utilizing the information also given in the
problem one can easily solve and find the answer. Remember this concept can be extended to more
than two different colours of balls. Also the problem can also be made interesting if we consider more
than two boxes.

Example 2.25
In a factory there are three machines producing 50%, 30% and 20% of the total output. It is also know
that 5% of the items produced by machine # 1 are defected, and the corresponding figures for machine
# 2 and machine # 3 are 3% and 2% respectively. Now an item is drawn at random from the
production line, then find the probability that it is defective. In the next part of the problem, given that
the item picked up for inspection is defective, find the conditional probability that is was produced by
machine # 1.

Solution 2.25
Again define the following events
A = The item picked up is defective
B
1
= The item is produced in machine # 1
B
2
= The item is produced in machine # 2
B
3
= The item is produced in machine # 3
Then the following information is given in the problem which are: P(B
1
) = 0.5, P(A|B
1
) = 0.05, P(B
2
)
= 0.3, P(A|B
2
) = 0.03, P(B
3
) = 0.2 and P(A|B
3
) = 0.02. Since by our definition and set up of the
problem, B
1
, B
2
and B
3
are mutually exclusive and exhaustive, hence we can easily use the formula

Page 46 of 65
for total probability which is: ( ) ( ) ( )

=
=
3
1 i
i i
B P B A P A P . Now applying Bayes theorem we have:
( )
( ) ( )
( ) ( )

=
3
1
1 1
1
i
i i
B P B A P
B P B A P
A B P .

Example 2.26
Consider two boxes, the first contains a white balls and b black balls, while the second box has c
white and d black balls. A fair (it can be an unfair one also) coin is tossed. If we get a head (H)/tail
(T), then a ball is drawn at random from the first/second box. First find the probability that the ball
drawn is white. Given this information find the conditional probability that it came from the first box.

Solution 2.26

Example 2.27
From a box containing 5 white balls and 5 black balls, 3 balls are drawn at random and without
replacement and placed into an empty second box. Then a ball is drawn at random from the second
box. Find the probability that the ball is white. Given that the ball is white, find the conditional
probability that all the three balls drawn from the first box were also white.

Solution 2.27
Let us define the events as follows:
A= The ball finally drawn is white
B = All the three balls drawn from the first box are white
Then from obvious symmetry we know that P(A) =

Page 47 of 65
CHAPTER 3
RANDOM VARIABLE AND PROBABILITY DISTRIBUTION

For the outcome of a random experiment we always try to express them in quantitative terms.
Generally for a random experimenter the outcome which we get can be either quantitative or
qualitative in character. This means we are generally interested in a function that takes a definitive
real value corresponding to each outcome of an experiment. Since this is associated with a random
experiment, one would naturally like to make probability statements about the function.
We are basically defining a numerical character that may vary from element of the sample space to
another and about which probability statements may be made. This numerical character is called the
random variable (r.v) and is denoted by the letter X, Y or Z. Thus a r.v is by definition a finite real
valued measurable function defined on a probability space
( ) , , A P O . Then the function X on O
may be called a random variable if it is real valued and if the inverse image under X of each Borel set
of the real line is measurable (i.e., belongs to A),. i.e., ( ) | | ( ) | | e e e e e = e

M X P M X P
1
: : , i.e.,
if we represent it diagrammatically we have
P




Example 3.1
Consider the following game being played between two players A and B where a coin is tossed. If
A random experiment is an experiment whose outcome cannot be predicted with certainty. The set of
all possible outcomes of a random experiment is called the sample space, generally denoted by the
symbol, O. The elements of O are called the sample points, and typically we use the symbol, e to
specify any sample point in the sample space, thus O =

i
i
e holds. Now the occurrence of a set of

Page 48 of 65
sample points pertaining to certain characteristics they posses will specify the even A, i.e.,
{ } . , ,
2 1
e e = A .
Example 3.2
Consider a simple example of a random experiment of rolling three unbiased dice simultaneously, and
the event formed there which is the sum of the numbers appearing on the dice add up to 10. Hence as
we can easily make it our the sample space consists of all the possible combinations of the faces of the
three dice, such that ( ) ( ) ( ) ( ) ( ) ( ) ( ) { } 6 , 6 , 6 , , 1 , 1 , 6 , , 6 , 2 , 1 , , 1 , 2 , 1 , 6 , 1 , 1 , , 2 , 1 , 1 , 1 , 1 , 1 . . . . = O . Now a
typical sample point,
k j i , ,
e ( ) 6 , , 2 , 1 , , . = k j i , can be any combination of the 216 (6 X 6 X 6)
number of ways the faces can appear. So the eve of having a sum of 10 would be the set A, denoted
by
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) { 3 , 1 , 6 , 3 , 1 , 6 , 1 , 4 , 5 , , 4 , 1 , 5 , 1 , 5 , 4 , , 5 , 1 , 4 , 1 , 6 , 3 , , 6 , 1 , 3 , 2 , 6 , 2 , 6 , 2 , 2 , 3 , 6 , 1 , , 6 , 3 , 1 . . . . . = O
( ) ( )} 1 , 3 , 6 , 2 , 2 , 6 and it consists of 32 elements each denoted by
k j i , ,
e .

Example 3.3
Now suppose you have two unbiased coins such that you toss them repeatedly till you need tp toss
them maximum of three times to get two heads in both the coins. In that case the sample space,
{ } { } { } ( ) ( ) { } ( ) ( ) { } ( ) ( ) { } | | . , , , , , , , , , , , , , , , , , , T H T T H T T T T T T T T H H T T T = O . If now you would
like to define the even A as that defined above, then
{ } { } { } ( ) ( ) { } ( ) ( ) { } ( ) ( ) { } | | . , , , , , , , , , , , , , , , , , , T H T T H T T T T T T T T H H T T T = O

Page 49 of 65
CHAPTER 4
EXPECTATION, VARIANCE, MOMENTS AND QUANTILES

It may be noted that the specification of the probability distribution of a random variable X, is general
done using the concept of probability function (whether discrete or continuous) or using the concept
of distribution function, F. But apart from these two specifications, other measures for the distribution
will be required which will be considered under the concept of moment or quantitle functions.

Expectation
If X is the random variable defined on the probability space, ( ) P A, , O and F is the distribution
function, then mathematical expectation of the random variable X, exists iff
( ) < =
} }
+

dP X x dF x , and the mathematical expectation is given by
( ) ( )
} } }
= = = =
+

X X
xdP XdP x xdF X E . The first two are with respect to the probability space,
( ) P A, , O , while the last one is with respect to concept of general probability space.
Now if X has a discrete distribution with mass points x
i
, (i = 1, 2,.., m) and the corresponding
probabilities are P[X = x
i
] = p
i
, (i = 1, 2,.., m) and if the expectation exists, then we have,
( )

=
=
m
i
i i
p x X E
1
. On the other hand is X is absolutely continuous distribution with pdf,
( )
( )
dx
x dF
x f = , and if E(X) exists, then the expectation is given by ( ) ( ) ( )
} }
+

+

= = x xdF dx x xf X E .
The expectation can be looked as a weighted average of the values of x's, where the weights
correspond to x being the probability dF(x), assigned in the infinitesimal interval dx x
2
1
and
dx x
2
1
+ . Another question which is important to understand and know is the existing of the expected
value, or the integrability of X. Consider for example the discrete case such that X takes countable set

Page 50 of 65
of values, x
i
with positive probabilities, p
i
(i = 1, 2,..), then

=1 i
i i
p x even if convergent may assume
a different value for a change I the order in which the terms,
i i
p x are taken. But this will mean that the average value of X I sdependent on the order in which the
terms occur I the series. How if the average value is to serve as a measure of a feature of the
distribution, then the order of the terms should have nothing to do with it. Hence to make the sum

=1 i
i i
p x independent of the order of the terms, we require that the series be absolutely convergent,
i.e., <

=1 i
i i
p x . In a similar way, it should be true for the continuous case, i.e., ( ) <
}
+

x dF x
i

should hold true for the average of X, for the continuous case, to exist.

Example 4.1
Consider you roll two unbiased consecutively, then, what is the average of the number which comes
out.

Solution 4.2
Let us consider X and Y as the random variables which denote the numbers which come out on the
first and he second die respectively. Then if we define the probability space as ( ) P A, , O , then the
expected value is given by ( ) ( ) | | ( ) ( ) | |

= = = +
i j
j i j i
w Y Y w X X P w Y w X 7 , . We will explain the
concept of ( ) ( ) | |
j i
w Y Y w X X P = = , later. In terms of ( )
X
P B R , ,
1 1
we have the expected value as,
( ) 7
36
1
12
36
1
2 =
|
.
|

\
|
+ +
|
.
|

\
|
= . X E .
In general consider that if X is a mapping from O to R
1
, while g is a mapping from R
1
to R
1
and let
f(x) = g[X(e)]. Now this defines a function, f, on O in terms of X and g. We define f = g(X). Now
suppose M is am subset of R
1
then we have
( ) | | { } ( ) ( ) { } M g X M X g
1
| |

e = e e e e e , ( ) ( ) ( ) M g X M gX e i
1 1 1
., .

= , i.e., X is a mapping
from O to R
1
, while g is a mapping from R
1
to R
1
.

Page 51 of 65


f: O
1
2
R
X g
O
1
1
R
1
2
R
Thus the inverse image of every set M is for the function g(X) a measurable set, implying that g(X) is
itself a r.v. on ( ) P A, , O .
Now one can define and hence find the expectation of g(X), i.e.,
( ) | | ( ) ( ) ( )
} }
= = x dF X g dP X g X g E
X


Properties
1) If X is a r.v. equals c, a finite number with probability of unity, then E(X) = c.
2) If c, is a finite real number and if E(X) exists, then E(cX) = cE(X).
3) If X
1
, X
2
,.., X
n
are r.v. such that al; are defined on ( ) P A, , O , then E(X
1
+ X
2
+.. + X
n
) exists
if E(X
i
) exists i =1, 2,., n and we have E(X
1
+ X
2
+.. + X
n
) = E(X
1
) +E(X
2
) +.. + E(X
n
)
4) If c
1
, c
2
,.., c
n
are finite real numbers and if E(X
i
) exists i =1, 2,., n, then E(c
1
X
1
+ c
2
X
2

+.. + c
n
X
n
) = E(c
1
X
1
) + E(c
2
X
2
) +.. +E(c
n
X
n
) = c
1
E(X
1
) + c
2
E(X
2
) +.. +c
n
E(X
n
)
5) If c
1
, c
2
,.., c
n
and d
1
, d
2
,.., d
n
are finite real numbers and if E(X
i
) exists i =1, 2,., n, then
E[(c
1
+ d
1
X
1
) + (c
2
+ d
2
X
2
) +.. + (c
n
+ d
n
X
n
)] = E(c
1
+ d
1
X
1
) + E(c
2
+ d
2
X
2
) +.. + E(c
n
+ d
n
X
n
)
= c
1
+ d
1
E(X
1
) + c
2
+ d
2
E(X
2
) +.. + c
n
+d
n
E(X
n
)
6) If E(X) exists, then ( ) ( ) X E X E s
7) If E(X) exists and a and b are real numbers such that a s X s b, then a s E(X) s b
8) If E(X) and E(Y) both exists and if X > Y a.e., then E(X) > E(Y)

Page 52 of 65
9) If E(X) exists and if P(A) > 0 and E(XI
A
) exists, then E(XI
A
)|P(A) also denoted by E(X|A) is
called the conditional expectation of X given A. Note here that E(XI
A
) will exist as E(X) exists,
moreover also we have
( ) ( )
otherwise
A if X XI
A
0 =
e = e e e

10) Let P(A
i
) be the measurable partition of the sample space, O, such that P(A
i
) > 0 i =1, 2,., n,
and X(e
i
) = x
i
. Let E(X) exit, then ( ) ( ) ( )

=
i
i i
A X E A P X E
11) Let P(A
i
) be the measurable partition of the sample space, O, such that P(A
i
) > 0 i =1, 2,., n,
and X(e
i
) = k
i
for e
i
e A
i
. Let E(X) exit, then ( ) ( ) ( ) ( )


= =
i
i i
i
i i
k A P A X E A P X E

Moments
Now we already know that if X is a r.v., then any Borel measurable function of X, say g(X) is also a
measurable function. Here we will discuss few things about the expectation of a few typical Boreal
measurable functions, g(X). A typical example of g(X) is X
r
, r =1, 2,.. Now if the expected value of
X
r
exists then it is called the r
th
moment about zero and it is denoted by,
| | ( )
r
i
i
r
r
X E x X P x = = =

/
, or ( ) ( ) ( )
r r r
r
X E dx x f x x dF x = = =
} }


/
provide of course that
( ) <
r
X E .
We can easily deduce the following
( ) | | | | 1
0 0 /
0
= = = = = =

i
i
i
i
x X P x X P x X E for X being discrete r.v. or
( ) ( ) ( ) ( ) 1
0 0 0 /
0
= = = = =
} } }


dx x f dx x f x x dF x X E for X being continuous r.v.

( ) | | | | ( ) X E x X xP x X P x X E
i
i
i
i
= = = = = =


1 1 /
1
for X being discrete r.v. or

Page 53 of 65
( ) ( ) ( ) ( ) ( ) X E dx x xf dx x f x x dF x X E = = = = =
} } }


1 1 1 /
1
for X being continuous r.v.

Theorem
If
/
s
exists, then
/
r
will necessarily exists for r < s

We may also denote the r
th
moment about any fixed point, say a, such that the function g(X) now is
(X a)
r
. Then we denote the r
th
moment of g(X) as ( ) a
r
/
, i.e., ( ) ( ) | |

= =
i
i
r
r
x X P a x a
/
, or
( ) ( ) ( ) ( ) ( )
} }


= = dx x f a x x dF a x a
r r
r
/
.
If ( ) = = = X E a
/
1
, then ( ) | |
r
X E is termed as the r
th
central moment of X and is denoted by
( )
r r
= , thus ( ) ( ) | |

= = =
i
i
r
r r
x X P x , or
( ) ( ) ( ) ( ) ( )
} }


= = = dx x f x x dF x
r r
r r
, provided of course the summation or the
integral exists.
We can easily deduce the following
( ) ( ) | | ( ) | | | | 1
0 0 /
0
= = = = = =

i
i
i
i
x X P x X P a x a X E a for X being discrete r.v. or
( ) ( ) | | ( ) ( ) ( ) ( ) ( ) 1
0 0 0 /
0
= = = = =
} } }


dx x f dx x f a x x dF a x a X E a
for X being continuous r.v.
( ) ( ) | | ( ) | | | | 1
0 0
0
/
0
= = = = = = =

i
i
i
i
x X P x X P x X E
for X being discrete r.v. or

Page 54 of 65
( ) ( ) | | ( ) ( ) ( ) ( ) ( ) 1
0 0 0
0
/
0
= = = = = =
} } }


dx x f dx x f x x dF x X E
for X being continuous r.v.

Theorem
If ( ) a
s
/
exists, then ( ) a
r
/
will necessarily exists for r < s
If ( )
s s
=
/
exists, then ( )
r r
=
/
will necessarily exists for r < s
If
/
r
exists, then ( ) a
r
/
will also exit and we have
( ) ( ) ( ) ( ) ( ) ( ) ( )
/
0
/
1
1
1
1 /
2
2
2
/
1 1
/ /
* 1 * 1 * *
r
r
r r r
r
r r
r
r
r
r
r r
a C a C a C a C a + + + + =


.
Now remember that:
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
0 1 1
1
1 2 2
2
1
1
0
0
* 1 * 1 * * * x a C x a C x a C x a C x a C a x
r
r
r r r
r
r r r r r r r r r
+ + + + =


.
i.e.,
( ) | | ( ) | | ( ) | |
1
1
0
0
* *

=
r r r r r
x a C E x a C E a x E
( ) | | ( ) ( ) | | ( ) ( ) | |
0 1 1
1
1 2 2
2
* 1 * 1 * x a C E x a C E x a C E
r
r
r r r
r
r r r r
+ + + +


. , i.e.,
( ) ( ) ( ) ( ) ( ) ( ) ( )
/
0
/
1
1
1
1 /
2
2
2
/
1 1
/ /
* 1 * 1 * *
r
r
r r r
r
r r
r
r
r
r
r r
a C a C a C a C a + + + + =


.
If
/
r
exists, then
r
will also exist and we have
( ) ( ) ( ) ( ) ( ) ( )
/
0
/
1
/
1
1
/
1 1
1 /
2
2
/
1 2
/
1
/
1 1
/
* 1 * 1 * *
r
r
r r
r
r
r r
r
r
r
r
r r
C C C C + + + + =


.
If ( ) a
r
/
exists, then
/
r
will also exist and we have
( ) ( ) | | ( ) | | ( ) | | ( ) | | a a C a a C a a C a a C a
r
r
r r
r
r
r
r
r
r
r r
/
0
/
1
1
1
/
2
2
2
/
1 1
/ /
* * * * + + + + + =


.



Page 55 of 65


Page 56 of 65
CHAPTER 5
DISCRETE DISTRIBUTION

For any r.v. X the domain space of O is generally discrete or continuous and likewise the distribution
will be termed as discrete distribution or more generally under the heading of discrete . The following
distribution would be considered here which are
1 Binomial distribution, and if X is the r.v. which has the Binomial distribution, then one denotes it
by X~B(n,p), where n and p are the parameters. Else another way of denoting the distribution is

() (

()
.
2 Negative Binomial distribution, and if X is the r.v. which has the Negative Binomial distribution,
then one denotes it by X~NB(r,p), where r and p are the parameters. Else another way of
denoting the distribution is

() (

.
3 Uniform Discrete distribution, and if X is the r.v. which has the Uniform Discrete distribution,
then one denotes it by X~UD(a,b), where a and b are the parameters such that . Else another way
of denoting the distribution is

() (

.
4 Geometric distribution, X~G()
5 Hypergeometric Distribution
6 Poisson Distribution
7 Logarithmic Distribution

Binomial Distribution

Uniform Distribution
Let us consider the example where in an urn, chits are kept each of which is marked with only one
number between 1 to 10 and where each number is a different discrete integer. Thus each number 1, 2,
3, 9, 10 appears only once in the urn. A person draws one chit at a time and notes down the number
and returns the chit into the urn for the second drawing. If the person continues to carry out this

Page 57 of 65
experiment then the distribution which one will generate from the random distribution, considering
that to be the number which appears when one draws the chit is said to have uniform distribution with
parameter 1 and 10. Now in case if we have discrete numbers between and , such that the numbers
can be represented by * ( )+ and one continues drawing random number with
replacement, from these number between and , then the resulting distribution is called a uniform
discrete distribution ()

, where . The notation for the uniform discrete


distribution is of the form, ( ).

The above figure refers to two different uniform discrete distribution which are, (), where
, and , and the second one is (), where , ,
. For each of the pmf we draw their cumulative distribution functions also to give a clear
understanding about the distribution.
Now to check the required properties of uniform discrete distribution, we check the required
properties which are: ()

and left continuity.


First note that

=1,
Next let us find its expected value and variance:
() ()

* ( )+

* ( )+

* ( )+


Bernoulli Trials
Consider a sequence of trials satisfying the following properties:
0.0
0.2
0.4
0.6
0.8
1.0
1 2 3 4 5 6 7 8 9 101112131415
f
(
x
)
,
F
(
x
)

X
X~UD(1,15)
f(x) F(x)
0.0
0.2
0.4
0.6
0.8
1.0
2 5 8 11 14 17 20 23 26 29
f
(
x
)
,
F
(
x
)

X
X~UD(2,29)
f(x) F(x)

Page 58 of 65
a. Each trial has two possible outcomes say a sequence of success and failures.
b. The trials are independent.
c. The probability of success remains the same for all trials.
Such a sequence of trials is called as a sequence of Bernoulli trials. The concept of Bernoulli trials
gets its name from
Binomial Distribution
Let us consider a sequence of n Bernoullian trials, each with success (what we mean by success is
defined by the experimenter and depends on the experiment designed for the same) probability, p.
Then obviously one can immediately say that the corresponding probability of failure (it can be
denoted by the complementary of the success) is (1-p). Then what we are required to find out is the
probability distribution of X, where X is the number of success in these n number of trials. One





can immediately decipher that the realized values of X, denoted by x may be 0, 1, 2,, n. Thus for x
= 0, 1, 2,, n we need to consider

( ) (), such that a general function form can be given


for the distribution.
One should note that to find the probability of having success in the first x trials and failures in the
remaining (n-x) number of trials can be deduced if we pay attention that a particular instance when
one can have the initial x number of as success and the rest (n-x) later as failures, i.e.,
( )( ) ( )
( )
_
.
_
.
times x n times x
p p p p p p

1 1 . 1 .
. In that case the corresponding probability is

()
. Now as per
initial assumption all the trials are independent, moreover the sequence of having n number of success
and (n-x) failures can be achieved in any manner such that the probability is given by
( )

n
x
x n x
x
n
q p C
0


Page 59 of 65
In other words the pmf of X is give by

( ) () (

()
; .
Note that f(x) is non-negative for every value of x and it is a legitimate pmf as
( )
( ) 1
0 1 1
1
0
0
0
= + = + + + =

=

n n
n
n n n n n
n
x
x n x
x
n
q p q p C q p C q p C q p C .


This random variable X considered in the example is said to follow Binomial distribution with
parameters, n and p. To make this more illustrative we give below the pmf for the following instances:
(a) f
1
(x) which is X~B(20, 0.4), (b) f
2
(x) which is X~B(40,0.2) and (c) f
3
(x) which is X~B(30, 0.5)

Negative Binomial Distribution
Consider a sequence of Bernoulli trials repeated till the r
th
success is obtained, where r is a pre-
assigned positive integer. Then with this information find the probability distribution of X, which is
the number of failures preceding the r
th
success.. Then clearly the possible values of X are 0, 1, 2,.
Thus:
For one has the following which is
( ) | | x X P x X f
X
= = = = P[there are exactly (r-1) number of successes in the first (r+x-1) trials and
the (r+x)
th
trial is a success
= P[there are exactly (r-1) number of successes in the first (r+x-1) trials
and the (r+x)
th
trial is a success] P[the (r+x)
th
trials is a success]
We are able to write the last step as the trials are independent, hence
( ) | |
( )
( )
( )
p q p C x X P x X f
x r
r
x r
X
1
1
1

+
= = = =

0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
f1(x) f2(x) f3(x)

Page 60 of 65
( )
( )
x r
r
x r
q p C
1
1

+
=
Hence the pmf of X is given by ( )
( )
( )
x r
r
x r
X
q p C x X f
1
1

+
= =
,

. In order to prove that
this is a legitimate pmf, we show that:

()

( )


Thus the random variable X considered in this example is said to follow the Negative Binomial
distribution with parameters r and p.
Now in order to find the expectation of the Binomial distribution
Geometric Distribution
In the above example, i.e., for the Negative Binomial distribution if one considers r = 1, i.e., we are
interested to find the pmf of the random variable pertaining to the case when one is interested to find
the probability of getting failures preceding the 1
st
success. Thus we have a sequence of Bernoulli
trials repeated till the 1
st
success is obtained. Hence the r.v. X considered in this example follows the
Geometric Distribution such that
( ) | | p q x X P x X f
x
X
= = = =

It is a legitimate pmf, as:

()

()



61
Page 61 of 65

DISCRETE DISTRIBUTIONS
Distribution Paramerts Median/Mode E(X) V(X) Skewness Kurtosis MGF
Binomial Distribution: X~B(n,p)

( ) (

()
,
np Npq


Uniform Distribution (Discrete): X~DU(a,b)

() (

), , where
( )

( )

{
(

}

Negative Binomial Distribution: X~NB(p,r)

( ) (



Geometric Distribution: X~G(p)

( )




Page 62 of 65
x= 0, 1, 2,
Hyper-Geometric Distribution: X~HG(N, n, p)

( )
(

)(

)
(

)

x= 0, 1, 2,, n
np
(


)

Poisson Distribution: X~P()

( )
()


x= 0, 1, 2,, n






Page 63 of 65


Distribution
Binomial Distribution: X~B(n,p)

( ) (

()

x = 0, 1, 2,., n

Uniform Distribution (Discrete): X~DU(a,b)

() (

)
x= a, a+k,.., b, where b = a+(n-1)k

Negative Binomial Distribution: X~NB(p,r)

( ) (


x= 0, 1, 2,


Page 64 of 65
Geometric Distribution: X~G(p)

( )


x= 0, 1, 2,

Hyper-Geometric Distribution: X~HG(N, n, p)

( )
(

)(

)
(

)

x= 0, 1, 2,, n

Poisson Distribution: X~P()

( )
()


x= 0, 1, 2,, n



15-08-2010_1300IST

You might also like