Note - 1 PDF
Note - 1 PDF
Probability
Basic Concepts
Random Experiment: An experiment is said to be a random experiment, if it's out-come can't
be predicted with certainty.
Example; If a coin is tossed, we can't say, whether head or tail will appear. So it is a random
experiment.
Sample Space: The set of all possible out-comes of an experiment is called the sample space. It
is denoted by 'S' and its number of elements are n(s).
Example; In throwing a dice, the number that appears at top is any one of 1,2,3,4,5,6. So here:S
={1,2,3,4,5,6} and n(s) = 6
Similarly in the case of a coin, S={Head,Tail} or {H,T} and n(s)=2.
The elements of the sample space are called sample points or event points.
Event: Every subset of a sample space is an event. It is denoted by 'E'.
Example: In throwing a dice S={1,2,3,4,5,6}, the appearance of an event number will be the
event E={2,4,6}.
Clearly E is a sub set of S.
Simple event; An event, consisting of a single sample point is called a simple event.
Example: In throwing a dice, S={1,2,3,4,5,6}, so each of {1},{2},{3},{4},{5} and {6} are simple
events.
Compound event: A subset of the sample space, which has more than on element is called a
mixed event.
Example: In throwing a dice, the event of appearing of odd numbers is a compound event,
because E={1,3,5} which has '3' elements.
Equally likely events: Events are said to be equally likely, if we have no reason to believe that
one is more likely to occur than the other.
Example: When a dice is thrown, all the six faces {1,2,3,4,5,6} are equally likely to come up.
Exhaustive events: When every possible out come of an experiment is considered.
Approaches of Probability
Classical approach
Frequency approach
Subjective approach
Axiomatic approach
P(A)>=0;
P(S)=1;
P(A+B)<=P(A)+P(B)
Sure event: Let 'S' be a sample space. If E is a subset of or equal to S then E is called a sure
event.
Example: In a throw of a dice, S={1,2,3,4,5,6}
Let E1=Event of getting a number less than '7'.
So 'E1' is a sure event.So, we can say, in a sure event n(E) = n(S)
Mutually exclusive or disjoint event: If two or more events can't occur simultaneously, that is
no two of them can occur together.
If ‘A’ and ‘B’ by any two events, then the probability of occurrence of at least one of the events
‘A’ and ‘B’ is given by:
--------------------x-------------------x---------------------x--------------------x---------------x-----
Ex.: The probability that a contractor will get a contract is ‘2/3’ and the probability that he
will get on other contract is 5/9 . If the probability of getting at least one contract is 4/5, what is
the probability that he will get both the contracts ?
Sol.: Here P(A) = 2/3, P(B) = 5/9
P(A b) = 4/5, (P(A B) = ?
By addition theorem of Probability:
P(A B) = P(A) + P(B) - P(A B)= 4/5 = 2/3 + 5/9 - P(A B)
or 4/5 = 11/9 – P(A B)
or P(A B) = 11/9 – 4/5 = (55-36) / 45
P(A B) = 19/45
Multiplication theorem:
Let A and B be two independent events. Then multiplication theorem states that,
P[AB]= P[A]. P[B].
Note: P[AB] can also be represented by P[A and B] or P[A∩B].
Example:
Let a problem in statistics be given to two students whose probability of solving it are 1/5 and
5/7.
What is the probability that both solve the problem.
Solution:
Let A= event that the first person solves the problem.
B= event that the second person solves the problem.
It is given that P[A]=1/ 5; P[B]=5/7.
Since A and B are independent, using multiplication theorem
A∩B={10}.
Therefore,
P[A B] 1/10 1
P[A/B] = = =
P[ B] 6 / 10 6
Bayes Theorem:
Statement:Let E1,E2,…,En be n complimentary events and B be any event. Then
P( B / Ei ) P( Ei )
P( Ei / B ) = n
P( B / E ) P( E )
i =1
i i
Part of the challenge in applying Bayes' theorem involves recognizing the types of problems
that warrant its use. You should consider Bayes' theorem when the following conditions exist.
▪ The sample space is partitioned into a set of mutually exclusive events { A1, A2, . . . , An }.
▪ Within the sample space, there exists an event B, for which P(B) > 0.
▪ The analytical goal is to compute a conditional probability of the form: P( A k | B ).
▪ You know at least one of the two sets of probabilities described below.
• P( Ak ∩ B ) for each Ak
• P( Ak ) and P( B | Ak ) for each Ak
Sample Problem
Bayes' theorem can be best understood through an example. This section presents an example
that demonstrates how Bayes' theorem can be applied effectively to solve statistical problems.
Example 1
Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it
has rained only 5 days each year. Unfortunately, the weatherman has predicted rain for
tomorrow. When it actually rains, the weatherman correctly forecasts rain 90% of the time.
When it doesn't rain, he incorrectly forecasts rain 10% of the time. What is the probability that
it will rain on the day of Marie's wedding?
Solution: The sample space is defined by two mutually-exclusive events - it rains or it does not
rain. Additionally, a third event occurs when the weatherman predicts rain. Notation for these
events appears below.
We want to know P( A1 | B ), the probability it will rain on the day of Marie's wedding, given a
forecast for rain by the weatherman. The answer can be determined from Bayes' theorem, as
shown below.
P( A1 ) P( B | A1 )
P( A1 | B ) =
P( A1 ) P( B | A1 ) + P( A2 ) P( B | A2 )
P( A1 | B ) = 0.111
Note the somewhat unintuitive result. Even when the weatherman predicts rain, it only rains
only about 11% of the time. Despite the weatherman's gloomy prediction, there is a good
chance that Marie will not get rained on at her wedding.
Let us say P(Fire) means how often there is fire, and P(Smoke) means how often we see smoke,
then:
P(Fire|Smoke) means how often there is fire when we can see smoke
P(Smoke|Fire) means how often we can see smoke when there is fire
Bayes Theorem Example : Three machines, M1, M2 y M3, produce 45%, 30% y 25%,
respectively, of the total parts produced in a factory. The percentages of defective production
of these machines are 3%, 4% y 5%, respectively.
a) If we choose a part randomly, calculate the probability that it is defective.
b) Suppose now that we choose a part randomly and it is defective. Calculate the probability
that it was produced by M2.
Random Variable: Probability distributions
Random Variable
A random variable is a function that maps the set of events to Rn. By convention random
variables are written as upper case Roman letters from the end of the alphabet like X.
For example, define the random variable X to be the sum of the two dice. For every element in
the sample space, we can specify the value of X.
S={
(1; 1) = 2 (1; 2) = 3 (1; 3) = 4 (1; 4) = 5 (1; 5) = 6 (1; 6) = 7
(2; 1) = 3 (2; 2) = 4 (2; 3) = 5 (2; 4) = 6 (2; 5) = 7 (2; 6) = 8
(3; 1) = 4 (3; 2) = 5 (3; 3) = 6 (3; 4) = 7 (3; 5) = 8 (3; 6) = 9
(4; 1) = 5 (4; 2) = 6 (4; 3) = 7 (4; 4) = 8 (4; 5) = 9 (4; 6) = 10
(5; 1) = 6 (5; 2) = 7 (5; 3) = 8 (5; 4) = 9 (5; 5) = 10 (5; 6) = 11
(6; 1) = 7 (6; 2) = 8 (6; 3) = 9 (6; 4) = 10 (6; 5) = 11 (6; 6) = 12
}
If we know the probabilities of a set of events, we can calculate the probabilities that a random
variable defined on those set of events takes on certain values. For example
P(X = 2) = P((1; 1)) = 1/36
P(X = 5) = P((1; 4); (2; 3); (3; 2); (4; 1g) = 1/9.
P(X = 7) = P( (1; 6); (2; 5); (3; 4); (4; 3); (5; 2); (6; 1) ) = 1/6
P(X = 12) = P((6; 6)) = 1/36.
The expression for P(X = 5) should be familiar, since we calculated it above as the probability of
the event that the two dice sum to five. Much of the theory of probability is concerned with
defining functions of random variables
and calculating the likelihood with which they take on their values.
So now we know something about what a random variable is. Now we see it a bit more
closely. Random variables can be broadly classified into two types,
.Discrete r.v ---- these take only integer values
.continuous r.v ---- these can take any value
Discrete r.v
probability mass function
p(x)=P(X=x)
Ex: Toss a coin twice S={HH,HT,TH,TT}
X=Number of heads={0,1,2}
p(0)=P{TT}=1/4
P(1)={TH,HT}=2/4
P(2)={HH}=1/4
5. Probability Distribution
Define a mapping between the set of all events and the set of real numbers. If every event has
one and only one corresponding number, we call this mapping a function. Write a function as
f(event) = number. For example
f((3; 6)) = 42
f((2; 4); (5; 6)) = 3.75
f() = 99
We can define any arbitrary function we want, but some classes of functions are more
interesting than others. One class of functions that is interesting is that for which the following
axioms of probability hold.
.P(S) = 1
.P(A) 0
.P(A B) P(A) + P(B) for all A and B
A function that obeys all three of these conditions for all A and B is called a probability
function or (equivalently) a probability distribution. By convention we write the name of a
probability distribution with a P.
Statements (5) and (6) are easily proved corollaries to the axioms of probability. They will be
true for any P that satisfies the above conditions.
Any random variable has probability associated with each of its possible values
according to the probability function as defined above. The pair
{X, p(x)}
Is called its probability distribution.
p (t )
t x
F ( x) = P( X x) = x
f (t )dt
−
pmf
Discrete
cdf
R.V
pdf
Continuous
cdf
6. Expectation Value
Once we know the probability distribution of a random variable we can use it to predict
the average outcome of functions of that variable. This is done using expectation values. The
expectation value of a random variable X is defined to be
=∫ 𝑥𝑓(𝑥)𝑑𝑥 if X is continuous
The X defined in the previous section has the following mean value
E[X] = 2P(X = 2) + 3P(X = 3) + 4P(X = 4) _ _ _ + 12P(X = 12)
=7
You can think of expectation values as taking a weighted average of the values of X
where more likely values get a higher weight than less likely values.
7. Variance
Once we know the probability distribution of a random variable we can use it to predict
the variance of that variable. This is done using expectation values, as
V(X) = E[X2]-{E[X]}2
where
E[X] =∑𝑥 𝑥𝑝(𝑥) if X is discrete
=∫ 𝑥𝑓(𝑥)𝑑𝑥 if X is continuous
=∫ 𝑥 2 𝑓(𝑥)𝑑𝑥 if X is continuous
1. BERNOULLI:
Bernoulli trials are trials with 2 outcomes, success and failure, with
A coin is tossed
A die is tossed
We write an examination
Probabilities p and q=1-p respectively. Its probability mass function is given by,
p x =1
p ( x) =
q x=0
E(X) = p
VAR(X) = p(1-p)=pq .
2. BINOMIAL:
The random variable X denoting the number of successes in a fixed Number of
independent Bernoulli trials is called a binomial random variable and its distribution is Binomial
distribution as defined below
p( x) = nCr p r q n − r
E(X)=np
VAR(X)=np(1-p)=npq.
Example: A bag contains 50 balls of which 35 are of red colour and15 are black. 5times a ball is
randomly selected , colour is noted and replaced. Find the probability that 2 times black balls
are selected.
Solution:
Here every time we draw a ball it is a Bernoulli trial, as we have only 2 possibilities.
So, n=5; p=15/50; q=1-p=35/50; x=2.
P(X=2)={nCxpxqn-x
= 5C2 15/502 35/505-2
GEOMETRIC:
The random variable X denoting the number of Bernoulli trials required to achieve the first
success is called a geometric random variable and its distribution is geometric distribution.
x−1
P(X=x) = {pq x = 1, 2, 3 … … .
0
Example: A bag contains 50 balls of which 35 are of red colour and15 are black. A ball is
randomly selected, if it is red it is replaced and again we select and continue till we get a black
for the first time. Find the probability that we need to select 7 times before black balls is
obtained.
Solution:
Here every time we draw a ball it is a Bernoulli trial, as we have only 2 possibilities.
So, x=7; p=15/50; q=1-p=35/50;
15 357−1
P[X=7]=50 50
4. POISSON:
The random variable X whose pmf is,
−λ x
P(X=x) = {(e λ ) / x! x = 1, 2, 3 … … .
0
E(X) = VAR(X) = λ.
Example: A bag contains 50 balls of which 35 are of red colour and15 are black. 20 times a ball
is randomly selected , colour is noted and replaced. Find the probability that 2 times black
balls are selected.
Solution:
Here every time we draw a ball it is a Bernoulli trial, as we have only 2 possibilities.
So, n=20; p=15/50; λ=np=6; x=2.
(e−λ λx ) (e−6 62 )
P(X=x) = { =
x! 2!
A bag contains 50 balls, 20 blue and 30red. Four balls are taken one after another with
replacement. Probability of getting 2 from each category?.
Binomial
A bag contains 500 balls, 200 blue and 300red. Four balls are taken one after another
with replacement. Probability of getting 2 from each category?
Poisson
A bag contains 50 balls, 20 blue and 30red. Balls are taken one after another with
replacement until we get a blue one. Probability of getting blue in fourth selection?
Geometric
Ex:
Cards Chosen from a pack.
Number of defective bulbs.
Number of rainy days.
Number of breads from a bakery with two types.
Number of students who understood Binomial.
1
a ≤ x ≤ b
f(x) = {b−a
0 𝑒𝑙𝑠𝑒
Its cdf is,
0 𝑥<𝑎
(x−a)
F(x) = {(b−a) a≤x≤b {0
1 𝑥>𝑏
E(X) = (a+b)/2
V(X)=(b-a)2/12
Example:
If a wheel is spun and then allowed to come to rest, the point on the circumference of the
wheel that is located opposite a certain fixed marker could be considered the value of a random
variable X that is uniformly distributed over the circumference of the wheel. One could then compute
the probability that X will fall in any given arc.
If we assume that it is uniform in the interval[3,6], we can obtain,
2. EXPONENTIAL:
A Random variable X is said to be exponentially distributed if its pdf is given by,
f(x) = { λ e-λx x≥0
{0 otherwise
Where λ – parameter.
f(x) = { 0 x<0
{ 1- λ e -λx x≥0
E(X) = 1/λ.
V(X) = 1/λ2.
Exponential distribution is useful in representing lifetime of items, model interarrival times when
arrivals are completely random and service times which are highly variable.
Exponential distribution has a property called memory less property given by,
P(X > s + t / X > s) = p(X > t)
This is why we are able to use exponential to model lifetimes.
Example:
Let us assume that a company is manufacturing burettes whose lifetime is assumed to be
exponential with average life, 950 days. What is the probability that it is in working condition for up
to 1000 days.
Solution:
It is given that , X= Lifetime of the burette , is exponential with
average life 950 days i.e λ=950.
1000
P[life time is up to 1000 days] = P[0<X<1000] = ∫0 λe−λx
1000
=∫0 950 e−950x
= 950 [e−950x /−950]1000
0 .
3. NORMAL:
A normal variable X with mean µ( -∞ < µ < ∞ ) and variance σ2> 0 has a normal
distribution if its pdf is,
f(x) = ( 1/√2π ) exp [ -1/2 ( x-µ/σ)2 ] -∞ < x < ∞
A normal distribution is used when we are having a sum of many random variables. A
normal random variable with µ = 0 and σ = 1 is called a standard normal r.v. Its curve is
symmetrically distributed about the average µ = 0.
We Standardize a normal distribution by
Z=[X-µ]/sigma
=0.5+0.34 = 0.84.
Note:
1. The same question may have the following variations:
If several independent exponentially distributed random variables with same parameter βѲ are
added we obtain the sum as a gamma r.v with parameters β and Ѳ.
4. ERLANG:
E(X) = 1/Ѳ.
V(X) = 1/kѲ2.
6. WEIBULL:
The r.v has a Weibull distribution. If its pdf has the form,
{0 else
β → shape parameter.
v → location parameter.
E(X) = v + α Г [ ( 1/β ) + 1]
mk = E ( X k )
xk p ( x ) if X is discrete
x
=
x k f ( x ) dx if X is continuous
-
mk0 = E ( X − m )
k
( x − m ) p ( x)
k
if X is discrete
x
=
( x − m )k f ( x ) dx if X is continuous
-
3. var ( X ) = m20 = E ( X − m )
2
= E ( X ) − E ( X ) = m2 − m12
2 2
4. var ( aX + b ) = a2 var ( X )
Examples:
The data that follows are 55 smiling times, in seconds, of an eight-week old baby.
What is the probability that a randomly chosen eight-week old baby smiles between 2
and 18 seconds?
Solution:
Let us consider the baby smiling example. From the data we can calculate the mean
smiling time to be 11.49 and 6.23 seconds.
Since this is an entirely spontaneous activity which could be termed completely random,
we can use Uniform distribution to approximate this.
So, we need to form the uniform distribution.
From the table, the smallest value is 0.7 and the largest is 22.8. So, if we assume that
smiling times, in seconds, is uniformly distributed between(0,23), then by the definition
of the uniform distribution
1
a xb
f ( x) = b − a 18 18
0 1 18 − 2 16
else So, P(2<X<18)= f ( x)dx = 23 dx = 23
=
23
1 2 2
f ( x) =
23
EX2:
If jobs arrive every 15 seconds on average = 4 per minute, what is the probability of
waiting less than or equal to 30 seconds, i.e 0.5 min?
Solution:
If X=Waiting time between arrivals
Then P(waiting less than or equal to 0.5 min)=
0.5
−4 x
P( X 0.5) = 4e dx = 1 −e − 2 = 0.86
0
Ex3:
Accidents occur with a Poisson distribution at an average of 4 per week. i.e.
1. Calculate the probability of more than 5 accidents in any one week
2. What is the probability that at least two weeks will elapse between accident?
Solution:
(i) X=number of accidents
Poisson with mean 4
P(X>5)=1-P(X<=5)
e−4 40
=1-{P(X=0)+ P(X=1)… P(X=5)} P( X = 0) = = e− 4
0!
(ii) T=Time between occurrences
Exponential with mean 4 P (T 2) = 4e − 4t dt =e −8 = 0.00034
2
Ex4:
Find the area under normal curve that lies [i] to the right of z=1.84 and [ii]
between z=-1.97 and z=0.86
X is normal with mean 50 and sd=10
Prob , bet 45 and 62.
Ex6:
Suppose the diameter of a certain car component follows the normal distribution with X
N(10; 3). Find the proportion of these components that have diameter larger than 13.4
mm. Or, if we randomly select one of these components and the probability that its
diameter will be larger than 13.4 mm.
Solution:
Suppose the diameter of a certain car component follows the normal distribution with X
N(10; 3).
P(X>13.4)=P(X-10>13.4-10)=
X − 10 13.4 − 10
P = P
3 3
= P( Z 1.13) = 1 − 0.8708 = 0.1292
Ex7:
A bag of cookies is underweight if it weighs less than 500 grams. The filling process
dispenses cookies with weight that follows the normal distribution with mean 510
grams and standard deviation 4 grams.
a. What is the probability that a randomly selected bag is underweight?
b. If you randomly select 5 bags, what is the probability that exactly 2 of them will be
underweight?
Solution:
(a)=0.0062;
(b) 0.0004
[use binomial with probability obtained in (a).]
Ex8:
The top 5% of applicants (as measured by GRE scores) will receive scholarships. If GRE ~
N(500, 1002), how high does your GRE score have to be to qualify for a scholarship?
Solution:
Z = (X - 500)/100
P(Z ≥ z) = .05
Note that P(Z ≥ z) = 1 - F(z) .
If 1 - F(z) = .05, then F(z) = .95.
If 1 - F(z) = .05, then F(z) = .95.
Looking at Normal Table , F(z) = .95 for z = 1.65 (approximately).
Hence, z = 1.65.
To find the equivalent x, compute
x = (z * 100) + 500 = (1.65 * 100) + 500 = 665.
Thus, your GRE score needs to be 665 or higher to qualify for a scholarship.
EX9:
If a random variable X has the following moment-generating function :
20
3 1
M X (t ) = + et Find the pmf of the distribution
4 4
Solution:
Comparing the given moment generating function with that of a binomial random
variable, we see that X must be a binomial random variable with n = 20 and p = ¼.
Therefore, the p.m.f. of X is:
1 x 3 20 − x
p( x) = 20C x
4 4
fX(x)={cx20|x|≤1otherwise
Soln:
(a)For f to be a pdf, f ( x) dx = 1 which for the given function is
−
1
cx
2
dx = 1
−1 giving c=3/2.
2
c =1
3
1
3 x4
1 1
3
(b) E(X)= xf ( x) dx = x x 2 dx = =0
2 2 4
−1 −1 −1
1
3 x 5
1 1
3 2 3
2
2 2
and so, Var(X)=E(X2)-[E(X)]2.= E(X2)= x f ( x ) dx = x x dx = =
2 5 5
−1 −1 −1
1 1
3 2 7
(c) P(X>1/2)= f ( x) dx = 2x dx = .
1 1
16
2 2
P( x) = k + 2k + 3k + k + 3k = 1
So, x
10k = 1 giving k = 0.1
Soln:
We know that th epmf of a geometric r.v is
Ex12: Let X∼Geometric(p). Find E[1/2X].
x−1
P(X=x) = {pq x = 1, 2, 3 … … .
0
k −1
1 1
1 k −1 p q p 1 p 1 p
So, E = PX (k ) = pq = = = =
2 x k =12 x x 2 k =1 2 2 q 2 (1 − p) 1 + p
k =12 1− 1−
2 2
Joint Distributions(Bi-variate)
Continuous
f ( x, y ) 0 satisfying state space
f ( x, y ) dxdx = 1
Example-Joint Distribution:
Air Conditioner Maintenance
A company that services air conditioner units in residences and office blocks is
interested in how to schedule its technicians in the most efficient manner
The random variable X, taking the values 1,2,3 and 4, is the service time in hours
The random variable Y, taking the values 1,2 and 3, is the number of air
conditioner units
Definition: Two random variable are said to have joint probability density function f(x,y)
if
Definition: Let X and Ydenote two random variables with joint probability density function f(x,y)
then
themarginal density of X is
fX ( x) = f ( x, y ) dy
−
themarginal density of X is
fY ( y ) = f ( x, y ) dx
−
Let X and Ydenote two random variables with joint probability density function f(x,y)
and marginal densities fX(x), fY(y) then
the conditional density of Y given X = x
f ( x, y )
fY X ( y x ) =
fX ( x)
conditional density of X given Y = y
f ( x, y )
fX Y ( x y) =
fY ( y )
Bivariate-Example:
If X and Y are jointly distributed as given by
Covariance:
The covariance of two random variables X and Y is
s XY = E( X − m X )(Y − mY )
Which for application purposes can be simplified into
s XY = E( XY ) − m X mY
Two ball pens are selected at random from a bag containing 3 blue, 2 red and 3 green
pens. If X is the number of blue pens selected and Y is the number of red pens selected,
find
(i)the joint distribution of X and Y
(ii)P[(X,Y)εA)] where A is the region {(x,y), x+y<=1}
(iii) Covariance of X and Y
Sampling
Population is the total objects under a statistical study. For example, if we study
M.PharmA , our population is M.Pharm A. If we study Pharmacy, the entire lot of Pharmacy
students is our population. So , our interested range of objects is what is technically called a
population. So, a population can be finite or infinite, based on whether we can determine every
object or not.
Sampling- Here we collect data and analyze only a part of it, the sample.
.Law of statistical regularity: when a sample of reasonably large size is selected from a
population, it is more likely to posses the character of the population.
. Law of large numbers: The larger the size of the sample, the accurate the results are
going to be in representing the population.
The procedures employed in selecting a sample from the population are called sampling
techniques. They are broadly classified into two types, Random and Non-random.
Universe:
The word universe as used in statistics denotes the aggregate from which the sample is to
be taken.
Ex:- If in the year 1999, there are 2,00,000 students in Delhi University and a sample of 5,000
students is taken to study their attitude towards semester system, then 2,00,000 constitutes the
universe and 5,000 the sample size.
Finite Universe:-
A finite universe is one in which the number of item is determinable such as number of
students in Delhi University (or) in India.
Infinite Universe:-
An infinite universe is that in which the number of items cannot be determined, such as
the number of stars in the sky.
For example, wheat varies to a limited extent in colour, protein, content, length, weight
etc, it can always be identified as wheat, urly apples of the same tree may vary in size, colour,
taste, weight, etc, but they can always be identified as apples.
Methods of sampling:-
The various methods of sampling can be grouped under two broad heads:
Probability sampling:-
Probability sampling methods are those in which every item in the universe has a known
chance (or) probability of being chosen for the sample. This implies that the selection of sample
item is independent of the person making the study.
i. stratified sampling
ii. systematic sampling and
iii. cluster sampling
Non – Probability Sampling methods are those which do not provide every item in the
universe with a known chance of being included in the sample.
1. Judgment sampling
2. convenience sampling and
3. quota sampling
Sampling Methods
1. Judgment sampling:
This method of sampling the choice of sample items depends exclusively on the
judgment of the investigator.
For example, if sample of ten students is to be selected from a class of 60 for analyzing
the spending habits of students, the investigator would select 10 students who, in his opinion, are
representative of the class.
2. Quota sampling:
In a quota sample, quotas are setup according to some specified characteristics such as so
many in each of several income groups, so many in each age, so many with certain political or
religious affiliations and so on. Each interviewer is then told to interview a certain number of
persons which constitute his quota within the quota the selection of sample items depends or
personal judgment.
For example, in a radio listening survey, the interviewers may be told to interview 500
people living in a certain area and that out of every 100 persons interviewed 60 are to be
housewives, 25 farmers and 15 children under the age of 15 within these quotas the interviewer
is free to select the people to be interviewed.
Convenience sampling:-
Convenience sampling is often used for making pilot studies, questions may be tested and
preliminary information may be obtained by the chunk before the final sampling design is
decided upon.
1. Stratified Sampling:-
This process first stratifies(groups) the population into different stratas(groups) based on
a known character. Then it employs one of the following methods to get the sample.
. Proportional: Here the number of objects selected from a strata(group) is the same
proportion as the strata is to the population.
Non-proportional: Here we select the same number of elements from all the
stratas(groups).
For example let us say we have to select 100 students from a college having 5000 students. We
first stratify the college into 4 classes, say, based on their year. Let us assume that
Proportional:
Strata I(I year): This contains 10 % of the population. So select 10% of 100= 10 students
from this strata.
StrataII(II year): This contains 20 % of the population. So select 20% of 100= 20 students
from this strata.
Strata III(III year): This contains 40 % of the population. So select 40% of 100= 40
students from this strata.
Strata IV(IV year): This contains 30 % of the population. So select 30% of 100= 30
students from this strata.
Non-Proportional:
We need 100 students and we have 4 stratas. So select 100/4=25 students from each
class.
2. Systematic sampling:-
This method is popularly used in those cases where a complete list of the population from
which sample is to be drawn is available. The list may be prepared in alphabetical, geographical,
numerical or some other order. The items are serially numbered. The first item is selected at
random generally by following the lottery method. Subsequent items are selected by taking
every Kth item from the list where ‘K’ refers to the sampling interval or sampling ratio, (ie) the
ratio of population size of the size of the sample.
Ex:- In a class there are 96 students with roll nos. from 1 to 96. It is desired to take sample of 10
students. Use the systematic sampling method to determine the sample size.
K=N/n=96/10=9.6=10
From 1 to 96 roll nos. the first student between 1 and K, (ie) 1 and 10, will be selected at
random and then we will go on taking every Kth student. Suppose the first student comes out to
be 4th. The sample would then consist of the following roll nos. 4,14,24,34,44,54,64,74,84,94.
Under this method, the random selection is made of primary, intermediate and final units
from a given population or stratum. There are several stages in which the sampling process is
carried out. At first, the first state units are sampled by some suitable method, such as simple
random sampling. Then a sample of second stage units is selected from each of the selected first
stage units, again by some suitable method which may be same as (or) different from the method
employed for the first stage units. Further stages may be added as required.
Ex:- suppose we want to take a sample of 5,000 households from the state of U.P.
At the first stage, the state may be divided into a no. of district and a few districts selected
at random.
At the second stage, each district may be subdivided into a no. of villages and a sample of
villages may be taken at random.
At the third state, a number of households may be selected from each of the villages
selected at the second stage.
Statistical Inferences refers to the process of selecting and using a sample statistic to draw
inference about a population parameter. It is concerned with using probability concept to deal
with uncertainly in decision making.
Statistical Inference treats two different classes of problems namely hypothesis testing
and estimation.
Hypothesis Testing:-
Hypothesis Testing is to test some hypothesis about the parent population from which the
sample is drawn. It must be noted test of hypothesis also includes test of significance.
Estimation:-
The estimation theory deals with defining estimators for unknown population parameters
on the basis of sample study.
Parameter and Statistics:-
The statistical constants of the population, namely mean m, variance σ2 which are usually
referred to as parameters.
Statistical measures computed from sample observations alone eg. mean (𝑥̅ ), variance
(S2) etc. are usually referred to as statistic.
Sampling Distribution:-
If we select a definite no of independent random samples from a given population and
calculate some statistic like mean and S.D. from each sample we shall get series of values of the
statistics, these values obtained from different samples put in the form of frequency distribution
is called as “Sampling Distribution”.
Ex: if we draw 100 sample from a given population and calculate their mean and S.D, we
shall get a series of 100 means and S.D’s to form a sampling distribution.
Standard Error:-
The standard deviation of sampling distribution of a statistic is known as its standard
error and it is denoted by S.E.
Null and Alternative Hypothesis:-
For applying the test of significance we first set up a hypothesis – a definite statement
about the population parameter. Such a hypothesis is usually a hypothesis of no. difference and
it is denoted by H0, null hypothesis.
Any hypothesis which is complementary to the null hypothesis is called an alternative
hypothesis, usually denoted by H1.
If we want to test the null hypothesis that the population has a specified mean mo (say)
(i.e) H0: m=mo, then the alternative hypothesis would be
Errors in sampling:-
The main objective in sampling theory is to draw valid inferences about the population
parameters on the basis of the samples results. In practice we decide to accept (or) to reject the
lot after examining a sample from it. As such we have two types of errors.
(i) type I error and (ii) type II error
Type I Error:-
A type I error is committed by rejecting the null hypothesis when it is true. The
probability of committing a type I error is denoted by α,where
α = prob (type I error)
= prob. (rejecting H0/when H0 is true)
Type II Error:-
A type II error is committed by accepting the null hypothesis is when it is false. The
probability of committing a type II error is denoted by β.
Where β = prob. (type II error)
= prob. (accepting H0/when H0 is false)
Accept H0 Reject H0
H0 is true Correct Decision Type I error
H0 is false Type II error Correct Decision
Critical Region:-
A region corresponding to a statistic in the sample space S which lead to the rejection of
H0 is called Critical Region (Or) Rejection Region.
Those region which to the acceptance of H0 give us a region called acceptance region.
Level of Signifance:-
The probability α that a random value of the statistic ‘t' belongs to the critical region is
known as the level of significance. In other words, level of significance is the size of the type I
error. The level of significance usually employed in testing of hypothesis are 5% and 1%.
One tailed and two tailed tests:-
Test of Significance
Let us now discuss the various situations where we have to apply different tests of
significance. For the sake of convenience and clarity these situations may be summed up under
the following 3 heads:
1. test of significance for attributes
2. test of significance for variables (large samples)
3. test of significance for variables (small samples)
Tests for No. of success Test for proportion of Test for difference
successes between proportions
i. Set up hypothesis H0:H1 i. Set up hypothesis H0:H1 i. Set up hypothesis H0:H1
LARGE SAMPLES
If the size of the sample n>30, then the sample is called large samples.
There are 3 important test to test the significance of large samples.
i. test of significance for single mean
ii. test of significance for difference of mean
iii. test of significance for difference of S.D
Test of significance for Difference between two Difference between two S.D
single mean mean
i. Hypothesis setting i. Hypothesis setting i. Hypothesis setting
iv. result
SMALL SAMPLES
Defn:
When the size of the sample (n) is less than 30, then the sample is called a small sample.
The following are some important tests for small samples.
i. student’s t-test
ii. f – test
iii. x – test
Degrees of Freedom:-
Degrees of freedom is the no. of independent observations in a set.
By degrees of freedom we mean the no. of classes in which the values can be assigned
arbitrarily (or) at will without violating the restrictions (or) limitations placed.
Degrees of freedom = no. of groups – no. of constraints
Student’s t-Distribution:-
Defn:- The t-distribution is commonly called student’s t-distribution (or) simply student’s
distribution.
Single mean Difference between Difference between Testing for observed
two mean two mean (dependent correlation
(independent samples) or paired t- coefficient
sample) test
i.Hypothesis setting i.Hypothesis setting i.Hypothesis setting i.Hypothesis setting
ii. t=𝑥̅ -m (or)𝑥̅ -m ii. t=𝑥̅ 1-𝑥̅ 2 ii. t=đ √n ii. t=r √n-2
S/√n σ√n S√1/n1+1/n2 S √1-r2
where đ=Ʃd
n
S=√Ʃ(d-đ)2
n –1
iii. table value iii. table value iii. table value iii. table value
(D.O.F=n-1) (D.O.F=n1n2-2) (D.O.F=n-1) (D.O.F=n-2)
The t-distribution is used when sample size is 30 (or) less and the population standard
deviation is unknown.
The t-statistic is defined as:
t=𝑥̅ -m where S=√Ʃ(x-𝑥̅ )2
S/√n n-1
A a b a+b
B c d c+d
a+c b+d N
Sampling distribution
The distribution of a statistic(measure) obtained from repeated random sampling from a
population is a sampling distribution.
If we draw 10 random samples from a data(population), calculate mean for each
sample, then the sequence of the 10 means is a sampling distribution of means.
Likewise for other measures.
Consider , tossing a dice .
The mean of a single throw
is (1 + 2 + 3 + 4 + 5 + 6) / 6 = 3.5.
When we increase the sample size
to 2,3…, there is movement
towards normality
Student's t-distribution (or simply the t-distribution) is any member of a family of
continuous probability distributions that arises when estimating the mean of a normally
distributed population in situations where the sample size is small and
X −m
population standard deviation is unknown. t =
s n
Definition. If Z ~ N(0, 1) and U ~ χ2(r) are independent, then the random variable
follows a t-distribution with r degrees of freedom. We write T ~ t(r).
t Test:
The t test is based on the assumption that we are comparing means .
The test statistic is defined by
(X − X )
2
| X −m|
t= where S =
S/ n n −1
Example:
A manufacturer of a kind of bulb claims that his bulbs have a mean life of 25 months
with a standard deviation of 5 months. A random sample of 6 bulbs gave the following
lifetimes. Is the claim valid.
24,26,30,20,20,18
Step 1: Ho: There is no significant difference between the sample mean and the
population mean.
Step 2: Dof = n-1 = 6-1 = 5; LOS = 5% =0.05.
Paired t test:
The t statistic for the paired t test is
d
t=
Sd / n
where
d = X1− X 2
d isthe average of deviation
S d is the s tan dard deviation of the deviation
Z test:
Is the proportion of babies born male different from .50?In a sample of 200 babies, 96
were male.
Is the proportion of babies born male different from .50?
H0:p=.50
Ha:p≠.50
pˆ=96/200=.48, p0=.50, n=200
P(z<−0.566)=.2843
Because this is a two-tailed test we must take into account both the left and right tails.
To do so, we multiply the value above by two. (p=.2843+.2843=.5686)
Our p-value is .5686
Since Our p-value [.5686]
is > o.o5, accept
the null hypothesis
Pooled sample proportion. Since the null hypothesis states that P1=P2, we use a pooled
sample proportion (p) to compute the standard error of the sampling distribution.
p = (p1 * n1 + p2 * n2) / (n1 + n2)
where p1 is the sample proportion from population 1, p2 is the sample proportion from
population 2, n1 is the size of sample 1, and n2 is the size of sample 2.
Standard error. Compute the standard error (SE) of the sampling distribution difference
between two proportions.
SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }
where p is the pooled sample proportion, n1 is the size of sample 1, and n2 is the size of
sample 2.
Test statistic. The test statistic is a z-score (z) defined by the following equation.
z = (p1 - p2) / SE
where p1 is the proportion from sample 1, p2 is the proportion from sample 2, and SE is
the standard error of the sampling distribution.
P-value. The P-value is the probability of observing a sample statistic as extreme as the
test statistic. Since the test statistic is a z-score, use the Normal Distribution table to
assess the probability associated with the z-score.
Tests for two proportions :Example
Suppose the Acme Drug Company develops a new drug, designed to prevent colds. The
company states that the drug is equally effective for men and women. To test this claim,
they choose a simple random sample of 100 women and 200 men from a population of
100,000 volunteers.
At the end of the study, 38% of the women caught a cold; and 51% of the men caught a
cold. Based on these findings, can we reject the company's claim that the drug is equally
effective for men and women? Use a 0.05 level of significance.
State the hypotheses. The first step is to state the null hypothesis and an alternative
hypothesis.
Null hypothesis: P1 = P2
Alternative hypothesis: P1 ≠ P2
Note that these hypotheses constitute a two-tailed test. The null hypothesis will be
rejected if the proportion from population 1 is too big or if it is too small.
Using sample data, we calculate the pooled sample proportion (p) and the standard
error (SE). Using those measures, we compute the z-score test statistic (z).
p = (p1 * n1 + p2 * n2) / (n1 + n2) = [(0.38 * 100) + (0.51 * 200)] / (100 + 200) = 140/300 =
0.467
Since we have a two-tailed test, the P-value is the probability that the z-score is less
than -2.13 or greater than 2.13.
We use the Normal Distribution table to find P(z < -2.13) = 0.017, and P(z > 2.13)
= 0.017. Thus, the P-value = 0.017 + 0.017 = 0.034.
Since the P-value (0.034) is less than the significance level (0.05), reject the null
hypothesis.
Estimation:
In studying a population, estimation is a route and it is about estimating the parameter
values of the population using the statistic value of a sample.
There are two types of Estimations
Point Estimation and Interval Estimation.
Point Estimates try to get a single value for the parameter whereas the Interval
estimation gets a Confidence Interval.
Methods: Least squares and Maximum likelihood.
Properties of Estimators:
Unbiasedness: Expected value of estimator equals parameter.
Consistency: Estimator approaches parameter as n gets larger.
Efficiency: Between two estimators. One with smaller variance is efficient.
Sufficiency: Conveys as much information as is possible about parameter from the
sample
Procedure:
Step1: Form the Hypothesis
Step2: Find the standard error
Step3: Find Difference/S.E
Step4:
If step3 value <1.96, Accept H0, else reject H0 at 5% LOS.
If step3 value <2.58, Accept H0, else reject H0 at 1% LOS.
n p +n p x +x
The pooled proportion is p = 1 1 2 2 or p = 1 2
n1 + n2 n1 + n2
Where x1, x2 are number of occurrences of two samples of sizes n1 and n2
Standard error for diff of prop is
1 1
pq +
S .E p1 − p2 =
n1 n2
If (p1-p2)/S.E <1.96, accept at 5% LOS
If (p1-p2)/S.E<2.58, accept at 1% LOS
Ina random sample of 1000 persons from a town A , 400 are consumers of wheat
whereas in another town B a random sample of size 800 had 400 consumed wheat. Is
there a significant difference in wheat consumption among the towns?
Solution:
400 400
1000 * + 800 *
n p +n p 1000 800 = 4
p= 1 1 2 2=
n1 + n2 1000 + 800 9
1 1 4 5 1 1
pq + =
S .E p1 − p2 = + = 0.024
n1 n2 9 9 1000 800
(p1-p2)/S.E=(0.4-0.5)/0.024=4.17 is >2.58,
Hence reject H0 at 1% LOS
Correlation: Regression
Correlation:
Measures the degree of linear association between two interval scaled variables analysis
of the relationship between two quantitative outcomes , e.g., height and weight,
Scatter plot:
Correlation
Karl Pearson’s correlation coefficient:
N XY − X Y
Karl Peason’s, r =
N X 2 − ( X ) N Y 2 − ( Y )
2 2
Correlation
Spearman’s correlation coefficient(ranks)
Spearman's rank correlation coefficient or Spearman's rho, is a measure of statistical
dependence between two variables based on ranks or relative values
Regression:
Regression follows correlation in identifying the causal relationship between the two
correlated variables.
The dependence of dependent variable Y on the independent variable X.
Relationship is summarized by a regression equation.
y = a + bx
a=intercept at y axis
b=regression coefficient
The line of regression is the line which gives the best estimate to the value of one
variable for any specific value of the other variable. Thus the line of regression is the line
of “best fit” and is Obtained by the principle of least squares.
This principle consists in minimizing the sum of the squares of the deviations of the
actual values of y from their estimate values given by the line of best fit
Procedure:
Step1: Write the normal equations for the regression line y=mx+c as
y = m x + nc
xy = m x 2 + c x
Step2: Form the regression table to get the values.
Step3: Substitute the values in normal equations, solve them to find ‘m’ and ‘c’ to fit the
line
Test for correlation:
r
The test for correlation is t with t = n−2
1− r2
Where r is the correlation and DOF= n-2.
Correlation-Other Methods
Covariance/Concurrent deviation
Covariance based
2*2 − 7
rc = = −0.65
7
Regression
The principle of least squares.
This principle consists in minimizing the sum of the squares of
the deviations of the actual values of y from their estimate
values given by the line of best fit
Regression
Procedure
Step1: Write the normal equations for the regression line
y=mx+c as
y = m x + nc
xy = m x 2
+ c x
Others:
Poisson Process:
For s < t, N(t) - N(s) is the number of events that occur in the interval (s, t].
A counting process has independent increments if, for any 0<s<t<u<v, N(t)-N(s)
is independent of N(v)-N(u).
That is, the number of events that occur in non overlapping intervals are independent
random variables.
▪ A counting process has stationary increments if , for any s < t, the distribution of N(t) –
N(s) depends only on the length of the time interval, t – s.
N(0) = 0
The number of events in any interval of length t follows a Poisson distribution with
mean lt (therefore, it has stationary increments), i.e.,
e − t ( t )
n
P N ( t + s ) − N ( s ) = n = , n = 0,1,...
n!
Chebyshev’s theorem:
Chebyshev’s theorem will show you how to use the mean and the standard deviation to
find the percentage of the total observations that fall within a given interval about the
mean.
The fraction of any set of numbers lying within k standard deviations of the mean of
the within number
those numbers is at least 1-[1/k2], where k =
the standard deviation
Use Chebyshev's theorem to find what percent of the values will fall between 123 and
179 for a data set with mean of 151 and standard deviation of 14.
Solution:
We subtract 151-123 and get 28, which tells us that 123 is 28 units below the mean.
We subtract 179-151 and also get 28, which tells us that 151 is 28 units above the mean.
Those two together tell us that the values between 123 and 179 are all within 28 units
of the mean. Therefore the "within number" is 28.
Use Chebyshev's theorem to find what percent of the values will fall between 123 and
179 for a data set with mean of 151 and standard deviation of 14.
Solution:
So we find the number of standard deviations, k, which the "within number", 28,
amounts to by dividing it by the standard deviation:
Use Chebyshev's theorem to find what percent of the values will fall between 123 and
179 for a data set with mean of 151 and standard deviation of 14.
Solution:
1−[1/k2]=1−[1/22]=1−1/4=3/4
Suppose µ = 39 and σ = 5, find the percentage of values that lie within 29 and 49 of the
mean.
29 ——————– 39 ——————– 49
First , we notice distance between mean(39) and 49 is 10 and it is kσ. Therefore kσ=10
implies k=2.
Discrete Simulation:
We sort of observe the system through a set of variables called state Variables and when the
state variables change their values only at countable number of points in time, its called
Discrete Simulation.
Most business processes can be described as a sequence of separate, discrete, events. For
example, a truck arrives at a warehouse, goes to an unloading gate, unloads, and then departs.
To simulate this, discrete event modeling is often chosen.
Steps in Simulation:
Key Aspects:
Entities − These are the representation of real elements like the parts of machines.
Simulation Executive − It is responsible for controlling the advance time and executing
discrete events.
Random Number Generator − It helps to simulate different data coming into the
simulation model.
Results & Statistics − It validates the model and provides its performance measures.
A queue is the combination of all entities in the system being served and those waiting
for their turn.
Central Limit Theorem:
Given a sequence of random variables with a mean and a variance, the CLT says that the
sample average has a distribution which is approximately Normal, and gives the new
X 1, X 2,... Xn i.i.d with mean = m ; SD = s ;
mean and variance.
X n N ( m , s 2 / n)
Notice that nothing at all need be assumed about the P, CDF, or PDF associated with X,
which could have any distribution from which a mean and variance can be derived.
Example:
The lifetime of a certain type of bulb for each plant of a company is a random variable
with mean 1200 hrs and standard deviation 250 hrs. Using central limit theorem, find
the probability that the average lifetime of 60 bulbs of the company exceeds 1250 hrs.
Estimation:
In studying a population, estimation is a route and it is about estimating the parameter values of the
population using the statistic value of a sample.
Point Estimation-Methods:
Method of Moments
A point estimator is any function T (Y1,Y2, ..,YN ) of a sample. Any statistic/Measure is a point estimator.
Assume that Y1,Y2, ..,YN are i.i.d. N( m, σ2) random variables. The sample mean (or average)
N
1
Yn =
N
Y
i =1
i
Let T be the time that is needed for a specific task in a factory to be completed. In order to estimate the
mean and variance of T, we observe a random sample T1,T2,⋯⋯,T6. Thus, Ti's are i.i.d. and have the
same distribution as T. We obtain the following values (in minutes):
18,21,17,16,24,20.
Find the values of the sample mean, the sample variance, and the sample standard deviation for the
observed sample.
18 + 21 + 17 + 16 + 24 + 20 1 n
Tˆ =
6
= 19.33 Sˆ 2 =
6 − 1 k =1
( X k − 19.33) 2 = 8.67
Maximum Likelihood:
The likelihood of a set of data is the probability of obtaining that particular set of data, given the
probability distribution model.
This expression contains the unknown model parameters. The values of these parameters that maximize
the sample likelihood are known as the Maximum Likelihood Estimates or MLEs.
The maximum likelihood estimate (mle) of θ is that value of θ that maximises likelihood(θ).
It is defined as
n
L( ) = f ( xi / )
i =1
n
log L( ) = log f ( xi / )
i =1
Example:
Suppose we wish to find the maximum likelihood estimate (mle) of θ for a Binomial distribution,
pk (k , ) = nCk k (1 − ) n − k
log pk (k , ) = log( nCk ) + k log( ) + (n − k ) log(( 1 − )
log pk (k , ) k n−k
= 0 0+ − =0
1−
k
k − k = n − k =
n
MLEExample:
Consider a sample 0,1,0,0,1,0 from a binomial distribution, with the form P[X=0]=(1-p), P[X=1]=p. Find
the maximum likelihood estimate of p.
Soln :
=(1-p)3p2.
Log L(p)=log[(1-p)3p2.]=3log(1-p)+2logp
LogL( p ) −3 2 −3p + 2 − 2 p
=0 means, + =0 = 0 p = 2/5
p 1− p p p(1 − p)
That is , there is 2/5 chance to observe this sample if we believe the population to be Binomial
distributed .
Interval Estimation:
Let xi, i = 1, 2, … n be a random sample of size n from f(x,θ). If T1(x) and T2(x) be any two statistics such
that T1(x) ≤ T2(x) then,
where α is level of significance, then the random interval (T1(x), T2(x)) is called 100(1-α)% confidence
interval for θ.
Here, T1 is called lower confidence limit and T2 is called upper confidence limit. (1-α) is called the
confidence coefficient.
Consider a sample 3.7, 5.3, 4.7, 3.3, 5.3, 5.1, 5.1, 4.9, 4.2, 5.7 .
If we go for 90%confidence interval, then α=.10. The confidence interval for the average is given by
S S
X − t / 2,dof m X + t / 2,dof
n n
So,
0.766 0.766
4.73 − 1.83 * m 4.73 + 1.83 *
10 10
4.73 − 0.4433 m 4.73 + 0.4433
4.2867 m 5.1733