0% found this document useful (0 votes)
97 views

Note - 1 PDF

This document discusses basic concepts of probability including: 1) Random experiments, sample spaces, events, simple and compound events, equally likely events, and exhaustive events. 2) Classical, frequency, subjective, and axiomatic approaches to probability. 3) The classical definition of probability as the number of favorable outcomes divided by the total number of possible outcomes. 4) Sure events, mutually exclusive events, the addition theorem of probability, and problems applying the addition theorem.

Uploaded by

Team Drive
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

Note - 1 PDF

This document discusses basic concepts of probability including: 1) Random experiments, sample spaces, events, simple and compound events, equally likely events, and exhaustive events. 2) Classical, frequency, subjective, and axiomatic approaches to probability. 3) The classical definition of probability as the number of favorable outcomes divided by the total number of possible outcomes. 4) Sure events, mutually exclusive events, the addition theorem of probability, and problems applying the addition theorem.

Uploaded by

Team Drive
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Probability

Probability
Basic Concepts
Random Experiment: An experiment is said to be a random experiment, if it's out-come can't
be predicted with certainty.
Example; If a coin is tossed, we can't say, whether head or tail will appear. So it is a random
experiment.
Sample Space: The set of all possible out-comes of an experiment is called the sample space. It
is denoted by 'S' and its number of elements are n(s).
Example; In throwing a dice, the number that appears at top is any one of 1,2,3,4,5,6. So here:S
={1,2,3,4,5,6} and n(s) = 6
Similarly in the case of a coin, S={Head,Tail} or {H,T} and n(s)=2.
The elements of the sample space are called sample points or event points.
Event: Every subset of a sample space is an event. It is denoted by 'E'.
Example: In throwing a dice S={1,2,3,4,5,6}, the appearance of an event number will be the
event E={2,4,6}.
Clearly E is a sub set of S.
Simple event; An event, consisting of a single sample point is called a simple event.
Example: In throwing a dice, S={1,2,3,4,5,6}, so each of {1},{2},{3},{4},{5} and {6} are simple
events.
Compound event: A subset of the sample space, which has more than on element is called a
mixed event.
Example: In throwing a dice, the event of appearing of odd numbers is a compound event,
because E={1,3,5} which has '3' elements.
Equally likely events: Events are said to be equally likely, if we have no reason to believe that
one is more likely to occur than the other.

Example: When a dice is thrown, all the six faces {1,2,3,4,5,6} are equally likely to come up.
Exhaustive events: When every possible out come of an experiment is considered.

Approaches of Probability
 Classical approach
 Frequency approach
 Subjective approach
 Axiomatic approach
 P(A)>=0;
 P(S)=1;
 P(A+B)<=P(A)+P(B)

Classical definition of probability:


If 'S' be the sample space, then the probability of occurrence of an event 'E' is defined as:
P(E) = n(E)/N(S) = number of elements in 'E'
number of elements in sample space 'S'
Example: Find the probability of getting a tail in tossing of a coin.
Solution:
Sample space S = {H,T} and n(s) = 2
Event 'E' = {T} and n(E) = 1
therefore P(E) = n(E)/n(S) = ½
Note: This definition is not true, if
(a) The events are not equally likely.
(b) The possible outcomes are infinite.

Sure event: Let 'S' be a sample space. If E is a subset of or equal to S then E is called a sure
event.
Example: In a throw of a dice, S={1,2,3,4,5,6}
Let E1=Event of getting a number less than '7'.
So 'E1' is a sure event.So, we can say, in a sure event n(E) = n(S)

Mutually exclusive or disjoint event: If two or more events can't occur simultaneously, that is
no two of them can occur together.

Addition Theorem of Probability :

If ‘A’ and ‘B’ by any two events, then the probability of occurrence of at least one of the events
‘A’ and ‘B’ is given by:

P(A or B) = P(A) + P(B) – P (A and B)

P(A B) = P(A) + P(B) – P (A B)


Problems based on addition theorem of probability:
Working rule :
(i) A B denotes the event of occurrence of at least one of the event ‘A’ or ‘B’
(ii) A B denotes the event of occurrence of both the events ‘A’ and ‘B’.
(iii) P(A B) or P(A+B) denotes the probability of occurrence of at least one of the
event ‘A’ or ‘B’.
(iv) P( B) or P(AB) denotes the probability of occurrence of both the event ‘A’ and
‘B’.

--------------------x-------------------x---------------------x--------------------x---------------x-----

Ex.: The probability that a contractor will get a contract is ‘2/3’ and the probability that he
will get on other contract is 5/9 . If the probability of getting at least one contract is 4/5, what is
the probability that he will get both the contracts ?
Sol.: Here P(A) = 2/3, P(B) = 5/9
P(A b) = 4/5, (P(A B) = ?
By addition theorem of Probability:
P(A B) = P(A) + P(B) - P(A B)= 4/5 = 2/3 + 5/9 - P(A B)
or 4/5 = 11/9 – P(A B)
or P(A B) = 11/9 – 4/5 = (55-36) / 45
P(A B) = 19/45
Multiplication theorem:
Let A and B be two independent events. Then multiplication theorem states that,
P[AB]= P[A]. P[B].
Note: P[AB] can also be represented by P[A and B] or P[A∩B].

Example:
Let a problem in statistics be given to two students whose probability of solving it are 1/5 and
5/7.
What is the probability that both solve the problem.
Solution:
Let A= event that the first person solves the problem.
B= event that the second person solves the problem.
It is given that P[A]=1/ 5; P[B]=5/7.
Since A and B are independent, using multiplication theorem

P[AB]= P[A]. P[B]. = 1/5*5/7= 1/7.


Conditional probability:
Probability of dependent events is termed conditional probability. Let A and B be 2
events, A depending on B. Then,
P[A  B]
P[A/B] =
P[ B]
Example:
Let a file contain 10 papers numbered 1 to 10. A paper is selected at random. What is the
probability that it is 10 given that it is at least 5.
Solution:
From the problem we can see that,
Sample space ={1,2,3,4,5,6,7,8,9,10}

A- Event that number is 10 ={10}.


B- Event that number is at least 5 ={5,6,7,8,9,10}.

A∩B={10}.

P[A]= 1/10; P[B] =6/10; P[ A∩B] =1/10.

Therefore,

P[A  B] 1/10 1
P[A/B] = = =
P[ B] 6 / 10 6
Bayes Theorem:
Statement:Let E1,E2,…,En be n complimentary events and B be any event. Then

P( B / Ei ) P( Ei )
P( Ei / B ) = n

 P( B / E ) P( E )
i =1
i i

When to Apply Bayes' Theorem

Part of the challenge in applying Bayes' theorem involves recognizing the types of problems
that warrant its use. You should consider Bayes' theorem when the following conditions exist.
▪ The sample space is partitioned into a set of mutually exclusive events { A1, A2, . . . , An }.
▪ Within the sample space, there exists an event B, for which P(B) > 0.
▪ The analytical goal is to compute a conditional probability of the form: P( A k | B ).
▪ You know at least one of the two sets of probabilities described below.
• P( Ak ∩ B ) for each Ak
• P( Ak ) and P( B | Ak ) for each Ak

Sample Problem

Bayes' theorem can be best understood through an example. This section presents an example
that demonstrates how Bayes' theorem can be applied effectively to solve statistical problems.

Example 1
Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it
has rained only 5 days each year. Unfortunately, the weatherman has predicted rain for
tomorrow. When it actually rains, the weatherman correctly forecasts rain 90% of the time.
When it doesn't rain, he incorrectly forecasts rain 10% of the time. What is the probability that
it will rain on the day of Marie's wedding?

Solution: The sample space is defined by two mutually-exclusive events - it rains or it does not
rain. Additionally, a third event occurs when the weatherman predicts rain. Notation for these
events appears below.

▪ Event A1. It rains on Marie's wedding.


▪ Event A2. It does not rain on Marie's wedding
▪ Event B. The weatherman predicts rain.

In terms of probabilities, we know the following:

▪ P( A1 ) = 5/365 =0.0136985 [It rains 5 days out of the year.]


▪ P( A2 ) = 360/365 = 0.9863014 [It does not rain 360 days out of the year.]
▪ P( B | A1 ) = 0.9 [When it rains, the weatherman predicts rain 90% of the time.]
▪ P( B | A2 ) = 0.1 [When it does not rain, the weatherman predicts rain 10% of the time.]

We want to know P( A1 | B ), the probability it will rain on the day of Marie's wedding, given a
forecast for rain by the weatherman. The answer can be determined from Bayes' theorem, as
shown below.
P( A1 ) P( B | A1 )

P( A1 | B ) =

P( A1 ) P( B | A1 ) + P( A2 ) P( B | A2 )

P( A1 | B ) = (0.014)(0.9) / [ (0.014)(0.9) + (0.986)(0.1) ]

P( A1 | B ) = 0.111

Note the somewhat unintuitive result. Even when the weatherman predicts rain, it only rains
only about 11% of the time. Despite the weatherman's gloomy prediction, there is a good
chance that Marie will not get rained on at her wedding.

Bayes Theorem: Example 2:

Let us say P(Fire) means how often there is fire, and P(Smoke) means how often we see smoke,
then:

P(Fire|Smoke) means how often there is fire when we can see smoke
P(Smoke|Fire) means how often we can see smoke when there is fire

So the formula kind of tells us "forwards" P(Fire|Smoke) when we know


"backwards" P(Smoke|Fire)

• If dangerous fires are rare (1%)


• but smoke is fairly common (10%) due to barbecues,
• and 90% of dangerous fires make smoke
• We can then discover the probability of dangerous Fire when there is Smoke:
• P(Fire|Smoke) =P(Fire) P(Smoke|Fire)P(Smoke)
• =1% x 90%/10%
• =9%
• So it is still worth checking out any smoke to be sure.

Bayes Theorem Example : Three machines, M1, M2 y M3, produce 45%, 30% y 25%,
respectively, of the total parts produced in a factory. The percentages of defective production
of these machines are 3%, 4% y 5%, respectively.
a) If we choose a part randomly, calculate the probability that it is defective.
b) Suppose now that we choose a part randomly and it is defective. Calculate the probability
that it was produced by M2.
Random Variable: Probability distributions

Random Variable
A random variable is a function that maps the set of events to Rn. By convention random
variables are written as upper case Roman letters from the end of the alphabet like X.

For example, define the random variable X to be the sum of the two dice. For every element in
the sample space, we can specify the value of X.
S={
(1; 1) = 2 (1; 2) = 3 (1; 3) = 4 (1; 4) = 5 (1; 5) = 6 (1; 6) = 7
(2; 1) = 3 (2; 2) = 4 (2; 3) = 5 (2; 4) = 6 (2; 5) = 7 (2; 6) = 8
(3; 1) = 4 (3; 2) = 5 (3; 3) = 6 (3; 4) = 7 (3; 5) = 8 (3; 6) = 9
(4; 1) = 5 (4; 2) = 6 (4; 3) = 7 (4; 4) = 8 (4; 5) = 9 (4; 6) = 10
(5; 1) = 6 (5; 2) = 7 (5; 3) = 8 (5; 4) = 9 (5; 5) = 10 (5; 6) = 11
(6; 1) = 7 (6; 2) = 8 (6; 3) = 9 (6; 4) = 10 (6; 5) = 11 (6; 6) = 12
}

If we know the probabilities of a set of events, we can calculate the probabilities that a random
variable defined on those set of events takes on certain values. For example
P(X = 2) = P((1; 1)) = 1/36
P(X = 5) = P((1; 4); (2; 3); (3; 2); (4; 1g) = 1/9.
P(X = 7) = P( (1; 6); (2; 5); (3; 4); (4; 3); (5; 2); (6; 1) ) = 1/6
P(X = 12) = P((6; 6)) = 1/36.
The expression for P(X = 5) should be familiar, since we calculated it above as the probability of
the event that the two dice sum to five. Much of the theory of probability is concerned with
defining functions of random variables
and calculating the likelihood with which they take on their values.

So now we know something about what a random variable is. Now we see it a bit more
closely. Random variables can be broadly classified into two types,
.Discrete r.v ---- these take only integer values
.continuous r.v ---- these can take any value

Discrete r.v
 probability mass function
 p(x)=P(X=x)
 Ex: Toss a coin twice S={HH,HT,TH,TT}
 X=Number of heads={0,1,2}
 p(0)=P{TT}=1/4
 P(1)={TH,HT}=2/4
 P(2)={HH}=1/4

5. Probability Distribution
Define a mapping between the set of all events and the set of real numbers. If every event has
one and only one corresponding number, we call this mapping a function. Write a function as
f(event) = number. For example
f((3; 6)) = 42
f((2; 4); (5; 6)) = 3.75
f() = 99
We can define any arbitrary function we want, but some classes of functions are more
interesting than others. One class of functions that is interesting is that for which the following
axioms of probability hold.
.P(S) = 1
.P(A) 0
.P(A B) P(A) + P(B) for all A and B

A function that obeys all three of these conditions for all A and B is called a probability
function or (equivalently) a probability distribution. By convention we write the name of a
probability distribution with a P.
Statements (5) and (6) are easily proved corollaries to the axioms of probability. They will be
true for any P that satisfies the above conditions.

Any random variable has probability associated with each of its possible values
according to the probability function as defined above. The pair
{X, p(x)}
Is called its probability distribution.

If X is a discrete random variable, the distribution has,


X - integer values
.p(x) - probability mass function
Gives probability for each value of X
.p(x)= P{X=x}

If X is a continuous random variable, the distribution has,


X - any value
.f(x) - probability density function
Gives probability for each interval in which X can take values.
𝑏
.P(a X b)= ∫𝑎 𝑓(𝑥)𝑑𝑥
CDF: Cumulative distribution function:
 The cumulative distribution function (CDF) calculates the cumulative probability for a
given x-value.

  p (t )
t  x

F ( x) = P( X  x) =  x
 f (t )dt

− 

pmf
Discrete
cdf
R.V
pdf
Continuous
cdf

6. Expectation Value
Once we know the probability distribution of a random variable we can use it to predict
the average outcome of functions of that variable. This is done using expectation values. The
expectation value of a random variable X is defined to be

E[(X)] =∑𝑥 𝑥𝑝(𝑥) if X is discrete

=∫ 𝑥𝑓(𝑥)𝑑𝑥 if X is continuous

The X defined in the previous section has the following mean value
E[X] = 2P(X = 2) + 3P(X = 3) + 4P(X = 4) _ _ _ + 12P(X = 12)
=7
You can think of expectation values as taking a weighted average of the values of X
where more likely values get a higher weight than less likely values.

Note: If X is continuous we do the same process where we replace ∑ by ∫ .

7. Variance
Once we know the probability distribution of a random variable we can use it to predict
the variance of that variable. This is done using expectation values, as

V(X) = E[X2]-{E[X]}2
where
E[X] =∑𝑥 𝑥𝑝(𝑥) if X is discrete

=∫ 𝑥𝑓(𝑥)𝑑𝑥 if X is continuous

E[X2] =∑𝑥 𝑥 2 𝑝(𝑥) if X is discrete

=∫ 𝑥 2 𝑓(𝑥)𝑑𝑥 if X is continuous

The X defined in the previous section has the following variance


E[X] = 2P(X = 2) + 3P(X = 3) + 4P(X = 4) _ _ _ + 12P(X = 12)
=7
E[X2] = 22P(X = 2) + 32P(X = 3) + 42P(X = 4) _ _ _ + 122P(X = 12)
= 4[1/36]+9[2/36]+16[3/36]+25[4/36]+36[5/36]+49[6/36]+
64[5/36]+81[4/36]+100[3/36]+121[2/36]+144[1/36].
=[4+18+48+100+180+294+320+324+300+242+144]/36=1974/36
V(X) = E[X2]-{E[X]}2
= 1974/36 - [72] = [1974-1764]/36 = 210/36.

Note: If X is continuous we do the same process where we replace ∑ by ∫ .

DISCUSS ABOUT DISCRETE DISTRIBUTIONS:


Trial: Any single performance of a random experiment.
Bernoulli Trial : Any trial which has exactly 2 possible outcomes.

1. BERNOULLI:
Bernoulli trials are trials with 2 outcomes, success and failure, with
 A coin is tossed
 A die is tossed
 We write an examination

Probabilities p and q=1-p respectively. Its probability mass function is given by,
p x =1
p ( x) = 
q x=0
E(X) = p
VAR(X) = p(1-p)=pq .
2. BINOMIAL:
The random variable X denoting the number of successes in a fixed Number of
independent Bernoulli trials is called a binomial random variable and its distribution is Binomial
distribution as defined below
p( x) = nCr p r q n − r
E(X)=np
VAR(X)=np(1-p)=npq.

Example: A bag contains 50 balls of which 35 are of red colour and15 are black. 5times a ball is
randomly selected , colour is noted and replaced. Find the probability that 2 times black balls
are selected.
Solution:
Here every time we draw a ball it is a Bernoulli trial, as we have only 2 possibilities.
So, n=5; p=15/50; q=1-p=35/50; x=2.
P(X=2)={nCxpxqn-x
= 5C2 15/502 35/505-2
GEOMETRIC:
The random variable X denoting the number of Bernoulli trials required to achieve the first
success is called a geometric random variable and its distribution is geometric distribution.

x−1
P(X=x) = {pq x = 1, 2, 3 … … .
0
Example: A bag contains 50 balls of which 35 are of red colour and15 are black. A ball is
randomly selected, if it is red it is replaced and again we select and continue till we get a black
for the first time. Find the probability that we need to select 7 times before black balls is
obtained.
Solution:
Here every time we draw a ball it is a Bernoulli trial, as we have only 2 possibilities.
So, x=7; p=15/50; q=1-p=35/50;
15 357−1
P[X=7]=50 50

4. POISSON:
The random variable X whose pmf is,
−λ x
P(X=x) = {(e λ ) / x! x = 1, 2, 3 … … .
0
E(X) = VAR(X) = λ.
Example: A bag contains 50 balls of which 35 are of red colour and15 are black. 20 times a ball
is randomly selected , colour is noted and replaced. Find the probability that 2 times black
balls are selected.
Solution:
Here every time we draw a ball it is a Bernoulli trial, as we have only 2 possibilities.
So, n=20; p=15/50; λ=np=6; x=2.
(e−λ λx ) (e−6 62 )
P(X=x) = { =
x! 2!
 A bag contains 50 balls, 20 blue and 30red. Four balls are taken one after another with
replacement. Probability of getting 2 from each category?.
Binomial
 A bag contains 500 balls, 200 blue and 300red. Four balls are taken one after another
with replacement. Probability of getting 2 from each category?
Poisson
 A bag contains 50 balls, 20 blue and 30red. Balls are taken one after another with
replacement until we get a blue one. Probability of getting blue in fourth selection?
Geometric
Ex:
 Cards Chosen from a pack.
 Number of defective bulbs.
 Number of rainy days.
 Number of breads from a bakery with two types.
 Number of students who understood Binomial.

DISCUSS ABOUT CONTINUOUS DISTRIBUTIONS:


(a) UNIFORM:
A random variable X is uniformly distributed on the interval ( a, b) if its pdf is given by,

1
a ≤ x ≤ b
f(x) = {b−a
0 𝑒𝑙𝑠𝑒
Its cdf is,
0 𝑥<𝑎
(x−a)
F(x) = {(b−a) a≤x≤b {0
1 𝑥>𝑏
E(X) = (a+b)/2
V(X)=(b-a)2/12

Example:
If a wheel is spun and then allowed to come to rest, the point on the circumference of the
wheel that is located opposite a certain fixed marker could be considered the value of a random
variable X that is uniformly distributed over the circumference of the wheel. One could then compute
the probability that X will fall in any given arc.
If we assume that it is uniform in the interval[3,6], we can obtain,

Average point of outcome, E[X]= [a+b]/12 = [3+6]/12=9/12=3/4.


Variance var[X]= [b-a]2/12= [6-3]2/12=6/12=1/2.

2. EXPONENTIAL:
A Random variable X is said to be exponentially distributed if its pdf is given by,
f(x) = { λ e-λx x≥0
{0 otherwise
Where λ – parameter.
f(x) = { 0 x<0
{ 1- λ e -λx x≥0
E(X) = 1/λ.
V(X) = 1/λ2.
Exponential distribution is useful in representing lifetime of items, model interarrival times when
arrivals are completely random and service times which are highly variable.
Exponential distribution has a property called memory less property given by,
P(X > s + t / X > s) = p(X > t)
This is why we are able to use exponential to model lifetimes.

Example:
Let us assume that a company is manufacturing burettes whose lifetime is assumed to be
exponential with average life, 950 days. What is the probability that it is in working condition for up
to 1000 days.

Solution:
It is given that , X= Lifetime of the burette , is exponential with
average life 950 days i.e λ=950.
1000
P[life time is up to 1000 days] = P[0<X<1000] = ∫0 λe−λx
1000
=∫0 950 e−950x
= 950 [e−950x /−950]1000
0 .

3. NORMAL:
A normal variable X with mean µ( -∞ < µ < ∞ ) and variance σ2> 0 has a normal
distribution if its pdf is,
f(x) = ( 1/√2π ) exp [ -1/2 ( x-µ/σ)2 ] -∞ < x < ∞
A normal distribution is used when we are having a sum of many random variables. A
normal random variable with µ = 0 and σ = 1 is called a standard normal r.v. Its curve is
symmetrically distributed about the average µ = 0.
We Standardize a normal distribution by
 Z=[X-µ]/sigma

 Which will give us the pdf


Example:
Let us assume that heights of students in II M.Pharm is normally distributed with an average of
165 cm and a standard deviation of 10 cms. What is the probability that a student’s height is
less than 175 cms.
Solution:

Let, X= Height of students in II M.Pharm.


It is normal with, mean µ= 165; standard deviation σ=10.
P[ a student’s height is less than 175 cms]=P[-∞<X<175]

First, we should convert X into Z by


Z= x- µ/ σ.
We have x=175, µ= 165; σ=10.
Z= 175- 165/ 10 =1.
So when X=175; Z=1 and so
P[-∞<X<175] = P[-∞<Z<1]= P[-∞<Z<0]+ P[0<Z<1].

The Normal distribution


(mean m, standard deviation s)
( x−m ) 2
1 − 2
f ( x) = e 2s
2s

=0.5+0.34 = 0.84.
Note:
1. The same question may have the following variations:

P[ a student’s height is more than 175 cms]=P[175<X<-∞]


= P[0<X<-∞]- P[0<X<175] =0.5- table value
P[ a student’s height is between 165 and 175 cms]=P[165 < X <175]
=P[0 < X <175]- P[ 0< X <165]=table value for 175 – table value for 165
The Gamma distribution
Let the continuous random variable X have
density function:
    −1 −  x
 x e x0
f ( x ) =   ( )
 x0
 0

Then X is said to have a Gamma distribution


with parameters  and .

If several independent exponentially distributed random variables with same parameter βѲ are
added we obtain the sum as a gamma r.v with parameters β and Ѳ.

4. ERLANG:

Erlang distribution is a gamma distribution with β = k, an integer. Let there be k stations,


a customer should pass through to get his service completed and only one customer is allowed at a
time in the entire station, where time in every station has an exponential distribution with
parameter kѲ. Then the service as a whole is Erlang distributed with β = k.

E(X) = 1/Ѳ.

V(X) = 1/kѲ2.

6. WEIBULL:

The r.v has a Weibull distribution. If its pdf has the form,

f(x) = { β/Ѳ (x-v/ α )β-1 exp [ -( x-v / α )β ] x≥v

{0 else

With three parameters α → scale parameter.

β → shape parameter.
v → location parameter.

E(X) = v + α Г [ ( 1/β ) + 1]

V(X) = α2 [ Г [ ( 2/β ) + 1 ] – [ Г ( 1/β ) + 1 ]2 ].


The kth moment of X.

mk = E ( X k )

  xk p ( x ) if X is discrete
 x
= 
  x k f ( x ) dx if X is continuous
-

the kth central moment of X

mk0 = E ( X − m ) 
k

 
  ( x − m ) p ( x)
k
if X is discrete
 x
= 
  ( x − m )k f ( x ) dx if X is continuous
- 

where m = m1 = E(X) = the first moment of X .


Rules:
1. E c = c where c is a constant

2. E  aX + b = aE  X  + b where a, b are constants

3. var ( X ) = m20 = E ( X − m ) 
2

 
= E ( X ) −  E ( X ) = m2 − m12
2 2

4. var ( aX + b ) = a2 var ( X )

Examples:
The data that follows are 55 smiling times, in seconds, of an eight-week old baby.

 What is the probability that a randomly chosen eight-week old baby smiles between 2
and 18 seconds?
Solution:
 Let us consider the baby smiling example. From the data we can calculate the mean
smiling time to be 11.49 and 6.23 seconds.
 Since this is an entirely spontaneous activity which could be termed completely random,
we can use Uniform distribution to approximate this.
 So, we need to form the uniform distribution.
 From the table, the smallest value is 0.7 and the largest is 22.8. So, if we assume that
smiling times, in seconds, is uniformly distributed between(0,23), then by the definition
of the uniform distribution
 1
 a xb
f ( x) =  b − a 18 18
0 1 18 − 2 16
 else So, P(2<X<18)=  f ( x)dx =  23 dx = 23
=
23
1 2 2
f ( x) =
23

EX2:
 If jobs arrive every 15 seconds on average  = 4 per minute, what is the probability of
waiting less than or equal to 30 seconds, i.e 0.5 min?
Solution:
 If X=Waiting time between arrivals
 Then P(waiting less than or equal to 0.5 min)=
0.5
−4 x
P( X  0.5) =  4e dx = 1 −e − 2 = 0.86
0
Ex3:
Accidents occur with a Poisson distribution at an average of 4 per week. i.e.
 1. Calculate the probability of more than 5 accidents in any one week
 2. What is the probability that at least two weeks will elapse between accident?
Solution:
(i) X=number of accidents
 Poisson with mean 4
 P(X>5)=1-P(X<=5)
e−4 40
 =1-{P(X=0)+ P(X=1)… P(X=5)} P( X = 0) = = e− 4
0!
(ii) T=Time between occurrences

 Exponential with mean 4 P (T  2) =  4e − 4t dt =e −8 = 0.00034
2
Ex4:
 Find the area under normal curve that lies [i] to the right of z=1.84 and [ii]
between z=-1.97 and z=0.86
 X is normal with mean 50 and sd=10
 Prob , bet 45 and 62.

Ex6:
 Suppose the diameter of a certain car component follows the normal distribution with X
N(10; 3). Find the proportion of these components that have diameter larger than 13.4
mm. Or, if we randomly select one of these components and the probability that its
diameter will be larger than 13.4 mm.
Solution:
 Suppose the diameter of a certain car component follows the normal distribution with X
N(10; 3).
 P(X>13.4)=P(X-10>13.4-10)=
 X − 10   13.4 − 10 
P  = P 
 3   3 
= P( Z  1.13) = 1 − 0.8708 = 0.1292

Ex7:
 A bag of cookies is underweight if it weighs less than 500 grams. The filling process
dispenses cookies with weight that follows the normal distribution with mean 510
grams and standard deviation 4 grams.
 a. What is the probability that a randomly selected bag is underweight?
 b. If you randomly select 5 bags, what is the probability that exactly 2 of them will be
underweight?
 Solution:
 (a)=0.0062;
 (b) 0.0004
 [use binomial with probability obtained in (a).]

Ex8:
 The top 5% of applicants (as measured by GRE scores) will receive scholarships. If GRE ~
N(500, 1002), how high does your GRE score have to be to qualify for a scholarship?
 Solution:
 Z = (X - 500)/100
 P(Z ≥ z) = .05
 Note that P(Z ≥ z) = 1 - F(z) .
 If 1 - F(z) = .05, then F(z) = .95.
 If 1 - F(z) = .05, then F(z) = .95.
 Looking at Normal Table , F(z) = .95 for z = 1.65 (approximately).
 Hence, z = 1.65.
 To find the equivalent x, compute
 x = (z * 100) + 500 = (1.65 * 100) + 500 = 665.
 Thus, your GRE score needs to be 665 or higher to qualify for a scholarship.

EX9:
 If a random variable X has the following moment-generating function :
20
3 1 
M X (t ) =  + et  Find the pmf of the distribution
4 4 

Solution:
 Comparing the given moment generating function with that of a binomial random
variable, we see that X must be a binomial random variable with n = 20 and p = ¼.
Therefore, the p.m.f. of X is:
1 x 3 20 − x
p( x) = 20C x
4 4

Ex10: Let X be a random variable with PDF given by

fX(x)={cx20|x|≤1otherwise

a. Find the constant c.


b. Find EX and Var(X).
c. Find P(X≥1/2).

Soln:

(a)For f to be a pdf,  f ( x) dx = 1 which for the given function is
−
1
 cx
2
dx = 1
−1 giving c=3/2.
2
c =1
3
1
3  x4 
1 1
3
(b) E(X)=  xf ( x) dx =  x x 2 dx =   =0
2 2  4 
−1 −1 −1
1
3  x 5 
1 1
3 2 3
  2
2 2
and so, Var(X)=E(X2)-[E(X)]2.= E(X2)= x f ( x ) dx = x x dx = =
2  5  5
−1 −1 −1
1 1
3 2 7
(c) P(X>1/2)=  f ( x) dx = 2x dx = .
1 1
16
2 2

Ex11: If X is a discrete r.v with pmf


 X : -2 -1 0 1 2
P[x]: k 2k 3k k 3k Find the value of k.

Soln: If P[X] is the pmf of X, then it will satisfy  P( x) = 1


x

 P( x) = k + 2k + 3k + k + 3k = 1
So, x
10k = 1 giving k = 0.1
Soln:
We know that th epmf of a geometric r.v is
Ex12: Let X∼Geometric(p). Find E[1/2X].
x−1
P(X=x) = {pq x = 1, 2, 3 … … .
0
k −1
1   1 
1 k −1 p  q p 1 p 1 p
So, E   =  PX (k ) =  pq =   = = =
 2 x  k =12 x x 2 k =1 2  2 q 2 (1 − p) 1 + p
k =12 1− 1−
2 2

Joint Distributions(Bi-variate)

Bivariate Random Variables:


 Throw a die once, let X be the outcome. Throw a fair coin X times and let Y be the
number of heads. Find P(x,y)?
Solution:
 The possible X values range from 1 to 6.
 The possible Y values range from 0 to 6.
 When X = 2 (e.g.) Y = 0,1 or 2.
 P(2,0) = P(X=2,Y=0) = P(X=2)P(Y=0|X=2)
=(1/6)(1/2)2=1/24.

 Joint Probability Distributions


 Discrete P( X = xi , Y = y j ) = pij  0
satisfying  p
i j
ij =1

 Continuous
f ( x, y )  0 satisfying  state space
f ( x, y ) dxdx = 1

Example-Joint Distribution:
 Air Conditioner Maintenance
 A company that services air conditioner units in residences and office blocks is
interested in how to schedule its technicians in the most efficient manner
 The random variable X, taking the values 1,2,3 and 4, is the service time in hours
 The random variable Y, taking the values 1,2 and 3, is the number of air
conditioner units

 Definition: Two random variable are said to have joint probability density function f(x,y)
if

Definition: Let X and Ydenote two random variables with joint probability density function f(x,y)
then
themarginal density of X is

fX ( x) =  f ( x, y ) dy
−

themarginal density of X is 
fY ( y ) =  f ( x, y ) dx
−
 Let X and Ydenote two random variables with joint probability density function f(x,y)
and marginal densities fX(x), fY(y) then
 the conditional density of Y given X = x
 f ( x, y )
fY X ( y x ) =
 fX ( x)
 conditional density of X given Y = y
f ( x, y )
fX Y ( x y) =
fY ( y )

Bivariate-Example:
 If X and Y are jointly distributed as given by

 Find c and their marginal pdf’s

Covariance:
 The covariance of two random variables X and Y is

s XY = E( X − m X )(Y − mY )
Which for application purposes can be simplified into
s XY = E( XY ) − m X mY

 Two ball pens are selected at random from a bag containing 3 blue, 2 red and 3 green
pens. If X is the number of blue pens selected and Y is the number of red pens selected,
find
 (i)the joint distribution of X and Y
 (ii)P[(X,Y)εA)] where A is the region {(x,y), x+y<=1}
 (iii) Covariance of X and Y

Sampling

Population is the total objects under a statistical study. For example, if we study
M.PharmA , our population is M.Pharm A. If we study Pharmacy, the entire lot of Pharmacy
students is our population. So , our interested range of objects is what is technically called a
population. So, a population can be finite or infinite, based on whether we can determine every
object or not.

We can study a population in 2 different ways,

Enumeration- where we collect data from every object in the population.

Sampling- Here we collect data and analyze only a part of it, the sample.

So, sampling is a process where, to understand the characters of a population, we study


only a part of it called the sample and use the answers obtained to infer about the population.

Sampling process is justified by the following laws:

.Law of statistical regularity: when a sample of reasonably large size is selected from a
population, it is more likely to posses the character of the population.

. Law of large numbers: The larger the size of the sample, the accurate the results are
going to be in representing the population.

The procedures employed in selecting a sample from the population are called sampling
techniques. They are broadly classified into two types, Random and Non-random.

Universe:

The word universe as used in statistics denotes the aggregate from which the sample is to
be taken.

Ex:- If in the year 1999, there are 2,00,000 students in Delhi University and a sample of 5,000
students is taken to study their attitude towards semester system, then 2,00,000 constitutes the
universe and 5,000 the sample size.

The universe may be either finite (or) infinite.

Finite Universe:-

A finite universe is one in which the number of item is determinable such as number of
students in Delhi University (or) in India.

Infinite Universe:-

An infinite universe is that in which the number of items cannot be determined, such as
the number of stars in the sky.

Theoretical basis of sampling:-


On the basis of sample study we can predict and generalize the behaviour of mass
phenomena. This is possible because there is no statistical population whose elements would
vary from each other without limit.

For example, wheat varies to a limited extent in colour, protein, content, length, weight
etc, it can always be identified as wheat, urly apples of the same tree may vary in size, colour,
taste, weight, etc, but they can always be identified as apples.

Methods of sampling:-

The various methods of sampling can be grouped under two broad heads:

1. Probability sampling (or) random sampling


2. Non-probability sampling (or) non random sampling

Probability sampling:-

Probability sampling methods are those in which every item in the universe has a known
chance (or) probability of being chosen for the sample. This implies that the selection of sample
item is independent of the person making the study.

a. simple or unrestricted random sampling


b. restricted random sampling:

i. stratified sampling
ii. systematic sampling and
iii. cluster sampling

Non – Probability Sampling:-

Non – Probability Sampling methods are those which do not provide every item in the
universe with a known chance of being included in the sample.

1. Judgment sampling
2. convenience sampling and
3. quota sampling

Sampling Methods

Non Probability Sampling methods Probability Sampling methods

1. Judgement Sampling 1. Sampling (or) unrestricted


Random samples.
2. Convenience Sampling 2. Restricted random samples
3. Quota Sampling (i) Stratified sampling
(ii) Systematic sampling
(iii) Cluster sampling

Non-probability sampling methods

1. Judgment sampling:

This method of sampling the choice of sample items depends exclusively on the
judgment of the investigator.

For example, if sample of ten students is to be selected from a class of 60 for analyzing
the spending habits of students, the investigator would select 10 students who, in his opinion, are
representative of the class.

2. Quota sampling:

In a quota sample, quotas are setup according to some specified characteristics such as so
many in each of several income groups, so many in each age, so many with certain political or
religious affiliations and so on. Each interviewer is then told to interview a certain number of
persons which constitute his quota within the quota the selection of sample items depends or
personal judgment.

For example, in a radio listening survey, the interviewers may be told to interview 500
people living in a certain area and that out of every 100 persons interviewed 60 are to be
housewives, 25 farmers and 15 children under the age of 15 within these quotas the interviewer
is free to select the people to be interviewed.

Convenience sampling:-

A convenience sample is obtained by selecting convenient population units. The method


of convenience sampling is also called the chunk. A chunk refers to that fraction of the
population being investigated which is selected neither by probability nor by judgment but by
convenience.

Convenience sampling is often used for making pilot studies, questions may be tested and
preliminary information may be obtained by the chunk before the final sampling design is
decided upon.

PROBABILITY SAMPLING METHODS

a. Simple (or) unrestricted random sampling:


Simple random sampling refers to that sampling technique in which each and every unit
of the population has an equal opportunity of being selected in the sample. In simple random
sampling which items get selected in the sample is just a matter of chance-personal bias of the
investigator does not influence the selection.

b. Restricted random sampling:

1. Stratified Sampling:-

This process first stratifies(groups) the population into different stratas(groups) based on
a known character. Then it employs one of the following methods to get the sample.

. Proportional: Here the number of objects selected from a strata(group) is the same
proportion as the strata is to the population.

Non-proportional: Here we select the same number of elements from all the
stratas(groups).

For example let us say we have to select 100 students from a college having 5000 students. We
first stratify the college into 4 classes, say, based on their year. Let us assume that

I year- 500 students

II year- 1000 students

III year – 2000 students

IV year – 1500 students.

Proportional:

Strata I(I year): This contains 10 % of the population. So select 10% of 100= 10 students
from this strata.

StrataII(II year): This contains 20 % of the population. So select 20% of 100= 20 students
from this strata.

Strata III(III year): This contains 40 % of the population. So select 40% of 100= 40
students from this strata.

Strata IV(IV year): This contains 30 % of the population. So select 30% of 100= 30
students from this strata.

Non-Proportional:

We need 100 students and we have 4 stratas. So select 100/4=25 students from each
class.
2. Systematic sampling:-

This method is popularly used in those cases where a complete list of the population from
which sample is to be drawn is available. The list may be prepared in alphabetical, geographical,
numerical or some other order. The items are serially numbered. The first item is selected at
random generally by following the lottery method. Subsequent items are selected by taking
every Kth item from the list where ‘K’ refers to the sampling interval or sampling ratio, (ie) the
ratio of population size of the size of the sample.

Symbolically K=N/n where K=sampling interval


N=universe size
N=sample size

Ex:- In a class there are 96 students with roll nos. from 1 to 96. It is desired to take sample of 10
students. Use the systematic sampling method to determine the sample size.

K=N/n=96/10=9.6=10

From 1 to 96 roll nos. the first student between 1 and K, (ie) 1 and 10, will be selected at
random and then we will go on taking every Kth student. Suppose the first student comes out to
be 4th. The sample would then consist of the following roll nos. 4,14,24,34,44,54,64,74,84,94.

3. Multi-Stage sampling (or) Cluster sampling:-

Under this method, the random selection is made of primary, intermediate and final units
from a given population or stratum. There are several stages in which the sampling process is
carried out. At first, the first state units are sampled by some suitable method, such as simple
random sampling. Then a sample of second stage units is selected from each of the selected first
stage units, again by some suitable method which may be same as (or) different from the method
employed for the first stage units. Further stages may be added as required.

Ex:- suppose we want to take a sample of 5,000 households from the state of U.P.

At the first stage, the state may be divided into a no. of district and a few districts selected
at random.

At the second stage, each district may be subdivided into a no. of villages and a sample of
villages may be taken at random.

At the third state, a number of households may be selected from each of the villages
selected at the second stage.

2. A sample of 10,000 students from Delhi University.


We may take colleges-primary units as the first stage then draw departments as the
second stage, and chase students as the third and cast stage.
STATISTICAL INFERENCE

Statistical Inferences refers to the process of selecting and using a sample statistic to draw
inference about a population parameter. It is concerned with using probability concept to deal
with uncertainly in decision making.

Statistical Inference treats two different classes of problems namely hypothesis testing
and estimation.
Hypothesis Testing:-
Hypothesis Testing is to test some hypothesis about the parent population from which the
sample is drawn. It must be noted test of hypothesis also includes test of significance.
Estimation:-
The estimation theory deals with defining estimators for unknown population parameters
on the basis of sample study.
Parameter and Statistics:-
The statistical constants of the population, namely mean m, variance σ2 which are usually
referred to as parameters.
Statistical measures computed from sample observations alone eg. mean (𝑥̅ ), variance
(S2) etc. are usually referred to as statistic.
Sampling Distribution:-
If we select a definite no of independent random samples from a given population and
calculate some statistic like mean and S.D. from each sample we shall get series of values of the
statistics, these values obtained from different samples put in the form of frequency distribution
is called as “Sampling Distribution”.
Ex: if we draw 100 sample from a given population and calculate their mean and S.D, we
shall get a series of 100 means and S.D’s to form a sampling distribution.
Standard Error:-
The standard deviation of sampling distribution of a statistic is known as its standard
error and it is denoted by S.E.
Null and Alternative Hypothesis:-
For applying the test of significance we first set up a hypothesis – a definite statement
about the population parameter. Such a hypothesis is usually a hypothesis of no. difference and
it is denoted by H0, null hypothesis.
Any hypothesis which is complementary to the null hypothesis is called an alternative
hypothesis, usually denoted by H1.
If we want to test the null hypothesis that the population has a specified mean mo (say)
(i.e) H0: m=mo, then the alternative hypothesis would be

i. H1 : m≠mo, m>mo (or) m<mo (two tailed)


ii. H1 : m≠mo ,m>mo [one tailed or Right tailed]
iii. H1 : m≠mo ,m<mo [one tailed or Left tailed]
Ex: if we want to test whether the veg. and non-veg peoples are equally populated in a village.
Null hypothesis H0: Veg and non-veg people are equally populated in the village.
Alternative hypothesis H1 : Veg and non-veg peoples are not equally populated in the village.

Errors in sampling:-
The main objective in sampling theory is to draw valid inferences about the population
parameters on the basis of the samples results. In practice we decide to accept (or) to reject the
lot after examining a sample from it. As such we have two types of errors.
(i) type I error and (ii) type II error
Type I Error:-
A type I error is committed by rejecting the null hypothesis when it is true. The
probability of committing a type I error is denoted by α,where
α = prob (type I error)
= prob. (rejecting H0/when H0 is true)
Type II Error:-
A type II error is committed by accepting the null hypothesis is when it is false. The
probability of committing a type II error is denoted by β.
Where β = prob. (type II error)
= prob. (accepting H0/when H0 is false)

Accept H0 Reject H0
H0 is true Correct Decision Type I error
H0 is false Type II error Correct Decision

Critical Region:-
A region corresponding to a statistic in the sample space S which lead to the rejection of
H0 is called Critical Region (Or) Rejection Region.
Those region which to the acceptance of H0 give us a region called acceptance region.
Level of Signifance:-
The probability α that a random value of the statistic ‘t' belongs to the critical region is
known as the level of significance. In other words, level of significance is the size of the type I
error. The level of significance usually employed in testing of hypothesis are 5% and 1%.
One tailed and two tailed tests:-

One tailed test:-


A test of any statistical hypothesis where the alternative hypothesis is one tailed (right
tailed (or) left tailed) is called a one tailed test.
Thus in a one tailed test, the rejection region will be located in only one tail which may
be depending upon the alternative hypothesis formulated.
We assume that the null hypothesis
H0 :m=m0 against the alternative hypothesis
H1 :m>m0 (right tailed)
H1 : m<m0 (left tailed) is called one tailed test
Two tailed test:-
In a two tailed test the rejection region is located in both the tails.
In a test of statistical hypothesis where the alternative hypothesis is two tailed, we assume
that the null hypothesis.
H0 :m=m0
H1 :m≠mo [m>m0 (or) m<m0 ]
Ex:- Consider two population brands of bulbs one manufactured by routine process (mean m1)
and the other manufactured by new technique (mean m2).
If we want to test if the bulbs differ significantly then the hypothesis is
H0 :m1=m2 and the alternative hypothesis will be
H1 :m1≠m2 this give us a two tailed test
Suppose if we want to test if the bulbs produced by new process (m2) have higher average
life than those produced by standard process (m1) then we have
H0 :m1=m2 and
H1 :m1<m2 this give us a left tail test
If we want to test whether the product of routine process (m1), have higher average life
than those produced by standard process (m2) then we have
H0 :m1=m2 and
H1 :m1>m2 this give us a right tail test

Procedure for testing of hypothesis :-


i. Set up the null hypothesis.
ii. Choose the appropriate level of significance (either 5% or 1% level)
iii. Compute the test statistic
iv. We compare the calculated value and tabulated value.
If C.V. < T.V, H0 is accepted at 5% or 1%
C.V. > T.V, H0 is rejected at 5% or 1%

Test of Significance
Let us now discuss the various situations where we have to apply different tests of
significance. For the sake of convenience and clarity these situations may be summed up under
the following 3 heads:
1. test of significance for attributes
2. test of significance for variables (large samples)
3. test of significance for variables (small samples)

Test of significance for attributes:-


In case of attributes we can only find out the presence (or) absence of a particular
characteristic. For example, in the study of attribute ‘Literacy’ a sample may be taken and people
classified as literates and illiterates.
The various tests of significance for attributes are discussed under the following heads:
i. Tests for No. of successes
ii. Test for proportion of successes
iii. Test for difference between proportions

Tests for No. of success Test for proportion of Test for difference
successes between proportions
i. Set up hypothesis H0:H1 i. Set up hypothesis H0:H1 i. Set up hypothesis H0:H1

II. S.E. of No. of ii. S.E. = √PQ S.E.=


successes= √npq n ii. √PQ (1/n2 + 1/n2)
Where n=sample size
P=prob. Of success
q=1-p

iii. |Z|=difference iii. |Z| = difference iii. |Z|= difference


S.E S.E S.E
=p-P |Z|= P1 - P2
√PQ/n √PQ(1/n1 + 1/n2)
where where P=n1 P1+n2 P2
p=sample proportion n1+n2
P=population proportion
n=sample size Q=1-P
Q=1-P P1=sample proportion of 1st
samples
iv. table value P2=sample proportion of 2nd
v. result samples

iv. table value


iv. table value
v. result
v. result
5% table value is 1.96
1% table value is 2.58

LARGE SAMPLES
If the size of the sample n>30, then the sample is called large samples.
There are 3 important test to test the significance of large samples.
i. test of significance for single mean
ii. test of significance for difference of mean
iii. test of significance for difference of S.D
Test of significance for Difference between two Difference between two S.D
single mean mean
i. Hypothesis setting i. Hypothesis setting i. Hypothesis setting

ii. test statistic ii. test statistic ii. test statistic


z=𝑥̅ -m z=𝑥̅ 1-𝑥̅ 2 z=S1-S2
σ/√n √s12+s22 √S12+S22
(or) n1 n2 2n1 2n2
z=𝑥̅ -m (or) (or)
s/√n z=σ1-σ2
where x=sample mean z=𝑥̅ 1-𝑥̅ 2 √σ12+σ22
m=pop. Mean √σ12+σ22 2n1 2n2
σ=prop. S.D. n1 n2
s=sample S.D.
n=sample size

iii. table value iii. table value

iv. result iii. table value iv. result

iv. result

5% table value is 1.96


1% table value is 2.58

SMALL SAMPLES

Defn:
When the size of the sample (n) is less than 30, then the sample is called a small sample.
The following are some important tests for small samples.
i. student’s t-test
ii. f – test
iii. x – test
Degrees of Freedom:-
Degrees of freedom is the no. of independent observations in a set.
By degrees of freedom we mean the no. of classes in which the values can be assigned
arbitrarily (or) at will without violating the restrictions (or) limitations placed.
Degrees of freedom = no. of groups – no. of constraints
Student’s t-Distribution:-
Defn:- The t-distribution is commonly called student’s t-distribution (or) simply student’s
distribution.
Single mean Difference between Difference between Testing for observed
two mean two mean (dependent correlation
(independent samples) or paired t- coefficient
sample) test
i.Hypothesis setting i.Hypothesis setting i.Hypothesis setting i.Hypothesis setting

ii. t=𝑥̅ -m (or)𝑥̅ -m ii. t=𝑥̅ 1-𝑥̅ 2 ii. t=đ √n ii. t=r √n-2
S/√n σ√n S√1/n1+1/n2 S √1-r2
where đ=Ʃd
n
S=√Ʃ(d-đ)2
n –1

iii. table value iii. table value iii. table value iii. table value
(D.O.F=n-1) (D.O.F=n1n2-2) (D.O.F=n-1) (D.O.F=n-2)

iv. result iv. result iv. result iv. result

The t-distribution is used when sample size is 30 (or) less and the population standard
deviation is unknown.
The t-statistic is defined as:
t=𝑥̅ -m where S=√Ʃ(x-𝑥̅ )2
S/√n n-1

CH1 – SQUARE TEST


Defn:-
The ψ2-test is one of the simplest and most widely used non-parametric tests in statistical
work. The quantity x2 describes the magnitude of the discrepancy between theory and
observation. It is defined as:
(𝑂−𝐸)2
𝝌2 = ∑ 𝐸

Where O – observed frequency


E – expected frequency
𝝌2-test as a test of goodness of fit:-
𝝌2-test is very popularly known as test of goodness of fit for the reason that is enables us
to ascertain how appropriately the theoretical distribution such as binomial, Poisson, normal etc.
fit empirical distributions.
Karl Pearson developed a test for testing the significance of discrepancy between
experimental values and the theoretical value obtained under some theory (or) hypothesis. This
(𝑂−𝐸)2
test is known as x2-test of goodness of fit. Karl Pearson proved that the statistic 2= ∑ 𝐸
where O – observed frequency E=expected frequency
Note:
If the data is given in a series of ‘n’ number then d.o.f. = n-1
In the case of binomial distribution d.o.f. n-1
In the case of poisson distribution d.o.f. n-2
In the case of normal distribution d.o.f. n-3
Chi-square test for independence of attributes:-
Defn: Literally, an attribute mean a quality (or) characteristic.
Ex: Drinking, smoking, honesty etc.
An attribute may be marked by its presence (or) absence in a no. of a given population.
2x2 contingency table:-
Let us consider two attributes A and B. A is divided into two classes and B is divided in
two classes. The various cell frequencies can be expressed in the following table known as 2x2
contingency table.

A a b a+b
B c d c+d
a+c b+d N

Expected frequency= row total x column total


N

The expected frequency are given by


(𝑎 + 𝑐)(𝑎 + 𝑏) (𝑏 + 𝑑)(𝑎 + 𝑏)
𝑁 𝑁
(𝑎 + 𝑐)(𝑐 + 𝑑) (𝑏 + 𝑑)(𝑐 + 𝑑)
𝑁 𝑁

Null Hypothesis. H0: Attributes are independent


Alternative Hypothesis.H1: Attributes are not independent
D.o.f. r=(c-1) (r-1)
Where c= no. of columns
r= no. of rows

Sampling distribution
 The distribution of a statistic(measure) obtained from repeated random sampling from a
population is a sampling distribution.
 If we draw 10 random samples from a data(population), calculate mean for each
sample, then the sequence of the 10 means is a sampling distribution of means.
Likewise for other measures.
 Consider , tossing a dice .
The mean of a single throw
is (1 + 2 + 3 + 4 + 5 + 6) / 6 = 3.5.
When we increase the sample size
to 2,3…, there is movement
towards normality
 Student's t-distribution (or simply the t-distribution) is any member of a family of
continuous probability distributions that arises when estimating the mean of a normally
distributed population in situations where the sample size is small and
X −m
population standard deviation is unknown. t =
s n
 Definition. If Z ~ N(0, 1) and U ~ χ2(r) are independent, then the random variable
follows a t-distribution with r degrees of freedom. We write T ~ t(r).

 The t distribution has the following properties:


 The mean of the distribution is equal to 0 .
 The variance is equal to v / ( v - 2 ), where v is the degrees of freedom (see last section)
and v >2.
 The variance is always greater than 1, although it is close to 1 when there are many
degrees of freedom. With infinite degrees of freedom, the t distribution is the same as
the standard normal distribution.
 The following are the important Applications of the t-distribution:
 Test of the Hypothesis of the population mean.
 Test of Hypothesis of the difference between the two means.
 Test of Hypothesis of the difference between two means with dependent samples.
 Test of Hypothesis about the coefficient of correlation.

t Test:
 The t test is based on the assumption that we are comparing means .
 The test statistic is defined by

 (X − X )
2
| X −m|
t= where S =
S/ n n −1

 Example:
 A manufacturer of a kind of bulb claims that his bulbs have a mean life of 25 months
with a standard deviation of 5 months. A random sample of 6 bulbs gave the following
lifetimes. Is the claim valid.
 24,26,30,20,20,18

 Step 1: Ho: There is no significant difference between the sample mean and the
population mean.
 Step 2: Dof = n-1 = 6-1 = 5; LOS = 5% =0.05.

 Calculated value t = 1.084


 Tabulated value t5,0.05 = 2.015
 Since CV < TV, accept Ho.
 Therefore, There is no significant difference between the sample mean and the
population mean
The Student’s t-test compares the averages and standard deviations of two samples to see if
there is a significant difference between them.
Difference between means

Paired t test:
 The t statistic for the paired t test is
d
t=
Sd / n
where
d = X1− X 2
d isthe average of deviation
S d is the s tan dard deviation of the deviation

Two sample t test-example:


 6 subjects were given a drug (treatment group) and an additional 6 subjects a placebo
(control group). Their reaction time to a stimulus was measured (in ms). We want to
perform a two-sample t-test for comparing the means of the treatment and control
groups.
 Control = (91, 87, 99, 77, 88, 91)
 Treat = (101, 110, 103, 93, 99, 104)

Chi square test Goodness of fit:


 The number of mistakes in a page for a sample of 6 random pages are 5,8,6,7,9,7. Are
they uniformly distributed.
 Solution:
 Step 1: Ho: The mistakes are uniformly distributed.
 Ho: There is no significant difference between the observed frequency and the expected
frequency.
 Step 2:
 Dof = n-1 = 6-1 = 5; LOS = 5% =0.05.
 Step5: Since the calculated value of χ2= 1.45 is less than the tabulated value χ2=11.07,
accept H0.
Therefore, The mistakes are uniformly distributed.

Chi square Independence of Attributes:


 Chi-square test for independence of attributes:-
 Defn: Literally, an attribute means a quality (or) characteristic.
 Ex: Sincere, honesty etc.
 An attribute may be marked by its presence (or) absence in a number of a given
population
 Two characters A and B are considered means, we ‘ll have a 2 x 2 contingency table of
observed frequencies
 Calculated value chi-sqr =38.39
 Tabulated value chi-sqr= 3.84
 Since CV> TV, Reject Ho.
 Therefore,
 Quinine is effective.

Tests for proportions:

Z test:
 Is the proportion of babies born male different from .50?In a sample of 200 babies, 96
were male.
 Is the proportion of babies born male different from .50?
 H0:p=.50
Ha:p≠.50
 pˆ=96/200=.48, p0=.50, n=200

 P(z<−0.566)=.2843
 Because this is a two-tailed test we must take into account both the left and right tails.
To do so, we multiply the value above by two. (p=.2843+.2843=.5686)
 Our p-value is .5686
 Since Our p-value [.5686]
 is > o.o5, accept
the null hypothesis

Tests for two proportions:


 For a right- or left-tailed test, a minimum of 10 successes and 10 failures in each group
are necessary (i.e., np≥10 and n(1−p)≥10). Two-tailed tests are more robust and require
only a minimum of 5 successes and 5 failures in each group.

 Pooled sample proportion. Since the null hypothesis states that P1=P2, we use a pooled
sample proportion (p) to compute the standard error of the sampling distribution.
p = (p1 * n1 + p2 * n2) / (n1 + n2)
 where p1 is the sample proportion from population 1, p2 is the sample proportion from
population 2, n1 is the size of sample 1, and n2 is the size of sample 2.
 Standard error. Compute the standard error (SE) of the sampling distribution difference
between two proportions.
 SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }
 where p is the pooled sample proportion, n1 is the size of sample 1, and n2 is the size of
sample 2.
 Test statistic. The test statistic is a z-score (z) defined by the following equation.
 z = (p1 - p2) / SE
 where p1 is the proportion from sample 1, p2 is the proportion from sample 2, and SE is
the standard error of the sampling distribution.
 P-value. The P-value is the probability of observing a sample statistic as extreme as the
test statistic. Since the test statistic is a z-score, use the Normal Distribution table to
assess the probability associated with the z-score.
Tests for two proportions :Example

 Suppose the Acme Drug Company develops a new drug, designed to prevent colds. The
company states that the drug is equally effective for men and women. To test this claim,
they choose a simple random sample of 100 women and 200 men from a population of
100,000 volunteers.
 At the end of the study, 38% of the women caught a cold; and 51% of the men caught a
cold. Based on these findings, can we reject the company's claim that the drug is equally
effective for men and women? Use a 0.05 level of significance.

 State the hypotheses. The first step is to state the null hypothesis and an alternative
hypothesis.
 Null hypothesis: P1 = P2
Alternative hypothesis: P1 ≠ P2
 Note that these hypotheses constitute a two-tailed test. The null hypothesis will be
rejected if the proportion from population 1 is too big or if it is too small.

 Using sample data, we calculate the pooled sample proportion (p) and the standard
error (SE). Using those measures, we compute the z-score test statistic (z).
 p = (p1 * n1 + p2 * n2) / (n1 + n2) = [(0.38 * 100) + (0.51 * 200)] / (100 + 200) = 140/300 =
0.467

SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }


SE = sqrt [ 0.467 * 0.533 * ( 1/100 + 1/200 ) ] = sqrt [0.003733] = 0.061

z = (p1 - p2) / SE = (0.38 - 0.51)/0.061 = -2.13


 where p1 is the sample proportion in sample 1, where p2 is the sample proportion in
sample 2, n1 is the size of sample 1, and n2 is the size of sample 2.

 Since we have a two-tailed test, the P-value is the probability that the z-score is less
than -2.13 or greater than 2.13.
 We use the Normal Distribution table to find P(z < -2.13) = 0.017, and P(z > 2.13)
= 0.017. Thus, the P-value = 0.017 + 0.017 = 0.034.
 Since the P-value (0.034) is less than the significance level (0.05), reject the null
hypothesis.
Estimation:
 In studying a population, estimation is a route and it is about estimating the parameter
values of the population using the statistic value of a sample.
 There are two types of Estimations
 Point Estimation and Interval Estimation.
 Point Estimates try to get a single value for the parameter whereas the Interval
estimation gets a Confidence Interval.
 Methods: Least squares and Maximum likelihood.

 Properties of Estimators:
 Unbiasedness: Expected value of estimator equals parameter.
 Consistency: Estimator approaches parameter as n gets larger.
 Efficiency: Between two estimators. One with smaller variance is efficient.
 Sufficiency: Conveys as much information as is possible about parameter from the
sample

Test for attributes:


 Based on attributes(qualities) we may have three types of tests
 Test for number of successes
 Test for proportion of successes
 Test for difference between proportions
Test for attributes Number of successes:

 Procedure:
 Step1: Form the Hypothesis
 Step2: Find the standard error
 Step3: Find Difference/S.E
 Step4:
 If step3 value <1.96, Accept H0, else reject H0 at 5% LOS.
 If step3 value <2.58, Accept H0, else reject H0 at 1% LOS.

Test for attributes


Number of successes-Example:
 A coin was tossed 400 times and head appeared 216 times. Test the hypothesis that the
coin is unbiased.
 Solution:
 Step1: Coin is unbiased.
 Step2: SE=sqrt(npq)[ as we are in Binomial distribution domain]
 = sqrt(400*1/2*1/2)=10
 Step3: Diff/S.E=(216-200)/10=1.6
 Step 4: As Diff/S.E is <1.96, accept H0 at 5% LOS.

Test for attributes


Proportion of successes:
 Two things are added as compared to number.
pq
 S.E=
n
 p is proportion of successes, n is sample size
 Confidence Limit/Confidence Interval for mean:
X  1.96 S .E [95% confidence interval ]
 CI:
X  2.58 S .E [99% confidence interval ]

Test for attributes


Proportion of successes-Example:
 A Manufacturer claims only 4% of apples of his are defective. From a random sample of
600, 36 are defective. Is his claim true?
 Solution:
 Step1: 4% apples are defective.
pq 0.96 * 0.04
 Step2: S .E = = = 0.008
n 600
X  1.96 S .E = 0.96  1.96 * 0.008
 Step3: 95% CI is
= 0.9443 to 0.9757
 Step 4: With n=600, the boundary [for good ones] is
 [0.9443*600, 0.9757*600]=[567,585]
 Since the number of defectives is [15,33] and 36 is outside, reject H0

Test for attributes


Difference of Proportion of successes:

n p +n p x +x
 The pooled proportion is p = 1 1 2 2 or p = 1 2
n1 + n2 n1 + n2
 Where x1, x2 are number of occurrences of two samples of sizes n1 and n2
 Standard error for diff of prop is
1 1 
 pq + 
S .E p1 − p2 =
 n1 n2 
 If (p1-p2)/S.E <1.96, accept at 5% LOS
 If (p1-p2)/S.E<2.58, accept at 1% LOS

Test for attributes


Difference of Proportion of successes-Example:

 Ina random sample of 1000 persons from a town A , 400 are consumers of wheat
whereas in another town B a random sample of size 800 had 400 consumed wheat. Is
there a significant difference in wheat consumption among the towns?
 Solution:
400 400
1000 * + 800 *
n p +n p 1000 800 = 4
 p= 1 1 2 2=
n1 + n2 1000 + 800 9
1 1  4 5 1 1 
 pq +  =
S .E p1 − p2 =  +  = 0.024
 n1 n2  9 9  1000 800 
 (p1-p2)/S.E=(0.4-0.5)/0.024=4.17 is >2.58,
 Hence reject H0 at 1% LOS
Correlation: Regression

Correlation:
 Measures the degree of linear association between two interval scaled variables analysis
of the relationship between two quantitative outcomes , e.g., height and weight,

Scatter plot:
Correlation
Karl Pearson’s correlation coefficient:
N  XY −  X  Y
 Karl Peason’s, r =
N  X 2 − ( X ) N  Y 2 − ( Y )
2 2

 r lies between -1 and 1.


 Values near 0 means no (linear)
 correlation and values near ± 1 means very strongcorrelation.

Correlation
Spearman’s correlation coefficient(ranks)
 Spearman's rank correlation coefficient or Spearman's rho, is a measure of statistical
dependence between two variables based on ranks or relative values
Regression:
 Regression follows correlation in identifying the causal relationship between the two
correlated variables.
 The dependence of dependent variable Y on the independent variable X.
 Relationship is summarized by a regression equation.
 y = a + bx
 a=intercept at y axis
 b=regression coefficient
 The line of regression is the line which gives the best estimate to the value of one
variable for any specific value of the other variable. Thus the line of regression is the line
of “best fit” and is Obtained by the principle of least squares.
 This principle consists in minimizing the sum of the squares of the deviations of the
actual values of y from their estimate values given by the line of best fit

 Procedure:
 Step1: Write the normal equations for the regression line y=mx+c as
 y = m x + nc
 xy = m x 2 + c x
 Step2: Form the regression table to get the values.
 Step3: Substitute the values in normal equations, solve them to find ‘m’ and ‘c’ to fit the
line
Test for correlation:
r
 The test for correlation is t with t = n−2
1− r2
 Where r is the correlation and DOF= n-2.
Correlation-Other Methods
Covariance/Concurrent deviation
Covariance based

Concurrent Deviation Method


C is no of positive signs
in dx dy column
m is no of pairs of observations.
Correlation
Concurrent Deviation Method
Example

2*2 − 7
rc =   = −0.65
7

Regression
The principle of least squares.
This principle consists in minimizing the sum of the squares of
the deviations of the actual values of y from their estimate
values given by the line of best fit
Regression
Procedure
Step1: Write the normal equations for the regression line
y=mx+c as
 
y = m x + nc
 xy = m x 2
+ c x

Step2: Form the regression table to get the values.

Step3: Substitute the values in normal equations, solve them to


find ‘m’ and ‘c’ to fit the line

Others:

Chebyshev’s Theorem, :Poisson Processes,: Simulation, :Central limit theorem

Poisson Process:

Is a counting process, independent increments, stationary increments

A stochastic process {N(t), t  0} is a counting process if N(t) represents the total


number of events that have occurred in [0, t]

Then {N(t), t  0} must satisfy:

N(t)  0; N(t) is an integer for all t

If s < t, then N(s)  N(t)

For s < t, N(t) - N(s) is the number of events that occur in the interval (s, t].

Is a counting process with independent and stationary increments

 A counting process has independent increments if, for any 0<s<t<u<v, N(t)-N(s)

is independent of N(v)-N(u).
That is, the number of events that occur in non overlapping intervals are independent
random variables.

▪ A counting process has stationary increments if , for any s < t, the distribution of N(t) –
N(s) depends only on the length of the time interval, t – s.

A counting process {N(t), t  0} is a Poisson process with rate l, l > 0, if

N(0) = 0

The process has independent increments

The number of events in any interval of length t follows a Poisson distribution with
mean lt (therefore, it has stationary increments), i.e.,

e − t ( t )
n

P  N ( t + s ) − N ( s ) = n = , n = 0,1,...
n!

Chebyshev’s theorem:

 Chebyshev’s theorem will show you how to use the mean and the standard deviation to
find the percentage of the total observations that fall within a given interval about the
mean.

 The fraction of any set of numbers lying within k standard deviations of the mean of
the within number
those numbers is at least 1-[1/k2], where k =
the standard deviation

 and k must be greater than 1


Example:

 Use Chebyshev's theorem to find what percent of the values will fall between 123 and
179 for a data set with mean of 151 and standard deviation of 14.

 Solution:

 We subtract 151-123 and get 28, which tells us that 123 is 28 units below the mean.

 We subtract 179-151 and also get 28, which tells us that 151 is 28 units above the mean.

 Those two together tell us that the values between 123 and 179 are all within 28 units
of the mean. Therefore the "within number" is 28.

 Use Chebyshev's theorem to find what percent of the values will fall between 123 and
179 for a data set with mean of 151 and standard deviation of 14.

 Solution:

 Within number =28

 So we find the number of standard deviations, k, which the "within number", 28,
amounts to by dividing it by the standard deviation:

 k=the within number/the standard deviation =28/14=2

 Use Chebyshev's theorem to find what percent of the values will fall between 123 and
179 for a data set with mean of 151 and standard deviation of 14.

 Solution:

 Within number =28; k=2



So now we know that the values between 123 and 179 are all within 28 units of the
mean, which is the same as within k=2 standard deviations of the mean. Now, since k >
1 we can use Chebyshev's formula to find the fraction of the data that are within k=2
standard deviations of the mean. Substituting k=2 we have:

 1−[1/k2]=1−[1/22]=1−1/4=3/4

 So 3/4 of the data(75%) lie between 123 and 179.

 Suppose µ = 39 and σ = 5, find the percentage of values that lie within 29 and 49 of the
mean.
29 ——————– 39 ——————– 49

 We just need to find k and there are 2 ways to do so.

 First , we notice distance between mean(39) and 49 is 10 and it is kσ. Therefore kσ=10
implies k=2.

For k=2, we can find that is contains 75% of values

Discrete Simulation:

Simulation is basically about mimicking a system to understand it.

We sort of observe the system through a set of variables called state Variables and when the
state variables change their values only at countable number of points in time, its called
Discrete Simulation.

Most business processes can be described as a sequence of separate, discrete, events. For
example, a truck arrives at a warehouse, goes to an unloading gate, unloads, and then departs.
To simulate this, discrete event modeling is often chosen.

Steps in Simulation:
 Key Aspects:

 Entities − These are the representation of real elements like the parts of machines.

 Relationships − It means to link entities together.

 Simulation Executive − It is responsible for controlling the advance time and executing
discrete events.

 Random Number Generator − It helps to simulate different data coming into the
simulation model.

 Results & Statistics − It validates the model and provides its performance measures.

Discrete Simulation-Queuing System:

 A queue is the combination of all entities in the system being served and those waiting
for their turn.
Central Limit Theorem:

 Given a sequence of random variables with a mean and a variance, the CLT says that the
sample average has a distribution which is approximately Normal, and gives the new
X 1, X 2,... Xn i.i.d with mean = m ; SD = s ;
mean and variance.
X n  N ( m , s 2 / n)

 Notice that nothing at all need be assumed about the P, CDF, or PDF associated with X,
which could have any distribution from which a mean and variance can be derived.

Example:

 The lifetime of a certain type of bulb for each plant of a company is a random variable
with mean 1200 hrs and standard deviation 250 hrs. Using central limit theorem, find
the probability that the average lifetime of 60 bulbs of the company exceeds 1250 hrs.

 Let Xi= Lifetime of bulbs from plant I

 Then is the mean of the lifetime of bulbs of the company. By CLT,


 
 X − 1200 1250 − 1200 
P[ X  1250] = P   
250   250   250  
X n  N (m , s / n ) X follows N (1200, )   60    
60     60  
60
= P[ z  ] = P[ z  1.55] = 0.06
5

Estimation:

In studying a population, estimation is a route and it is about estimating the parameter values of the
population using the statistic value of a sample.

Point Estimation-Methods:

Method of Moments

Method of Maximum Likelihood Estimation

Method of Minimum Variance

Method of Least Squares

A point estimator is any function T (Y1,Y2, ..,YN ) of a sample. Any statistic/Measure is a point estimator.

Assume that Y1,Y2, ..,YN are i.i.d. N( m, σ2) random variables. The sample mean (or average)
N
1
Yn =
N
Y
i =1
i

is a point estimator (or an estimator) of m.

Point Estimation Example:

Let T be the time that is needed for a specific task in a factory to be completed. In order to estimate the
mean and variance of T, we observe a random sample T1,T2,⋯⋯,T6. Thus, Ti's are i.i.d. and have the
same distribution as T. We obtain the following values (in minutes):

18,21,17,16,24,20.

Find the values of the sample mean, the sample variance, and the sample standard deviation for the
observed sample.
18 + 21 + 17 + 16 + 24 + 20 1 n
Tˆ =
6
= 19.33 Sˆ 2 = 
6 − 1 k =1
( X k − 19.33) 2 = 8.67

Maximum Likelihood:

The likelihood of a set of data is the probability of obtaining that particular set of data, given the
probability distribution model.

This expression contains the unknown model parameters. The values of these parameters that maximize
the sample likelihood are known as the Maximum Likelihood Estimates or MLEs.

Likelihood(θ )= probability of observing the given data as a function of ‘θ ’.

The maximum likelihood estimate (mle) of θ is that value of θ that maximises likelihood(θ).

It is defined as

n
L( ) =  f ( xi /  )
i =1
n
log L( ) =  log f ( xi /  )
i =1

Example:

Suppose we wish to find the maximum likelihood estimate (mle) of θ for a Binomial distribution,

pk (k ,  ) = nCk k (1 −  ) n − k
log pk (k ,  ) = log( nCk ) + k log(  ) + (n − k ) log(( 1 −  )
 log pk (k ,  ) k n−k
= 0 0+ − =0
  1−
k
k − k = n − k   =
n

MLEExample:

Suppose we have x=32. From What distribution we’ve got it?

If we assume mean=28 and SD=2, then , the above equation gives,


1
L( m = 28, s = 2 | x = 32) = exp{−(32 − 28) 2 / 2 * 22 } = 0.003654
2 2

Consider a sample 0,1,0,0,1,0 from a binomial distribution, with the form P[X=0]=(1-p), P[X=1]=p. Find
the maximum likelihood estimate of p.

Soln :

L(p)=P[X=0] P[X=1] P[X=0] P[X=0] P[X=1] P[X=0]

=(1-p) p (1-p) (1-p) p (1-p)

=(1-p)3p2.

Log L(p)=log[(1-p)3p2.]=3log(1-p)+2logp

LogL( p ) −3 2 −3p + 2 − 2 p
=0 means, + =0 = 0  p = 2/5
p 1− p p p(1 − p)

That is , there is 2/5 chance to observe this sample if we believe the population to be Binomial
distributed .

Interval Estimation:

In brief , we can say

Interval estimate = Point estimate ± Margin of Error

Margin of Error = Critical Value * Standard Error of the statistic

Let xi, i = 1, 2, … n be a random sample of size n from f(x,θ). If T1(x) and T2(x) be any two statistics such
that T1(x) ≤ T2(x) then,

P(T1(x) < θ < T2(x)) = 1 – α

where α is level of significance, then the random interval (T1(x), T2(x)) is called 100(1-α)% confidence
interval for θ.
Here, T1 is called lower confidence limit and T2 is called upper confidence limit. (1-α) is called the
confidence coefficient.

Confidence Interval Example:

Consider a sample 3.7, 5.3, 4.7, 3.3, 5.3, 5.1, 5.1, 4.9, 4.2, 5.7 .

The average is 4.73 and SD=0.766.

If we go for 90%confidence interval, then α=.10. The confidence interval for the average is given by

S S
X − t / 2,dof  m  X + t / 2,dof
n n

So,

0.766 0.766
4.73 − 1.83 *  m  4.73 + 1.83 *
10 10
4.73 − 0.4433  m  4.73 + 0.4433
4.2867  m  5.1733

You might also like