04 StaisticalMethods
04 StaisticalMethods
Probability deals with predicting the Statistics involves the analysis of the
likelihood of future events. frequency of past events
Example: Consider there is a drawer containing 100 socks: 30 red, 20 blue and
50 black socks.
We can use probability to answer questions about the selection of a
random sample of these socks.
PQ1. What is the probability that we draw two blue socks or two red socks from
the drawer?
PQ2. What is the probability that we pull out three socks or have matching pair?
PQ3. What is the probability that we draw five socks and they are all black?
SQ1: A random sample of 10 socks from the drawer produced one blue, four red, five
black socks. What is the total population of black, blue or red socks in the drawer?
SQ2: We randomly sample 10 socks, and write down the number of black socks and
then return the socks to the drawer. The process is done for five times. The mean
number of socks for each of these trial is 7. What is the true number of black socks in
the drawer?
etc.
The probability that the random variable takes a given value can be computed
using the rules governing probability.
For example, the probability that 𝑋 = 1 means either mother or father but not both
has had measles is 0.32. Symbolically, it is denoted as P(X=1) = 0.32
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 7
Probability Distribution
Definition 4.2: Probability distribution
A probability distribution is a definition of probabilities of the values of
random variable.
Example 4.3: Given that 0.2 is the probability that a person (in the ages between
20 and 40) has had childhood measles. Then the probability distribution is given
by
X Probability
?
0 0.64
1 0.32
2 0.04
𝒙 𝒙𝟏 𝒙𝟐 … … … … . . 𝒙𝒏
𝑓 𝑥 = 𝑃(𝑋 = 𝑥) 𝑓 𝑥$ 𝑓 𝑥% … … . . 𝑓(𝑥& )
𝒙 0 1 2
𝑓 𝑥 0.64 0.32 0.04 0.32
f(x)
0.04
The use of simulation studies can often eliminate the need of costly experiments
and is also often used to study problems where actual experimentation is
impossible.
Examples 4.4:
1) A study involving testing the effectiveness of a new drug, the number of cured
patients among all the patients who use such a drug approximately follows a
binomial distribution.
for x = 0, 1, 2, …., n
Here, 𝑓 𝑥 = 𝑃 𝑋 = 𝑥 , where 𝑋 denotes “the number of success” and 𝑋 = 𝑥
denotes the number of successes is 𝑥.
Thus,
$!
𝑃 𝑥=0 = (0.2)& (0.8)$#& = 𝟎. 𝟔𝟒
&! $#& !
X Probability
2! 0 0.64
𝑃 𝑥=1 = (0.2)' (0.8)$#' = 𝟎. 𝟑𝟐
1! 2 − 1 ! 1 0.32
2 0.04
2!
𝑃 𝑥=2 = (0.2)$ (0.8)$#$ = 𝟎. 𝟎𝟒
2! 2 − 2 !
15 38 68 39 49 54 19 79 38 14
If the value of the digit is 0 or 1, the outcome is “had childhood measles”, otherwise,
(digits 2 to 9), the outcome is “did not”.
For example, in the first pair (i.e., 15), representing a couple and for this couple, x = 1. The
frequency distribution, for this sample is
x 0 1 2
f(x)=P(X=x) 0.7 0.3 0.0
" "!
where !!,!",……,!#
=!
! !!" !……!# !
Example 4.8:
Probability of observing three red cards in 5 draws from an ordinary deck of 52
playing cards.
− You draw one card, note the result and then returned to the deck of cards
− Reshuffled the deck well before the next drawing is made
• The hypergeometric distribution does not require independence and is based on the
sampling done without replacement.
Example 4.9:
Number of customers visiting a ticket selling
counter in a railway station.
𝑒 #/0 . (𝜆𝑡)!
𝑓 𝑥, 𝜆𝑡 = 𝑃 𝑋 = 𝑥 = , 𝑥 = 0, 1, … …
𝑥!
where 𝜆 is the average number of outcomes per unit time and 𝑒 = 2.71828 …
What is P(X = x) if t = 0?
Example:
The number of customers arriving at a grocery store can be modelled by a
Poisson process with intensity λ=10 customers per hour.
1. Find the probability that there are 2 customers between 10:00 and 10:20.
2. Find the probability that there are 3 customers between 10:00 and 10:20 and
7 customers between 10:20 and 11:00.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 25
Descriptive measures
Given a random variable X in an experiment, we have denoted 𝑓 𝑥 = 𝑃 𝑋 = 𝑥 , the
probability that 𝑋 = 𝑥. For discrete events 𝑓 𝑥 = 0 for all values of 𝑥 except 𝑥 =
0, 1, 2, … . .
𝜇 = 𝑛. 𝑝
𝜎 $ = 𝑛𝑝 1 − 𝑝
2. Hypergeometric distribution
The hypergeometric distribution function is characterized with the size of a sample
(𝑛), the number of items (𝑁) and 𝑘 labelled success. Then
𝑛𝑘
𝜇=
𝑁
$
𝑁−𝑛 𝑘 𝑘
𝜎 = . n. (1 − )
𝑁−1 𝑁 𝑁
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 27
Descriptive measures
3. Poisson Distribution
The Poisson distribution is characterized with 𝜆𝑡 where 𝜆=
𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 and 𝑡 = 𝑡𝑖𝑚𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙.
𝜇 = 𝜆𝑡
𝜎 $ = 𝜆𝑡
𝑒 #1 U 𝜇 !
𝑃 𝑋=𝑥 =
𝑥!
1
𝑓 𝑥 =
𝑛
Where f(x) represents the probability mass function.
f(x) f(x)
x1 x2 x3 x4
X=x X=x
Discrete Probability distribution Continuous Probability Distribution
Examples:
1. Tax to be paid for a purchase in a shopping mall. Here, the random
variable varies from 0 to +∞
2. Amount of rainfall in mm in a region.
3. Earthquake intensity in Richter scale.
4. Height of an earth surface. Here, the random variable varies from −𝑎
to +𝑏, 𝑎, 𝑏 ∈ 𝑅, 𝑅 is a set of real numbers.
(%&')"
' # 6 "
𝑓 𝑥 = 4 $5
𝑒 ") −∞ < 𝑥 < ∞
1. 𝑓 𝑥 ≥ 0, for all 𝑥 ∈ 𝑅
∝
2. ∫&∝ 𝑓 𝑥 𝑑𝑥 = 1
) f(x)
3. 𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = ∫( 𝑓(𝑥) 𝑑𝑥
∝
4. 𝜇 = ∫&∝ 𝑥𝑓(𝑥) 𝑑𝑥
a b
∝
5. 𝜎 " = ∫&∝ 𝑥 − 𝜇 "𝑓 𝑥 𝑑𝑥 X=x
Note: Probability is represented by area under the curve. The probability of a specific
value of a continuous random variable will be zero because the area under a point is
zero.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 34
Continuous Probability Distributions
Example:
Suppose bacteria of a certain species typically live 4 to 6 hours. The
probability that a bacterium lives exactly 5 hours is equal to zero. A
lot of bacteria live for approximately 5 hours, but there is no chance
that any given bacterium dies at exactly 5.0000000000... hours.
However, the probability that the bacterium dies between 5 hours and
5.01 hours is quantifiable.
Suppose, the answer is 0.02 (i.e., 2%). Then, the probability that the
bacterium dies between 5 hours and 5.001 hours should be about
0.002, since this time interval is one-tenth as long as the previous.
The probability that the bacterium dies between 5 hours and 5.0001
hours should be about 0.0002, and so on.
A B
X=x
f(x)
c
A B
X=x
Note:
"! #
a) ∫! 𝑓 𝑥 𝑑𝑥 = $"% ×(𝐵 − 𝐴) = 1
&"'
b) 𝑃(𝑐 < 𝑥 < 𝑑)= where both 𝑐 and 𝑑 are in the interval (A,B)
$"%
%($
c) 𝜇 = )
($"%)'
d) 𝜎 ) =
#)
f(x)
"
!
x
σ2
µ1 µ
µ1 µ2 2 µ1 = µ2
Normal curves with µ1< µ2 and σ1 = σ2 Normal curves with µ1 = µ2 and σ1< σ2
σ1
σ2
µ1 µ2
@DSamanta, IIT Kharagpur Data Analytics
Normal (CS61061)
curves with µ1<µ2 and σ1<σ2 41
Properties of Normal Distribution
The curve is symmetric about a vertical axis through the mean 𝜇.
The random variable 𝑥 can take any value from −∞ 𝑡𝑜 ∞.
The most frequently used descriptive parameters define the curve itself.
The mode, which is the point on the horizontal axis where the curve is a
maximum occurs at 𝑥 = 𝜇.
The total area under the curve and above the horizontal axis is equal to 1.
(
! # ! " ' (."/)'
∫"! 𝑓 𝑥 𝑑𝑥 = ∫ 𝑒 ')
, )- "!
𝑑𝑥 =1
0
* ! * & 1 (.&/)1
𝜇= ∫&* 𝑥. 𝑓 𝑥 𝑑𝑥 = ∫&* 𝑥. 𝑒 12 𝑑𝑥
+ ",
0
! * &1[(345)221 ]
𝜎" = ∫&*(𝑥 − "
𝜇) . 𝑒 𝑑𝑥
+ ",
! .1 & 01 (.&/)1
𝑃 𝑥! < 𝑥 < 𝑥" = ∫. 𝑒 12 𝑑𝑥
+ ", 0
denotes the probability of x in the interval (𝑥, , 𝑥+ ). ! x1 x2
!#1
z= [Z-transformation]
4
0.09
0.4
0.08 σ σ=1
0.07
0.3
0.06
0.05
0.2
0.04
0.03
0.02 0.1
0.01
0.00 0.0
-5 0 5 10 15 20 25 -3 -2 -1 0 1 2 3
x=µ µ=0
f(x: µ, σ) f(z: 0, 1)
Γ 𝛼 = 𝛼 − 1 * 𝑥 7") 𝑒 ". 𝑑𝑥
6
= 𝛼−1 Γ 𝛼−1
= 𝑛 − 1 𝑛 − 2 … … … .3.2.1
= 𝑛−1 !
!
Further, Γ 1 = ∫6 𝑒 ". 𝑑𝑥 = 1
Note:
#
Γ )
= 𝜋 [An important property]
1.0
σ=1, β=1
0.8
0.6
f(x)
0.4
σ=2, β=1
0.2
σ=4, β=1
0.0
0 2 4 6 8 10 12
x
Note:
1) The mean and variance of gamma distribution are
𝜇 = 𝛼𝛽
𝜎 $ = 𝛼𝛽$
2) The mean and variance of exponential distribution are
𝜇=𝛽
𝜎 $ = 𝛽$
𝜇 = 𝑣 and 𝜎 $ = 2𝑣
1 #
'
[;< ! #1]"
𝑒 $4 " 𝑥≥0
𝑓 𝑥: 𝜇, 𝜎 = ]𝜎𝑥 2𝜋
0 𝑥<0
#$6 2 1
𝜎$ = 𝛼 8 {Γ 1+ − [Γ(1 + )]$ }
𝛽 𝛽
Note:
A PDF must be integrated over an interval to yield a probability.
f(x) f(x)
x1 x2 x3 x4
X=x X=x
Discrete Probability distribution
Continuous Probability Distribution
Data collection
Collect a sample from the population.
Statistics
Compute a statistics from the sample.
Statistical inference
From the statistics we made various statements concerning the values of population
parameters.
For example, population mean from the sample mean, etc.
Note:
A sample statistics is random variable and like any other random variable, a sample
statistics has a probability distribution.
Note: Probability distribution for random variable is not applicable to sample statistics.
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 61
Sampling Distribution
More precisely, sampling distributions are probability distributions and used to describe
the variability of sample statistics.
• is called
The probability distribution of sample mean (hereafter, will be denoted as 𝑋)
the sampling distribution of the mean (also, referred to as the distribution of sample
mean).
• we call sampling distribution of variance (denoted as 𝑆 $ ).
Like 𝑋,
Using the values of 𝑋• and 𝑆 $ for different random samples of a population, we are to
make inference on the parameters 𝜇 and 𝜎 $ (of the population).
?
@DSamanta, IIT Kharagpur Data Analytics (CS61061) 65
Theorem on Sampling Distribution
Famous theorem in Statistics
$ 4"
𝑆 ≈ "
This further, can be established with the famous “Central Limit Theorem”, which is
stated below.
If random samples each of size 𝑛 are taken from any distribution with mean µ
and variance 𝜎 $ , the sample mean 𝑋• will have a distribution approximately
4"
normal with mean µ and variance .
"
n=large
n = small
n=1 to moderate
89 &9
where 𝑍7 = .
:
This 𝜒 " is also a random variable of a distribution and is called 𝜒 " -
distribution (pronounced as Chi-square distribution).
This expression χ2 describes the distribution (of n samples) and thus having
degrees of freedom 𝜐 = n-1 and often written as 𝜒 " (𝜐), where 𝜐 is the only
parameter in it.
4. In such situation, only measure of the standard deviation available may be the
sample standard deviation 𝑆.
5. It is natural then to substitute 𝑆 for 𝜎. The problem is that the resulting statistics is
not normally distributed!
𝑍
𝑡 𝑣 =
𝜒 $ (𝑣)
𝑣
Using this definition, we can develop the sampling distribution of the sample mean when
the population variance, 𝜎 $ is unknown.
That is,
E
F#1
Z = 4/ "
has the standard normal distribution.
$ "#' A "
𝜒 = has the 𝜒 $ distribution with (𝑛 − 1) degrees of freedom.
4"
,
-&'
)/ / E
F#1
Thus, 𝑇 = or 𝑇=
/&! 0"/)" A/ "
/&!
"#' A "
Corollary: Recall that 𝜒 $ = is the Chi-squared distribution with (𝑛 − 1) degrees
4"
of freedom.
Therefore, if we assume that we have sample of size 𝑛' from a population with variance
𝜎' $ and an independent sample of size 𝑛$ from another population with variance 𝜎$ $ ,
then the statistics
𝑆' $ /𝜎' $
𝐹=
𝑆$ $ /𝜎$ $
𝒕 − 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧:
population variance is not known. In this case, we use the variance of the
sample as an estimate of the population variance.
𝝌𝟐 − 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧:
It is used for comparing a sample variance to a theoretical population
variance.
𝑭 − 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧:
It is used for comparing the variance of two or more populations.
𝑛$ , 𝜇$ , 𝜎$