0% found this document useful (0 votes)
3 views

7 Inference L8 Unlocked

Uploaded by

moneeshbba2026
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

7 Inference L8 Unlocked

Uploaded by

moneeshbba2026
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Postgraduate Diploma in Business Analytics (PGDBA)

2024-26 Batch

LECTURE 8
MODULE-II

December 30, 2021 2


Stat3: INFERENCE, PGDBA Programme, ISI, 2021
 Estimation in the context of business analytics
 Estimation as a data summarization and inferential tool
 Concepts of population, sample and estimators
 Criteria for good estimators
 Concepts of
 unbiasedness
 consistency
 Illustration of sample mean and sample proportion through Monte
Carlo simulations
 Introduction to different methods of estimation
 Concepts of sampling distributions of a statistic
 Confidence interval and their usages; examples in real life
analytic scenarios (5 weeks).

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 3


December 30, 2021 4
Stat3: INFERENCE, PGDBA Programme, ISI, 2021
A population is a collection of all possible individuals, objects, or
measurements of interest.

A sample is a portion, or part, of the population of interest

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 5


Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 6
 Makes certain propositions about a population using data drawn from the
population through some sampling procedure.
 STATISTICAL INFERENCE consists of
 assuming a realistic statistical model of the process that generates the data
 deducing (statistical) propositions from the model.
 The conclusion of a statistical inference is a statistical proposition.
 Some common types of statistical proposition
 A POINT ESTIMATE
 a particular value computed from the data that best approximates some parameter of
interest
 AN INTERVAL ESTIMATE
 an interval constructed using the data such that, under repeated sampling of such
datasets, such intervals would contain the true parameter value with the probability at
the stated confidence level
 REJECTION OF A STATISTICAL HYPOTHESIS

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 7


 Isnothing but INDUCTIVE LOGIC or REASONING.
 Inductive Logic
 A collection of observations is synthesized to come up with a
general principle.
 As opposed to deductive logic or reasoning, namely,
 If the premises are correct, the conclusion of a deductive argument
is certain.
 The truth of the conclusion of an inductive argument is
probable, based upon the evidence given.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 8


Example of Inductive Logic
 Most Indians I have come
across love spicy food.
 Therefore Indians probably
love spicy food.
Example of Deductive Logic
https://www.stratechi.com/
 All viruses undergo mutations.
 SARS-CoV-2 is a virus.
 Therefore SARS-CoV-2 must
undergo mutations.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 9


 A set of assumptions concerning the generation of the observed data.
 The description of statistical models usually emphasize the role of
population quantities of interest, about which we wish to draw inference.
 Three levels of modelling assumptions in statistics
 PARAMETRIC: The data-generation process is assumed to be fully described by a
family of probability distributions involving a finite number of unknown
parameters
 For example, one may assume that it can be described by a 𝑁𝑁(𝜇𝜇, 𝜎𝜎 2 ) distribution.
 NON-PARAMETRIC: The assumptions made about the data-generation process are
much more general than in parametric statistics and may be minimal.
 For example, the data-generation process can be described by a continuous probability
distribution.
 SEMI-PARAMETRIC: Intermediate to the fully and non-parametric approaches.
 For example, one may assume that
 the data-generation process is a continuous probability distribution with a finite mean.
 the mean in the population is a linear function of some covariate

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 10


 A model for data collection which generates observations that are
said to constitute a random sample from the population
 RANDOM SAMPLE
 Definition: A collection of random variables 𝑋𝑋1 , 𝑋𝑋2 , ⋯ , 𝑋𝑋𝑛𝑛 is said to be a
random sample of size 𝑛𝑛 from a population characterized by a pmf/pdf
𝑓𝑓(𝑥𝑥) if
 𝑋𝑋1 , 𝑋𝑋2 , ⋯ , 𝑋𝑋𝑛𝑛 are mutually independent random variables
 each 𝑋𝑋𝑖𝑖 has the same pmf/pdf 𝑓𝑓(𝑥𝑥)
 𝑋𝑋1 , 𝑋𝑋2 , ⋯ , 𝑋𝑋𝑛𝑛 are said to be independent and identically distributed
random variables (or i.i.d. random variables, in short).
 A set of random observations 𝑥𝑥1 , 𝑥𝑥2 , ⋯ , 𝑥𝑥𝑛𝑛 of size 𝑛𝑛 from 𝑓𝑓(𝑥𝑥) is a
realization of the random variables 𝑋𝑋1 , 𝑋𝑋2 , ⋯ , 𝑋𝑋𝑛𝑛 .
Convention: Uppercase letters represent random variables, their lowercase
counterparts represent observations on them.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 11


 Independent members of a population drawn without bias
 The statistical theory to be discussed henceforth is based on
the premise that a random sample is available from the
population.
 The larger the sample, the better the inference from the
data.
 Samples can be drawn in many ways.
 Analysis will depend on the sampling method used.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 12


A mathematical model
•parametric
•non-parametric

RELATIONSHIP
BETWEEN A
POPULATION AND
A SAMPLE

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 13


 Any function of 𝑋𝑋1 , 𝑋𝑋2 , ⋯ , 𝑋𝑋𝑛𝑛 , like
1 𝑛𝑛
 sample mean: 𝑋𝑋� = ∑𝑖𝑖=1 𝑋𝑋𝑖𝑖
𝑛𝑛
1 𝑛𝑛 1
 sample variance: 𝑆𝑆 2 = ∑𝑖𝑖=1 𝑋𝑋𝑖𝑖 − 𝑋𝑋� 2 or 𝑛𝑛
∑𝑖𝑖=1 𝑋𝑋𝑖𝑖 − 𝑋𝑋� 2
𝑛𝑛 𝑛𝑛−1
1 𝑛𝑛 1
 sample standard deviation: 𝑆𝑆 = + ∑ 𝑋𝑋𝑖𝑖 − 𝑋𝑋� 2 or + 𝑛𝑛
∑𝑖𝑖=1 𝑋𝑋𝑖𝑖 − 𝑋𝑋� 2
𝑛𝑛 𝑖𝑖=1 𝑛𝑛−1
 order statistics: 𝑋𝑋(1) , 𝑋𝑋(2) , ⋯ , 𝑋𝑋(𝑛𝑛) where 𝑋𝑋(1) ≤ 𝑋𝑋 2 ≤ 𝑋𝑋 𝑛𝑛 .
 sample maximum: 𝑋𝑋 𝑛𝑛
 sample minimum: 𝑋𝑋 1
 sample range: 𝑋𝑋 𝑛𝑛 − 𝑋𝑋 1
 A Statistic, being a function of random variables, is also a random
variable.
 Sampling distribution of a statistic is its probability distribution.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 14


 In the singular sense, it refers to the discipline of Statistics,
that is, a body of scientific methods dealing with collection
and analysis of numerical data.
 In the plural sense, it refers to
 more than one statistic, as defined in the previous slide, OR
 more than one fact or piece of information obtained from a study
of a large quantity of numerical data.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 15


Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 16
December 30, 2021 17
Stat3: INFERENCE, PGDBA Programme, ISI, 2021
 Monte Carlo Methods are a broad class of computational
algorithms that rely on repeated random sampling to obtain
numerical results.
 Use randomness to solve problems that might be
deterministic in principle.
 Often used in physical and mathematical problems.
 Are most useful when it is difficult or impossible to use other
approaches.
 Monte Carlo methods are mainly used for
 optimization
 numerical integration
 generating random samples from a probability distribution.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 18


 Numerical Integration: Computing a definite integral
𝑏𝑏

𝐼𝐼 𝑎𝑎, 𝑏𝑏 = � 𝑔𝑔 𝑥𝑥 𝑑𝑑𝑑𝑑
𝑎𝑎
when there is no closed form expression for the corresponding indefinite integral
∫ 𝑔𝑔 𝑥𝑥 𝑑𝑑𝑑𝑑 .
 Solution: Select 𝑛𝑛 points 𝑥𝑥1 , 𝑥𝑥2 , ⋯ , 𝑥𝑥𝑛𝑛 at random (that is, generate 𝑛𝑛 random
1
numbers) from the interval [𝑎𝑎, 𝑏𝑏] and estimate 𝐼𝐼 𝑎𝑎, 𝑏𝑏 by 𝐼𝐼̂(𝑛𝑛) = (𝑏𝑏 − 𝑎𝑎) ∑𝑛𝑛𝑖𝑖=1 𝑔𝑔(𝑥𝑥𝑖𝑖 ).
𝑛𝑛

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 19


 Any a set of numbers exhibiting statistical randomness, that
is, not exhibiting any discernible patterns or regularities
 Generation of random numbers is at the heart of Monte Carlo
methods.
 Generation of random numbers
 In earlier days, before the advent of computers, random number
tables were used
 Almost all software for statistical computation have built-in random
number generators
 Actually, they are pseudo-random number generators since they use
deterministic algorithms, like 𝑥𝑥𝑛𝑛+1 = 𝑎𝑎𝑥𝑥𝑛𝑛 + 𝑏𝑏 (mod 𝑀𝑀) with 𝑥𝑥0 as the seed,
𝑀𝑀 being a large positive integer. (linear congruential generators)

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 20


Let 𝑥𝑥1 , 𝑥𝑥2 , ⋯ , 𝑥𝑥𝑛𝑛 , ⋯ be a sequence of random numbers in [0,1].
 Ideally, plots of 𝑥𝑥𝑖𝑖 , 𝑥𝑥𝑖𝑖+1 should exhibit complete absence of
any kind of pattern, that is should be truly randomly
distributed. Truly random Numbers Pseudo-random Numbers

 Pseudo-random numbers
exhibit patterns.

https://www.ics.uci.edu/~goodrich/

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 21


 Simulation
 The imitation of the behaviour of a real-world process or system over time.
 Requires the use of models which represent the key characteristics of the system or process.
 Generally computers are used to execute the simulation.
 Stochastic Simulation
 is a simulation of a system that involves random variables, that is, which can change
stochastically (randomly) with individual probabilities.
 Realizations of these random variables are generated and inserted into a model of the system.
 Outputs of the model are recorded.
 The process is repeated with a new set of random values until a sufficient amount of data is
generated.
 The distribution of the outputs provides insights into the system, like the most probable
estimates of parameters, and so on

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 22


 The random variables used for
simulating a stochastic model are
generated on a computer with a
random number generator (RNG).
 The sequence of numbers
generated by a RNG takes values in
[0,1] generally.
 Can be looked upon as a realization of
the uniform random variable over
[0,1] or the 𝑈𝑈(0,1) distribution.
 These can be transformed into random
variables with respective probability
distributions. https://genedan.com/

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 23


Let 𝑓𝑓(𝑥𝑥) be the pdf/pmf to be simulated from, with
corresponding cumulative distribution function (CDF) 𝐹𝐹(𝑥𝑥).

 Some commonly-used methods for simulating from 𝑓𝑓(𝑥𝑥)


 Inversion method
 Acceptance-Rejection method
 Box-Muller method for normal variables
 Special methods for specific distributions, based on their
properties.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 24


 It can easily be shown that the random variable 𝐹𝐹(𝑋𝑋) has the
𝑈𝑈(0,1) distribution.
−1
 If the inverse function for 𝐹𝐹, that is, 𝐹𝐹 exists, then 𝐹𝐹 −1 (𝑈𝑈)
has cdf 𝐹𝐹, where U~𝑈𝑈(0,1).
 Thus for any realization 𝑢𝑢 from the 𝑈𝑈(0,1) distribution,
𝐹𝐹 −1 (𝑢𝑢) is a realization from 𝑓𝑓(𝑥𝑥).
 Example
1 𝑥𝑥⁄
 𝑓𝑓 𝑥𝑥 = 𝑒𝑒 − 𝜇𝜇 , 𝑥𝑥 > 0, that is, the exponential distribution with
𝜇𝜇
mean 𝜇𝜇
𝑥𝑥⁄ 1
Here 𝐹𝐹 𝑥𝑥 = 1 − 𝑒𝑒 − 𝜇𝜇 and 𝐹𝐹 −1 𝑢𝑢 = − log 𝑒𝑒 1 − 𝑢𝑢 .
𝜇𝜇
1 1
Hence − log 𝑒𝑒 1 − 𝑢𝑢 or, equivalently, − log 𝑒𝑒 𝑢𝑢 is a realization from 𝑓𝑓(𝑥𝑥).
𝜇𝜇 𝜇𝜇

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 25


 Easily extended to the
case of discrete random
variables though 𝐹𝐹 −1 (𝑢𝑢) is
not uniquely defined.
 Here Pr 𝑥𝑥 = 𝑥𝑥𝑖𝑖 𝑢𝑢

= 𝐹𝐹 𝑥𝑥𝑖𝑖+1 − 𝐹𝐹(𝑥𝑥𝑖𝑖 )
 So, for 𝑢𝑢~𝑈𝑈 0,1 , if
𝐹𝐹 𝑥𝑥𝑖𝑖 < 𝑢𝑢 ≤ 𝐹𝐹 𝑥𝑥𝑖𝑖+1 ,
take 𝑥𝑥𝑖𝑖+1 to be a realization
from 𝑓𝑓.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 26


 Consider another pmf/pdf 𝑔𝑔(�) 𝑓𝑓(𝑥𝑥)
which has the same support as
𝑓𝑓(�). 𝑐𝑐𝑐𝑐(𝑥𝑥)
𝑓𝑓(𝑥𝑥)
 Let 𝑐𝑐 = < ∞, that is,
max
𝑥𝑥 𝑔𝑔(𝑥𝑥)
𝑓𝑓 𝑥𝑥 ≤ 𝑐𝑐𝑐𝑐 𝑥𝑥 ∀𝑥𝑥, with 𝑐𝑐 > 1.
 Generate 𝑢𝑢 from 𝑈𝑈(0,1) and 𝑣𝑣 Reject
from 𝑔𝑔, independently.
1 𝑓𝑓(𝑣𝑣) 𝑓𝑓(𝑥𝑥)
 If 𝑢𝑢 < ,
return 𝑣𝑣 as a Accept
𝑐𝑐 𝑔𝑔(𝑣𝑣)
realization from 𝑓𝑓; 𝑥𝑥
else go to the previous step.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 27


 Simulation from the 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵(2,2)
density
𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 1 − 𝑥𝑥 , 0 < 𝑥𝑥 < 1.
 Take the 𝑈𝑈(0,1) density as the
proposal density and 𝑐𝑐 = 1.5.
 Choice of 𝑐𝑐 is important since
the proportion of samples from
1
𝑔𝑔 that are rejected is .
𝑐𝑐

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 28


 Generates a pair of independent, standard normal variates
(𝑁𝑁(0,1) ) 𝑍𝑍0 and 𝑍𝑍1 , given a pair of uniformly distributed
random numbers 𝑈𝑈1 and 𝑈𝑈2 as

or, equivalently, as

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 29

You might also like