Statistical Sampling & Parameter Estimation: Prof M.Shashi

Uploaded by

ghosthunter007123

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Statistical Sampling & Parameter Estimation: Prof M.Shashi

Uploaded by

ghosthunter007123

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Statistical Sampling &

Parameter Estimation
Prof M.Shashi
Population and Sampling
• Population is includes all entities of interest for a decision making
scenario.
• In most of the cases, it involves vast population which may not be reachable
within the reasonable constraints on time and effort.
• Sample is a subset of the population. Whether the subset includes
appropriate data to represent the population or not depends on the
purpose of taking the sample.
• Generally business decisions are made based on samples. Hence
unless mentioned otherwise, data considered for business analytics is
sample only.
Statistical Sampling
• Sampling is used for:
for estimating the central tendency and spread of population
using mean, mode, median, variance, 5-point summary, etc.
providing inputs for building decision models to understand the
trends and make predictions
• Sampling plan is a description of the approach used to obtain samples
from population.
Components of a Sampling Plan
• Objective of sampling activity,
• target population
• population frame
• method of sampling
• operational procedures to collect data and
• statistical tools to be used for data analysis.
Sampling Methods
• Sampling methods can be either Subjective or Probabilistic.
• Subjective sampling methods includes
• Judgemental sampling wherein an expert decides whom to sample (best
customers) and
• Convenience sampling wherein samples are taken based on ease and
feasibility (recent customers)
• Probabilistic sampling involves selecting items randomly from the
whole population.
• Probabilistic sampling is necessary for drawing valid statistical
conclusions.
Probabilistic Sampling Methods in Excel

• Simple Random Sampling

• Systematic or Periodic Sampling
• Stratified Sampling
• Clustered Sampling
• Sampling from a continuous Process
In Excel, click on Data Analysis in the Analysis group of the Data tab and select
Sampling and we get a dialogue box shown above.
This tool requires input range specified as numeric.
With Simple random sampling every subset of a given size (n) has equal chance of
being selected.
In periodic sampling from a population size of p, the tool selects the first item at
random from the first block of size p/n items and then a series of the remaining (n-
1)items of the sample are selected from the population p/n items apart.
Probabilistic Sampling Methods contd…
• Stratified sampling applies to populations containing natural partitions
(strata) and selects proportionate number of items from each stratum.
Disadv: leads to negligible representation from minority groups.
• Cluster Sampling involves sampling a set of clusters rather than individual
items so that all the items within a selected cluster are included in the
sample. It is easier and costs less compared to selecting individual items for
sampling large datasets.
• Sampling from Continuous process like a manufacturing process in done by
selecting a random time and include a chunk of n-items in the sample
arriving after that timing. OR select n-time stamps at random and then
include the next item after each of these time stamps.
Estimating Population Parameters
Single point Estimates using Excel Functions
1. Mean is obtained using =AVERAGE(B2:B95)
Sum of mean-deviations for all observations is equal to 0.
2. Median is the middle value of a ordered list of observations. It is
obtained using =MEDIAN(B2:B95) or by applying sort option on
the range and find the middle observation or the average of two
middle values if the list has even number of observations as
shown in the figure.
3. Mode is the value with highest frequency of occurrence in the
given range. It is obtained using MODE.SNGL(range) or
MODE.MULT(range) for single mode and multiple modes
respectively.
For frequency distributions MODE is the group / interval having the
highest frequency.
4. Mid Range is the average of the MIN and MAX.
Errors in Point Estimation
• Drawback of point estimates is that they do not provide any indication on
the magnitude of potential error.
• Sampling error (refers to the variation of estimates among samples) is
inherent in any sampling process. It can only be minimized but can not be
avoided altogether.
• Non-sampling error occurs when the sample doesn’t represent the target
population adequately.
• It is due to poor sample design such as convenience sampling where random
sampling is appropriate OR
• Wrong population frame is selected OR
• Less reliable data
• Data analyst should eliminate non-sampling error.
Effect of size of the sample on sampling error
• Sampling error depends on the size of sample relative (p/n) to the population size.
• Larger samples provide more accurate estimates of population parameters.
• The figures illustrate the variation of sampling error when sample mean is used for estimating
population mean with different sample sizes.
• Population is uniformly distributed between 0 to 10 and hence population mean is 5.
• It is estimated by sample mean with varying sample sizes and comparative results are shown in the next slide:
Experiment to observe Variation of Sampling Error
with Relative Size of the Sample (on 25 samples of each size)
Application of Standard Error
Central Limit Theorem
• This theorem is one of the most important foundation for making systematic
inference in real world scenarios.
• Central Limit Theorem(CLT) states that if the sample size is large enough, the
sampling distribution of mean is approximately normally distributed regardless of
the distribution of the population and that the mean of the sampling distribution
is same as that of the population.
• In the experiment discussed in previous slides:
• Distribution of population is uniformly distributed, yet sampling distribution of mean
converges to Normal distribution as the size of the sample increases.
• CLT also states that if the population is normally distributed, then the sampling
distribution of mean is also normally distributed for any sample size.
• Hence CLT allows us to apply the concepts and formulae derived for calculating
probabilities for normal distributions to draw conclusions about sample means.
Estimating Sampling Error using Empirical Rules
• According to empirical rule, the true value of a parameter falls within
a range of three standard deviations around the estimated value of
the parameter when it follows normal distribution.
• As per the central limit theorem the sampling distribution of mean is
normally distributed and the mean of the sample distribution is same
as the population mean (μ) for large samples.
• Hence the distribution of sample means starts
from m-3*s and ends at m+3*s.
• From the given table, for sample sizes 25 and 500, we can empirically
estimate the intervals as [3.65,6.35] and [4.76,5.24] respectively.
Interval estimates
• Interval estimates provide range of (plausible values of) a population
parameter / characteristic based on a sample.
• Probability intervals are centered on the mean or median.
• A 100(1-α)% probability interval is any interval [A,B] such that the
probability of falling between A and B is (1-α).
• Eg: In normal distribution with mean (μ) and standard deviation (σ) , μ ± σ is
approximately 68% probability interval around mean. Here, Margin of error is σ.
• 5th and 95th percentiles bounds or defines the 95% probability interval.
Confidence Intervals

d
Estimation of Confidence Interval for Mean

• Finding zα/2 value in the statistical tables:

eg:For 95% confidence level, α=0.05 and α/2=0.025 and 1-α/2=0.975; Search for 0.975 in the cells of Z tables
and note the row and col values. zα/2 value for 95%= row+col =1.9+0.06=1.96
• As the level of confidence, 1-α, decreases, zα/2 decreases and the confidence interval becomes narrower. A
99% confidence interval is wider than a 95% confidence interval for a given sample.
• We must trade-off a higher level of accuracy (low error margins) with the risk that the confidence interval
does not contain the true mean reflected by lower level confidence.
• For a fixed level of risk or level of confidence, as the sample size increases, standard error decreases and
makes the confidence interval narrower that corresponds to more accurate interval.
T-Distribution
• It is a probability distribution with
shape similar to normal distribution
but with larger variance to represent
wider confidence intervals.
• It is used to model uncertainty about 0 t(α/2,df)
the true standard deviation when unknown.
• t-distribution has a parameter, degrees of freedom (df) and as the degrees of freedom increases,
the t-distribution converges to standard normal distribution.
• As the sample size increases, df increases and we use z-values as in the previous formulae to
estimate confidence interval even if the population standard deviation is not known; value of
standard deviation estimated from the large sample is accepted as the value of σ.
• When there is doubt or when only normal sized samples are available, it is better to use t-
distribution.
• Degrees of freedom is defined as the number of sample values that are free to vary. In general, df
is the number of sample values minus the number of estimated parameters.
• Eg: Since sample variance (s2) is estimated using only one estimated parameter (sample mean), t-distribution
of s2 has df=(n-1)
Confidence interval for Mean with unknown
population standard deviation
• Formula for 100(1-α)% confidence interval for the population mean(μ)
when the population variance is unknown is: m ± tα/2,n-1(s/√n) where
tα/2,n-1 value is found from t-distribution tables with (n-1) df, given the
upper tail probability of α/2.
Confidence interval for Mean with unknown
population standard deviation contd…
Confidence Interval for Proportion
of a Categorical variables
• For categorical variables like gender, the proportion of records of a specific
value among all possibilities in the sample is of interest.
• An unbiased estimator of a population proportion π is the statistic / metric
called sample proportion, p^=x/n where x records have the specific value in
the sample of size n.
• A 100(1-α)% confidence interval for the proportion is
Using Confidence intervals for Decision Making
• Drawing conclusion based confidence interval • Predicting election results based on confidence
for population mean interval of proportions
In the example 6.8,while we require a volume Suppose the exit poll of 1300 voters found that 692
voted for candidate A in a contest between A and
of 800ml to be filled in a bottle, we found the B. Just because 692/1300=0.53 of the sample
sample average of 796 ml and accordingly 95% voters are in favour of A we can't predict that A
confidence interval is computed as [790.12, would win. Instead 95% confidence interval for the
801.88]. proportion is estimated as [0.505,0.559] and since
Although the sample mean is less than 800ml, the lower bound is also greater then 0.5, it is safe
we rely on the confidence interval as it to predict that A would win.
contains the desired value, since it is just as However, if it was found that A has only 670 voters,
plausible that the population mean could be the sample proportion(p^) is 0.515 and 95%
800ml with 95% confidence. However, if the confidence interval reduces to [0.488, 0.543].
sample mean were found to be 792, we get Then, it is not wise to predict that A would win,
since the population proportion could be less than
the 95% confidence interval as [786.12, 0.5, even though sample proportion is greater than
797.88] . Then the manufacturer should check 0.5 due to possible sampling error.
and adjust the equipment to meet the
standard.
Prediction Intervals
• A prediction interval is one that provides a range for predicting the value of a new
observation from the same population.
• Prediction interval is associated with the distribution of a random variable while a
confidence interval is associated with the sampling distribution of a summary statistic.
• Prediction intervals are wider than confidence intervals.
• When the population SD is unknown, a 100(1-α)% prediction interval for a new observation
is

OJO - RD 43-170 - GRI 04-0229 Guidelines For Reliability Based Design - 0900a86680125df2
0% (1)
OJO - RD 43-170 - GRI 04-0229 Guidelines For Reliability Based Design - 0900a86680125df2
236 pages
Estimation New
No ratings yet
Estimation New
37 pages
Chapter 6 - Sampling and Estimation
No ratings yet
Chapter 6 - Sampling and Estimation
36 pages
6 Estimation and Hypothesis
No ratings yet
6 Estimation and Hypothesis
95 pages
Chapter 6
No ratings yet
Chapter 6
7 pages
Sample Size Determination
No ratings yet
Sample Size Determination
29 pages
Chapter 7 Sampling and Sampling Distributions
No ratings yet
Chapter 7 Sampling and Sampling Distributions
44 pages
Chapter 6-8 Sampling and Estimation
No ratings yet
Chapter 6-8 Sampling and Estimation
48 pages
Introduction To Sampling: Situo Liu Spry, Inc. 10/25/2013
No ratings yet
Introduction To Sampling: Situo Liu Spry, Inc. 10/25/2013
22 pages
Chapter 5 Sampling and Estimation
No ratings yet
Chapter 5 Sampling and Estimation
13 pages
Evans Analytics2e PPT 06 Final
100% (1)
Evans Analytics2e PPT 06 Final
36 pages
Point of Estimation of Parameters and Sampling Distri.
No ratings yet
Point of Estimation of Parameters and Sampling Distri.
39 pages
Inferential Statistics
No ratings yet
Inferential Statistics
40 pages
BA4101 - Module 2
No ratings yet
BA4101 - Module 2
24 pages
Sampling and Estimation
No ratings yet
Sampling and Estimation
15 pages
Sampling and It
No ratings yet
Sampling and It
14 pages
Distribution of Sample Means
No ratings yet
Distribution of Sample Means
32 pages
Stat Notes
No ratings yet
Stat Notes
5 pages
Sampling and Estimation
No ratings yet
Sampling and Estimation
36 pages
Stat 115 - Basic Statistical Methods
No ratings yet
Stat 115 - Basic Statistical Methods
6 pages
Unit 3 Statistical and Modelling
No ratings yet
Unit 3 Statistical and Modelling
6 pages
Quantitative Chapter6
No ratings yet
Quantitative Chapter6
27 pages
UNIT - 4
No ratings yet
UNIT - 4
10 pages
Review of Chapters 1-5
No ratings yet
Review of Chapters 1-5
21 pages
Research Methodology - Chapter 8
No ratings yet
Research Methodology - Chapter 8
21 pages
Interval Estimate
No ratings yet
Interval Estimate
20 pages
Sampling Dist
No ratings yet
Sampling Dist
40 pages
7Estimation
No ratings yet
7Estimation
108 pages
Ch6 Sampling and Estimation
No ratings yet
Ch6 Sampling and Estimation
24 pages
Day 3
No ratings yet
Day 3
88 pages
Chapter 7: The Distribution of Sample Means
No ratings yet
Chapter 7: The Distribution of Sample Means
23 pages
Sampling 1
No ratings yet
Sampling 1
26 pages
FIN 640 - Lecture Notes 4 - Sampling and Estimation
100% (1)
FIN 640 - Lecture Notes 4 - Sampling and Estimation
40 pages
Sampling Distribution
No ratings yet
Sampling Distribution
19 pages
VIII - Estimation
No ratings yet
VIII - Estimation
60 pages
Sampling
No ratings yet
Sampling
9 pages
Chapter 10
No ratings yet
Chapter 10
16 pages
Confidence Intervals PDF
No ratings yet
Confidence Intervals PDF
5 pages
Using Statistical Inference
No ratings yet
Using Statistical Inference
18 pages
Sampling Distribution of a Static
No ratings yet
Sampling Distribution of a Static
9 pages
Week 017 Measures of Central Tendency
No ratings yet
Week 017 Measures of Central Tendency
15 pages
Sampling
No ratings yet
Sampling
30 pages
Applied Environmental Statistics
No ratings yet
Applied Environmental Statistics
35 pages
Sample Size
No ratings yet
Sample Size
6 pages
Lecture 6 - Estimation Part A
No ratings yet
Lecture 6 - Estimation Part A
23 pages
Estimation
No ratings yet
Estimation
92 pages
Designing Methodology
No ratings yet
Designing Methodology
93 pages
Chapter 7 - Sample Selection: 7.1 Sampling
No ratings yet
Chapter 7 - Sample Selection: 7.1 Sampling
5 pages
File 1722526077 110208 PrinciplesofInstrumentation-Secondpart
No ratings yet
File 1722526077 110208 PrinciplesofInstrumentation-Secondpart
28 pages
Presented By: Ashwini Pokharkar Rohit Pandey Swapnil Muke Apoorva Dave Peeyush Khandekar Shailaja Patil
100% (1)
Presented By: Ashwini Pokharkar Rohit Pandey Swapnil Muke Apoorva Dave Peeyush Khandekar Shailaja Patil
33 pages
Probability Distributions-Sarin B
No ratings yet
Probability Distributions-Sarin B
20 pages
Statistics and Probabiity.pptx
No ratings yet
Statistics and Probabiity.pptx
239 pages
Sample Size Determination
100% (2)
Sample Size Determination
25 pages
Lecture 3: Sampling and Sample Distribution
No ratings yet
Lecture 3: Sampling and Sample Distribution
30 pages
Statistical Inference
100% (1)
Statistical Inference
33 pages
Session 5 Sampling Distribution
No ratings yet
Session 5 Sampling Distribution
67 pages
Sampling and Sampling Distribution
100% (1)
Sampling and Sampling Distribution
22 pages
Confidence Intervals and Hypothesis Tests For Means
No ratings yet
Confidence Intervals and Hypothesis Tests For Means
40 pages
Statistical Inference
No ratings yet
Statistical Inference
52 pages
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Multi-Objective Mathematical Programming Approach For Multivariate Compromise Allocation For Stratif
No ratings yet
Multi-Objective Mathematical Programming Approach For Multivariate Compromise Allocation For Stratif
13 pages
Chapter10 Heizer HR Final
No ratings yet
Chapter10 Heizer HR Final
84 pages
Estimation
No ratings yet
Estimation
35 pages
Full Download of Statistical Techniques in Business and Economics 15th Edition Lind Solutions Manual in PDF DOCX Format
100% (2)
Full Download of Statistical Techniques in Business and Economics 15th Edition Lind Solutions Manual in PDF DOCX Format
52 pages
Alleviating Traffic Congestion in Lebanon With Special Emphasis On The Roadway Section: Beirut-Jounieh
No ratings yet
Alleviating Traffic Congestion in Lebanon With Special Emphasis On The Roadway Section: Beirut-Jounieh
19 pages
Krebs Chapter 08 2017
No ratings yet
Krebs Chapter 08 2017
56 pages
Chapter 9 Test Bank
No ratings yet
Chapter 9 Test Bank
67 pages
Statistics For The Behavioral Sciences 10th Edition Gravetter Test Bank download
100% (1)
Statistics For The Behavioral Sciences 10th Edition Gravetter Test Bank download
56 pages
3 - Notes On Unit-3 - Work Measurements
No ratings yet
3 - Notes On Unit-3 - Work Measurements
9 pages
Fourth Year Final Project
No ratings yet
Fourth Year Final Project
60 pages
Bbs14e PPT ch08
No ratings yet
Bbs14e PPT ch08
71 pages
AMFO
No ratings yet
AMFO
4 pages
Chapter 09
No ratings yet
Chapter 09
55 pages
18.02.MSA Attribute (Advance)
No ratings yet
18.02.MSA Attribute (Advance)
71 pages
Principles and Planning For Research: A. B. C. D
No ratings yet
Principles and Planning For Research: A. B. C. D
62 pages
Kinanthropometry VIII Proceedings of The 8th International Conference of The International Society For The Advancement of Kinanthropometry ISAK 1st Edition Thomas Reilly All Chapter Instant Download
100% (3)
Kinanthropometry VIII Proceedings of The 8th International Conference of The International Society For The Advancement of Kinanthropometry ISAK 1st Edition Thomas Reilly All Chapter Instant Download
84 pages
Sampling Procedure: Ms. Resa Mae C. Laygan
No ratings yet
Sampling Procedure: Ms. Resa Mae C. Laygan
19 pages
Chapter 10 Sampling Strategies Edited
No ratings yet
Chapter 10 Sampling Strategies Edited
23 pages
Solved Problems PDF
100% (5)
Solved Problems PDF
11 pages
Kock2016 Minimum Sample Size Estimation in PLS-SEM
No ratings yet
Kock2016 Minimum Sample Size Estimation in PLS-SEM
35 pages
Analysis of Service Quality of Private Hospital in Bangladesh
75% (4)
Analysis of Service Quality of Private Hospital in Bangladesh
79 pages
Jensen
No ratings yet
Jensen
6 pages
Journal Article Reporting Standards (JARS)
No ratings yet
Journal Article Reporting Standards (JARS)
2 pages
Ch8 - Statistical Intervals For A Single Sample
No ratings yet
Ch8 - Statistical Intervals For A Single Sample
22 pages
Sample Size Merwyn
No ratings yet
Sample Size Merwyn
40 pages
ደሴ
No ratings yet
ደሴ
33 pages
Research Dholaikhal
No ratings yet
Research Dholaikhal
74 pages
Chapter I
25% (4)
Chapter I
32 pages
STAT-II Week End
100% (2)
STAT-II Week End
57 pages

Statistical Sampling & Parameter Estimation: Prof M.Shashi

Uploaded by

Statistical Sampling & Parameter Estimation: Prof M.Shashi

Uploaded by

Statistical Sampling &

• Simple Random Sampling

• Finding zα/2 value in the statistical tables:

You might also like