Sampling Thoery

Uploaded by

yasvinariya2708

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views30 pages

Sampling Thoery

Uploaded by

yasvinariya2708

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

SAMPLING THEORY

- Population and sample

- Sampling with and without replacement
- Random sample and population parameters
- Sample statics and mean
- Sample variances
- Unbiased estimates and efficient estimates
Population And Sample
Population:
• population is the entire set of items from which you
draw data for a statistical study.
• Example: We want to predict the final grades of all
students in the school based on various features such
as study hours, attendance, and past exam scores.
Sample:
• A subset of a larger population that contains
characteristics of that population.
• A sample is used in statistical testing when the
population size is too large for all members or
observations to be included in the test.
• Example :We randomly select 50 students from the
entire school population to create a sample. This
subset will be used to build and train our predictive
model.
Why sampling ?

Feasibility of Data Collection:

•Reason: In some situations, collecting data from the entire population may be impractical
or impossible.
•Example: If you're studying the behavior of online users, obtaining data from the entire
global user base may be unrealistic, so you might opt for a sample.

Reducing Bias:
•Reason: In some cases, working with a sample can help reduce bias in the data or
mitigate the impact of outliers.
•Example: If a dataset contains outliers or is heavily skewed, a sample might provide a
more balanced representation for model training.
Model Testing and Evaluation:
•Reason: Sampling is crucial during the testing and evaluation phase of model
development to assess performance on unseen data.
•Example: After training a model, it is essential to evaluate its performance on a separate
sample (validation or test set) to ensure its generalization to new, unseen data.

Computational Efficiency:
•Reason: Training machine learning models on large datasets can be computationally
expensive and time-consuming.
•Example: If you're building a model to predict customer preferences, it might be more
practical to work with a sample of the customer data rather than the entire dataset.
Sampling with and without replacement
 sampling without replacement, in which a subset of the observations is selected
randomly, and once an observation is selected it cannot be selected again.
•In sampling without replacement, once an item is selected from the population, it is not
returned before the next selection.
•This means that each subsequent selection has a reduced pool of items to choose from.
•Once an item is selected, it cannot be selected again in the same sample.
•Sampling without replacement is often used when dealing with finite populations or when
it's important to ensure that each item is selected only once.
•E.g. Continuing with the dataset of 1000 images of cats and dogs, in cross-validation, you
divide the dataset into 5 equal parts. After each selection of a validation set, you do not put
the images back into the dataset before the next selection. This ensures that each image is
used exactly once for validation across all iterations of cross-validation, without
replacement.
 sampling with replacement, in which a subset of observations are selected randomly,
and an observation may be selected more than once.
•In sampling with replacement, each selected item is returned to the population before
the next item is selected.
•This means that each item in the population has the same probability of being selected
each time.
•Consequently, it's possible to select the same item more than once in the sample.
•Sampling with replacement is commonly used in situations where the population is
either very large or infinite.
•E.g. Imagine you have a dataset of 1000 images of cats and dogs for a classification
task. you randomly select 100 images from the dataset, and after each selection, you put
the image back into the dataset before the next selection. This allows some images to be
selected multiple times for training the model.
Sampling methods
1.Probability sampling:
Probability sampling is a sampling technique where a researcher selects a few criteria and
chooses members of a population randomly. All the members have an equal opportunity to
participate in the sample with this selection parameter.
For example, in a population of 1000 members, every member will have a 1/1000 chance of
being selected to be a part of a sample. Probability sampling eliminates sampling bias in the
population and allows all members to be included in the sample.
2.Non-probability sampling:
In non-probability sampling, the researcher randomly chooses members for research. This
sampling method is not a fixed or predefined selection process. This makes it difficult for all
population elements to have equal opportunities to be included in a sample.
For example:
suppose a teacher has to choose 4 participants from a class of 30 students in a debate
competition. Here, the teacher may select the top 4 debaters on the basis of her own conscious
judgement about the top debaters in the class. This is an example of purposive sampling. In this
method, the purpose of the sample guides the choice of certain members or units of the
population. Here, the all population member has not same chance to being selected.
Simple random sampling
In simple random sampling technique, every item in the population
has an equal and likely chance of being selected in the sample.
Since the item selection entirely depends on the chance, this
method is known as “Method of chance Selection”. As the
sample size is large, and the item is chosen randomly, it is known
as “Representative Sampling”.
Example:
Suppose we want to select a simple random sample of 200 students
from a school. Here, we can assign a number to every student in
the school database from 1 to 500 and use a random number
generator to select a sample of 200 numbers.

One of the best probability sampling techniques that helps in

saving time and resources is the Simple Random Sampling
method. It is a reliable method of obtaining information where
every single member of a population is chosen randomly, merely
by chance. Each individual has the same probability of being
chosen to be a part of a sample.
Cluster sampling
Cluster sampling is a method where the researchers divide the
entire population into sections or clusters representing a
population. Clusters are identified and included in a sample based
on demographic parameters like age, sex, location, etc. This makes
it very simple for a survey creator to derive effective inferences
from the feedback.
Example:
In a study on education quality, a researcher selects
five schools out of a total of twenty in a district using
cluster sampling. They then survey all teachers
within those selected schools to gather data on
teaching methods and student performance.

Cluster sampling is a sampling technique used in

statistics when the population being studied is too
large and widely dispersed to enumerate or sample
directly. Instead of individually sampling from the
Systematic sampling
Systematic sampling is a probability sampling method where
the researcher chooses elements from a target population by
selecting a random starting point and selecting sample
members after a fixed ‘sampling interval.’
Example:Suppose we have a list of 200 registered
voters, and we want to select a systematic
sample of 20 voters to interview about their
political preferences. With a sampling interval of
200 / 20 = 10, we randomly choose a starting
point, say the 5th voter. We then proceed to
select every 10th voter from the list until we
reach 20, forming
Systematic samplingourissystematic
often moresample.
efficient than simple random sampling
because it requires less time and effort. Once the sampling interval is
determined, selecting the sample members becomes straightforward.
One disadvantage of systematic sampling is that if there's a periodic pattern
or structure in the population, it may lead to a biased sample. Additionally, if
there are periodic changes within the population, systematic sampling may
Stratified random sampling
Stratified sampling designs involve partitioning
a population into strata based on a certain
characteristic that is known for every
sampling unit in the population, and then
selecting samples independently from each
stratum.
This design offers flexibility of sampling
methods in different strata and gains improved
precision of estimates of target parameters
when each stratum is composed of units that
are relatively homogenous.
Stratified sampling is a method used in
statistics and research to ensure that
subgroups within a population are represented
proportionally in the sample.
For Example:For research, the target market is
split into two strata based on gender, where
How stratified sampling works:
1.Identify Strata: First, you divide the population into distinct subgroups
called "strata." These strata should be mutually exclusive and collectively
exhaustive, meaning that every member of the population belongs to one and
only one stratum, and all possible members of the population are accounted
for in the strata.
2.Determine Proportions: Next, you determine the proportion of the
population that each stratum represents. This could be based on certain
characteristics or attributes that you're interested in studying.
3.Sample from Each Stratum: After determining the proportions, you then
take a sample from each stratum. The size of the sample from each stratum is
proportional to the size of the stratum relative to the population. This ensures
that each subgroup is adequately represented in the final sample.
4.Combine Samples: Finally, you combine the samples from each stratum to
form the complete sample for your study.
Stratified sampling is particularly useful when there are known differences or
variations within the population that could affect the outcome of the study. By
ensuring that each subgroup is represented in the sample, researchers can
Random sample

A random sample is a subset of individuals or items selected from a larger population in

such a way that every member of the population has an equal chance of being chosen.

Random sampling is a fundamental technique used in statistics and research to draw

conclusions about a population based on the characteristics of the sample.

When conducting research or analysis, random sampling helps ensure that the sample is
representative of the population, meaning that the characteristics and attributes of the
sample closely resemble those of the entire population.

This allows researchers to make valid inferences or generalizations about the population
based on the data collected from the sample.
Population Parameter
A parameter is a characteristic of a population. Population parameters are
numerical values that describe various characteristics of a population. These
parameters provide a summary of the entire population and are typically
unknown because it's often impractical or impossible to measure every
individual in the population. Instead, researchers use statistical techniques to
estimate these parameters based on data collected from a sample of the
population.
These parameters are used in statistical analysis to make inferences about the
population, test hypotheses, and draw conclusions. Estimating population
parameters from sample data involves using statistical methods such as point
estimation and interval estimation. Point estimation provides a single value
estimate for a population parameter, while interval estimation provides a range
of values within which the parameter is likely to lie, along with a level of
confidence.
Sample statics
Sample statistics are numerical values calculated from data collected
from a sample. These statistics provide information about the
characteristics of the sample and are used to estimate population
parameters or make inferences about the population as a whole.
Common sample statistics include the sample mean, sample standard
deviation, sample median, sample variance, etc. Sample statistics are
often used in statistical analysis to summarize data, test hypotheses, and
draw conclusions.
These sample statistics are used to estimate their corresponding
population parameters. However, sample statistics are subject to
sampling variability, meaning that they may differ from one sample to
another. The accuracy of sample statistics as estimators of population
parameters depends on factors such as sample size, sampling method,
and the representativeness of the sample.
PARAMETER VS STATISTICS:
(1) A parameter is a fixed measure describing the whole population (population is a
group of people or things with common characteristics). On the hand, a statistics
is a characteristics of a sample, a portion of target population.
(2) A parameter is fixed, unknown numerical value, while statistics is a known
number and a variable which depends on the portion of the population.
(3) Sample statistics and population parameter both use different statistical notations.
(4) Parameters never change while statistics do change.
(5) A parameter is a characteristic of a population and a statistic is a characteristic of
a sample.
(6) Statistic makes one guess about a population parameter based on a statistic
computed from sample.
Population Sample
Parameter Statics
Sample Size N n

Mean μ

Variance σ² s²

Standard σ s
deviation
Sample Mean
•The sample mean, denoted by is a measure of central tendency that represents the average
value of a set of data points in a sample.
•It provides a single numerical value that summarizes the center of the distribution of data
in the sample.
•The sample mean is calculated by summing up all the individual values in the sample and
dividing by the total number of values (sample size).
•Mathematically, it can be represented as:

•where:
• is the sample mean,
• xirepresents each individual data point,
• n is the total number of data points in the sample.
•The sample mean provides an estimate of the population mean when the sample is drawn
from a larger population.
•It represents the "typical" value or average value observed in the sample.
•The sample mean is influenced by the values of all data points in the sample.
•The sample mean is a point estimator, meaning it provides a single estimate of the
population mean.
•It is sensitive to outliers in the data, as extreme values can disproportionately influence
the calculation of the mean.
•The sample mean is an unbiased estimator of the population mean, meaning that, on
average, it provides an accurate estimate of the population mean when multiple samples
are drawn.
•It is used extensively in data analysis, inference, and decision-making
processes across various fields, including science, business, and social
sciences.
Sample Variance
•Sample variance can be defined as the expectation of the squared difference of data points
from the mean of the data set.
•It is an absolute measure of dispersion and is used to check the deviation of data points with
respect to the data's average.
•The formula to calculate sample variance,

where:
•is the sample variance,
•xirepresents each individual data point,
•is the sample mean,
•n is the total number of data points in the sample.
•Sample variance measures the average squared deviation of data points from the sample
mean.
•Larger variance values indicate greater variability among data points, while smaller values
suggest more consistency.
•Sample variance is commonly used in inferential statistics to estimate population variance.
•It serves as a crucial parameter in hypothesis testing and constructing confidence intervals.
•Sample variance can help identify outliers or extreme values that significantly affect the
overall variability of the sample.
•Sample variance helps assess the consistency or variability of data within a sample.
•Its importance extends across various fields, including scientific research,
business analytics, quality control, finance, and risk management.
Unbiased estimate
Definition:
• An unbiased estimate is a statistical estimator whose expected value is equal to the true
population parameter being estimated. In simpler terms, an unbiased estimator, when used
repeatedly, produces estimates that are on average equal to the true value of the parameter
being estimated.
Mathematical Definition:
• Mathematically, an estimator is unbiased if its expected value E() equals the true population
parameter θ: E()=θ
• is the estimator,
• E(represents the expected value of the estimator,
• θ is the true population parameter being estimated.
• This means that, on average, the estimator provides an accurate estimate of the population
parameter across multiple samples.
•For example, when estimating the population mean (μ) using the sample mean , the
sample mean is an unbiased estimator because: E()=μ
This means that, on average, the sample mean accurately estimates the population
mean.
•Unbiased estimators are desirable because they provide estimates that are not
systematically too high or too low.
•They provide a fair and consistent estimate of the population parameter across different
samples.
•Unbiasedness is a desirable property when evaluating the performance of estimators, as it
ensures that the estimator does not systematically overestimate or underestimate the
population parameter.
Efficient estimate
Definition:
An efficient estimator is a statistical estimator that achieves the smallest possible variance
among all unbiased estimators for a given sample size. In other words, an efficient estimator
minimizes the variability of estimates and provides the most precise estimate of the
population parameter.
An estimator is considered efficient if it achieves the smallest possible variance among all
unbiased estimators for a given sample size.
Mathematically, let be an estimator for a population parameter θ. The efficiency of is
determined by comparing its variance with the variances of all other unbiased estimators of
θ.
If is the estimator with the smallest variance among all unbiased estimators, it is said to be
efficient. In other words:
Var() ≤ Var(θ)
• Efficiency is a desirable property because it ensures that estimates are not only unbiased but
also precise.
• Efficient estimators produce estimates with the least amount of sampling variability, making
them more reliable and informative.
• Efficient estimates provide more reliable estimates of population parameters. They minimize
the variability in estimates, making them more consistent and accurate.
• By reducing variability in estimates, efficient estimators provide a clearer picture of the
underlying data. This leads to a better understanding of the phenomenon being studied.
• In predictive modeling and forecasting, precise estimates are essential for accurate
predictions. Efficient estimates contribute to better forecasting models and more reliable
predictions.

Download Lecture Notes in Computational Intelligence and Decision Making: Proceedings of the XV International Scientific Conference “Intellectual Systems of Decision Making and Problems of Computational Intelligence” (ISDMCI'2019), Ukraine, May 21–25, 2019 Volodymyr Lytvynenko ebook All Chapters PDF
100% (4)
Download Lecture Notes in Computational Intelligence and Decision Making: Proceedings of the XV International Scientific Conference “Intellectual Systems of Decision Making and Problems of Computational Intelligence” (ISDMCI'2019), Ukraine, May 21–25, 2019 Volodymyr Lytvynenko ebook All Chapters PDF
52 pages
Chap 1-4 Reviewer Psych Stats
No ratings yet
Chap 1-4 Reviewer Psych Stats
3 pages
UNIT II
No ratings yet
UNIT II
21 pages
Sampling Design
No ratings yet
Sampling Design
104 pages
A Generalized Family of Estimators For Estimating Population Mean Using Two Auxiliary Attributes
No ratings yet
A Generalized Family of Estimators For Estimating Population Mean Using Two Auxiliary Attributes
13 pages
Estimation Methods and Their Properties
100% (1)
Estimation Methods and Their Properties
46 pages
Sample Statistics: 2.1 Populations and Observations
No ratings yet
Sample Statistics: 2.1 Populations and Observations
23 pages
Modern Engineering Statistics 1st Edition Thomas P. Ryan - The ebook is available for instant download, read anywhere
100% (1)
Modern Engineering Statistics 1st Edition Thomas P. Ryan - The ebook is available for instant download, read anywhere
57 pages
Week 4 Sampling and Sampling Procedures
100% (1)
Week 4 Sampling and Sampling Procedures
47 pages
Ridge Regression
No ratings yet
Ridge Regression
24 pages
Chapter 1: Sampling and Sampling Distribution
No ratings yet
Chapter 1: Sampling and Sampling Distribution
36 pages
Newbold Stat7 Ism 07
No ratings yet
Newbold Stat7 Ism 07
28 pages
Collection of Data: Objectives
No ratings yet
Collection of Data: Objectives
7 pages
UNIT 3 BRM (1)
No ratings yet
UNIT 3 BRM (1)
22 pages
TD2
No ratings yet
TD2
4 pages
Median: This Article Is About The Statistical Concept. For Other Uses, See Median (Disambiguation)
No ratings yet
Median: This Article Is About The Statistical Concept. For Other Uses, See Median (Disambiguation)
14 pages
Begreber Note For Statistics
No ratings yet
Begreber Note For Statistics
17 pages
Q3 3I Methodology
No ratings yet
Q3 3I Methodology
40 pages
Gene Flow
No ratings yet
Gene Flow
7 pages
Demo Classes
No ratings yet
Demo Classes
23 pages
Caiib - Financial Management - Module - A Model Questions - (Set-I)
No ratings yet
Caiib - Financial Management - Module - A Model Questions - (Set-I)
22 pages
Introduction To Kernel Smoothing
No ratings yet
Introduction To Kernel Smoothing
24 pages
Stat II Material
No ratings yet
Stat II Material
87 pages
1466677135da-mod6-Q1-e-text
No ratings yet
1466677135da-mod6-Q1-e-text
11 pages
Module 16, Quadrant I
No ratings yet
Module 16, Quadrant I
14 pages
Sampling Techniques Autosaved .Pptx
No ratings yet
Sampling Techniques Autosaved .Pptx
80 pages
Sampling Techniques
No ratings yet
Sampling Techniques
25 pages
1741362494_L24-L26_NPEC595
No ratings yet
1741362494_L24-L26_NPEC595
27 pages
Sampling and Sampling Techniques
100% (1)
Sampling and Sampling Techniques
25 pages
The Primary Goal of Sampling Is To Create A Representative
No ratings yet
The Primary Goal of Sampling Is To Create A Representative
30 pages
Sampling
No ratings yet
Sampling
8 pages
Introduction To Statistical Sampling and Resampling
No ratings yet
Introduction To Statistical Sampling and Resampling
6 pages
Sampling English
No ratings yet
Sampling English
14 pages
Notes Ics - Ii
No ratings yet
Notes Ics - Ii
16 pages
chptr1 statistcs2
No ratings yet
chptr1 statistcs2
8 pages
Lesson 7
No ratings yet
Lesson 7
10 pages
Chapter One
No ratings yet
Chapter One
45 pages
Sampling MR
No ratings yet
Sampling MR
51 pages
Csir Net Mathematical Science June 2024 (Ifas Solved Paper) Part A
No ratings yet
Csir Net Mathematical Science June 2024 (Ifas Solved Paper) Part A
25 pages
Sampling Techniques Edited
No ratings yet
Sampling Techniques Edited
80 pages
BIOMETRY
No ratings yet
BIOMETRY
37 pages
Time Frequency Reassignment: A Review and Analysis Stephen Hainsworth - Malcolm Macleod CUED/F-INFENG/TR.459
No ratings yet
Time Frequency Reassignment: A Review and Analysis Stephen Hainsworth - Malcolm Macleod CUED/F-INFENG/TR.459
28 pages
F1 Sampling Methods
No ratings yet
F1 Sampling Methods
29 pages
Sampling and Sampling techniques
No ratings yet
Sampling and Sampling techniques
9 pages
Sampling Assignment
No ratings yet
Sampling Assignment
4 pages
Book Mixed Model Henderson
No ratings yet
Book Mixed Model Henderson
384 pages
Presentation On Sampling: Group Members
No ratings yet
Presentation On Sampling: Group Members
42 pages
introduction to statistics
No ratings yet
introduction to statistics
30 pages
Estimating The Sample Mean and Standard Deviation From The Sample Size, Median, Range And/or Interquartile Range
No ratings yet
Estimating The Sample Mean and Standard Deviation From The Sample Size, Median, Range And/or Interquartile Range
14 pages
CH7 - Sampling and Sampling Distributions
No ratings yet
CH7 - Sampling and Sampling Distributions
37 pages
Strength of Intact Rock and Rock Masses
No ratings yet
Strength of Intact Rock and Rock Masses
33 pages
Sampling
No ratings yet
Sampling
56 pages
Sampling
No ratings yet
Sampling
5 pages
STATISTICS-REVIEWER-SAMPLING-METHODS
No ratings yet
STATISTICS-REVIEWER-SAMPLING-METHODS
7 pages
NCM 111a Notes - 2
No ratings yet
NCM 111a Notes - 2
3 pages
Cluster Sampling
No ratings yet
Cluster Sampling
22 pages
Module 9-Sampling Design
No ratings yet
Module 9-Sampling Design
7 pages
Sampling Techniques Lecture
No ratings yet
Sampling Techniques Lecture
67 pages
Introduction To Statistical Machine Learning
No ratings yet
Introduction To Statistical Machine Learning
84 pages
SRM formula sheet
No ratings yet
SRM formula sheet
16 pages
Chapter 5 - Sampling
No ratings yet
Chapter 5 - Sampling
56 pages
Session 9
No ratings yet
Session 9
29 pages
Sampling Technique - 9A
No ratings yet
Sampling Technique - 9A
33 pages
9 Sampling
No ratings yet
9 Sampling
5 pages
DMDW 4
No ratings yet
DMDW 4
24 pages
IntroduEconometrics - MBA 525 - FEB2024
No ratings yet
IntroduEconometrics - MBA 525 - FEB2024
266 pages
Statistics c.1
No ratings yet
Statistics c.1
125 pages
Sampling Techniques TULIO JO GABRIEL
No ratings yet
Sampling Techniques TULIO JO GABRIEL
35 pages
Tổng Hợp BT Thống Kê (2) -Đã Gộp
No ratings yet
Tổng Hợp BT Thống Kê (2) -Đã Gộp
20 pages
Introduction To Sampling
No ratings yet
Introduction To Sampling
14 pages
RM 7
No ratings yet
RM 7
47 pages
Sampling (Method)
No ratings yet
Sampling (Method)
31 pages
62 - Ex 12A Populations and Samples
No ratings yet
62 - Ex 12A Populations and Samples
30 pages
Econometrics Slides
No ratings yet
Econometrics Slides
289 pages
Sampling
No ratings yet
Sampling
5 pages
DOANE - Stats Answer Key Chap 008
100% (1)
DOANE - Stats Answer Key Chap 008
73 pages
Type of Sampling and Data
No ratings yet
Type of Sampling and Data
40 pages
Sampling Theory - Assignment
No ratings yet
Sampling Theory - Assignment
15 pages
Point and Interval Estimation-26!08!2011
No ratings yet
Point and Interval Estimation-26!08!2011
28 pages
Nonparametric Methods: C Vi S A: I N M
No ratings yet
Nonparametric Methods: C Vi S A: I N M
73 pages
Eth Od S
No ratings yet
Eth Od S
17 pages
Sampling: Iiird Year Resident
No ratings yet
Sampling: Iiird Year Resident
26 pages
Sampling and Sampling Distributions: Mrs. Kiranmayi Patel
No ratings yet
Sampling and Sampling Distributions: Mrs. Kiranmayi Patel
35 pages
Sampling
No ratings yet
Sampling
5 pages
Sampling Techniques: of The Population Has A Chance of Being Included
No ratings yet
Sampling Techniques: of The Population Has A Chance of Being Included
10 pages
BUS 802 Assignment
No ratings yet
BUS 802 Assignment
9 pages
Donnie Marie Plaza - Sampling Techniques (March 06 2022)
No ratings yet
Donnie Marie Plaza - Sampling Techniques (March 06 2022)
34 pages
Newbold Chapter 7
No ratings yet
Newbold Chapter 7
62 pages
Supply Chain Management
No ratings yet
Supply Chain Management
35 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)

Sampling Thoery

Uploaded by

Sampling Thoery

Uploaded by

SAMPLING THEORY

- Population and sample

Feasibility of Data Collection:

One of the best probability sampling techniques that helps in

Cluster sampling is a sampling technique used in

A random sample is a subset of individuals or items selected from a larger population in

Random sampling is a fundamental technique used in statistics and research to draw

You might also like