Statistics Notes For CLT & Sampling Techniques
Statistics Notes For CLT & Sampling Techniques
Central limit theorem is a statistical theory which states that when the large sample size has a finite
variance, the samples will be normally distributed and the mean of samples will be approximately equal
to the mean of the whole population.
In other words, the central limit theorem states that for any population with mean and standard deviation,
the distribution of the sample mean for sample size N has mean μ and standard deviation σ / √n .
As the sample size gets bigger and bigger, the mean of the sample will get closer to the actual population
mean. If the sample size is small, the actual distribution of the data may or may not be normal, but as the
sample size gets bigger, it can be approximated by a normal distribution. This statistical theory is useful in
simplifying analysis while dealing with stock indexes and many more.
The CLT can be applied to almost all types of probability distributions. But there are some exceptions. For
example, if the population has a finite variance. Also, this theorem applies to independent, identically
distributed variables. It can also be used to answer the question of how big a sample you want. Remember
that as the sample size grows, the standard deviation of the sample average falls because it is the
population standard deviation divided by the square root of the sample size. This theorem is an important
topic in statistics. In many real-time applications, a certain random variable of interest is a sum of a large
number of independent random variables. In these situations, we can use the CLT to justify using the
normal distribution.
The steps used to solve the problem of the central limit theorem that are either involving ‘>’ ‘<’ or
“between” are as follows:
1) The information about the mean, population size, standard deviation, sample size and a number that
is associated with “greater than”, “less than”, or two numbers associated with both values for a range of
“between” is identified from the problem.
4) The z-table is referred to find the ‘z’ value obtained in the previous step.
Step 3 is executed.
The last step is common to all three cases, that is to convert the decimal obtained into a percentage.
Example 1:
20 students are selected at random from a clinical psychology class, find the probability that their mean
GPA is more than 5. If the average GPA scored by the entire batch is 4.91. The standard deviation is 0.72.
Properties of the Central Limit Theorem:
We can summarize the properties of the Central Limit Theorem for sample means with the following
statements:
1. Sampling is a form of any distribution with mean and standard deviation.
2. Provided that n is large (n≥30), as a rule of thumb), the sampling distribution of the sample mean will
be approximately normally distributed with a mean and a standard deviation is equal to σ/√n.
3. If the sampling distribution is normal, the sampling distribution of the sample means will be an exact
normal distribution for any sample size.
Sampling Techniques/Methods:
Sampling is an essential part of any research project. The right sampling method can make or
break the validity of your research, and it’s essential to choose the right method for your specific
question. In this article, we’ll take a closer look at some of the most popular sampling methods
and provide real-world examples of how they can be used to gather accurate and reliable data.
From simple random sampling to complex stratified sampling, we’ll explore each method’s pros,
cons, and best practices. So, whether you’re a seasoned researcher or just starting your journey,
this article is a must-read for anyone looking to master sampling methods. Let’s get started!
What is sampling?
Sampling is a technique of selecting individual members or a subset of the population to make
statistical inferences from them and estimate the characteristics of the whole population.
Different sampling methods are widely used by researchers in market research so that they do
not need to research the entire population to collect actionable insights.
It is also a time-convenient and cost-effective method and hence forms the basis of
any research design. Sampling techniques can be used in research survey software for optimum
derivation.
For example, suppose a drug manufacturer would like to research the adverse side effects of a
drug on the country’s population. In that case, it is almost impossible to conduct a research
study that involves everyone. In this case, the researcher decides on a sample of people from
each demographic and then researches them, giving him/her indicative feedback on the drug’s
behavior.
Types of sampling: sampling methods
Sampling in market action research is of two types – probability sampling and non-probability
sampling. Let’s take a closer look at these two methods of sampling.
This blog discusses the various probability and non-probability sampling methods you can
implement in any market research study.
For example, in a population of 1000 members, every member will have a 1/1000 chance of
being selected to be a part of a sample. Probability sampling eliminates sampling bias in the
population and allows all members to be included in the sample.
Simple random sampling: One of the best probability sampling techniques that helps in
saving time and resources is the Simple Random Sampling method. It is a reliable
method of obtaining information where every single member of a population is chosen
randomly, merely by chance. Each individual has the same probability of being chosen
to be a part of a sample.
For example, in an organization of 500 employees, if the HR team decides on conducting
team-building activities, they would likely prefer picking chits out of a bowl. In this case,
each of the 500 employees has an equal opportunity of being selected.
Cluster sampling: Cluster sampling is a method where the researchers divide the entire
population into sections or clusters representing a population. Clusters are identified
and included in a sample based on demographic parameters like age, sex, location, etc.
This makes it very simple for a survey creator to derive effective inferences from the
feedback.
For example, suppose the United States government wishes to evaluate the number of
immigrants living in the Mainland US. In that case, they can divide it into clusters based
on states such as California, Texas, Florida, Massachusetts, Colorado, Hawaii, etc. This
way of conducting a survey will be more effective as the results will be organized into
states and provide insightful immigration data.
Systematic sampling: Researchers use the systematic sampling method to choose the
sample members of a population at regular intervals. It requires selecting a starting
point for the sample and sample size that can be repeated at regular intervals. This type
of sampling method has a predefined range; hence, this sampling technique is the least
time-consuming.
For example, a researcher intends to collect a systematic sample of 500 people in a
population of 5000. He/she numbers each element of the population from 1-5000 and
will choose every 10th individual to be a part of the sample (Total population/ Sample
Size = 5000/500 = 10).
Stratified random sampling: Stratified random sampling is a method in which the
researcher divides the population into smaller groups that don’t overlap but represent
the entire population. While sampling, these groups can be organized, and then draw a
sample from each group separately.
For example, a researcher looking to analyze the characteristics of people belonging to
different annual income divisions will create strata (groups) according to the annual
family income. Eg – less than $20,000, $21,000 – $30,000, $31,000 to $40,000, $41,000
to $50,000, etc. By doing this, the researcher concludes the characteristics of people
belonging to different income groups. Marketers can analyze which income groups to
target and which ones to eliminate to create a roadmap that would bear fruitful results.
Reduce Sample Bias: Using the probability sampling method, the bias in the sample
derived from a population is negligible to non-existent. The sample selection mainly
depicts the researcher’s understanding and inference. Probability sampling leads to
higher-quality data collection as the sample appropriately represents the population.
Diverse Population: When the population is vast and diverse, it is essential to have
adequate representation so that the data is not skewed toward one demographic. For
example, suppose Square would like to understand the people that could make their
point-of-sale devices. In that case, a survey conducted from a sample of people across
the US from different industries and socio-economic backgrounds helps.
Create an Accurate Sample: Probability sampling helps the researchers plan and create
an accurate sample. This helps to obtain well-defined data.
The non-probability method is a sampling method that involves a collection of feedback based
on a researcher or statistician’s sample selection capabilities and not on a fixed selection
process. In most situations, the output of a survey conducted with a non-probable sample leads
to skewed results, which may not represent the desired target population. But, there are
situations, such as the preliminary stages of research or cost constraints for conducting
research, where non-probability sampling will be much more useful than the other type.
Four types of non-probability sampling explain the purpose of this sampling method in a better
manner:
Convenience sampling: This method depends on the ease of access to subjects such as
surveying customers at a mall or passers-by on a busy street. It is usually termed
as convenience sampling because of the researcher’s ease of carrying it out and getting
in touch with the subjects. Researchers have nearly no authority to select the sample
elements, and it’s purely done based on proximity and not representativeness. This non-
probability sampling method is used when there are time and cost limitations in
collecting feedback. In situations with resource limitations, such as the initial stages of
research, convenience sampling is used.
For example, startups and NGOs usually conduct convenience sampling at a mall to
distribute leaflets of upcoming events or promotion of a cause – they do that by
standing at the mall entrance and giving out pamphlets randomly.
Judgmental or purposive sampling: Judgmental or purposive samples are formed at the
researcher’s discretion. Researchers purely consider the purpose of the study, along
with the understanding of the target audience. For instance, when researchers want to
understand the thought process of people interested in studying for their master’s
degree. The selection criteria will be: “Are you interested in doing your masters in …?”
and those who respond with a “No” are excluded from the sample.
Snowball sampling: Snowball sampling is a sampling method that researchers apply
when the subjects are difficult to trace. For example, surveying shelterless people or
illegal immigrants will be extremely challenging. In such cases, using the snowball
theory, researchers can track a few categories to interview and derive results.
Researchers also implement this sampling method when the topic is highly sensitive and
not openly discussed—for example, surveys to gather information about HIV Aids. Not
many victims will readily respond to the questions. Still, researchers can contact people
they might know or volunteers associated with the cause to get in touch with the
victims and collect information.
Quota sampling: In Quota sampling, members in this sampling technique selection
happens based on a pre-set standard. In this case, as a sample is formed based on
specific attributes, the created sample will have the same qualities found in the total
population. It is a rapid method of collecting samples.
Difference between probability sampling and non-probability sampling methods
Alternatively
Random sampling method. Non-random sampling method
Known as
Population
The population is selected randomly. The population is selected arbitrarily.
selection
Since there is a method for deciding the Since the sampling method is arbitrary, the
Sample sample, the population demographics are population demographics representation is
conclusively represented. almost always skewed.
Takes longer to conduct since the research This type of sampling method is quick since
Time Taken design defines the selection parameters neither the sample nor the selection criteria of
before the market research study begins. the sample are undefined.