0% found this document useful (0 votes)

7 views

Asm Assignment

Uploaded by

manan gupta

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Asm Assignment

Uploaded by

manan gupta

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

1.

Introduction
Sampling is a process through which we can analyse or study a small group of people from the large
group to derive conclusions and that are most likely to be applicable to all the people of large group. It
is used by academics to collect data from a small group rather than researching the full population,
which can be impracticable or unattainable.

Example:

i. For a research study you need to collect data. Let us suppose that as a researcher, you want
to study the association between role model of parents and undesirable behaviour of
children in a home for street children. For this, you have to select a few representative cases
from the home. The process of selection requires thorough knowledge of various sampling
techniques.
ii. Let us take a company which wants to understand consumer preferences for a new product.
Instead of surveying all potential customers, they select a random sample of 1000 people
from their target market. This sample is analyzed to gather inferences on helping company
to make informed decision.

Advantages of Sampling
2. Role of Sampling Theory in Statistical Inference
Sampling Theory plays a crucial role in statistical research by providing the framework to draw
conclusions about a population from a subset of observations (also called sample). In practical use cases,
it is in most cases impossible to collect data from an entire population, making sampling essential for
analysis and decision making.

Term Definition

Population The entire group of events that a Statistician wants to study and draw conclusions.

Sample A subset of population that is used to display the characteristics of a population.

Types of Sampling:

Random Sampling Every entity has an equal chance of being selected in the sample.

Stratified Sampling Population divided into strata, random samples taken from each strata.

Systematic Sampling Individuals selected at regular intervals from the list.

Cluster Sampling Population divided into clusters, clusters are selected randomly.

Central Limit Theorem(CLT):

The Central Limit Theorem states that the distribution of a sample mean will be approximately normally
distributed when the sample size is large enough. The CLT helps Staticians to make inferences using
the properties of normal distribution without knowing the underlying distribution of the population.

Role in Statistical Inference:

1. Estimation of Population Parameters: Sample mean and sample variance are used to estimate
population mean and population variance.

2. Hypothesis testing: Sample statistics and unbiased estimators are used in hypothesis testing.

3. Confidence Intervals: Sample statistics are also used in providing confident acceptance and
rejection to null and alternative hypotheses.

Example: Estimating Average Height of Students

Suppose we want to estimate the average height of students in BITS. Measuring everyone’s height is
not feasible. We select a random sample of 200 students and calculate their average height. This value
comes out to be 170cm.

Using sampling theory:

1. Population mean is 170cm. (Point Estimation)

2. 95% confidence interval is (168cm, 172cm). (Using CLT to assume Normality in the
background population)

3. Probability Sampling Techniques

Probability sampling technique is a sampling technique in which we select a small group of people from
a larger population and predict the likelihood that all their responses put together will match with the
overall population.

There are four main methods of probability sampling: -

1) Simple Random Sampling

●A type of probability sampling technique in which we randomly select a subset of participants

from a population.

● Example: If we want to survey 50 students out of 5000 students living in the hostels at BITS,
we assign a unique number to each student and use a random number generator to pick 50
students.

2) Systematic Sampling

●A type of probability sampling technique in which sample members from a larger population
are selected according to a random starting point but with a fixed, periodic interval.

●Example: If we wish to select 50 students out of 4,000 participants attending OASIS, we

randomly select the 7th student, and then we pick every 20th student (7, 27, 47, 67, and so on)
until we reach a total of 50.

3) Stratified Random Sampling

●A type of probability sampling technique in which the total population is divided into
homogenous groups (strata).

● It is of two types:

●Proportionate - the sample size for each group is proportional to its

size in the population.

● Disproportionate - the sample size can vary across groups.

●Example: BITS having both ug and pg students across multiple disciplines, if we want that all
programs are proportionally represented in our survey, we could divide the student population
into strata by department then further we can use random sampling.
4) Cluster Sampling

●A type of probability sampling technique in which we create multiple clusters of people from
a population where they are indicative of homogeneous characteristics and have an equal
chance of being a part of the sample.

● It is of three types:

● Single-stage - entire clusters are chosen.

●Two-stage - a random sample is taken from selected clusters.

●Multistage Sampling.

●Example: BITS has multiple hostels across its campus. Instead of surveying individual
students from all hostels, we could randomly select three hostels and survey all students from
those hostels.

Probability sampling technique Pros Cons

Simple random sampling ● Easy to implement ● Resource-intensive

● Unbiased selection

Systematic sampling ● Easier and faster to ● If population has any

implement. unseen cyclic pattern, it
● Provides a good spread can give biased result.
across the entire
population.
Stratified random sampling ● Reduces variability ● Not useful for
within strata, homogenous
improving precision. populations.
● Good for subgroup
analysis.
Cluster sampling ● More practical and less ● Less precision
expensive when ● Important subgroups
dealing with large can be underrepresented
population. or overlooked

Non-Probability Sampling Techniques

The non-probability sampling method is a technique in which the researcher/statistician selects the
sample based on subjective judgment rather than the random selection generally seen in probability
sampling. These are methods of selecting samples where not all members of a population have a known
or equal chance of being selected.

There are several advantages to using non-probability sampling techniques. They are often more cost-
effective and quicker to implement than probability-based methods. These techniques are particularly
useful in exploratory research or when the target population is difficult to access.

There are a few limitations to non-probability sampling techniques. First, there is a higher risk of
sampling bias since not all members of the population have an equal chance of being selected.
Additionally, the results may not be generalizable to the entire population due to the lack of randomness
in the selection process.

The main types of non-probability sampling methods are:

1. Convenience Sampling
This is the most basic form of non-probability sampling wherein the samples are selected from the
population directly which are conveniently available to the researcher. The samples are easy to select,
and hence the researcher does not choose the sample that represents the entire population.

Even though this is quick and convenient, there is a huge chance that the results may be biased.

● Example: A student researcher studying the eating habits of college students may survey
classmates or students at the campus library simply because they are easy to access and
willing to participate.

2. Purposive/Judgmental Sampling
Purposive sampling is based on the presumption that with good judgment the researcher can select the
sample units that are satisfactory in relation to one's requirements.

A common strategy of this sampling technique is to select cases that are judged to be typical of the
population in which one is interested, assuming that errors of judgment in the selection will tend to
counterbalance each other.

● Example: A student researcher examining the study habits of top-performing students

might select participants based on their CGPA, choosing only greater than 9 CGPA
students for interviews.

3. Quota Sampling
In the quota sampling method, the researcher forms a sample that involves the individuals to represent
the population based on specific traits or qualities. The researcher chooses the sample subsets that bring
the useful collection of data that generalizes the entire population.

This gives us better samples representing diverse units. However, the selection within each group is
non-random and based on convenience.

● Example: A student researcher studying the effect of social media on students’ academic
performance might set a quota to include 50 undergraduate students and 50 higher degree
students.
4. Snowball Sampling
Snowball sampling, also known as referral or respondent-driven sampling, is invaluable for accessing
hard-to-reach or elusive populations such as homeless people, teenagers, drug users, or other hidden
populations. Initial participants are recruited based on some criteria, who then refer others within their
networks, creating a snowball effect. The researchers only have control over the initial respondents who
in turn bring in more such respondents.

● Example: A student researcher investigating the experiences of international students may

start by interviewing a few known international students and ask them to refer others.

4. Standard Guidelines for Sample Data Collection

When conducting research using sampling methods, it is crucial to follow established ethical and
technical guidelines to ensure that the data collected is both valid and ethical. The following are some
key principles:

1. Ethical Guidelines
Ethical considerations ensure that participants are respected, their rights are protected, and the research
remains credible.

1.1. Informed Consent

Participants must be fully informed about the purpose of the research, what their
participation entails, and any potential risks or benefits. They should give their voluntary
consent to participate, with the option to withdraw at any point without facing any
consequences.
1.2. Confidentiality & Anonymity
It is the researcher’s responsibility to protect participants’ privacy. Any personal data
collected should be anonymized to ensure that individuals cannot be identified. Moreover,
researchers must ensure that the collected data is securely stored to prevent unauthorized
access.

1.3. Avoiding Harm:

Researchers must avoid causing any psychological, emotional, or physical harm to
participants. If sensitive topics are discussed, participants must be given the choice to refuse
or withdraw from such sections of the survey.

1.4. Right to Information

Participants have the right to know how their data will be used and whom it will be shared
with. They should also be informed of the outcomes, which ensures transparency in the
research process.

2. Technical Guidelines
Technical guidelines ensure that the sampling process is scientifically sound and yields reliable, valid
data for statistical inference.

2.1. Development of the Sampling Frame:

A sampling frame is a list or database from which the sample will be drawn. It must be
complete and representative of the target population to avoid selection bias.

2.2. Sampling Design

The sampling design refers to the method used to select the sample from the population.
Researchers should use probability sampling methods (e.g., simple random sampling,
stratified sampling) to ensure that each member of the population has an equal chance of
being selected. This reduces bias and increases the generalizability of the results.

2.3. Sample Size Determination

Determining the appropriate sample size is critical for ensuring reliable and valid results. If
the sample size is too small, the results may not be representative of the population, leading
to inaccurate inferences.

2.4. Pre-Testing the Data Collection Instrument

Before full-scale data collection, it is essential to test the survey or measurement tools on a
smaller sample to identify any issues with the questions or data collection process. This
process, known as pre-testing, helps improve the reliability and validity of the survey.

2.5. Data Quality Control

During data collection, it is important to ensure data quality by monitoring the collection
process for accuracy and consistency. This might involve training data collectors, setting
clear instructions for survey administration, and using technology (e.g., digital tools) to
minimize errors.
5. ENSURING I.I.D PROPERTY WHILE SAMPLING
Introduction:

When collecting data from a population, one of the key assumptions we make in statistical analysis is
that the data is independent and identically distributed (i.i.d). This helps us make sure that the results
we get from the data are accurate for the whole group. In simple terms, independent means that one
sample doesn’t affect another, and identically distributed means that all the samples follow the same
pattern or distribution.

What is the I.I.D. Property?

The i.i.d. property is made up of:

• Independent: This means that the selection of one sample does not influence the selection of
another. For example, flipping a coin multiple times is independent because each flip doesn’t affect
the next.

• Identically Distributed: All samples must come from the same probability distribution. For
instance, if we measure the height of students in a class, all heights follow the same distribution (e.g.,
normal distribution).

Example (R programming):

Example 1: I.I.D. Behavior in Sampling

Random Sampling: We draw two samples from the same normal distribution. Both samples are
independent of each other, and they follow the same probability distribution (mean = 50, sd = 10).

Results: The histograms overlap, showing that both samples follow the same distribution, and the
scatter plot does not show any clear pattern, indicating independence.

Example 2: Non-I.I.D. Behavior in Sampling

In this case, we draw samples from two different distributions, where one sample has a different mean
and standard deviation (mean = 60, sd = 15). This shows that the samples are not identically
distributed. Also sample 3 was taken along with sample 1 and sample 2 , which was dependent on
sample 1.

Results: The scatter plot shows a clear relationship between the two samples, indicating that they are
not independent, the Histograms do not overlap showing the samples are not identically distributed.

Ensuring Independence and identically distributed property

To ensure independence in sampling, each sample must be chosen without affecting the others.
Random sampling helps by giving everyone an equal chance of being selected. In small groups, if you
don’t return the samples after selection (sampling without replacement), future picks are affected, so
sampling with replacement is better to keep the choices independent. Bias in methods like quota
sampling and others can also ruin independence by making some samples more likely to be chosen. If
the samples aren’t independent, the results might not be reliable, leading to incorrect conclusions.
For identically distributed samples, all samples should follow the same pattern or distribution.
Stratified sampling, where the population is divided into groups and samples are taken from each,
helps achieve this. Removing outliers also ensures that the data is consistent. If the samples aren’t
identically distributed, the analysis might be biased, and the conclusions may not represent the true
population. Before sampling, make sure the population has consistent characteristics. After sampling,
use graphs like histograms or control charts to verify that the samples are distributed the same way.

6. The Process of Cleaning Data

Data cleaning is the most important preprocessing step in any statistical analysis especially of sampled
data. It is for checking the dataset and correcting errors, inconsistencies, and biases before any
meaningful conclusions are drawn from it. This process, commonly referred to as data pre-processing,
involves locating the missing values, detecting outliers, and the data has to be standardized so that it is
ready for robust analysis.

1. Handling Missing Data

Missing data arises very frequently in most datasets and has sources such as nonresponse in surveys,
malfunctioning sensors or corrupted data in devices. Missing data may introduce biases if not properly
handled. It also reduces the statistical power of an analysis if ignored or mishandled. Here are several
strategies to deal with missing data:

•Listwise Deletion: This method eliminates any observation that contains missing values. Although
quite intuitive, it can have the effect of reducing the sample size dramatically if many variables have
missing values. The use of listwise deletion is typically only advisable if data is missing completely at
random (MCAR), meaning the missingness relates to nothing in particular to the data.

•Imputation: A more elaborate version is imputation of missing values in with a plausible data set.
Perhaps the most intuitive method is mean imputation which simply is a replacement of missing values
by the average of observed values for that variable. The most elaborate would probably be multiple
imputation or MI, in which missing values are predicted using regression models based on their
relationship with other variables.

o Example: In a dataset of employee ages, missing values might be imputed by examining

relationships between age and years of experience.

Both methods have their strengths and weaknesses, but multiple imputation tends to be more effective
in preserving the underlying structure of the data, thereby minimizing bias.

2. Outlier Detection
Outliers are values which are very far from other values. They may be the result of differences in data
collection, or they may simply be the errors that occurred during data collection. Outliers need to be
treated carefully because it could mislead statistical inferences, particularly if they relate to summary
measures like mean or standard deviation. Here are several methods for the detection of outliers:

• Z-Scores: A Z-score measures the distance of a data point from the mean, expressed in standard
deviations. A Z-score greater than ±3 is often considered an outlier in normally distributed data.

• Interquartile Range (IQR): This method identifies outliers as values that lie beyond 1.5 times
the IQR, which is the range between the first and third quartiles (Q1 and Q3).
o Example: If analysing housing prices, homes priced far above the upper quartile might be
identified as outliers.

Outliers can be handled by Winsorization, where extreme values are replaced with the nearest valid
values, or by log transformation, which reduces the impact of outliers in right-skewed data, such as
income.

3. Data Pre-Processing
Before running any statistical models, it is important to ensure that the data is in a consistent format and
prepared for analysis. Pre-processing involves several steps:

• Standardization: Most of the statistics techniques, including linear regression, expect variables
of interest to be measured on comparable scales. Data are rescaled so that its mean would now be 0 and
the standard deviation would be 1 during standardization. This goes well in managing variables which
are measured in different units or magnitudes, like height in centimeters and weight in kilograms.

• Normalization: This technique scales the data to a fixed range, often between 0 and 1.
Normalization is crucial when using algorithms sensitive to scale, such as k-nearest neighbours.

• Encoding Categorical Data: Categorical variables, such as gender or region, need to be

converted into a numerical format for use in most statistical models. One-hot encoding is commonly
used, where each category is represented by a binary variable.

o Example: In a dataset of survey responses, one-hot encoding would create separate columns for
each response category, assigning a 1 or 0 depending on the respondent’s answer.

4. Ensuring Data Validity

After cleaning, it's essential to verify that the cleaned dataset meets the assumptions of the analysis. A
critical assumption in most statistical analyses is that the data are independent and identically distributed
(i.i.d.). This means that each observation is independent of the others and drawn from the same
distribution. Violations of this assumption can lead to biased estimates.

•Example: In time series data, the presence of autocorrelation might violate the i.i.d. assumption.
Transformations like differencing can restore this property.

Another important check is normality. Many statistical models assume that the data are normally
distributed.

5. Tools for Data Cleaning

Several software tools and programming languages can facilitate the data cleaning process:

•Python: Libraries like pandas, NumPy, and scikit-learn are effective for handling missing data,
detecting outliers, and transforming variables.

•R: Packages such as dplyr and tidyverse excel in data manipulation and cleaning.

•MATLAB: Offers strong built-in functions for data cleaning and pre-processing, particularly in
engineering and scientific fields.

By leveraging these tools, researchers can streamline the data cleaning process, making it more efficient
and accurate.

M. Ganesh - Introduction To Fuzzy Sets and Fuzzy Logic (PHI) - Libgen - Li
100% (1)
M. Ganesh - Introduction To Fuzzy Sets and Fuzzy Logic (PHI) - Libgen - Li
235 pages
MAS202 - Homework For Chapters 1-2-3
100% (1)
MAS202 - Homework For Chapters 1-2-3
19 pages
Sampling Techniques
No ratings yet
Sampling Techniques
17 pages
Sample and Sampling Method
No ratings yet
Sample and Sampling Method
7 pages
Hand Outs Lesson 16 Pr2
No ratings yet
Hand Outs Lesson 16 Pr2
5 pages
2.0 Methods of Sampling and Their Comparison
No ratings yet
2.0 Methods of Sampling and Their Comparison
26 pages
Xsampling Techniques
No ratings yet
Xsampling Techniques
4 pages
Introduction: Demystifying The Art of Sampling
No ratings yet
Introduction: Demystifying The Art of Sampling
9 pages
SAMPLING - Probability and Non Probability
No ratings yet
SAMPLING - Probability and Non Probability
11 pages
CHAP 5 SAMPLING
No ratings yet
CHAP 5 SAMPLING
11 pages
Revised - RMT - Unit 4 - Sampling Technique
No ratings yet
Revised - RMT - Unit 4 - Sampling Technique
56 pages
CFSD301 Lecture Notes Week 3 Document 3
No ratings yet
CFSD301 Lecture Notes Week 3 Document 3
12 pages
Lecture 8-Sampling Techniques
No ratings yet
Lecture 8-Sampling Techniques
12 pages
Sample and Sampling Process
No ratings yet
Sample and Sampling Process
37 pages
Sampling Techniques Ahana Das
No ratings yet
Sampling Techniques Ahana Das
13 pages
Sampling Techniques in Reserarch Methodolgy
No ratings yet
Sampling Techniques in Reserarch Methodolgy
10 pages
Sampling Lesson Notes
No ratings yet
Sampling Lesson Notes
5 pages
Sampling and Hypothesis Testing
No ratings yet
Sampling and Hypothesis Testing
37 pages
Xsampling Techniques
No ratings yet
Xsampling Techniques
4 pages
Type of Sampling and Data
No ratings yet
Type of Sampling and Data
40 pages
Sampling Techniques
No ratings yet
Sampling Techniques
25 pages
Sampling
No ratings yet
Sampling
9 pages
Statistics Is The Science Concerned With Developing and Studying Methods For Collecting, Analyzing, Interpreting and Presenting Empirical Data
No ratings yet
Statistics Is The Science Concerned With Developing and Studying Methods For Collecting, Analyzing, Interpreting and Presenting Empirical Data
4 pages
DEJI'S QTS 223 (Dr. Ajayi)
No ratings yet
DEJI'S QTS 223 (Dr. Ajayi)
2 pages
Probability and Nonprobability Sampling
No ratings yet
Probability and Nonprobability Sampling
7 pages
Lecture 05
No ratings yet
Lecture 05
29 pages
Sampling Techniques: Module 4: Lesson 2
No ratings yet
Sampling Techniques: Module 4: Lesson 2
13 pages
Sampling
No ratings yet
Sampling
34 pages
Sampling TECHNIQUES MATH IN MODERN WORLD PREFINALS
No ratings yet
Sampling TECHNIQUES MATH IN MODERN WORLD PREFINALS
7 pages
sc TYPES OF SAMPLING TECHNIQUES
No ratings yet
sc TYPES OF SAMPLING TECHNIQUES
7 pages
Methods-Of-Sampling (BRM)
No ratings yet
Methods-Of-Sampling (BRM)
10 pages
NCM 111a Notes - 2
No ratings yet
NCM 111a Notes - 2
3 pages
Module 3
No ratings yet
Module 3
48 pages
RM Lectures-1
No ratings yet
RM Lectures-1
50 pages
Sampling
No ratings yet
Sampling
24 pages
Sampling Procedure
No ratings yet
Sampling Procedure
11 pages
Eth Od S
No ratings yet
Eth Od S
17 pages
SAMPLING
No ratings yet
SAMPLING
3 pages
Chapter IIIa
No ratings yet
Chapter IIIa
51 pages
Cse-613 - Mod 4
No ratings yet
Cse-613 - Mod 4
97 pages
Portion 3
No ratings yet
Portion 3
32 pages
Sampling Reading Material
No ratings yet
Sampling Reading Material
3 pages
Pangilinan - Rodel - Med001 - Final Requiremnets
No ratings yet
Pangilinan - Rodel - Med001 - Final Requiremnets
3 pages
Tosin ass
No ratings yet
Tosin ass
24 pages
Tổng Hợp BT Thống Kê (2) -Đã Gộp
No ratings yet
Tổng Hợp BT Thống Kê (2) -Đã Gộp
20 pages
Unit No. 4- Sampling
No ratings yet
Unit No. 4- Sampling
22 pages
Midterm Concept 3 SAMPLING DESIGN AND TECHNIQUE NR1 MIDTERMS
No ratings yet
Midterm Concept 3 SAMPLING DESIGN AND TECHNIQUE NR1 MIDTERMS
5 pages
Stat 2 unit 1 (1)
No ratings yet
Stat 2 unit 1 (1)
42 pages
BRM Unit 3
No ratings yet
BRM Unit 3
42 pages
PR 2 2nd Quarter
No ratings yet
PR 2 2nd Quarter
3 pages
CH 7
No ratings yet
CH 7
11 pages
Coding of Data Set
No ratings yet
Coding of Data Set
105 pages
Unit-V: Sampling
No ratings yet
Unit-V: Sampling
58 pages
sampling and inferential statistics
No ratings yet
sampling and inferential statistics
9 pages
Sampling Techniques
No ratings yet
Sampling Techniques
35 pages
Sampling Techniques
No ratings yet
Sampling Techniques
1 page
Sampling Technique S
No ratings yet
Sampling Technique S
18 pages
Lec 6 Sampling Techniques
No ratings yet
Lec 6 Sampling Techniques
28 pages
RESEARCH DEVELOPMENT Lesson 6
No ratings yet
RESEARCH DEVELOPMENT Lesson 6
17 pages
Sampling Methods
No ratings yet
Sampling Methods
11 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
Engine Performance
No ratings yet
Engine Performance
21 pages
Chapter VARS Chennu
No ratings yet
Chapter VARS Chennu
43 pages
Tillnowmergedfin
No ratings yet
Tillnowmergedfin
254 pages
1 Module 2 BAV 20 01 2024 For Class
No ratings yet
1 Module 2 BAV 20 01 2024 For Class
51 pages
Measure of Position For Ungrouped Data
No ratings yet
Measure of Position For Ungrouped Data
10 pages
PMBA502 Fall 2021 3a - Stats
No ratings yet
PMBA502 Fall 2021 3a - Stats
94 pages
Relative Frequency (%) Frequency Total Frequency ×100
0% (1)
Relative Frequency (%) Frequency Total Frequency ×100
10 pages
Assignment 2 CS Sec#4
No ratings yet
Assignment 2 CS Sec#4
5 pages
4-Unit9 Statistics
No ratings yet
4-Unit9 Statistics
17 pages
Maths Literacy Grade 12 Trial 2021 P1 and Memo
No ratings yet
Maths Literacy Grade 12 Trial 2021 P1 and Memo
27 pages
??statistical Concepts That Every Data Scientist Should Know?? - ??? - ?!! - by Dhilip Maharish - AI Mind
No ratings yet
??statistical Concepts That Every Data Scientist Should Know?? - ??? - ?!! - by Dhilip Maharish - AI Mind
22 pages
Lesson 6 - Measures of Location
No ratings yet
Lesson 6 - Measures of Location
14 pages
IT Skills
No ratings yet
IT Skills
32 pages
Teacher Learning Area Teaching Date & Time Quarter: Grades 1 To 12 Daily Lesson LOG
No ratings yet
Teacher Learning Area Teaching Date & Time Quarter: Grades 1 To 12 Daily Lesson LOG
11 pages
Module 4 PDF
No ratings yet
Module 4 PDF
15 pages
Activity 1 4TH
No ratings yet
Activity 1 4TH
1 page
Summative Test in Grade 10 Math
No ratings yet
Summative Test in Grade 10 Math
5 pages
IPE 333 - Sheet-1
No ratings yet
IPE 333 - Sheet-1
11 pages
Math10WS Q4 Week2a
No ratings yet
Math10WS Q4 Week2a
8 pages
Statistics
No ratings yet
Statistics
55 pages
Lecture 4 - Data Wrangling
No ratings yet
Lecture 4 - Data Wrangling
41 pages
Descriptive Statistics - Grouped Data and Graphs - Math403 - EDA
No ratings yet
Descriptive Statistics - Grouped Data and Graphs - Math403 - EDA
42 pages
Statistics II Quizes
No ratings yet
Statistics II Quizes
19 pages
Mastery Test Math 10
No ratings yet
Mastery Test Math 10
3 pages
Lind 18e Chap004 PPT
No ratings yet
Lind 18e Chap004 PPT
28 pages
Ic50 Calculation and Analysis
No ratings yet
Ic50 Calculation and Analysis
34 pages
G9 - Statistics - Cumulative Frequency Measuring The Spread Box Plot Freq Density
No ratings yet
G9 - Statistics - Cumulative Frequency Measuring The Spread Box Plot Freq Density
8 pages
Statistics With R Fall 20180912 PDF
No ratings yet
Statistics With R Fall 20180912 PDF
101 pages
DWDM Unit 1 Chap2 PDF
No ratings yet
DWDM Unit 1 Chap2 PDF
21 pages
Types of Data
No ratings yet
Types of Data
68 pages
Stat Module 3.2
No ratings yet
Stat Module 3.2
16 pages
Dsbdal Lab Manual
No ratings yet
Dsbdal Lab Manual
107 pages
Lind 10e Chap04
No ratings yet
Lind 10e Chap04
30 pages

Asm Assignment

Uploaded by

Asm Assignment

Uploaded by

1.

Sample A subset of population that is used to display the characteristics of a population.

Systematic Sampling Individuals selected at regular intervals from the list.

Central Limit Theorem(CLT):

Role in Statistical Inference:

Example: Estimating Average Height of Students

Using sampling theory:

3. Probability Sampling Techniques

There are four main methods of probability sampling: -

1) Simple Random Sampling

●A type of probability sampling technique in which we randomly select a subset of participants

●Example: If we wish to select 50 students out of 4,000 participants attending OASIS, we

3) Stratified Random Sampling

●Proportionate - the sample size for each group is proportional to its

● Disproportionate - the sample size can vary across groups.

● Single-stage - entire clusters are chosen.

●Two-stage - a random sample is taken from selected clusters.

Probability sampling technique Pros Cons

Simple random sampling ● Easy to implement ● Resource-intensive

Systematic sampling ● Easier and faster to ● If population has any

Non-Probability Sampling Techniques

The main types of non-probability sampling methods are:

● Example: A student researcher examining the study habits of top-performing students

● Example: A student researcher investigating the experiences of international students may

4. Standard Guidelines for Sample Data Collection

1.1. Informed Consent

1.3. Avoiding Harm:

1.4. Right to Information

2.1. Development of the Sampling Frame:

2.2. Sampling Design

2.3. Sample Size Determination

2.4. Pre-Testing the Data Collection Instrument

2.5. Data Quality Control

What is the I.I.D. Property?

The i.i.d. property is made up of:

Example 1: I.I.D. Behavior in Sampling

Example 2: Non-I.I.D. Behavior in Sampling

Ensuring Independence and identically distributed property

6. The Process of Cleaning Data

1. Handling Missing Data

o Example: In a dataset of employee ages, missing values might be imputed by examining

• Encoding Categorical Data: Categorical variables, such as gender or region, need to be

4. Ensuring Data Validity

5. Tools for Data Cleaning

You might also like