0% found this document useful (0 votes)
0 views

DAT100_Int_Data_Ana_Lec4_Obtaining_Data

The document provides an overview of data and statistics, emphasizing the importance of understanding populations and samples in data collection. It outlines methods for obtaining data, including observational and experimental approaches, as well as various sampling techniques such as random, stratified, and convenience sampling. Additionally, it distinguishes between descriptive and inferential statistics, highlighting their roles in analyzing and interpreting data.

Uploaded by

Bahaa Mohd
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

DAT100_Int_Data_Ana_Lec4_Obtaining_Data

The document provides an overview of data and statistics, emphasizing the importance of understanding populations and samples in data collection. It outlines methods for obtaining data, including observational and experimental approaches, as well as various sampling techniques such as random, stratified, and convenience sampling. Additionally, it distinguishes between descriptive and inferential statistics, highlighting their roles in analyzing and interpreting data.

Uploaded by

Bahaa Mohd
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

DAT 100 - Introduction to Data

Analytics
Lecture 4 – Obtaining Data

Dr. Ghazi Al-Naymat January 24, 2025

1
Data and Statistics

Data consists of information coming


from observations, counts,
measurements, or responses.
Statistics is the science of collecting,
organizing, analyzing, and interpreting data
in order to make decisions.
A population is the collection of all
outcomes, responses, measurement, or
counts that are of interest.
A sample is a subset of a population.

2
Population

The population can be seen as those who you care about, or related to your
question.

Population involve those who we are trying to talk about.

Example:
 If you are trying to test if smoking leads to heart disease, your population would be:
the smokers of the world.

3
Data Sample

Consider the case of smokers, can we collect data for all smokers in the world!?

We need to take a sample of the population.

Sample can be defined as a subset of the population.

4
Populations & Samples

Example:
In a recent survey, 250 college students at Union College were asked if they
smoked cigarettes regularly. 35 of the students said yes. Identify the
population and the sample.

Responses of all students


at Union College
(population)

Responses of
students in survey
(sample)

5
Designing a Statistical Study

 GUIDELINES
1. Identify the variable(s) of interest (the focus) and the
population of the study.
2. Develop a detailed plan for collecting data. If you use
a sample, make sure the sample is representative of
the population.
3. Collect the data.
4. Describe the data.
5. Interpret the data and make decisions about the
population using inferential statistics.
6. Identify any possible errors.

6
Obtaining data

There are two main ways of collecting data for


our analysis:

1. Observational
2. Experimental.

Other method
3. Simulation
4. Survey.

7
Scientific Method

8
Observational

Which consists of measuring specific characteristics but not attempting to


modify the subjects being studied

For example
 you have a tracking software on your website that observes users' behavior on the website,
such as
 length of time spent on certain pages
 the rate of clicking on ads,
all should be done while not affecting the user's experience, then that would be
an observational study.
All you have to do is observe and collect data

9
Experimental

An experiment consists of a treatment and the observation of its effect on the
subjects.

Subjects in an experiment are called experimental units.

This is usually how most scientific labs collect data.

They will put people into two or more groups (usually just two) and call them
the control and the experimental group.

1
0
Experimental

The control group is exposed to a certain environment and then observed.

The experimental group is then exposed to a different environment and then


observed.

Data is aggregated from both groups.

A decision is made about which environment was more favorable (favorable is a


quality that the experimenter gets to decide).

1
1
Other Methods of Data Collection

A simulation is the use of a mathematical or


physical model to reproduce the conditions of
a situation or process.
A survey is an investigation of one or more
characteristics of a population.

A census is a measurement of an entire population.

A sampling is a measurement of part of a population.

1
2
Marketing example

Consider that we expose half of our users to a certain landing page with certain
images and a certain style (website A)
We measure whether or not they sign up for the service.
Then, we expose the other half to a different landing page, different images,
and different styles (website B)
We measure whether or not they sign up.
We can then decide which of the two sites performed better and should be
used going further.
This, specifically, is called an A/B test.

1
3
Sampling data

Who gets the honor of being in the sample that we measure.

Probability sampling
Vs
Random sampling

1
4
Probability sampling

Probability sampling is a way of sampling from a population.

Every person has a known probability of being chosen but that number might
be a different probability than another user.

This is uncommon practice in sampling

1
5
Random sampling

A random sample is chosen such that every single member of a population has
an equal chance of being chosen as any other member.

Random sampling is an effective way of avoiding Sampling bias.

A sampling bias occurs when the way the sample is obtained systemically
favors some outcome over the target outcome.

1
6
Example

Suppose that we want to compare two groups A and B:


 Group A: will get text message for a specific service
 Group B: will get a phone call for the same specific service

The question is which of the following is considered as a random sampling:


 People lives in Dubai are placed in group A, while People lives in Ajman are placed
Probability Sampling
in group B

 People who visit the market place between 7 p.m. and 4 a.m. get in A, while the rest are placed
in group B Random Sampling

 The first half of our list will be in Group A while the rest will be placed in Group B.
Random Sampling at first

1
7
Unequal probability sampling

Suppose we are interested in measuring the happiness level doctors in the


world.
We already know that we can't ask every single doctor.
So, we need to take a sample.
Obviously we will do a random sampling
But then someone asks a harmless question does anyone know the percentage
of men/women among the doctors?
What if we have 70% men vs 30% women?
Data balance/ imbalance??????????

1
8
If we do a random sampling the results will favor men over women.

To combat this, we can favor including more women than men in our survey in
order to make the split of our sample less favored for men.

Therefor, it is important to have the population before choosing your sampling


method.

1
9
Other Sampling methods

Stratified sample
Cluster sample
Systematic sample
Convenience sample

2
0
Stratified Samples

A stratified sample has members from each


segment of a population. This ensures that
each segment from the population is
represented.

Freshmen Sophomores Juniors Seniors


Cluster Samples

A cluster sample has all members from


randomly selected segments of a population.
This is used when the population falls into
naturally occurring subgroups.

All members in
each selected
group are
used.

The city of Clarksville divided into city blocks.


Systematic Samples

A systematic sample is a sample in which


each member of the population is assigned a
number. A starting number is randomly
selected and sample members are selected at
regular intervals.

Every fourth member is chosen.


Convenience Samples

A convenience sample consists only of


available members of the population.

Definition 2: A non-probability sampling method


where units are selected for inclusion in the sample
because they are the easiest for the researcher to
access. This can be due to geographical proximity,
availability at a given time, or willingness to
participate in the research.

Continued.
Example : Identifying the Sampling Technique

You are doing a study to determine the number of years of education each teacher at your college
has. Identify the sampling technique used if you select the samples listed.

1.) You randomly select two different departments and survey each teacher
in those departments.
This is a cluster sample because each department is a naturally occurring
subdivision.
2.) You select only the teachers you currently
have
This is athis semester. sample because you are using the teachers that are
convenience
readily available to you.
3.) You divide the teachers up according to their department and then choose
and survey some teachers in each department.
This is a stratified sample because the teachers are divided by department and
some from each department are randomly selected.
Branches of Statistics

The study of statistics has two major


branches: descriptive statistics and
inferential statistics.
Statistics

Descriptive Inferential
statistics statistics
Involves the Involves using a
organization, sample to draw
summarization, conclusions about
and display of a population.
data.
2
7
Descriptive and Inferential Statistics

Example:
In a recent study, volunteers who had less than 6 hours of
sleep were four times more likely to answer incorrectly on a
Data Analytics test than participants who had at least 8 hours
of sleep. Decide which part is the descriptive statistic and
what conclusion might be drawn using inferential statistics.
• The statement “four times more likely to answer
incorrectly” is a descriptive statistic.
• An inference drawn from the sample is that all
individuals sleeping less than 6 hours are more likely to
answer Data Analytics questions incorrectly than
individuals who sleep at least 8 hours.

2
8
The Data Analytics Process – Get the data

2
9
Click to edit
Master title style
10/3/2023
10/3/2023 30

You might also like