DAT100_Int_Data_Ana_Lec4_Obtaining_Data
DAT100_Int_Data_Ana_Lec4_Obtaining_Data
Analytics
Lecture 4 – Obtaining Data
1
Data and Statistics
2
Population
The population can be seen as those who you care about, or related to your
question.
Example:
If you are trying to test if smoking leads to heart disease, your population would be:
the smokers of the world.
3
Data Sample
Consider the case of smokers, can we collect data for all smokers in the world!?
4
Populations & Samples
Example:
In a recent survey, 250 college students at Union College were asked if they
smoked cigarettes regularly. 35 of the students said yes. Identify the
population and the sample.
Responses of
students in survey
(sample)
5
Designing a Statistical Study
GUIDELINES
1. Identify the variable(s) of interest (the focus) and the
population of the study.
2. Develop a detailed plan for collecting data. If you use
a sample, make sure the sample is representative of
the population.
3. Collect the data.
4. Describe the data.
5. Interpret the data and make decisions about the
population using inferential statistics.
6. Identify any possible errors.
6
Obtaining data
1. Observational
2. Experimental.
Other method
3. Simulation
4. Survey.
7
Scientific Method
8
Observational
For example
you have a tracking software on your website that observes users' behavior on the website,
such as
length of time spent on certain pages
the rate of clicking on ads,
all should be done while not affecting the user's experience, then that would be
an observational study.
All you have to do is observe and collect data
9
Experimental
An experiment consists of a treatment and the observation of its effect on the
subjects.
They will put people into two or more groups (usually just two) and call them
the control and the experimental group.
1
0
Experimental
1
1
Other Methods of Data Collection
1
2
Marketing example
Consider that we expose half of our users to a certain landing page with certain
images and a certain style (website A)
We measure whether or not they sign up for the service.
Then, we expose the other half to a different landing page, different images,
and different styles (website B)
We measure whether or not they sign up.
We can then decide which of the two sites performed better and should be
used going further.
This, specifically, is called an A/B test.
1
3
Sampling data
Probability sampling
Vs
Random sampling
1
4
Probability sampling
Every person has a known probability of being chosen but that number might
be a different probability than another user.
1
5
Random sampling
A random sample is chosen such that every single member of a population has
an equal chance of being chosen as any other member.
A sampling bias occurs when the way the sample is obtained systemically
favors some outcome over the target outcome.
1
6
Example
People who visit the market place between 7 p.m. and 4 a.m. get in A, while the rest are placed
in group B Random Sampling
The first half of our list will be in Group A while the rest will be placed in Group B.
Random Sampling at first
1
7
Unequal probability sampling
1
8
If we do a random sampling the results will favor men over women.
To combat this, we can favor including more women than men in our survey in
order to make the split of our sample less favored for men.
1
9
Other Sampling methods
Stratified sample
Cluster sample
Systematic sample
Convenience sample
2
0
Stratified Samples
All members in
each selected
group are
used.
Continued.
Example : Identifying the Sampling Technique
You are doing a study to determine the number of years of education each teacher at your college
has. Identify the sampling technique used if you select the samples listed.
1.) You randomly select two different departments and survey each teacher
in those departments.
This is a cluster sample because each department is a naturally occurring
subdivision.
2.) You select only the teachers you currently
have
This is athis semester. sample because you are using the teachers that are
convenience
readily available to you.
3.) You divide the teachers up according to their department and then choose
and survey some teachers in each department.
This is a stratified sample because the teachers are divided by department and
some from each department are randomly selected.
Branches of Statistics
Descriptive Inferential
statistics statistics
Involves the Involves using a
organization, sample to draw
summarization, conclusions about
and display of a population.
data.
2
7
Descriptive and Inferential Statistics
Example:
In a recent study, volunteers who had less than 6 hours of
sleep were four times more likely to answer incorrectly on a
Data Analytics test than participants who had at least 8 hours
of sleep. Decide which part is the descriptive statistic and
what conclusion might be drawn using inferential statistics.
• The statement “four times more likely to answer
incorrectly” is a descriptive statistic.
• An inference drawn from the sample is that all
individuals sleeping less than 6 hours are more likely to
answer Data Analytics questions incorrectly than
individuals who sleep at least 8 hours.
2
8
The Data Analytics Process – Get the data
2
9
Click to edit
Master title style
10/3/2023
10/3/2023 30