DAT100_Int_Data_Ana_Lec2_Intro II
DAT100_Int_Data_Ana_Lec2_Intro II
Analytics
Lecture 2 – Introduction II
1
INTRODUCTION TO DATA
ANALYTICS
2
What is Data?
3
What is data?
Wikipedia:
Data (singular datum) are individual units of information.
A datum describes a single quality or quantity of some
object
Another definition:
Data is a collection of facts, such
as numbers, words, measurements
, observations or even just
descriptions of things.
4
Data All Around
5
Data is the New Oil
6
Digging for data: Datafication
According to [1]
Datafication is:
A technological trend turning many aspects of our life into data.
Or
A process of taking all aspects of life and turning them into data.
[1] K.Cukier and V.Mayer-Schoenberger, Viktor (2013). "The Rise of Big Data".
7
Datafication Examples
Social platforms:
o(e.g. Facebook) collect and monitor data information of our actions and
friendships to market products and services to us.
Insurance: Data used to update risk profile development and
business models.
Banking: Data used to establish trustworthiness and likelihood of
a person paying back a loan.
Hiring and recruitment: Data used to replace personality tests.
8
What is a data Scientist?
9
Facts about Data science
1
0
Facts about Data science
Data science won't replace the human brain, but complement it,
work alongside it.
1
1
Why data science?
1
2
Why data science?
1
4
1
5
1
6
The data science Venn diagram
1
7
Cont.
Data science areas
While having only two of these three qualities can make you
intelligent, it will also leave a gap.
In order to gain knowledge from data, we must be able to
outilize computer programming (to access and manipulate data, develop models,
visualize the results, etc..)
o understand the mathematics behind the models we derive
oabove all, understand our analyses' place in the domain we are in. (domain
expertise allows you to apply concepts and results in a meaningful and effective
way.)
1
8
The math
1
9
Advice from Hadley Wickham the Chief Scientist
at Rstudio
2
0
Computer programming
2
1
2
2
Python
2
3
2
4
Domain knowledge
2
5
2
6
Data Science Process
If duplicates, missing values,
outliers, then we may go back to
collect more data, or spend more
time cleaning the dataset.
2
9
1. Asking an interesting question
Thus, they need to learn the domain knowledge and combine the
technical knowledge with data to come up with a solution to drive
business values.
3
0
2. Obtaining the data
3
1
3. Exploring the data/
Explorative data analysis(EDA)
The basic tools of EDA are plots, graphs and summary statistics.
3
2
EDA
3
3
4. Modeling the data
In this step, we are not only fitting and choosing models, we are
implanting mathematical validation metrics in order to quantify
the models and their effectiveness.
3
4
5. Communicate and visualize the results
3
5
Data Science Workflow 3
6
EXAMPLE:
PREDICTING NEONATAL INFECTION
3
8
Click to edit
Master title style
1/28/2024
1/28/2024 39