0% found this document useful (0 votes)

9 views

Approaches in data science [Slides]

Uploaded by

ali.ibrahim.a.abdelrahman

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Approaches in data science [Slides]

Uploaded by

ali.ibrahim.a.abdelrahman

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

An introduction to data science

Approaches in data science

Please do not copy without permission. © ExploreAI 2023.
An introduction to data science

Overview
It is important that we are able to make Problem statement
informed decisions and derive appropriate
insights from data.

Model deployment Data collection

We therefore need a structured
framework for working with data and
extracting valuable insights from it. Data science
process
Model building Data cleaning

How our data science process is applied

and interpreted depends on several Exploratory data
factors, including whether we are doing a Gather insights
analysis
quantitative or qualitative analysis, and
whether we need hindsight, insight,
foresight, or context.

2
An introduction to data science

Quantitative and qualitative data analyses

| Quantitative and qualitative data analyses are important because they enable us to gain a more
comprehensive understanding of complex phenomena and make data-driven decisions.

Quantitative data analysis involves numerical Qualitative data analysis involves exploring patterns
measurement and statistical analysis. and themes in non-numerical data.

It allows us to measure and analyse numerical data It allows us to explore and interpret non-numerical
using statistical methods, enabling us to identify data, such as text, images, or videos.
patterns, trends, and relationships between variables.
It is useful for understanding the context of a
It is useful for making predictions, testing hypotheses, problem and people’s attitudes, behaviours, etc.
and identifying cause-and-effect relationships.

Both types of analysis are important because they provide different ways of understanding and interpreting
data.
3
An introduction to data science

The data science process

| The data science process is a systematic approach to transforming a data problem into a
data-driven solution.

_Problem_ _Data_ _Exploratory_ _Model_ _Model_

_Data cleaning_
_statement_ _collection_ _data analysis_ _building_ _deployment_

State the problem Find the right data Remove, fix, and Understand the Select features, build, Deploy, test, and
or hypothesis sources filter data train, and validate communicate

This approach to data science helps us to discover meaningful patterns, relationships, and trends and helps us
develop accurate and robust models. Various forms of this process are used across different data disciplines,
including data analytics, science, and engineering, under various names, such as OSEMN and CRISP-DM.

4
An introduction to data science

Problem statement

| The problem statement helps us define the scope and objectives of our analysis and ensures that our
insights are relevant.

A problem statement identifies the gap between the current (problem) state and the desired (outcome) state.
It should be specific, brief, concise, clear, unbiased, and measurable.

A problem statement may also be in the form of a hypothesis, which is a proposed cause and effect for a
particular phenomenon or problem which has not yet been proven correct.

Examples:

Statement: We need to report on Hypothesis: The estimated water and Question: How much water and
estimated water and electricity income electricity income from domestic electricity income can we expect from
from different customer groups. customers are 30% lower than from commercial customers per month?
other customers.

5
An introduction to data science

Data collection

| Data collection includes identifying and acquiring applicable data sources, both internally and
externally, which can help answer the problem statement.

We can use company data or open-source data, or collect our own data depending on the nature of our problem
and the analysis we would like to do.

Examples:

Data acquired from surveys such as Queried data from databases or APIs Downloaded data from open sources
market research and customer (Application Programming Interfaces) and cloud repositories such as general
satisfaction surveys. such as sales data and employee census data.
information.

6
An introduction to data science

Data cleaning

| Data cleaning, also known as data wrangling, involves transforming raw data into usable formats.

We can use several cleaning techniques to ensure that our data are indeed accurate and of the required quality.
If our data are inaccurate, so will our insights be.

Examples:

Using spreadsheets or a programming Using regular expressions for pattern Using data visualisation tools such as
language to remove irrelevant matching and replacing data. PowerBI or spreadsheets for identifying
observations, handle missing values, fix outliers and anomalies.
structural issues, etc.

7
An introduction to data science

Exploratory data analysis

| Exploratory data analysis (EDA) is an approach used to summarise the main characteristics of a
dataset using aggregations, fundamental statistics, and visualisation techniques.

Before we can gather insights or build a model, we first need to understand our data. We can use non-graphical
methods, such as descriptive statistics and correlation, or graphical (visualisation) methods to investigate our
data.

Examples:

Descriptive statistics Standard dev.

Aggregations Count
Measures of central tendency Mean
Measures of distribution Kurtosis
Correlation Pearson Bar Scatter Density Violin

8
An introduction to data science

Univariate and multivariate analyses

| In EDA, we do either a univariate or multivariate analysis, depending on what we want to

investigate.

_Univariate analysis_ is the exploration of individual In a _multivariate analysis_ we're more interested in
variables in a dataset, i.e. we only consider one the relationship between the different variables of our
variable at a time. dataset.

Non-graphical Graphical Non-graphical Graphical

We can use descriptive We can use visualisations We use correlation to We can use visualisations
statistics such as the such as histograms, density understand the strength and such as heatmaps, scatter
standard deviation, central plots, and box plots to direction between variables. plots, and pair plots to
tendency, and measures of understand the investigate the relationship.
distribution. characteristics of a variable.

9
An introduction to data science

Gather insights

| Gathering insights, also known as data dissemination, involves gathering and reporting the insights
derived from the analysis.

Insights may be gathered in and reported to stakeholders through dashboards and reports that include text and
data visualisations.

Examples:

Using spreadsheets or a programming Using data visualisation tools such as

language to summarise data and PowerBI or spreadsheets to visualise and
construct insights to form a report. report the insights.

10
An introduction to data science

Model building

| Model building involves selecting an appropriate algorithm and training the model on the data.

Model building often involves reiteration since a model will rarely give us the results we seek on the first try.
This means that we train and test a model until we’ve found a suitable model before deploying it into a larger
system.

Some common tools and skills required for data Select features
collection include:
A

Machine learning libraries Deep learning libraries such Deploy the model E B Build model
such as Scikit-learn and as Keras and PyTorch for Regression, classification,
TensorFlow for building building neural networks in or other ML model
models in Python. Python.
Validate the results D C Train model

11
An introduction to data science

Model deployment

| Model deployment involves integrating the model into a large system or application.

Deployment bridges the gap between data science and real-world applications. Effective testing and
communication ensure the model is useful, reliable, and understood.

Although we have reached the end of the process, it is

crucial to maintain and optimise the model. Deploy the model

A
Maintenance: Monitor and Optimisation: Regularly
maintain the model, retrain the model with new
archiving insights to data sources and make
facilitate future endeavours. adjustments to improve
performance. Optimise the Maintain the
C B
model model

12
An introduction to data science

Type of analytics

| The type of analytics we apply depends on our goal and prescribes our approach to the data
analytics or data science process.

Descriptive Diagnostic Predictive Prescriptive

Hindsight Insight Foresight Context

Used to describe what has Used to determine why Used to forecast what will Used to recommend the best
happened in the past. something has happened in happen in the future. course of action for a given
the past. situation.
It's a summary of historical Uses statistical models and
data that provides insights Helps organisations machine learning algorithms Uses advanced algorithms
into patterns, trends, and understand the factors that to identify patterns and and optimisation techniques
relationships within the data contributed to a particular trends in historical data to to suggest the most optimal
outcome. predict future outcomes. solution based on a variety of
Examples: Dashboards and factors and constraints.
reports. Examples: Data mining and Examples: Forecasting and
drill-down analysis. risk modelling. Examples: Optimisation
13

Suler - Psychology of The Digital Age (Humans Become Electric)
100% (1)
Suler - Psychology of The Digital Age (Humans Become Electric)
440 pages
intro
No ratings yet
intro
144 pages
Unit 1 Full Notes
No ratings yet
Unit 1 Full Notes
52 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
File
No ratings yet
File
27 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
Selected Topics - Datascience
No ratings yet
Selected Topics - Datascience
17 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
dataScience(mod1)
No ratings yet
dataScience(mod1)
4 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
DAT100_Int_Data_Ana_Lec2_Intro II
No ratings yet
DAT100_Int_Data_Ana_Lec2_Intro II
39 pages
Introduction To Datascience
No ratings yet
Introduction To Datascience
15 pages
Activity 3. Mind Map. Data Science Methodology
No ratings yet
Activity 3. Mind Map. Data Science Methodology
4 pages
DSV Sem Exam
No ratings yet
DSV Sem Exam
15 pages
Data Science S3mca
No ratings yet
Data Science S3mca
55 pages
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
No ratings yet
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
28 pages
Lecture 2 The data science process and tools for each step
No ratings yet
Lecture 2 The data science process and tools for each step
8 pages
Data Sciences in Telecommunication-Chapitre-1
No ratings yet
Data Sciences in Telecommunication-Chapitre-1
20 pages
Self Learning Material - Introduction To Data Science
No ratings yet
Self Learning Material - Introduction To Data Science
10 pages
M1 - FDS
No ratings yet
M1 - FDS
19 pages
Data Science - Unit 1 MDM
No ratings yet
Data Science - Unit 1 MDM
64 pages
DataScienceUnlocked
No ratings yet
DataScienceUnlocked
35 pages
Data Science
No ratings yet
Data Science
11 pages
Chapter 1 - Lecture
No ratings yet
Chapter 1 - Lecture
7 pages
DA-1,2,3[1]_merged
No ratings yet
DA-1,2,3[1]_merged
39 pages
AIDS C04-Session-19
No ratings yet
AIDS C04-Session-19
29 pages
Data Science
100% (2)
Data Science
33 pages
Data Science Intro Session-18 & 19
No ratings yet
Data Science Intro Session-18 & 19
48 pages
Unit I and unit ii dev (1)
No ratings yet
Unit I and unit ii dev (1)
36 pages
Chapter 1- Intr to DS and Business Understanding
No ratings yet
Chapter 1- Intr to DS and Business Understanding
35 pages
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
Data Science
No ratings yet
Data Science
18 pages
Unit-1 Data Science
No ratings yet
Unit-1 Data Science
74 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages
Data Science
No ratings yet
Data Science
2 pages
FDS - UNIT 1
No ratings yet
FDS - UNIT 1
233 pages
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes Part A: Content Design
No ratings yet
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes Part A: Content Design
6 pages
datascience
No ratings yet
datascience
12 pages
1.1 Introduction To Data Science 1
No ratings yet
1.1 Introduction To Data Science 1
17 pages
Unit I
No ratings yet
Unit I
52 pages
Lecture 1 Introduction Tools An - Chniques For Data Science
No ratings yet
Lecture 1 Introduction Tools An - Chniques For Data Science
16 pages
IDS Unit 1
No ratings yet
IDS Unit 1
67 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Data Science Material
No ratings yet
Data Science Material
48 pages
What Is Data Science
No ratings yet
What Is Data Science
14 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
Unit 3
No ratings yet
Unit 3
9 pages
Data Science Process Stages Lecture 2
No ratings yet
Data Science Process Stages Lecture 2
4 pages
Lesson 02 Introduction To Data Science
No ratings yet
Lesson 02 Introduction To Data Science
30 pages
Essential Data Science Notes - A Concise PDF Guide
No ratings yet
Essential Data Science Notes - A Concise PDF Guide
20 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Crash Course_Introduction to Data Science
No ratings yet
Crash Course_Introduction to Data Science
121 pages
CUITM217-DATA-SCIENCE Data
No ratings yet
CUITM217-DATA-SCIENCE Data
48 pages
21 Powerful Tips Tricks and Hacks for Data Scientists
No ratings yet
21 Powerful Tips Tricks and Hacks for Data Scientists
38 pages
1 Stop Project1
No ratings yet
1 Stop Project1
27 pages
DS-Unit-1_ABM
No ratings yet
DS-Unit-1_ABM
103 pages
Data Science
No ratings yet
Data Science
5 pages
Slidesgo Unlocking Insights A Professional Introduction To Data Science With Python 20241125160150D6YR
No ratings yet
Slidesgo Unlocking Insights A Professional Introduction To Data Science With Python 20241125160150D6YR
14 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
11 pages
1 Introduction To Data Science
No ratings yet
1 Introduction To Data Science
14 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
5. chapter II نسخة
No ratings yet
5. chapter II نسخة
8 pages
KAPA The New Trend in Combatting Unemployment
No ratings yet
KAPA The New Trend in Combatting Unemployment
21 pages
ENG-304 Introduction To Research Methodology
No ratings yet
ENG-304 Introduction To Research Methodology
3 pages
Thesis Topics in Public Administration in Nigeria
100% (3)
Thesis Topics in Public Administration in Nigeria
9 pages
Lesson 2 Research
No ratings yet
Lesson 2 Research
18 pages
The Making of Social Studies Education in Korea: Implications For Citizenship Education
No ratings yet
The Making of Social Studies Education in Korea: Implications For Citizenship Education
16 pages
10 5923 J Edu 20130305 03 PDF
No ratings yet
10 5923 J Edu 20130305 03 PDF
12 pages
RMD - S 5
No ratings yet
RMD - S 5
13 pages
Soci1002 U1
No ratings yet
Soci1002 U1
16 pages
2 PB
No ratings yet
2 PB
11 pages
Mixed Methods Research
100% (9)
Mixed Methods Research
85 pages
Future of Media PDF
No ratings yet
Future of Media PDF
29 pages
Uncovering The Lived Experiences of Filipino Drug Recoverees Towards Occupational Participation and Justice Through An Interpretative Phenomenological Analysis
No ratings yet
Uncovering The Lived Experiences of Filipino Drug Recoverees Towards Occupational Participation and Justice Through An Interpretative Phenomenological Analysis
14 pages
04D Research Designs and Methods - Observation
No ratings yet
04D Research Designs and Methods - Observation
29 pages
CHAPTER-III Done
No ratings yet
CHAPTER-III Done
5 pages
Landscape Linguistic Di Surabaya
No ratings yet
Landscape Linguistic Di Surabaya
84 pages
Pet Owners Lived Experiences During The COVID-19 Pandemic: Michael Andre Berja & Niel Kenneth Gabejan
No ratings yet
Pet Owners Lived Experiences During The COVID-19 Pandemic: Michael Andre Berja & Niel Kenneth Gabejan
18 pages
The Language of Leadership For Female Public Relations Professionals
No ratings yet
The Language of Leadership For Female Public Relations Professionals
30 pages
RRL Patrick
No ratings yet
RRL Patrick
3 pages
01 AQA Sociology Topic 20 Mark Essays Theory Methods Digital Download
No ratings yet
01 AQA Sociology Topic 20 Mark Essays Theory Methods Digital Download
33 pages
Download Semantic Network Analysis in Social Sciences 1st Edition Elad Segev (Editor) ebook All Chapters PDF
100% (6)
Download Semantic Network Analysis in Social Sciences 1st Edition Elad Segev (Editor) ebook All Chapters PDF
82 pages
Zinsser Et Al., 2015
No ratings yet
Zinsser Et Al., 2015
22 pages
CRI 300 Research
No ratings yet
CRI 300 Research
105 pages
AIS Presentation Section 3 Group 4
No ratings yet
AIS Presentation Section 3 Group 4
20 pages
Subtitling Strategies Used in The Meg Movie Texts: Devi Suci Nirwana Rahmad Husein Zainuddin
No ratings yet
Subtitling Strategies Used in The Meg Movie Texts: Devi Suci Nirwana Rahmad Husein Zainuddin
6 pages
(2020) (Libro) Health Psychology - The Basics
100% (1)
(2020) (Libro) Health Psychology - The Basics
267 pages
Full Download Fundamentals of Nursing and Midwifery Research: A Practical Guide for Evidence-based Practice 2nd Edition Mckenna PDF DOCX
100% (2)
Full Download Fundamentals of Nursing and Midwifery Research: A Practical Guide for Evidence-based Practice 2nd Edition Mckenna PDF DOCX
62 pages
Focus Group
No ratings yet
Focus Group
20 pages
Mahabang Dahilig Senior High School
No ratings yet
Mahabang Dahilig Senior High School
6 pages