0% found this document useful (0 votes)
71 views

The Data Science Process Course Slides Red

This document discusses Python for data science. It describes the data science process as involving five steps: acquiring data, preparing data, analyzing data, reporting results, and applying results. It discusses acquiring data from various sources like databases, text files, and web services. It also discusses exploring data as the first part of preparing data, and explains the importance of exploring data before analyzing or modeling it.

Uploaded by

Awais Rasheed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

The Data Science Process Course Slides Red

This document discusses Python for data science. It describes the data science process as involving five steps: acquiring data, preparing data, analyzing data, reporting results, and applying results. It discusses acquiring data from various sources like databases, text files, and web services. It also discusses exploring data as the first part of preparing data, and explains the importance of exploring data before analyzing or modeling it.

Uploaded by

Awais Rasheed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

Python for Data Science

How does data science happen?


Dr. Ilkay Altintas and Dr. Leo Porter
Twitter: #UCSDpython4DS
After this video you will be able to..
Python for Data Science

• List some of the dimensions of


modern data science
• Identify why analyzing these
dimensions are important for us as
data scientists
Python for Data Science
Python for Data Science

Data Science Process


Data Engineering Computational Data Science
Python for Data Science

ACQUIRE PREPARE ANALYZE REPORT ACT

Scale Scale Scale Scale


Build
Python for Data Science

Explore
Report Act
Scale
Data Engineering Computational Data Science
Python for Data Science

ACQUIRE PREPARE ANALYZE REPORT ACT

Scale Scale Scale Scale

Programmability
Python for Data Science

Asking the Right Question


Dr. Ilkay Altintas and Dr. Leo Porter
Twitter: #UCSDpython4DS
After this video you will be able to..
Python for Data Science

•Describe the ingredients to form a


data science problem
•List some questions others asked to
get value of their big data
•Formulate the right questions to
guide your data science process.
“A problem well defined
is a problem half
Python for Data Science

solved.”

Charles F. Kettering

Define the Problem


Evaluate a new
product
Python for Data Science

Sales figures

Call center logs


Detect equipment
failure
Python for Data Science

Sensor data

Sensor data

Sensor data
Better targeted
Customer data marketing
Python for Data Science

Marketing data
Python for Data Science

Assess the Situation


Assess the Situation
Python for Data Science

Risks
Benefits
Contingencies
Regulations
Resources
Requirements
Define Goals
Python for Data Science

Objectives

Criteria
Formulate the Question
Define the Problem
Python for Data Science

Assess the Situation

Define Goals
Python for Data Science

Steps in the Data Science Process


Dr. Ilkay Altintas and Dr. Leo Porter
Twitter: #UCSDpython4DS
After this video you will be able to..
Python for Data Science

•Identify the steps in the data


science process

•Understand what each step


involves
PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT


PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 1: Acquire Data


Identify data sets
Retrieve data
Query data
PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 2: Prepare Data


Step 2-A: Explore
Step 2-B: Pre-process
PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 2-A: Explore Data


Understand
nature of data

Preliminary
analysis
PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 2-B: Pre-process Data

Clean Integrate Package


PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 3: Analyze Data


Select analytical techniques
Build models
PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 4: Communicate Results


PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 5: Apply Results


PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Iterative process
Python for Data Science

Step 1: Acquiring Data


Dr. Ilkay Altintas and Dr. Leo Porter
Twitter: #UCSDpython4DS
After this video you will be able to..
•List techniques and technologies
Python for Data Science

to access and retrieve the data


you need
•Describe an example scenario that
accesses data from a variety of
sources using different
technologies
PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 1: Acquire Data


• Identify datasets
• Retrieve datasets
• Query data
Where’s the data?
Python for Data Science

• Identify suitable data


• Acquire all available data
Data comes from many places…
Python for Data Science

…with many ways to access it


Traditional databases
Python for Data Science

SQL and query browsers


Text files
Python for Data Science

Scripting languages
Remote data
SOAP
Python for Data Science

REST
WebSocket

Web Services
NoSQL storage
Python for Data Science

API Web Services


Acquiring data related to wildfires…
Python for Data Science

Historical weather SQL

Current weather
WebSocket

Real-time tweets
near fires REST
Traditional databases Remote data
Python for Data Science

SQL and query browsers Web Services

NoSQL storage
Text files Web Services
Scripting languages Programming Interfaces
Python for Data Science

Step 2-A: Exploring Data


Dr. Ilkay Altintas and Dr. Leo Porter
Twitter: #UCSDpython4DS
After this video you will be able to..
•Explain the importance of
Python for Data Science

exploring data

•Identify methods to perform


preliminary analysis of your data
PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 2-A: Explore Data


Understand
nature of data

Preliminary
analysis
Why explore?
Python for Data Science

Goal: Understand your data


Why explore?
Python for Data Science

Correlations
Outliers
Why explore?
Python for Data Science

General trends
Correlations
Python for Data Science

Describe Your Data


Visualize Your Data Heat Maps
Python for Data Science

Histogram Line graphs

Scatter plots

Boxplots
Informed
Python for Data Science

Analysis
Data
Undertanding

Data
Exploration
Python for Data Science

Step 2-B: Pre-processing Data


Dr. Ilkay Altintas and Dr. Leo Porter
Twitter: #UCSDpython4DS
After this video you will be able to..
•Identify some problems with real-
Python for Data Science

world data

•Describe what is needed to


transform raw data to data that
can be used for analysis
PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 2-B: Pre-process Data

Clean Transform
Real-world data is messy!
Python for Data Science

• Inconsistent values
• Duplicate records
• Missing values
• Invalid data
• Outliers
Addressing Data Quality Issues
Python for Data Science

•Remove data with missing values


•Merge duplicate records
•Generate best estimate for invalid values
•Remove outliers

Domain
Knowledge
Python for Data Science

Getting Data in Shape


Data Munging
Python for Data Science

Data
Dimensionality Manipulation
Reduction

Transformation

Scaling
Feature
Selection
Scaling
Python for Data Science

Scaled Values

Weight
Height
Transformation
Python for Data Science

Original Transformed
Data Data
Feature Selection
Remove
feature
Python for Data Science

Combine
features

X
Add
feature
Dimensionality Reduction
Python for Data Science

3D 2D
Python for Data Science

Data Manipulation
Always Remember!
Python for Data Science

Garbage in = Garbage out

Data preparation is
very important for
meaningful analysis!
Python for Data Science

Step 3: Analyze Data


Dr. Ilkay Altintas and Dr. Leo Porter
Twitter: #UCSDpython4DS
After this video you will be able to..
Python for Data Science

•Describe what is involved in


applying an analysis technique to
your data

•List three basic analysis


techniques
PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 3: Analyze Data


Select analytical techniques
Build models
Build Model
Python for Data Science

Input Data

Analysis Model Model Output


Technique
Categories of Analysis Techniques
Python for Data Science

Classification
Regression

Clustering

Association
Graph Analysis
Analytics
Classification
Python for Data Science

Sunny
Goal: Predict category

Windy

Rainy

Cloudy
Regression
Goal: Predict numeric value
Python for Data Science
Clustering
Python for Data Science

Goal: Organize similar items into groups


Seniors

Adults

Teenagers
Association Analysis
Goal: Find rules to capture
Python for Data Science

associations between items


Graph Analytics
Python for Data Science

Goal: Use graph structures to


find connections between entities
Modeling
Python for Data Science

Select technique

Build model

Validate model
Python for Data Science

Evaluation of Results
Classification and Regression
Python for Data Science

Predicted Correct
Value Value
Python for Data Science

Clustering
Association Analysis and Graph Analytics
Python for Data Science

Investigate Validate
Determine Next Steps
Python for Data Science

Repeat analysis?

Take deeper dive?

Act on results?
Select technique Build model Evaluate
Python for Data Science

• Classification
• Regression
• Clustering
• Association
• Analysis
• Graph Analytics
Python for Data Science

Step 4: Reporting Insights


Dr. Ilkay Altintas and Dr. Leo Porter
Twitter: #UCSDpython4DS
After this video you will be able to..
Python for Data Science

•Determine what to present in


reporting your findings

•Identify techniques to
communicate your results
PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 4: Communicate Results


Python for Data Science

What to Present
Python for Data Science

What to Present
Python for Data Science

How to Present
Python for Data Science

Visualization Tools
Python for Data Science

Present

with

using
Python for Data Science

Step 5: Turning Insights into Action


Dr. Ilkay Altintas and Dr. Leo Porter
Twitter: #UCSDpython4DS
After this video you will be able to..
Python for Data Science

•Explain what turning insights into


action means

•Connect your results with your


business question
PREPARE
Python for Data Science

ACQUIRE ANALYZE REPORT ACT

Step 5: Apply Results


NoSQL
Python for Data Science

Database GIS Files

Social Sensor

Action
Implementation
Python for Data Science

Process

Action

Automation Stakeholders
Assess Impact
Monitor

Action
Python for Data Science

Measure

Evaluation
Action
Determine
Next Steps
Python for Data Science

Evaluation

Favorable Further
Revisit?
Results? Opportunities?
Action
Python for Data Science

Evaluation
Real-time
Action?

Favorable Further
Revisit?
Results? Opportunities?
Python for Data Science

You might also like