The Data Science Process Course Slides Red
The Data Science Process Course Slides Red
Explore
Report Act
Scale
Data Engineering Computational Data Science
Python for Data Science
Programmability
Python for Data Science
solved.”
Charles F. Kettering
Sales figures
Sensor data
Sensor data
Sensor data
Better targeted
Customer data marketing
Python for Data Science
Marketing data
Python for Data Science
Risks
Benefits
Contingencies
Regulations
Resources
Requirements
Define Goals
Python for Data Science
Objectives
Criteria
Formulate the Question
Define the Problem
Python for Data Science
Define Goals
Python for Data Science
Preliminary
analysis
PREPARE
Python for Data Science
Iterative process
Python for Data Science
Scripting languages
Remote data
SOAP
Python for Data Science
REST
WebSocket
Web Services
NoSQL storage
Python for Data Science
Current weather
WebSocket
Real-time tweets
near fires REST
Traditional databases Remote data
Python for Data Science
NoSQL storage
Text files Web Services
Scripting languages Programming Interfaces
Python for Data Science
exploring data
Preliminary
analysis
Why explore?
Python for Data Science
Correlations
Outliers
Why explore?
Python for Data Science
General trends
Correlations
Python for Data Science
Scatter plots
Boxplots
Informed
Python for Data Science
Analysis
Data
Undertanding
Data
Exploration
Python for Data Science
world data
Clean Transform
Real-world data is messy!
Python for Data Science
• Inconsistent values
• Duplicate records
• Missing values
• Invalid data
• Outliers
Addressing Data Quality Issues
Python for Data Science
Domain
Knowledge
Python for Data Science
Data
Dimensionality Manipulation
Reduction
Transformation
Scaling
Feature
Selection
Scaling
Python for Data Science
Scaled Values
Weight
Height
Transformation
Python for Data Science
Original Transformed
Data Data
Feature Selection
Remove
feature
Python for Data Science
Combine
features
X
Add
feature
Dimensionality Reduction
Python for Data Science
3D 2D
Python for Data Science
Data Manipulation
Always Remember!
Python for Data Science
Data preparation is
very important for
meaningful analysis!
Python for Data Science
Input Data
Classification
Regression
Clustering
Association
Graph Analysis
Analytics
Classification
Python for Data Science
Sunny
Goal: Predict category
Windy
Rainy
Cloudy
Regression
Goal: Predict numeric value
Python for Data Science
Clustering
Python for Data Science
Adults
Teenagers
Association Analysis
Goal: Find rules to capture
Python for Data Science
Select technique
Build model
Validate model
Python for Data Science
Evaluation of Results
Classification and Regression
Python for Data Science
Predicted Correct
Value Value
Python for Data Science
Clustering
Association Analysis and Graph Analytics
Python for Data Science
Investigate Validate
Determine Next Steps
Python for Data Science
Repeat analysis?
Act on results?
Select technique Build model Evaluate
Python for Data Science
• Classification
• Regression
• Clustering
• Association
• Analysis
• Graph Analytics
Python for Data Science
•Identify techniques to
communicate your results
PREPARE
Python for Data Science
What to Present
Python for Data Science
What to Present
Python for Data Science
How to Present
Python for Data Science
Visualization Tools
Python for Data Science
Present
with
using
Python for Data Science
Social Sensor
Action
Implementation
Python for Data Science
Process
Action
Automation Stakeholders
Assess Impact
Monitor
Action
Python for Data Science
Measure
Evaluation
Action
Determine
Next Steps
Python for Data Science
Evaluation
Favorable Further
Revisit?
Results? Opportunities?
Action
Python for Data Science
Evaluation
Real-time
Action?
Favorable Further
Revisit?
Results? Opportunities?
Python for Data Science