SIMS DP
SIMS DP
PREPARATION &
MANAGEMENT
Course Outline
Practical’s Assessments Tools
We learn using case study and Assessments – HBR case study,
Excel, Python
datasets and data set submissions
Quiz Groups
You need to create groups to
1 in class quiz – 15 min
work on the assessments
Course Outcomes
CO 1 CO 2
Understand role and challenges CO 3
Understand the range of
of data preparation and Apply data extraction and
techniques useful for data
management in enterprise connectivity
preparation and Management
analytics
CO 4 Recommended Books
-Data Preparation for Data Mining by
Apply data cleaning and Dorian Pyle, The Morgan Kaufmann
transformation scripts for a wide Series in Data Management Systems, 1999.
range of errors and -HBR Case: "Applying Data Science and
inconsistencies Analytics at P&G : Srikant M. Datar, Sarah
Mehta, Paul Hamilton"
Session 1,2
Topics :
• Enterprise Data Overview- Need,
Environment, Benefits
• Enterprise Data : Strategy, Planning,
Challenges, Business linkage
WHY DATA?
Data
Analytics Raw Data Data Modelling
Transformation
Understanding Data
different data storage Warehouses
technologies
Data Lakes
A Typical Architecture
Semantic Layer
Source Data
Data Model
Calculations
Relationships
Databases
Excel, Power Bi
Data Warehouses
Data Lakes
User Layer
Reporting and Analytics
Excel, Power Bi
How Data meets enterprise operations
Information Communication
• Raw • Collection
• Context • Story
Data Knowledge
Case Study Group
Work
• We learn more about all dimensions of
business analytics journey in an enterprise
from the HBR case study
• Read the case study – “Applying Data
Science & Analytics at P&G”
• Prepare a presentation to answer the 14
questions
• Randomly groups will be picked in the next
class to discuss each question
• Submit your presentation to CR for
assessment
How BA meets enterprise operations
Strategy
Change management
Execution
Observations
Measurement
Information
Facts
Quantities
Types of Data
Qualitative
5 Kids
Data
Discrete Data 97 Trees
5 Bottles
Quantitative
3.75 kgs
Continuous
1.25 kms
Data
6.25 inches
Data Sources
Data Source
STRUCTURED UNSTRUCTURED
Types
Web Other
Datasets Flat Files
Services Sources
• MS Access • Excel’s • NoSql • Images
• Oracle • CSV’s • Mongo • Video
• DB2 • Text
• Informix • Voice
• SQL
• MySQL
• Amazon
SimpleDB
What is OLTP system? What is OLAP ?
Examples - Online banking, online air ticket For example, sales analysis might have several
booking, Order entry dimensions related to category (Home, Office,
Furniture), time (year, month, week, day), product
(clothing, men/women/children, brand, type), and
more.
Represents Data with
a cube operation
Consists of FACTS
What is Dimensional and DIMENSIONS
Modelling?
Facts are Numerical
transaction data
Dimension gives
context to Facts
Data Implementation process
Business
Goals
Present Identify
Data Data
Data
Model
and Prepare
Evaluate Data
Data
Steps Involved in Data Cleanup
Removal Of
Unwanted
Observations
Fixing
Handling
Structural
missing data
Errors
Managing
unwanted
Outliers
Steps Involved in Data Cleanup
Handle Missing
Data
• Categorical data – Label them ‘Missing’
Filter Unwanted • Missing numeric data, flag and fill the values
Outliers
• Do not remove outliers until have a legitimate reason
Fix Structural
Errors
• Typos in the name of features
Removal of
• Same attribute with different name
Unwanted • Mislabeled classes, i.e. separate classes that should really be the same or inconsistent capitalization
Observations
• Duplicate Observations
• Irrelevant Observations
23
Session 9,10
Topics :
• Machine learning – Supervised, Unsupervised
• Classification
• Association rules, market basket analysis
• Regression , Decision trees
• Clustering
• Neural networks
• Time series forecasting
Session Objective:
Develop an understanding about the specific
data mining techniques and its different steps
Analytics Methodology
Business
Goals
Present Hypothesi
Data story s questions
Analytics
Advanced Data
modelling sourcing
Data clean
Explorator
up &
y data
transforma
analysis
tion
END OF SESSION