0% found this document useful (0 votes)
2 views

SIMS DP

The document outlines a course on Data Preparation and Management, focusing on practical assessments, tools like Excel and Python, and key concepts in enterprise data analytics. It covers topics such as data types, analytics methodologies, data cleaning, and the implementation process, along with a case study from HBR. The course aims to equip participants with the skills to understand and apply data management techniques in a business context.

Uploaded by

Ganesh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

SIMS DP

The document outlines a course on Data Preparation and Management, focusing on practical assessments, tools like Excel and Python, and key concepts in enterprise data analytics. It covers topics such as data types, analytics methodologies, data cleaning, and the implementation process, along with a case study from HBR. The course aims to equip participants with the skills to understand and apply data management techniques in a business context.

Uploaded by

Ganesh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

DATA

PREPARATION &
MANAGEMENT
Course Outline
Practical’s Assessments Tools
We learn using case study and Assessments – HBR case study,
Excel, Python
datasets and data set submissions

Quiz Groups
You need to create groups to
1 in class quiz – 15 min
work on the assessments
Course Outcomes
CO 1 CO 2
Understand role and challenges CO 3
Understand the range of
of data preparation and Apply data extraction and
techniques useful for data
management in enterprise connectivity
preparation and Management
analytics

CO 4 Recommended Books
-Data Preparation for Data Mining by
Apply data cleaning and Dorian Pyle, The Morgan Kaufmann
transformation scripts for a wide Series in Data Management Systems, 1999.
range of errors and -HBR Case: "Applying Data Science and
inconsistencies Analytics at P&G : Srikant M. Datar, Sarah
Mehta, Paul Hamilton"
Session 1,2
Topics :
• Enterprise Data Overview- Need,
Environment, Benefits
• Enterprise Data : Strategy, Planning,
Challenges, Business linkage
WHY DATA?

EVERY COMPANY HAS BIG DATA IN ITS


FUTURE AND EVERY COMPANY WILL
EVENTUALLY BE IN THE DATA BUSINESS
Thomas H. Davenport, an Academician &
Author
What is Enterprise Data
Working with factual data in
organizations

Using appropriate tools to


identify nuggets of
wisdom(insights)

Influence good decision


making
What can you do with data?
• Descriptive analytics
• Summarizes data into meaningful charts and reports, How can we
for example, about budgets, sales, revenues, or cost. make it
happen?

• Predictive analytics What will


Prescriptive analytics
• To predict the future by examining historical data, happen?
detecting patterns or relationships in these data, and
then extrapolating these relationships forward in time
Predictive analytics
What
happened?
• Prescriptive analytics
• Optimization to identify the best alternatives to
minimize or maximize some objective.
Descriptive analytics
Terminologies
Data
Story Telling Dashboard Reports
Visualization

Data
Analytics Raw Data Data Modelling
Transformation

ETL Big Data Data Mining Charts

Unstructured Machine Artificial


Structured Data
Data Learning Intelligence
Database

Understanding Data
different data storage Warehouses
technologies

Data Lakes
A Typical Architecture
Semantic Layer
Source Data
Data Model
Calculations
Relationships
Databases
Excel, Power Bi
Data Warehouses

Data Lakes

User Layer
Reporting and Analytics
Excel, Power Bi
How Data meets enterprise operations

Information Communication
• Raw • Collection
• Context • Story

Data Knowledge
Case Study Group
Work
• We learn more about all dimensions of
business analytics journey in an enterprise
from the HBR case study
• Read the case study – “Applying Data
Science & Analytics at P&G”
• Prepare a presentation to answer the 14
questions
• Randomly groups will be picked in the next
class to discuss each question
• Submit your presentation to CR for
assessment
How BA meets enterprise operations

Strategy

Change management

Execution

Data management & governance


Session 1,2
Topics :
• BA – Need, Environment and Benefits
• BA – Strategy, Planning, Challenges, Business
Linkage
• HBR case study – “ Applying data science and
analytics at P&G”
Session Objective:
Understand role of business analytics in
enterprises
Session 5,6
Topics :
• Data, Databases, Data warehouses, Data Lakes
• Dimensional modelling – Dimensions, Facts,
OLTP, OLAP
• Data implementation process
Session Objective:
Understand the need, concepts &
components of data processing and
analytical capabilities
Understanding Data
Numbers

Observations

Measurement

What is Data? Graphs

Information

Facts

Quantities
Types of Data

Qualitative

5 Kids
Data
Discrete Data 97 Trees
5 Bottles
Quantitative
3.75 kgs
Continuous
1.25 kms
Data
6.25 inches
Data Sources
Data Source
STRUCTURED UNSTRUCTURED
Types

Web Other
Datasets Flat Files
Services Sources
• MS Access • Excel’s • NoSql • Images
• Oracle • CSV’s • Mongo • Video
• DB2 • Text
• Informix • Voice
• SQL
• MySQL
• Amazon
SimpleDB
What is OLTP system? What is OLAP ?

• Focused on transaction-oriented tasks • Multidimensional database


• Large number of users • Consists of numeric facts called measures
• Fast response time which are categorized by dimensions
• Supports complex data models and tables • Enables fast, flexible multidimensional
• Uses a fully normalized schema for database data analysis for business intelligence (BI)
consistency

Examples - Online banking, online air ticket For example, sales analysis might have several
booking, Order entry dimensions related to category (Home, Office,
Furniture), time (year, month, week, day), product
(clothing, men/women/children, brand, type), and
more.
Represents Data with
a cube operation

Consists of FACTS
What is Dimensional and DIMENSIONS
Modelling?
Facts are Numerical
transaction data

Dimension gives
context to Facts
Data Implementation process

Business
Goals

Present Identify
Data Data

Data
Model
and Prepare
Evaluate Data
Data
Steps Involved in Data Cleanup

Removal Of
Unwanted
Observations

Fixing
Handling
Structural
missing data
Errors

Managing
unwanted
Outliers
Steps Involved in Data Cleanup

Handle Missing
Data
• Categorical data – Label them ‘Missing’
Filter Unwanted • Missing numeric data, flag and fill the values
Outliers
• Do not remove outliers until have a legitimate reason
Fix Structural
Errors
• Typos in the name of features
Removal of
• Same attribute with different name
Unwanted • Mislabeled classes, i.e. separate classes that should really be the same or inconsistent capitalization
Observations

• Duplicate Observations
• Irrelevant Observations

23
Session 9,10
Topics :
• Machine learning – Supervised, Unsupervised
• Classification
• Association rules, market basket analysis
• Regression , Decision trees
• Clustering
• Neural networks
• Time series forecasting
Session Objective:
Develop an understanding about the specific
data mining techniques and its different steps
Analytics Methodology
Business
Goals

Present Hypothesi
Data story s questions

Analytics
Advanced Data
modelling sourcing

Data clean
Explorator
up &
y data
transforma
analysis
tion
END OF SESSION

You might also like