Chapter 6
Chapter 6
(c) To be able to perform scrubbing/cleaning of data (d) To be able to apply de-duplication (e) To be able to enhance the quality of data
Session Plan
Lecture time
90 minutes
Q/A
15 minutes
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Agenda
BI the process What is Data Integration? Challenges in Data Integration Technologies in Data Integration ETL: Extract, Transform, Load Various stages in ETL Need for Data Integration Advantages of using Data Integration Common approaches to Data Integration Metadata and its types Data Quality and Data Profiling concepts
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
BI The Process
Data Integration
Data Analysis
Reporting
Process of coherent merging of data from various data sources and presenting a cohesive/consolidated view to the user
Involves combining data residing at different sources and providing users with a unified view of the data.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
According to your understanding What are the problems faced in Data Integration?
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Data Staging
Archive
Clean up
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
VALIDATE REFERENCE
STAGE
EXTRACT
TRANSFORM
ARCHIVE
----------------
AUDIT REPORTS
PUBLISH
Extracting data from different sources Transforming it to fit operational needs (which can include quality levels)
Loading it into the end target (database or data warehouse)
Allows to create efficient and consistent databases While ETL can be referred in the context of a data warehouse, the term ETL is in fact referred to as a process that loads any database. Usually ETL implementations store an audit trail on positive and negative process runs.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Data Mapping
The process of creating data element mapping between two distinct data models It is used as the first step towards a wide variety of data integration tasks The various method of data mapping are
Hand-coded, graphical manual Graphical tools that allow a user to draw lines from fields in one set of data to fields in another Data-driven mapping Evaluating actual data values in two data sources using heuristics and statistics to automatically discover complex mappings Semantic mapping A metadata registry can be consulted to look up data element synonyms If the destination column does not match the source column, the mappings will be made if these data elements are listed as synonyms in the metadata registry Only able to discover exact matches between columns of data and will not discover any transformation logic or exceptions between columns
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Data Staging
A data staging area is an intermediate storage area between the sources of information and the Data Warehouse (DW) or Data Mart (DM) A staging area can be used for any of the following purposes:
Data cleansing
Pre-calculate aggregates.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Data Extraction
Extraction is the operation of extracting data from the source system for further use in a data warehouse environment. This the first step in the ETL process. Designing this process means making decisions about the following main aspects: Which extraction method would I choose? How do I provide the extracted data for further processing?
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Data Transformation
It is the most complex and, in terms of production the most costly part of ETL process. They can range from simple data conversion to extreme data scrubbing techniques. From an architectural perspective, transformations can be performed in two ways. Multistage data transformation
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Data Transformation
LOAD INTO
STAGE_01 TABLE
STAGIN
G TABLE
STAGE_02 TABLE
STAGE_03 TABLE
TARGET TABLE
MULTISTAGE TRANSFORMATION
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Data Transformation
EXTERNAL TABLE
TARGET TABLE
PIPELINED TRANSFORMATION
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Data Loading
The load phase loads the data into the end target, usually the data warehouse (DW). Depending on the requirements of the organization, this process varies widely. The timing and scope to replace or append into the DW are strategic design choices dependent on the time available and the business needs. More complex systems can maintain a history and audit trail of all changes to the data loaded in the DW.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
According to your understanding What is the need for Data Integration in corporate world ?
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
SQL
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
What it means?
DB2 Unified view of data Oracle
SQL
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
According to your understanding What are the advantages and limitations of Data Warehouse?
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Data definitions, Metrics definitions, Subject models, Data models, Business rules, Data rules, Data owners/stewards, etc. Source/target maps, Transformation rules, data cleansing rules, extract audit trail, transform audit trail, load audit trail, data quality audit, etc. Data locations, Data formats, Technical names, Data sizes, Data types, indexing, data structures, etc. Data access history: Who is accessing? Frequency of access? When accessed? How accessed? , etc.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Integration
Enrichment Monitoring
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Data Profiling
Beginning the data improvement efforts by knowing where to begin. Data profiling (sometimes called data discovery or data quality analysis) helps to gain a clear perspective on the current integrity of data. It helps: Discover the quality, characteristics and potential problems Reduce the time and resources in finding problematic data Gain more control on the maintenance and management of data Catalog and analyze metadata The various steps in profiling include Metadata analysis Outline detection Data validation Pattern analysis Relationship discovery Statistical analysis Fundamentals of Business Analytics RN Prasad and Seema Acharya Business rule validation Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Data Quality
Correcting, standardizing and validating the information Creating business rules to correct, standardize and validate your data. High-quality data is essential to successful business operations.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
What do you think are the major causes of bad data quality?
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Automation Process
Data Purging
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
ETL Tools
ETL process can be create using programming language. Open source ETL framework tools Clover.ETL Enhydra Octopus Pentaho Data Integration (also known as Kettle) Talend Open Studio Popular ETL Tools Ab Initio Business Objects Data Integrator Informatica SQL Server 2005/08 Integration services
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Summary please
Fundamentals of Business Analytics RN Prasad and Seema Acharya Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.