0% found this document useful (0 votes)

213 views30 pages

Introduction To The Ibm Dataops Methodology and Practice

Uploaded by

Griselda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

213 views30 pages

Introduction To The Ibm Dataops Methodology and Practice

Uploaded by

Griselda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Introduction to the IBM DataOps

methodology and practice

—

Julie Lockner
Director, Portfolio Optimization
and Offering Management
IBM Data and AI

Steven Eliuk
VP, Deep Learning &
Governance Automation
IBM Global CDO
There is no AI 81% 8X
without IA do not understand
the data required
AI pioneers are 8X
more likely to have
(information architecture) for AI a robust data
architecture

“No amount of AI algorithmic sophistication

will overcome a lack of data (architecture)...”
Data collection & preparation is the most
time consuming and difficult part of AI.
IBM Watson / © 2020 IBM Corporation
2
The AI Ladder
A prescriptive approach to the journey to AI

INFUSE - Operationalize AI throughout the business

AI
ANALYZE - Build and scale AI with trust & explainability

MODERNIZE
ORGANIZE - Create a business-ready analytics foundation Unlock the value of data for
an AI and multicloud world
COLLECT - Make data simple and accessible

One Platform, Any

Talent &
Cloud
Skills

IBM Watson / © 2020 IBM Corporation

ORGANIZE
DataOps delivers business-ready data fast
Know your data

Trust your data

Use your data

4
ORGANIZE:
Critical information architecture capabilities

Know Trust Use

COLLECT Data Integration Self-service ANALYZE

Data Quality
Data Governance interaction for
Data Replication data preparation
and Curation Master Data
Management and testing
Data Virtualization

Catalog & Metadata Management

Problem Statement: Business users need access to high quality data
fast. Data pipelines are the primary source of bottlenecks.

Prepare Data Pipelines

“Most dreaded part of AI” Build Run Manage
Data Operations
Discover, understand, ingest,
integrate, assess quality, clean data

Months - Quarters

IBM Watson / © 2020 IBM Corporation 6

Poor Data Quality and Governance Cause Negative Business
Impact

“Our study shows that 95% of organizations see negative impacts from
poor data quality, resulting in wasted resources and additional costs.”
https://www.experian.co.uk/assets/data-quality/experian-global-data-management-report-jan-2019.pdf

IBM Watson / © 2020 IBM Corporation

Introducing DataOps

“DataOps is a collaborative data management

practice focused on improving the communication,
integration and automation of data flows between
data managers and data consumers across an
organization.”

Gartner

IBM Watson / © 2020 IBM Corporation 8

DataOps Consistently DataOps expedites delivery of high-quality data by:

Delivers High Quality Data — Streamlining data pipeline processes.

Fast — Automating core operations on data.

— Incorporating agile processes and workflows.

— Taps into data sources and consumers for end-

to-end DataOps.

Prepare Build Run Manage — Automates test data generation and

management
— Enables collaborative communication across
key stakeholders and SME.

Hours - Days

Months - Quarters

IBM Watson / © 2020 IBM Corporation 9

DataOps Impact – Know Your Data in Minutes
Data Inventory Case Study

200,000 2 Hour
85% 90%
ROI

Reduction in business Reduction in time to Number of technical Uncovered Protected

glossary creation time discover metadata assets across multiple Health Information
and assign terms clouds discovered in PHI / PII exposure
less than 5 mins

Financial Services, Telecommunications, Retail Examples, Healthcare Payer

IBM Watson / © 2020 IBM Corporation
DataOps Impact - Trust Your Data
Data Quality Case Study
International Bank

Data records update speed With DataOps

13 50
Per hour (manual)
Per min (automated)

Data quality score

6% 93%
Per hour (manual)
Per min (automated)

Net promoter score

2 years 230x
Data quality improvement

IBM Watson / © 2020 IBM Corporation

DataOps Impact – Use Your Data
Data Integration Use Case
Leading European Retailer

Data change delay on Customer affinity Inventory stock

reporting systems analysis positions

>3 weeks 20 days

~24 hours

DataOps Impact

< 2 minutes < 1 day < 4 hours

IBM Watson / © 2020 IBM Corporation
Comparing the two scenarios.
Which one is yours?
Without DataOps With DataOps

80%
Data Prep

1
3

Single iteration
Multiple iterations
Months-Quarters
Days-Weeks
One outcome, costly if wrong
Multiple outcomes, more chances for success
IBM Watson / © 2020 IBM Corporation
DataOps requires Automation
and Multicloud Architecture

Automated Automated Self-services

data curation metadata interaction
and quality management
services and catalog Automated
Organize services data
integration
DataOps Delivers Business
Ready Data Fast Automated test data management services

Business-ready
Automated master data management data

Governed data access services

On-Prem

IBM Watson / © 2020 IBM Corporation 14

DataOps Maturity Model • Know: Enforced and Enriched Catalog
Advanced • Trust: Compliance, Business Ontology and Automated
Increased business value Classification
and speed in Delivering DataOps
• Use: DataOps for All Data Pipelines
business-ready data.

• Know: Enterprise Catalog

Developed • Trust: Data Governance Program with Data Stewardship and Business Glossary
• Use: Self Service Data Prep and Test Data Management
DataOps

Foundational • Know: Departmental / LOB Catalog

• Trust: Data Quality Program
DataOps • Use: Data Virtualization, Data Integration and Data Replication

• Know: Spreadsheets
No DataOps • Trust: Emails
• Use: Hand coding

IBM Watson / © 2020 IBM Corporation

DataOps Methodology
DataOps Methodology
— Prioritize and align data pipelines with business
Automates Data Management objective and success criteria.
Best Practices — Associated with the Data Engineering discipline
— Automatically measures accuracy and speed of data
capture, quality and use.
— Automates data and metadata ingestion and
classification.
— Automatically assesses data quality issues and
alerts when anomalies are detected.
— Automatically initiates remediation via
workflow.
— Automates test data management
Inventory and Publish data Deliver quality and
categorize data and use governance — Automatically ensures authorized use of published
data assets by enforcing data privacy and
governance policies.
IBM Watson / © 2020 IBM Corporation
DataOps Interoperates with Peer DataOps Interoperates Cross-Functionally

Organizations - Application development teams publish source data

and incorporate feedback from DataOps to improve
data definitions and data quality.

- IT security and compliance teams publish security,

privacy and governance policies to DataOps teams to
be enforced and respond to audits when necessary.

- Data science teams consume data assets published

by data engineering and leverage DataOps for model
lineage, data definitions and security and privacy
policies.

- Lines-of-business leverage the output of DataOps for

accessing high-quality data quickly and efficiently
while providing feedback for data definitions, data
quality and submitting new assets to be catalogued,
assessed and published.

IBM Watson / © 2020 IBM Corporation

DataOps combines people,
process and technology
Executive Sponsor
Organization
design Executive Steering Committee
CDO, CIO, LOB Execs,
Chief Risk Officer

DataOps Data Architecture Working Group Enterprise Data Governance Council

Data Pipeline Deployment & Test Enterprise Data Architect Data Governance Manager
DataOps Monitoring & Management Data Modelers Business Process Owners
Self-Service Operations Database Administrators Compliance and Legal

Lead Data Steward

Data Governance Office

Meta Administrator
Data Engineers Data Custodians Domain Data Stewards
Data Governance Analyst

IBM Watson / © 2020 IBM Corporation

DataOps in Action
at IBM’s Global Chief Data Office

IBM Watson / © 2020 IBM Corporation

IBM Global Chief Data Office
Organizational Structure
IBM CEO

SVP Finance & Operations, Chief

Financial Officer

Enterprise Ops & Services VP Finance, Controller Global Chief Data Office CIO

CAO

Enterprise Data & AI Platform Enterprise Data Governance Adoption & Value Creation Client & Product Master Data Deep Learning

Advanced Technology Enterprise Data Standards Discovery Client Reference Data

Hybrid Cloud Development

E2E Data Flows Budget & Financial Controls Product Data
Environment

Production Platform & Solutions Enterprise Governance Workflow Modernization & Transformation
Platform Adoption
Engineering Delivery automation leveraging Enterprise Data & AI
Platform
Business Controls, Support & Data Acquisition (M&A, 3rd Party, AI Accelerator
Operations Public)

Production Platform Release

Data Stewardship BUDO Network
Mgmt & Project Mgmt

Importance of Metadata
METADATA makes data visible and
Metadata understandable

Every enterprise struggles with the

problem of labeling Metadata
unlocks data

It can take DAYS for SMEs to Users can easily find, understand and trust the
review/ approve business data they need to drive
term business insights WITH SPEED

Large risk item, consider:

• Untapped potential in dark data
• Data Governance, Compliance, Audits, potential Leakage of sensitive data

Examples of Metadata Benefits

Regulatory Productivity & Discovery

Data is abundant. Much of it comes from existing systems and data
Compliance stores for which no documentation exists or the documentation
does not reflect the changes and updates of those systems and
data stores.
Metadata management conducted
on a unified platform that provides • Data scientists can spend 80% of their time finding and
stewardship, data lineage, and cleaning data prior to using it!
impact analysis services is the best
assurance that an organization can
validate and demonstrate that the
data reported is true.
Risk Avoidance
Metadata management provides the measure of trust that businesses
need. Through data lineage and impact analysis, businesses can know
• e.g., GDPR, Government the accuracy, completeness and currency of the data used in their
Owned Entity planning or decision-making models.

IBM GCDO automated metadata generation (AMG)

Implementation Challenges addressed

Automated Metadata Generation (AMG)
uses automation and data science to link data Distributed Federated Learning

• A complex series of organic Deep Lack of data for model training impacts the performance
Learning models were developed for
CEDP metadata classifications Local restrictions related to processing of the business
information within the limits of certain jurisdiction
• Backed by micro-services: Can be
installed anywhere (cloud, container)

• ~60TB of labeled training data in Compliance with local regulation

addition to public sources and
synthetically generated data Larger volume of training data allows to achieve better
performance
No isolated business units that lack training data
IBM Watson / © 2020 IBM Corporation
IBM GCDO Automated Metadata Generation (AMG)
An AI-powered process for curating, verifying, and classifying data
that enhances speed and usability at speed

95% reduction
Up to
~$27
in cycle time: Dramatically enhanced
million
targeted at full automation in 18 months Data Quality
with regulatory & in
governance checks productivity
savings

Unified.
Classifying terabytes of data to make it easily discoverable while providing
the data stewardship, lineage, and impact analysis to assure it is trustworthy

24
Small Tag Set as a Product

Project Stages How we define it: 30%

of data
2500
To provide top-5 recommendation • Better prediction quality is
terms
1 available for the small tag set
5x less workload • No need to provide top-5
70%
of data
recommendations, the
600
choice is easy terms
To provide single recommendation
2
20x less workload ~95% workload decrease
To provide the correct Metadata
3
NO workload, almost. Goal: full automation, i.e. zero SME involved

Watson Knowledge Catalog
Automated cataloging to discover, classify, prepare & share data

• ML-driven intelligent discoverability of data sources,

models, notebooks, AI artifacts
• Operationalize Data governance program
• Data lineage in the language of the Business

Watson Knowledge Catalog now with automated
metadata generation

Up to 96% accuracy Business terms can differ across the different groups
on holdout data in an organization.

To address this:
AMG's classifications in the current release
Up to 70% accuracy use an "umbrella" set of 25 terms defined to cover
on data that was once the varying cases we see at the GCDO

inaccessible

AMG capability roadmap

Concept development of MVP 1

Proven internally in GCDO Released in Watson Knowledge
And on external enterprise Catalog services for
use cases Cloud Pak for Data

Q2 2018 Q4 2018 2020

Q1 2018 Q4 2019

MVP 2 MVP 3 Subsequent Subsequent

release 1 release 2
Getting Started
Use your data Know your data

— Try Watson Knowledge Catalog today at

ibm.com/Watson-Knowledge-Catalog
— Schedule a DataOps Garage Workshop with one of
our DataOps Center of Excellence Experts by
contacting [email protected]
— Learn more about IBM DataOps at ibm.com/DataOps

Trust your data

DAMA-DMBOK: Data Management Body of Knowledge: 2nd Edition. ISBN 1634622340, 978-1634622349
88% (34)
DAMA-DMBOK: Data Management Body of Knowledge: 2nd Edition. ISBN 1634622340, 978-1634622349
23 pages
Data Management - Jesús Tapia
No ratings yet
Data Management - Jesús Tapia
35 pages
Big Data
No ratings yet
Big Data
106 pages
Data Fabric - BPs - 2023
No ratings yet
Data Fabric - BPs - 2023
24 pages
Watson Detailed
No ratings yet
Watson Detailed
39 pages
Novigo Solutions Capabilities
No ratings yet
Novigo Solutions Capabilities
45 pages
cloud-pak-for-data
No ratings yet
cloud-pak-for-data
7 pages
DMMA Maturity Assessment Report
No ratings yet
DMMA Maturity Assessment Report
45 pages
04 CaseStudy DataPlatformPeopleStrategy Rao Tom
No ratings yet
04 CaseStudy DataPlatformPeopleStrategy Rao Tom
30 pages
Session 1
No ratings yet
Session 1
49 pages
Part 4: Modernize Your Predictive and Prescriptive Analytics
No ratings yet
Part 4: Modernize Your Predictive and Prescriptive Analytics
30 pages
IBM Cognos Framework Manager
0% (1)
IBM Cognos Framework Manager
456 pages
New Microsoft PowerPoint Presentation
No ratings yet
New Microsoft PowerPoint Presentation
15 pages
ultimate-guide-to-building-a-data-foundation
No ratings yet
ultimate-guide-to-building-a-data-foundation
23 pages
Online_Assets_Hitachi_Vantara_DataOps_Unlocks_Value_of_Data
No ratings yet
Online_Assets_Hitachi_Vantara_DataOps_Unlocks_Value_of_Data
15 pages
A Guide To Data Governance
No ratings yet
A Guide To Data Governance
26 pages
Unit 2 PPT (BA)
No ratings yet
Unit 2 PPT (BA)
33 pages
Field Guide: IBM Data and Analytics Strategy
No ratings yet
Field Guide: IBM Data and Analytics Strategy
42 pages
Data Ops
100% (2)
Data Ops
26 pages
HR Automation Practices and Its Impact On Employee Satisfaction
No ratings yet
HR Automation Practices and Its Impact On Employee Satisfaction
64 pages
04 - IBM Watsonx - Data Seller Enablement
No ratings yet
04 - IBM Watsonx - Data Seller Enablement
42 pages
Unit_1pdf__2025_01_30_07_50_50
No ratings yet
Unit_1pdf__2025_01_30_07_50_50
137 pages
WP EN DI Talend DefinitiveGuide DataIntegration
No ratings yet
WP EN DI Talend DefinitiveGuide DataIntegration
78 pages
Atlan - Data Management Report
No ratings yet
Atlan - Data Management Report
14 pages
FBA-FINALS-LONG-QUIZ
No ratings yet
FBA-FINALS-LONG-QUIZ
13 pages
mdmdgsummit21
No ratings yet
mdmdgsummit21
18 pages
FBA-LONG-QUIZ
No ratings yet
FBA-LONG-QUIZ
7 pages
The DAMA Guide To The Data Management 1a 6
100% (2)
The DAMA Guide To The Data Management 1a 6
176 pages
IBM Data Product Hub Vision and Roadmap 3309 (1)
No ratings yet
IBM Data Product Hub Vision and Roadmap 3309 (1)
23 pages
IBM 2553-DataOps - Whitepaper.Update-RGB-V1 1
No ratings yet
IBM 2553-DataOps - Whitepaper.Update-RGB-V1 1
13 pages
Artificial Intelligence For Data Driven Disruption White Paper 3328en
No ratings yet
Artificial Intelligence For Data Driven Disruption White Paper 3328en
29 pages
Kevin Bartley Portfolio
No ratings yet
Kevin Bartley Portfolio
29 pages
Transform Data Into Actionable Insights
No ratings yet
Transform Data Into Actionable Insights
17 pages
Chapter 1 Summary
No ratings yet
Chapter 1 Summary
7 pages
DG Deployment
No ratings yet
DG Deployment
38 pages
prompt
No ratings yet
prompt
3 pages
AEB-1184 DataOps Flipbook v2.4.2b
100% (1)
AEB-1184 DataOps Flipbook v2.4.2b
13 pages
Data Democratization
No ratings yet
Data Democratization
50 pages
Distribution Strategies Tactics
No ratings yet
Distribution Strategies Tactics
119 pages
Data Audit Approach To Developing An Enterprise Data Strategy
100% (1)
Data Audit Approach To Developing An Enterprise Data Strategy
66 pages
20221212519-DataOpsMethodology
No ratings yet
20221212519-DataOpsMethodology
2 pages
5pages From FY093-Data Governance Tools - Evaluati - Sunil Soares
No ratings yet
5pages From FY093-Data Governance Tools - Evaluati - Sunil Soares
10 pages
ACC IT APP MIdterm Bigdata
No ratings yet
ACC IT APP MIdterm Bigdata
12 pages
Big Data Ibm 2014
No ratings yet
Big Data Ibm 2014
33 pages
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
No ratings yet
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
18 pages
Build Data Warehouse People Use Trust WP en US
No ratings yet
Build Data Warehouse People Use Trust WP en US
15 pages
Big Data - Work Program - 05 - Data Architecture Analysis and Design (10 24 2013)
No ratings yet
Big Data - Work Program - 05 - Data Architecture Analysis and Design (10 24 2013)
7 pages
Cloudera Infobrief Final
No ratings yet
Cloudera Infobrief Final
19 pages
A Guide To: Data Science at Scale
No ratings yet
A Guide To: Data Science at Scale
20 pages
p6 Eppm Upgrade Config
No ratings yet
p6 Eppm Upgrade Config
56 pages
Cost Reduction in Manufacturing Industry Pitch Deck by Slidesgo
No ratings yet
Cost Reduction in Manufacturing Industry Pitch Deck by Slidesgo
41 pages
Getting Started With Data Governance: August 2008
No ratings yet
Getting Started With Data Governance: August 2008
12 pages
What is Data Management
No ratings yet
What is Data Management
3 pages
Question
No ratings yet
Question
3 pages
DataOps and The Future of Management
No ratings yet
DataOps and The Future of Management
8 pages
SAP Data Quality
No ratings yet
SAP Data Quality
58 pages
Maryam BA Assgn 1
No ratings yet
Maryam BA Assgn 1
4 pages
Big Data Analytics Applications in Business and Marketing (Kiran Chaudhary, Mansaf Alam) (z-lib.org) Pages 1-50 - Flip PDF Download _ FlipHTML5
No ratings yet
Big Data Analytics Applications in Business and Marketing (Kiran Chaudhary, Mansaf Alam) (z-lib.org) Pages 1-50 - Flip PDF Download _ FlipHTML5
15 pages
Enterprise Structures
No ratings yet
Enterprise Structures
47 pages
Apptio TBM Model - Summary
No ratings yet
Apptio TBM Model - Summary
13 pages
Unit-1 Overview of Different Types of Decision Making. Syllabus
100% (1)
Unit-1 Overview of Different Types of Decision Making. Syllabus
22 pages
sma unit1
No ratings yet
sma unit1
26 pages
محاضرة 1 جودة
No ratings yet
محاضرة 1 جودة
29 pages
Product Lifecycle Management (PLM) : A Key Enabler in Implementation of Industry 4.0
No ratings yet
Product Lifecycle Management (PLM) : A Key Enabler in Implementation of Industry 4.0
32 pages
Using Seeded Data Extract Services For Oracle ERP Financial Cloud
No ratings yet
Using Seeded Data Extract Services For Oracle ERP Financial Cloud
18 pages
CAD PART-4
No ratings yet
CAD PART-4
8 pages
Brief Outline On Speciality Chemical Inds
No ratings yet
Brief Outline On Speciality Chemical Inds
11 pages
1.1 - Nature of Software
No ratings yet
1.1 - Nature of Software
11 pages
PLM XML and Teamcenter XML
100% (1)
PLM XML and Teamcenter XML
22 pages
Project of Tranining and Development Strategies of Wipro
No ratings yet
Project of Tranining and Development Strategies of Wipro
23 pages
Olalekan P. Adebulu
No ratings yet
Olalekan P. Adebulu
22 pages
Manufacturing Support Systems Unit-6
No ratings yet
Manufacturing Support Systems Unit-6
4 pages
MGT300 Group Assignment
No ratings yet
MGT300 Group Assignment
9 pages
Land Registration Procedure
No ratings yet
Land Registration Procedure
6 pages
Price Action Diver Power - Manual Eng
0% (1)
Price Action Diver Power - Manual Eng
5 pages
Environment On PNP Smart Camp
No ratings yet
Environment On PNP Smart Camp
13 pages
Object Oriented Analysis and Design - Part1 (Analysis) : Ibm Ooad 833
No ratings yet
Object Oriented Analysis and Design - Part1 (Analysis) : Ibm Ooad 833
11 pages
RealProAssist - Presentation
No ratings yet
RealProAssist - Presentation
6 pages
ICT2622-Exam OctNov 2023
No ratings yet
ICT2622-Exam OctNov 2023
4 pages
Office 365 Readiness Assessment
No ratings yet
Office 365 Readiness Assessment
2 pages
Ccs334 - Big Data Analytics
75% (4)
Ccs334 - Big Data Analytics
2 pages
Trouble Shooting Process
No ratings yet
Trouble Shooting Process
2 pages
Informatica Solutions and Data Integration: Definitive Reference for Developers and Engineers
From Everand
Informatica Solutions and Data Integration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Effective Business Intelligence with QuickSight
From Everand
Effective Business Intelligence with QuickSight
Rajesh Nadipalli
No ratings yet
Domo Platform Essentials: Definitive Reference for Developers and Engineers
From Everand
Domo Platform Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance
From Everand
IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance
Sunil Soares
3.5/5 (2)
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet