0% found this document useful (0 votes)
213 views30 pages

Introduction To The Ibm Dataops Methodology and Practice

Uploaded by

Griselda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
213 views30 pages

Introduction To The Ibm Dataops Methodology and Practice

Uploaded by

Griselda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Introduction to the IBM DataOps

methodology and practice


Julie Lockner
Director, Portfolio Optimization
and Offering Management
IBM Data and AI

Steven Eliuk
VP, Deep Learning &
Governance Automation
IBM Global CDO
There is no AI 81% 8X
without IA do not understand
the data required
AI pioneers are 8X
more likely to have
(information architecture) for AI a robust data
architecture

“No amount of AI algorithmic sophistication


will overcome a lack of data (architecture)...”
Data collection & preparation is the most
time consuming and difficult part of AI.
IBM Watson / © 2020 IBM Corporation
2
The AI Ladder
A prescriptive approach to the journey to AI

INFUSE - Operationalize AI throughout the business

AI
ANALYZE - Build and scale AI with trust & explainability

MODERNIZE
ORGANIZE - Create a business-ready analytics foundation Unlock the value of data for
an AI and multicloud world
COLLECT - Make data simple and accessible

One Platform, Any


Talent &
Cloud
Skills

IBM Watson / © 2020 IBM Corporation


ORGANIZE
DataOps delivers business-ready data fast
Know your data

Trust your data

Use your data


4
ORGANIZE:
Critical information architecture capabilities

Know Trust Use

COLLECT Data Integration Self-service ANALYZE


Data Quality
Data Governance interaction for
Data Replication data preparation
and Curation Master Data
Management and testing
Data Virtualization

Catalog & Metadata Management


Problem Statement: Business users need access to high quality data
fast. Data pipelines are the primary source of bottlenecks.

Prepare Data Pipelines


“Most dreaded part of AI” Build Run Manage
Data Operations
Discover, understand, ingest,
integrate, assess quality, clean data

Months - Quarters

IBM Watson / © 2020 IBM Corporation 6


Poor Data Quality and Governance Cause Negative Business
Impact

“Our study shows that 95% of organizations see negative impacts from
poor data quality, resulting in wasted resources and additional costs.”
https://www.experian.co.uk/assets/data-quality/experian-global-data-management-report-jan-2019.pdf

IBM Watson / © 2020 IBM Corporation


Introducing DataOps

“DataOps is a collaborative data management


practice focused on improving the communication,
integration and automation of data flows between
data managers and data consumers across an
organization.”

Gartner

IBM Watson / © 2020 IBM Corporation 8


DataOps Consistently DataOps expedites delivery of high-quality data by:

Delivers High Quality Data — Streamlining data pipeline processes.

Fast — Automating core operations on data.

— Incorporating agile processes and workflows.

— Taps into data sources and consumers for end-


to-end DataOps.

Prepare Build Run Manage — Automates test data generation and


management
— Enables collaborative communication across
key stakeholders and SME.

Hours - Days

Months - Quarters

IBM Watson / © 2020 IBM Corporation 9


DataOps Impact – Know Your Data in Minutes
Data Inventory Case Study

200,000 2 Hour
85% 90%
ROI

Reduction in business Reduction in time to Number of technical Uncovered Protected


glossary creation time discover metadata assets across multiple Health Information
and assign terms clouds discovered in PHI / PII exposure
less than 5 mins

Financial Services, Telecommunications, Retail Examples, Healthcare Payer


IBM Watson / © 2020 IBM Corporation
DataOps Impact - Trust Your Data
Data Quality Case Study
International Bank

Data records update speed With DataOps


13 50
Per hour (manual)
Per min (automated)

Data quality score


6% 93%
Per hour (manual)
Per min (automated)

Net promoter score


2 years 230x
Data quality improvement

IBM Watson / © 2020 IBM Corporation


DataOps Impact – Use Your Data
Data Integration Use Case
Leading European Retailer

Data change delay on Customer affinity Inventory stock


reporting systems analysis positions

>3 weeks 20 days


~24 hours

DataOps Impact

< 2 minutes < 1 day < 4 hours


IBM Watson / © 2020 IBM Corporation
Comparing the two scenarios.
Which one is yours?
Without DataOps With DataOps

80%
Data Prep

1
3

Single iteration
Multiple iterations
Months-Quarters
Days-Weeks
One outcome, costly if wrong
Multiple outcomes, more chances for success
IBM Watson / © 2020 IBM Corporation
DataOps requires Automation
and Multicloud Architecture

Automated Automated Self-services


data curation metadata interaction
and quality management
services and catalog Automated
Organize services data
integration
DataOps Delivers Business
Ready Data Fast Automated test data management services

Business-ready
Automated master data management data

Governed data access services

On-Prem

IBM Watson / © 2020 IBM Corporation 14


DataOps Maturity Model • Know: Enforced and Enriched Catalog
Advanced • Trust: Compliance, Business Ontology and Automated
Increased business value Classification
and speed in Delivering DataOps
• Use: DataOps for All Data Pipelines
business-ready data.

• Know: Enterprise Catalog


Developed • Trust: Data Governance Program with Data Stewardship and Business Glossary
• Use: Self Service Data Prep and Test Data Management
DataOps

Foundational • Know: Departmental / LOB Catalog


• Trust: Data Quality Program
DataOps • Use: Data Virtualization, Data Integration and Data Replication

• Know: Spreadsheets
No DataOps • Trust: Emails
• Use: Hand coding

IBM Watson / © 2020 IBM Corporation


DataOps Methodology
DataOps Methodology
— Prioritize and align data pipelines with business
Automates Data Management objective and success criteria.
Best Practices — Associated with the Data Engineering discipline
— Automatically measures accuracy and speed of data
capture, quality and use.
— Automates data and metadata ingestion and
classification.
— Automatically assesses data quality issues and
alerts when anomalies are detected.
— Automatically initiates remediation via
workflow.
— Automates test data management
Inventory and Publish data Deliver quality and
categorize data and use governance — Automatically ensures authorized use of published
data assets by enforcing data privacy and
governance policies.
IBM Watson / © 2020 IBM Corporation
DataOps Interoperates with Peer DataOps Interoperates Cross-Functionally

Organizations - Application development teams publish source data


and incorporate feedback from DataOps to improve
data definitions and data quality.

- IT security and compliance teams publish security,


privacy and governance policies to DataOps teams to
be enforced and respond to audits when necessary.

- Data science teams consume data assets published


by data engineering and leverage DataOps for model
lineage, data definitions and security and privacy
policies.

- Lines-of-business leverage the output of DataOps for


accessing high-quality data quickly and efficiently
while providing feedback for data definitions, data
quality and submitting new assets to be catalogued,
assessed and published.

IBM Watson / © 2020 IBM Corporation


DataOps combines people,
process and technology
Executive Sponsor
Organization
design Executive Steering Committee
CDO, CIO, LOB Execs,
Chief Risk Officer

DataOps Data Architecture Working Group Enterprise Data Governance Council


Data Pipeline Deployment & Test Enterprise Data Architect Data Governance Manager
DataOps Monitoring & Management Data Modelers Business Process Owners
Self-Service Operations Database Administrators Compliance and Legal

Lead Data Steward

Data Governance Office


Meta Administrator
Data Engineers Data Custodians Domain Data Stewards
Data Governance Analyst

IBM Watson / © 2020 IBM Corporation


DataOps in Action
at IBM’s Global Chief Data Office

IBM Watson / © 2020 IBM Corporation


IBM Global Chief Data Office
Organizational Structure
IBM CEO

SVP Finance & Operations, Chief


Financial Officer

Enterprise Ops & Services VP Finance, Controller Global Chief Data Office CIO

CAO

Enterprise Data & AI Platform Enterprise Data Governance Adoption & Value Creation Client & Product Master Data Deep Learning

Advanced Technology Enterprise Data Standards Discovery Client Reference Data

Hybrid Cloud Development


E2E Data Flows Budget & Financial Controls Product Data
Environment

Production Platform & Solutions Enterprise Governance Workflow Modernization & Transformation
Platform Adoption
Engineering Delivery automation leveraging Enterprise Data & AI
Platform
Business Controls, Support & Data Acquisition (M&A, 3rd Party, AI Accelerator
Operations Public)

Production Platform Release


Data Stewardship BUDO Network
Mgmt & Project Mgmt

IBM Watson / © 2020 IBM Corporation


Importance of Metadata
METADATA makes data visible and
Metadata understandable

Every enterprise struggles with the


problem of labeling Metadata
unlocks data

It can take DAYS for SMEs to Users can easily find, understand and trust the
review/ approve business data they need to drive
term business insights WITH SPEED

Large risk item, consider:


• Untapped potential in dark data
• Data Governance, Compliance, Audits, potential Leakage of sensitive data

IBM Watson / © 2020 IBM Corporation


Examples of Metadata Benefits

Regulatory Productivity & Discovery


Data is abundant. Much of it comes from existing systems and data
Compliance stores for which no documentation exists or the documentation
does not reflect the changes and updates of those systems and
data stores.
Metadata management conducted
on a unified platform that provides • Data scientists can spend 80% of their time finding and
stewardship, data lineage, and cleaning data prior to using it!
impact analysis services is the best
assurance that an organization can
validate and demonstrate that the
data reported is true.
Risk Avoidance
Metadata management provides the measure of trust that businesses
need. Through data lineage and impact analysis, businesses can know
• e.g., GDPR, Government the accuracy, completeness and currency of the data used in their
Owned Entity planning or decision-making models.

IBM Watson / © 2020 IBM Corporation


IBM GCDO automated metadata generation (AMG)

Implementation Challenges addressed


Automated Metadata Generation (AMG)
uses automation and data science to link data Distributed Federated Learning

• A complex series of organic Deep Lack of data for model training impacts the performance
Learning models were developed for
CEDP metadata classifications Local restrictions related to processing of the business
information within the limits of certain jurisdiction
• Backed by micro-services: Can be
installed anywhere (cloud, container)

• ~60TB of labeled training data in Compliance with local regulation


addition to public sources and
synthetically generated data Larger volume of training data allows to achieve better
performance
No isolated business units that lack training data
IBM Watson / © 2020 IBM Corporation
IBM GCDO Automated Metadata Generation (AMG)
An AI-powered process for curating, verifying, and classifying data
that enhances speed and usability at speed

95% reduction
Up to
~$27
in cycle time: Dramatically enhanced
million
targeted at full automation in 18 months Data Quality
with regulatory & in
governance checks productivity
savings

Unified.
Classifying terabytes of data to make it easily discoverable while providing
the data stewardship, lineage, and impact analysis to assure it is trustworthy

24
Small Tag Set as a Product

Project Stages How we define it: 30%


of data
2500
To provide top-5 recommendation • Better prediction quality is
terms
1 available for the small tag set
5x less workload • No need to provide top-5
70%
of data
recommendations, the
600
choice is easy terms
To provide single recommendation
2
20x less workload ~95% workload decrease
To provide the correct Metadata
3
NO workload, almost. Goal: full automation, i.e. zero SME involved

IBM Watson / © 2020 IBM Corporation


Watson Knowledge Catalog
Automated cataloging to discover, classify, prepare & share data

• ML-driven intelligent discoverability of data sources,


models, notebooks, AI artifacts
• Operationalize Data governance program
• Data lineage in the language of the Business

IBM Watson / © 2020 IBM Corporation 26


Watson Knowledge Catalog now with automated
metadata generation

Up to 96% accuracy Business terms can differ across the different groups
on holdout data in an organization.

To address this:
AMG's classifications in the current release
Up to 70% accuracy use an "umbrella" set of 25 terms defined to cover
on data that was once the varying cases we see at the GCDO

inaccessible

IBM Watson / © 2020 IBM Corporation


AMG capability roadmap

Concept development of MVP 1


Proven internally in GCDO Released in Watson Knowledge
And on external enterprise Catalog services for
use cases Cloud Pak for Data

Q2 2018 Q4 2018 2020

Q1 2018 Q4 2019

MVP 2 MVP 3 Subsequent Subsequent


release 1 release 2
Getting Started
Use your data Know your data

— Try Watson Knowledge Catalog today at


ibm.com/Watson-Knowledge-Catalog
— Schedule a DataOps Garage Workshop with one of
our DataOps Center of Excellence Experts by
contacting [email protected]
— Learn more about IBM DataOps at ibm.com/DataOps

Trust your data

You might also like