SlideShare a Scribd company logo
FREE TO SHARE
Sheetal Pratik - Saxo Bank
August 2020
Data Workbench
About Saxo
We are leading fintech and regtech
specialists, connecting traders, investors and
partners to more than 35,000 instruments –
across all asset classes – from a single
account.
What we do
We build digital platforms to facilitate multi-
asset market access and provide clients of all
sizes with professional-grade tools, industry-
leading prices and best-in-class service.
Data ForScale: Transforming DataAccess,DataGovernance and DataQuality 2
• A data driven organization need to have multi-level Data Governance. Most of the tools are designed to fix the
fact e.g. before a data warehouse load. What is needed is to ensure data integrity at the origin to prevent the
“butterfly effect” in the downstream systems.
• The article “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh”, clearly emphasizes on how
data platform with a centralized architecture can lead to failures by being bottleneck at certain point and have
impact to stability. Also with ownership of data at the domain level, it becomes a failed attempt to manage the
data dictionary centrally or duplicate the effort of creating and maintaining such data assets.
• Considering this, it is imperative that the solution has to be more futuristic and a straight implementation of any
of the COTS products for Data Cataloguing might not be the right answer to Saxo’s Data Governance
implementation.
• The preferred strategy for tooling is to fix forward rather than attempting to fix the past by using some kind of
crawler and using ML to extract the metadata from various data sources.
DATA GOVERNANCE FOR A DIGITAL NATIVE
3
For DomainTeams
Who need visibility onthe availability,meaning, usage,ownershipand quality of data
The Data Workbench(Owner's pride Neighbor's envy)
Is a one-stopdata shop
That provides transparency of Saxo’s data ecosystem
Unlike our current state whichis becomingincreasingly complex as we grow
Our product will help Saxo to improve time to market andunlock new insights.
The Data Workbench is designedto be part of the new data architecture.It consists oftwomaincomponentsa Data Catalogue anda Data
QualitySolution.
1. The Data Catalogue captures andexposes metadata. This provides transparency into the meaningand ownershipof ourdata. The Data
Catalogue is built on DataHub a data catalogue open-sourcedby LinkedIn. LinkedInis very supportive and are workingclosely withus
helping withthe adoptionof the tool.
2. The Data Quality Solutionis built onthe opensource solution Great Expectations supportedbySuperConductive. Great Expectations is
a declarative,flexible,andextensible data quality solution. It allows teams to define dataquality rules and actively monitor the quality
of their data.
VISION
4
• Federated Data Governance model is an industry trend where the enterprise governance team facilitates the
monitoring and management of the quality of enterprise critical data, with assistance from the business unit.
• LinkedIn’s journey of its shift of approach from initial version of Data Governance solution called WhereHows to
DataHub , is a typical example of the paradigm shift from “a central metadata repository” solution to a more
decentralized architecture that puts domains before anything else to support the possibility of self-service data
platform.
• We realized that a practical way of implementation would be to stay lean and agile and iteratively work with data
domains while establishing the Data Governance framework and thus create a platform that is self-serviced, scalable
and more relevant to stakeholders.
• We had a discussion with LinkedIn to understand their journey, learnings and lessons learnt that motivated them to
evolve from WhereHows to Datahub. We acknowledged that, Saxo Bank is on a similar journey and we can fast
forward the implementation by adopting Datahub open sources that best relates to the ecosystem of Saxo Bank.
• The LinkedIn datahub team has been extremely responsive.
• Other digital natives have also recognized that the incumbent solutions are not fit for the modern age and have
built their specific solutions.
MOTIVATION FOR THE SOLUTION
5
PERSONAS: GOALS AND PAIN POINTSGoalPainPoints
Data Asset
Owner
Data
Steward
Data Governance
Committee Member
Data Scientist Data Consumer
(Reporting)
To find data, its owner of
data and anything else
that helps compliance
To solve business
problems based on
data
To get an overview
of whole data in
the org
To define and
document data
standards
To be responsible
and accountable
for data
Lack of clarity on
ownership of data
New role in Saxo
Bank, so yet to
own full
responsibility
Lack of clarity on
what to mandate
at what level
(federated or data
domain level)
Data missing/
incomplete
See who can
explain data
elements
Not sure if the
data can be
trusted for making
the right decisions
6
DATA MESH APPROACH - PRODUCT THINKING
7
DIRECT AGENT/
FLEET ENGAGEMENT
SPEED OF
DELIVERY
Domain
Polyglot
Output Data
Ports
Polyglot Data
Input Ports
Control
Ports
Logs,
metrics
Self-serve
description
Discoverable
Addressable
Self-describing
Trustworthy
Interoperable
7
TARGET METADATA MODEL
8
STREAM PROCESSING
DATA PLATFORM - HIGH LEVEL OVERVIEW
9
ConsumptionDomain
Data Source
Business
Capabilities
Business
Capabilities
Reports
Trading
Instruments
Prices
Customers/Parti
es
Product
RX
Data Workbench
BUSINESS EVENTS
STREAM PROCESSING
(Enrich/Transform/Aggregate)
DATA STORAGE& MANAGEMENT
DW
DL Storage
DATA PRODUCTS
Native Data Products Domain Data
Products
Aggregated Data
Products
Fit for Purpose Data
Products
Confluent Kafka Platform
Data Catalog:
DATAHUB
DATA QUALITY
Great
Expectations:
OTHER PROCESSING FRAMEWORKS
● SAXO features list
● SAXO initial evaluation params
● TW extended feature list
● Product documentation
● Software: Local installations
● Vendor questionnaire
● Includes initial & shortlisted list
● Data Catalog
● Data Quality
Evaluation Criteria Definition Shortlisted tools Evaluation Process
TOOLS EVALUATION PROCESS
10
11
DATA CATALOG TOOL EVALUATION
11
Prioritized Feature List
● Full Text search on dataset
name, attributes and tags
● Extensiblesearch model
Metadata Search
● Web-based UI to show
metadata, governance
attributes, tags and lineage
● Ability to edit and enrich
attributes
Metadata UI
● Push-based REST API
● Pull-based adapters for
Snowflake and CRM dynamics
● Extensibility
Metadata Ingestion
● Metadata entity for datasets,
its users and attributes
● Business glossary &
documentation
● Extensibility
Metadata Modelling
● Dataset lineagewith upstream
and downstream provenance
● Integration with data
processing/orchestration tools
Data Lineage
● Support for metadata
enrichment and tagging
● Ability to flaga dataset
Data Stewardship
● Shows related Quality Attributes in
UI
● Extensibleto integrate with any DQ
tool
Data Quality Integration
● Cloud-native(Scalable& High
Availability)
● Configurable
● Extensible
Architecture
● Data as a Product
● Distributed Domain Driven
Architecture
● Self-serviceplatform
Alignment with Data Mesh
● Authentication / LDAP
● Authorization / RBAC
Security ● Metadata Versioning
● Data Virtualization
● ML/AI capabilities
Deprioritized
● Export API
Metadata Export
● LicensingCost
● Customization / Development cost
Total Cost of Ownership
● Release cycle
● Community support
● Commercial Support
● Documentation
Support
11
TOOLS LANDSCAPE
Data Catalog
Collibra
Informatica EDC
Alatian
Data.World
Azure Data Catalog - Prev2
Zeenea
Apache Atlas
Linkedin DataHub
Amundsen
Marquez
Commercial
Open Source
In House
12
TOOLS LANDSCAPE
Data Catalog
Collibra
Informatica EDC
Alatian
Data.World
Azure Data Catalog - Prev2
Zeenea
Apache Atlas
Linkedin DataHub
Amundsen
Marquez
Commercial
Open Source
In House
13
DATA CATALOG TOOL EVALUATION
Deep-dive analysis of the capabilities of shortlisted tools purely as per the teams understaning in Saxo’s context.
Datahub Marquez Amundsen Collibra Zeenea
Metadata Search *
Metadata UI Editable
Metadata Ingestion *
Metadata Modelling *
Data Lineage *
Metadata Export *
Data Stewardship
Data Quality Integration
Architecture * Only supports
AWS
Security *
Alignment with Data Mesh *
Support
Total Cost of Ownership *
Completelysuitable
Partiallysuitable
No/Minimal suitability
Commercial
Open Source
• Push Based approach that supports Event Driven
architecture. The solutionbuilt onthe principle of
self-service andproducers know theirdata better
and they canprovide the rich metadata so that it
helps in discover the data-assets andencourages
consumption.
• DetailedEvaluationwascarriedout fromSaxo
perspective.
• Possibility of evolution
• Extensibility with the open sourceand evolve
the toolas per needs. Right fit from feature
perspectivein terms of Data Governance
maturity of the organization.
• Leverage from larger community needs and
also influence internal process when needed.
• Reputation of LinkedIn, their success in Kafka
and datahub scaling internally to LinkedIn
volume
• Promising Roadmap 14
**Disclaimer**:Based on Saxo evaluation criteriaand interpretationofproduct capabilities
DATASET ONBOARDING - RESPONSIBILITIES
Metadata
● Dataset/Data Product Metadata
○ Ownership Information
○ Reader Information
○ Topic Configuration Details
○ Dataset Structure (AVRO Schema)
○ Business Term mapping
○ Source Dataset Definition (Optional)
● Quality Rules
Data Engineering
● Domain Transformations
● (Kafka Stream)
PRODUCERS
Metadata
● Consumer Details
● Usage Details
● Target Dataset details
CONSUMERS
Engineering Capabilities
● Supporting New Domains
● Metadata Integration
● DQ Integration
Kafka PLATFORM-Lean Team
15
Thank You
Q&A?
16
Ad

More Related Content

What's hot (20)

Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Cathrine Wilhelmsen
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
Denodo
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
Swisscom Network Analytics
Swisscom Network AnalyticsSwisscom Network Analytics
Swisscom Network Analytics
confluent
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying data
Hans Verstraeten
 
Elastic-Engineering
Elastic-EngineeringElastic-Engineering
Elastic-Engineering
Araf Karsh Hamid
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDM
Kousik Mukherjee
 
Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...
Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...
Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...
Vishal Pawar
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
DATAVERSITY
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management Functionality
Gartner
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
Denodo
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
DATAVERSITY
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Cathrine Wilhelmsen
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
Denodo
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
Swisscom Network Analytics
Swisscom Network AnalyticsSwisscom Network Analytics
Swisscom Network Analytics
confluent
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying data
Hans Verstraeten
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDM
Kousik Mukherjee
 
Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...
Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...
Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...
Vishal Pawar
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
DATAVERSITY
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management Functionality
Gartner
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
Denodo
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
DATAVERSITY
 

Similar to LinkedInSaxoBankDataWorkbench (20)

5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
MetroStar
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
Denodo
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
Jane Roberts
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Denodo
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
Sun Technologies
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
Digikrit
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
Denodo
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Devon Ziegenfuss
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Julian Tong
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
Denodo
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
DATAVERSITY
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
Denodo
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
MetroStar
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
Denodo
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
Jane Roberts
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Denodo
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
Digikrit
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
Denodo
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Julian Tong
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
Denodo
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
DATAVERSITY
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
Denodo
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
Ad

Recently uploaded (20)

Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Ad

LinkedInSaxoBankDataWorkbench

  • 1. FREE TO SHARE Sheetal Pratik - Saxo Bank August 2020 Data Workbench
  • 2. About Saxo We are leading fintech and regtech specialists, connecting traders, investors and partners to more than 35,000 instruments – across all asset classes – from a single account. What we do We build digital platforms to facilitate multi- asset market access and provide clients of all sizes with professional-grade tools, industry- leading prices and best-in-class service. Data ForScale: Transforming DataAccess,DataGovernance and DataQuality 2
  • 3. • A data driven organization need to have multi-level Data Governance. Most of the tools are designed to fix the fact e.g. before a data warehouse load. What is needed is to ensure data integrity at the origin to prevent the “butterfly effect” in the downstream systems. • The article “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh”, clearly emphasizes on how data platform with a centralized architecture can lead to failures by being bottleneck at certain point and have impact to stability. Also with ownership of data at the domain level, it becomes a failed attempt to manage the data dictionary centrally or duplicate the effort of creating and maintaining such data assets. • Considering this, it is imperative that the solution has to be more futuristic and a straight implementation of any of the COTS products for Data Cataloguing might not be the right answer to Saxo’s Data Governance implementation. • The preferred strategy for tooling is to fix forward rather than attempting to fix the past by using some kind of crawler and using ML to extract the metadata from various data sources. DATA GOVERNANCE FOR A DIGITAL NATIVE 3
  • 4. For DomainTeams Who need visibility onthe availability,meaning, usage,ownershipand quality of data The Data Workbench(Owner's pride Neighbor's envy) Is a one-stopdata shop That provides transparency of Saxo’s data ecosystem Unlike our current state whichis becomingincreasingly complex as we grow Our product will help Saxo to improve time to market andunlock new insights. The Data Workbench is designedto be part of the new data architecture.It consists oftwomaincomponentsa Data Catalogue anda Data QualitySolution. 1. The Data Catalogue captures andexposes metadata. This provides transparency into the meaningand ownershipof ourdata. The Data Catalogue is built on DataHub a data catalogue open-sourcedby LinkedIn. LinkedInis very supportive and are workingclosely withus helping withthe adoptionof the tool. 2. The Data Quality Solutionis built onthe opensource solution Great Expectations supportedbySuperConductive. Great Expectations is a declarative,flexible,andextensible data quality solution. It allows teams to define dataquality rules and actively monitor the quality of their data. VISION 4
  • 5. • Federated Data Governance model is an industry trend where the enterprise governance team facilitates the monitoring and management of the quality of enterprise critical data, with assistance from the business unit. • LinkedIn’s journey of its shift of approach from initial version of Data Governance solution called WhereHows to DataHub , is a typical example of the paradigm shift from “a central metadata repository” solution to a more decentralized architecture that puts domains before anything else to support the possibility of self-service data platform. • We realized that a practical way of implementation would be to stay lean and agile and iteratively work with data domains while establishing the Data Governance framework and thus create a platform that is self-serviced, scalable and more relevant to stakeholders. • We had a discussion with LinkedIn to understand their journey, learnings and lessons learnt that motivated them to evolve from WhereHows to Datahub. We acknowledged that, Saxo Bank is on a similar journey and we can fast forward the implementation by adopting Datahub open sources that best relates to the ecosystem of Saxo Bank. • The LinkedIn datahub team has been extremely responsive. • Other digital natives have also recognized that the incumbent solutions are not fit for the modern age and have built their specific solutions. MOTIVATION FOR THE SOLUTION 5
  • 6. PERSONAS: GOALS AND PAIN POINTSGoalPainPoints Data Asset Owner Data Steward Data Governance Committee Member Data Scientist Data Consumer (Reporting) To find data, its owner of data and anything else that helps compliance To solve business problems based on data To get an overview of whole data in the org To define and document data standards To be responsible and accountable for data Lack of clarity on ownership of data New role in Saxo Bank, so yet to own full responsibility Lack of clarity on what to mandate at what level (federated or data domain level) Data missing/ incomplete See who can explain data elements Not sure if the data can be trusted for making the right decisions 6
  • 7. DATA MESH APPROACH - PRODUCT THINKING 7 DIRECT AGENT/ FLEET ENGAGEMENT SPEED OF DELIVERY Domain Polyglot Output Data Ports Polyglot Data Input Ports Control Ports Logs, metrics Self-serve description Discoverable Addressable Self-describing Trustworthy Interoperable 7
  • 9. STREAM PROCESSING DATA PLATFORM - HIGH LEVEL OVERVIEW 9 ConsumptionDomain Data Source Business Capabilities Business Capabilities Reports Trading Instruments Prices Customers/Parti es Product RX Data Workbench BUSINESS EVENTS STREAM PROCESSING (Enrich/Transform/Aggregate) DATA STORAGE& MANAGEMENT DW DL Storage DATA PRODUCTS Native Data Products Domain Data Products Aggregated Data Products Fit for Purpose Data Products Confluent Kafka Platform Data Catalog: DATAHUB DATA QUALITY Great Expectations: OTHER PROCESSING FRAMEWORKS
  • 10. ● SAXO features list ● SAXO initial evaluation params ● TW extended feature list ● Product documentation ● Software: Local installations ● Vendor questionnaire ● Includes initial & shortlisted list ● Data Catalog ● Data Quality Evaluation Criteria Definition Shortlisted tools Evaluation Process TOOLS EVALUATION PROCESS 10
  • 11. 11 DATA CATALOG TOOL EVALUATION 11 Prioritized Feature List ● Full Text search on dataset name, attributes and tags ● Extensiblesearch model Metadata Search ● Web-based UI to show metadata, governance attributes, tags and lineage ● Ability to edit and enrich attributes Metadata UI ● Push-based REST API ● Pull-based adapters for Snowflake and CRM dynamics ● Extensibility Metadata Ingestion ● Metadata entity for datasets, its users and attributes ● Business glossary & documentation ● Extensibility Metadata Modelling ● Dataset lineagewith upstream and downstream provenance ● Integration with data processing/orchestration tools Data Lineage ● Support for metadata enrichment and tagging ● Ability to flaga dataset Data Stewardship ● Shows related Quality Attributes in UI ● Extensibleto integrate with any DQ tool Data Quality Integration ● Cloud-native(Scalable& High Availability) ● Configurable ● Extensible Architecture ● Data as a Product ● Distributed Domain Driven Architecture ● Self-serviceplatform Alignment with Data Mesh ● Authentication / LDAP ● Authorization / RBAC Security ● Metadata Versioning ● Data Virtualization ● ML/AI capabilities Deprioritized ● Export API Metadata Export ● LicensingCost ● Customization / Development cost Total Cost of Ownership ● Release cycle ● Community support ● Commercial Support ● Documentation Support 11
  • 12. TOOLS LANDSCAPE Data Catalog Collibra Informatica EDC Alatian Data.World Azure Data Catalog - Prev2 Zeenea Apache Atlas Linkedin DataHub Amundsen Marquez Commercial Open Source In House 12
  • 13. TOOLS LANDSCAPE Data Catalog Collibra Informatica EDC Alatian Data.World Azure Data Catalog - Prev2 Zeenea Apache Atlas Linkedin DataHub Amundsen Marquez Commercial Open Source In House 13
  • 14. DATA CATALOG TOOL EVALUATION Deep-dive analysis of the capabilities of shortlisted tools purely as per the teams understaning in Saxo’s context. Datahub Marquez Amundsen Collibra Zeenea Metadata Search * Metadata UI Editable Metadata Ingestion * Metadata Modelling * Data Lineage * Metadata Export * Data Stewardship Data Quality Integration Architecture * Only supports AWS Security * Alignment with Data Mesh * Support Total Cost of Ownership * Completelysuitable Partiallysuitable No/Minimal suitability Commercial Open Source • Push Based approach that supports Event Driven architecture. The solutionbuilt onthe principle of self-service andproducers know theirdata better and they canprovide the rich metadata so that it helps in discover the data-assets andencourages consumption. • DetailedEvaluationwascarriedout fromSaxo perspective. • Possibility of evolution • Extensibility with the open sourceand evolve the toolas per needs. Right fit from feature perspectivein terms of Data Governance maturity of the organization. • Leverage from larger community needs and also influence internal process when needed. • Reputation of LinkedIn, their success in Kafka and datahub scaling internally to LinkedIn volume • Promising Roadmap 14 **Disclaimer**:Based on Saxo evaluation criteriaand interpretationofproduct capabilities
  • 15. DATASET ONBOARDING - RESPONSIBILITIES Metadata ● Dataset/Data Product Metadata ○ Ownership Information ○ Reader Information ○ Topic Configuration Details ○ Dataset Structure (AVRO Schema) ○ Business Term mapping ○ Source Dataset Definition (Optional) ● Quality Rules Data Engineering ● Domain Transformations ● (Kafka Stream) PRODUCERS Metadata ● Consumer Details ● Usage Details ● Target Dataset details CONSUMERS Engineering Capabilities ● Supporting New Domains ● Metadata Integration ● DQ Integration Kafka PLATFORM-Lean Team 15