SlideShare a Scribd company logo
Architect’s Open-Source
Guide for a Data Mesh
Architecture
Lena Hall
Microsoft
Lena Hall
Director at Microsoft
Azure Engineering
ü Architecture
ü Cloud
ü Data
ü ML/AI
lenadroid
Entry Point
How to Move Beyond a Monolithic Data Lake to a Distributed
Data Mesh
https://martinfowler.com/articles/data-monolith-to-mesh.html
Data Mesh Principles and Logical Architecture
https://martinfowler.com/articles/data-mesh-principles.html
Slack for Data-Mesh-Learning
https://launchpass.com/data-mesh-learning
lenadroid
Talk Snapshot
• What is Data Mesh
• When is Data Mesh a Good Idea
• Core Principles and Concepts
• Example: Drone Delivery Service
• Challenges
• OSS and Open Standards
lenadroid
When and Why
Data Mesh
@lenadroid
Data Mesh is Not For Everyone
Challenges Indicating Data Mesh
May Be Considered
@lenadroid
Drone Delivery Service
lenadroid
WHYs
• Ambiguity in Ownership and Responsibility
• Slow Change due to Coupling to Monolithic System
• Data Engineering Resources Bottleneck
lenadroid
Ideas Composing Data Mesh Concept
@lenadroid
Core Ideas
ü Decentralized teams and data ownership
lenadroid
Core Ideas
ü Decentralized teams and data ownership
ü Data Products powered by Domain Driven Design
lenadroid
High-Level View of a Data Product
lenadroid
Core Ideas
ü Decentralized teams and data ownership
ü Data Products powered by Domain Driven Design
ü Self-serve Shared Data Infrastructure
lenadroid
Core Ideas
ü Decentralized teams and data ownership
ü Data Products powered by Domain Driven Design
ü Self-serve Shared Data Infrastructure
ü Global Federated Governance
lenadroid
Drone Delivery Service Data Products
@lenadroid
lenadroid
Core Principles for Data Products
@lenadroid
lenadroid
DISCOVERABLE
Core Principles for Data Products
lenadroid
DISCOVERABLE
SELF-DESCRIBING
Core Principles for Data Products
lenadroid
DISCOVERABLE
SELF-DESCRIBING
ADDRESSABLE
Core Principles for Data Products
lenadroid
DISCOVERABLE
SELF-DESCRIBING
ADDRESSABLE
SECURE
Core Principles for Data Products
lenadroid
DISCOVERABLE
SELF-DESCRIBING
ADDRESSABLE
SECURE
TRUSTWORTHY
Core Principles for Data Products
lenadroid
DISCOVERABLE
SELF-DESCRIBING
ADDRESSABLE
SECURE
TRUSTWORTHY
INTEROPERABLE
Core Principles for Data Products
Input Ports Questions
• Data Source - Where is the data coming from? External dataset or
another data product?
• Data Format - What is the format of the source input?
• Rate of Updates - How frequently does the input need to be updated?
Output Ports Questions
• End-consumers - Who are the end-users of the data product?
• Data purpose - What are they planning to do with the data outputs?
• Data access - Who needs to have access? How do they prefer to access
the data output?
• Data address - How do they prefer to access the data output?
• Data Format - What format of the data do they expect?
Identity and Permission Policies Questions
• Which resources can this data product be allowed to access?
• Which data products or users can read which output ports of this data
product?
• Are all sensitive resources this data product offers protected according
their required privacy standards (e.g. HIPAA, GDPR, PII, CCPA, etc.)
• Is the permissions policy stored and managed in the same package
as the data product?
Data Product Action Questions
• What is the action that needs to happen to produce the outcomes for the
end-users?
• What are the required adjustments, transformations, filters, updates, or
quality improvements to the input data?
Operational Questions
• How can this data product be discovered and how should it be described to
other data products that might want to consume it?
• Which metadata and information should it make available to the end-
users?
• Where and how should data product versioning be managed during
updates to ensure consistency with how the end-users consume it?
• Which SLAs or SLOs does the data product provide?
• Which product success metrics can this data product expose and keep
track of? (adoption, usage, quality)
• Is the automation/resource orchestration logic stored in the same
package?
Other Questions
• Is this product not tightly coupled to any other data source, data product,
or any other resource that makes him not interoperable?
• Does this data product follow the defined global governance standards and
practices defined by the organization?
• Does this data product have any implementation details that could
interfere with its portability?
Cheat Sheet for Planning Data Products lenadroid
Self-Serve Shared Infrastructure
@lenadroid
REAL-TIME DATA
INGESTION
PROCESSING
OBJECT STORAGE
COLUMNAR STORAGE
PROCESSING
COLUMNAR STORAGE
INCOMING
REQUEST
WEB SERVICE
PROCESSING
lenadroid
Types of Workloads Within a Data Product
WEB SERVICE
PROCESSING
lenadroid
It can look like this
Azure Data
Lake
WEB SERVICE
lenadroid
Or, it can look like this
Google
Storage
lenadroid
Self-Serve Shared Infrastructure
SHARED PLATFORM FOR
STREAMING INGESTION
SHARED PLATFORM FOR
RAW DATA STORAGE
SHARED PLATFORM FOR
COLUMNAR DATA STORAGE
SHARED PLATFORM FOR
CONTAINER WORKLOADS
SHARED PLATFORM FOR
CONTINUOUS DELIVERY
SHARED PLATFORM
FOR OBSERVABILITY
AND MORE…
DEPENDING ON THE ORGANIZATION
DATA CATALOGUE
lenadroid
SHARED PLATFORM FOR
CONTAINER WORKLOADS
SHARED PLATFORM FOR
CONTINUOUS DELIVERY
SHARED PLATFORM
FOR OBSERVABILITY
DISCOVERABLE
SELF-DESCRIBING
ADDRESSABLE
SECURE
TRUSTWORTHY
INTEROPERABLE
Data Mesh
SHARED PLATFORM FOR
STREAMING INGESTION
SHARED PLATFORM FOR
RAW DATA STORAGE
SHARED PLATFORM FOR
COLUMNAR DATA STORAGE
DATA CATALOGUE
Wait, What About the OSS Tools for Data Mesh??
@lenadroid
Challenges with Data Mesh
@lenadroid
Challenges
• Cost questions
• Lack of end-to-end examples
• Efforts to shift from centralized architecture to decentralization-
friendly techniques
• Automation required for enabling creating data products
• Underestimating the importance organizational aspects
lenadroid
Considerations for Technology Choices
@lenadroid
Considerations for Technology Choices
• Workload sharing and multi-tenancy
• No-copy data and compute mobility support
• Granularity of access-control
• Richness of automation and extensibility capabilities
• Flexibility and elasticity
• Provider-agnostic/multi-cloud operations support
• Variety of limitations
(quotas, data volume, resource count, etc.)
• Open Standards, Open Protocols, Open-Source Integrations
lenadroid
Examples of Data Mesh-friendly Technologies
@lenadroid
data
Anthos
Azure Arc
Data Catalogue, Data Lineage,
Data Governance
OSS Data Analytics, Data
Processing, Data Querying
Cloud Storage
Open Formats
Data Ingestion, Streaming
Data Orchestration, Workflows
OSS Storage
Products for Data Analytics and
Processing
Data Visualization and BI Tools
Data Experimentation
Cross-Platform Concepts and Tools
Multi and Hybrid Cloud Tools
Amazon S3
Azure Data Lake
Google Storage
Infrastructure Automation
lenadroid
Data Governance Systems
• Metadata
• Data lineage
• Data schemas
• Data relationships
• Data classification
• Data security
• Data catalog
lenadroid
Open Formats
• Open standard
• Atomic updates, serializable isolation, transactions
• Concurrent operations
• Versioning, rollbacks, time-travel
• Schema Evolution
• Scale, Efficiency, Data Volumes
• Compatibility with existing data stores and languages
lenadroid
Data Platforms (Cloud or OSS)
• Separation of storage and compute
• Support for no-copy data sharing
• Bringing compute to data
• Fine-tuned granularity of permissions for access
• Support for automation and resource management
• Open standards and interoperability with other platforms and
tools for governance, visualization, analytics, etc.
lenadroid
Multi-Cloud Infrastructure Management
• Terraform
Open-source infrastructure as code software tool that enables you to safely and
predictably create, change, and improve infrastructure.
• Pulumi
Open-source infrastructure as code SDK that enables you to create, deploy, and
manage infrastructure on any cloud, using your favorite languages.
• Crossplane
Assemble infrastructure from multiple vendors, and expose higher level self-service
APIs for application teams to consume, without having to write any code.
lenadroid
Multi-Cloud Workload Portability
• Azure Arc
Build cloud-native apps anywhere, at scale. Run Azure services in any Kubernetes
environment, whether it’s on-premises, multi-cloud, or at the edge
• Google Athnos
A modern application management platform that provides a consistent development
and operations experience for cloud and on-premises environments
lenadroid
Kubernetes Open-Standard Technologies
• Open Application Model
An open standard for defining cloud native apps.
KubeVella - https://kubevela.io/docs/concepts
• Open Policy Agent
Declarative Policy-as-Code, enables portability, combination with Infra-as-Code.
https://www.openpolicyagent.org/docs/latest
• Service Catalog
Provision managed services and make them available within a Kubernetes cluster.
https://kubernetes.io/docs/concepts/extend-kubernetes/service-catalog/
NOT AN EXHAUSTIVE LIST
lenadroid
Benefits Brought by Data Mesh
• Data Quality
• Tailored resource and focus allocation
• Organizational cohesion while allowing flexibility
• Reducing complexity
• Democratizing creating value
• Better understanding of value and innovation opportunities
• Empowering a more consistent and fast change
@lenadroid
Important Focus Areas for Technology Providers
• Open Standards, Open Protocols, Open-Source Integrations
• Workload sharing and multi-tenancy
• No-copy data and compute mobility support
• Granularity of access-control
• Richness of automation and extensibility capabilities
• Flexibility and elasticity
• Provider-agnostic/multi-cloud operations support
• Variety of limitations
(quotas, data volume, resource count, etc.)
@lenadroid
Data Mesh will drive better Interoperability, Open
Standards, and Data Quality in the Industry
@lenadroid
Thank you!
Follow lenadroid for more insights
Ad

More Related Content

What's hot (20)

Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
HostedbyConfluent
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Tristan Baker
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Dr. Arif Wider
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
DATAVERSITY
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
Databricks
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
JamesAnderson599331
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
Databricks
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Data product thinking-Will the Data Mesh save us from analytics history
Data product thinking-Will the Data Mesh save us from analytics historyData product thinking-Will the Data Mesh save us from analytics history
Data product thinking-Will the Data Mesh save us from analytics history
Rogier Werschkull
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
HostedbyConfluent
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Tristan Baker
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Dr. Arif Wider
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
DATAVERSITY
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
Databricks
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
JamesAnderson599331
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
Databricks
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Data product thinking-Will the Data Mesh save us from analytics history
Data product thinking-Will the Data Mesh save us from analytics historyData product thinking-Will the Data Mesh save us from analytics history
Data product thinking-Will the Data Mesh save us from analytics history
Rogier Werschkull
 

Similar to Architect’s Open-Source Guide for a Data Mesh Architecture (20)

[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
DataScienceConferenc1
 
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Denodo
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
DATAVERSITY
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Denodo
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
Revolution Analytics
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
Denodo
 
The role of Dremio in a data mesh architecture
The role of Dremio in a data mesh architectureThe role of Dremio in a data mesh architecture
The role of Dremio in a data mesh architecture
Paolo Platter
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
Dremio Corporation
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
Saurabh K. Gupta
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
Łukasz Grala
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Memoori
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Connected development data
Connected development dataConnected development data
Connected development data
Rob Worthington
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Durga Gadiraju
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
DataScienceConferenc1
 
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Denodo
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
DATAVERSITY
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Denodo
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
Revolution Analytics
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
Denodo
 
The role of Dremio in a data mesh architecture
The role of Dremio in a data mesh architectureThe role of Dremio in a data mesh architecture
The role of Dremio in a data mesh architecture
Paolo Platter
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
Dremio Corporation
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
Saurabh K. Gupta
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
Łukasz Grala
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Memoori
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Connected development data
Connected development dataConnected development data
Connected development data
Rob Worthington
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Durga Gadiraju
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
Ad

More from Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Ad

Recently uploaded (20)

shit yudh slideshare power likha point presen
shit yudh slideshare power likha point presenshit yudh slideshare power likha point presen
shit yudh slideshare power likha point presen
vishalgurjar11229
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
presentation of first program exist.pptx
presentation of first program exist.pptxpresentation of first program exist.pptx
presentation of first program exist.pptx
MajidAzeemChohan
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
History of Science and Technologyandits source.pptx
History of Science and Technologyandits source.pptxHistory of Science and Technologyandits source.pptx
History of Science and Technologyandits source.pptx
balongcastrojo
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Call illuminati Agent in uganda+256776963507/0741506136
Call illuminati Agent in uganda+256776963507/0741506136Call illuminati Agent in uganda+256776963507/0741506136
Call illuminati Agent in uganda+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
brainstorming-techniques-infographics.pptx
brainstorming-techniques-infographics.pptxbrainstorming-techniques-infographics.pptx
brainstorming-techniques-infographics.pptx
maritzacastro321
 
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptxPRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
JayeshTaneja4
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Induction Program of MTAB online session
Induction Program of MTAB online sessionInduction Program of MTAB online session
Induction Program of MTAB online session
LOHITH886892
 
shit yudh slideshare power likha point presen
shit yudh slideshare power likha point presenshit yudh slideshare power likha point presen
shit yudh slideshare power likha point presen
vishalgurjar11229
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
presentation of first program exist.pptx
presentation of first program exist.pptxpresentation of first program exist.pptx
presentation of first program exist.pptx
MajidAzeemChohan
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
History of Science and Technologyandits source.pptx
History of Science and Technologyandits source.pptxHistory of Science and Technologyandits source.pptx
History of Science and Technologyandits source.pptx
balongcastrojo
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
brainstorming-techniques-infographics.pptx
brainstorming-techniques-infographics.pptxbrainstorming-techniques-infographics.pptx
brainstorming-techniques-infographics.pptx
maritzacastro321
 
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptxPRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
JayeshTaneja4
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Induction Program of MTAB online session
Induction Program of MTAB online sessionInduction Program of MTAB online session
Induction Program of MTAB online session
LOHITH886892
 

Architect’s Open-Source Guide for a Data Mesh Architecture

  • 1. Architect’s Open-Source Guide for a Data Mesh Architecture Lena Hall Microsoft
  • 2. Lena Hall Director at Microsoft Azure Engineering ü Architecture ü Cloud ü Data ü ML/AI lenadroid
  • 3. Entry Point How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh https://martinfowler.com/articles/data-monolith-to-mesh.html Data Mesh Principles and Logical Architecture https://martinfowler.com/articles/data-mesh-principles.html Slack for Data-Mesh-Learning https://launchpass.com/data-mesh-learning lenadroid
  • 4. Talk Snapshot • What is Data Mesh • When is Data Mesh a Good Idea • Core Principles and Concepts • Example: Drone Delivery Service • Challenges • OSS and Open Standards lenadroid
  • 5. When and Why Data Mesh @lenadroid
  • 6. Data Mesh is Not For Everyone
  • 7. Challenges Indicating Data Mesh May Be Considered @lenadroid
  • 9. WHYs • Ambiguity in Ownership and Responsibility • Slow Change due to Coupling to Monolithic System • Data Engineering Resources Bottleneck lenadroid
  • 10. Ideas Composing Data Mesh Concept @lenadroid
  • 11. Core Ideas ü Decentralized teams and data ownership lenadroid
  • 12. Core Ideas ü Decentralized teams and data ownership ü Data Products powered by Domain Driven Design lenadroid
  • 13. High-Level View of a Data Product lenadroid
  • 14. Core Ideas ü Decentralized teams and data ownership ü Data Products powered by Domain Driven Design ü Self-serve Shared Data Infrastructure lenadroid
  • 15. Core Ideas ü Decentralized teams and data ownership ü Data Products powered by Domain Driven Design ü Self-serve Shared Data Infrastructure ü Global Federated Governance lenadroid
  • 16. Drone Delivery Service Data Products @lenadroid
  • 18. Core Principles for Data Products @lenadroid
  • 25. Input Ports Questions • Data Source - Where is the data coming from? External dataset or another data product? • Data Format - What is the format of the source input? • Rate of Updates - How frequently does the input need to be updated? Output Ports Questions • End-consumers - Who are the end-users of the data product? • Data purpose - What are they planning to do with the data outputs? • Data access - Who needs to have access? How do they prefer to access the data output? • Data address - How do they prefer to access the data output? • Data Format - What format of the data do they expect? Identity and Permission Policies Questions • Which resources can this data product be allowed to access? • Which data products or users can read which output ports of this data product? • Are all sensitive resources this data product offers protected according their required privacy standards (e.g. HIPAA, GDPR, PII, CCPA, etc.) • Is the permissions policy stored and managed in the same package as the data product? Data Product Action Questions • What is the action that needs to happen to produce the outcomes for the end-users? • What are the required adjustments, transformations, filters, updates, or quality improvements to the input data? Operational Questions • How can this data product be discovered and how should it be described to other data products that might want to consume it? • Which metadata and information should it make available to the end- users? • Where and how should data product versioning be managed during updates to ensure consistency with how the end-users consume it? • Which SLAs or SLOs does the data product provide? • Which product success metrics can this data product expose and keep track of? (adoption, usage, quality) • Is the automation/resource orchestration logic stored in the same package? Other Questions • Is this product not tightly coupled to any other data source, data product, or any other resource that makes him not interoperable? • Does this data product follow the defined global governance standards and practices defined by the organization? • Does this data product have any implementation details that could interfere with its portability? Cheat Sheet for Planning Data Products lenadroid
  • 27. REAL-TIME DATA INGESTION PROCESSING OBJECT STORAGE COLUMNAR STORAGE PROCESSING COLUMNAR STORAGE INCOMING REQUEST WEB SERVICE PROCESSING lenadroid Types of Workloads Within a Data Product
  • 28. WEB SERVICE PROCESSING lenadroid It can look like this Azure Data Lake
  • 29. WEB SERVICE lenadroid Or, it can look like this Google Storage
  • 30. lenadroid Self-Serve Shared Infrastructure SHARED PLATFORM FOR STREAMING INGESTION SHARED PLATFORM FOR RAW DATA STORAGE SHARED PLATFORM FOR COLUMNAR DATA STORAGE SHARED PLATFORM FOR CONTAINER WORKLOADS SHARED PLATFORM FOR CONTINUOUS DELIVERY SHARED PLATFORM FOR OBSERVABILITY AND MORE… DEPENDING ON THE ORGANIZATION DATA CATALOGUE
  • 31. lenadroid SHARED PLATFORM FOR CONTAINER WORKLOADS SHARED PLATFORM FOR CONTINUOUS DELIVERY SHARED PLATFORM FOR OBSERVABILITY DISCOVERABLE SELF-DESCRIBING ADDRESSABLE SECURE TRUSTWORTHY INTEROPERABLE Data Mesh SHARED PLATFORM FOR STREAMING INGESTION SHARED PLATFORM FOR RAW DATA STORAGE SHARED PLATFORM FOR COLUMNAR DATA STORAGE DATA CATALOGUE
  • 32. Wait, What About the OSS Tools for Data Mesh?? @lenadroid
  • 33. Challenges with Data Mesh @lenadroid
  • 34. Challenges • Cost questions • Lack of end-to-end examples • Efforts to shift from centralized architecture to decentralization- friendly techniques • Automation required for enabling creating data products • Underestimating the importance organizational aspects lenadroid
  • 35. Considerations for Technology Choices @lenadroid
  • 36. Considerations for Technology Choices • Workload sharing and multi-tenancy • No-copy data and compute mobility support • Granularity of access-control • Richness of automation and extensibility capabilities • Flexibility and elasticity • Provider-agnostic/multi-cloud operations support • Variety of limitations (quotas, data volume, resource count, etc.) • Open Standards, Open Protocols, Open-Source Integrations lenadroid
  • 37. Examples of Data Mesh-friendly Technologies @lenadroid
  • 38. data Anthos Azure Arc Data Catalogue, Data Lineage, Data Governance OSS Data Analytics, Data Processing, Data Querying Cloud Storage Open Formats Data Ingestion, Streaming Data Orchestration, Workflows OSS Storage Products for Data Analytics and Processing Data Visualization and BI Tools Data Experimentation Cross-Platform Concepts and Tools Multi and Hybrid Cloud Tools Amazon S3 Azure Data Lake Google Storage Infrastructure Automation lenadroid
  • 39. Data Governance Systems • Metadata • Data lineage • Data schemas • Data relationships • Data classification • Data security • Data catalog lenadroid
  • 40. Open Formats • Open standard • Atomic updates, serializable isolation, transactions • Concurrent operations • Versioning, rollbacks, time-travel • Schema Evolution • Scale, Efficiency, Data Volumes • Compatibility with existing data stores and languages lenadroid
  • 41. Data Platforms (Cloud or OSS) • Separation of storage and compute • Support for no-copy data sharing • Bringing compute to data • Fine-tuned granularity of permissions for access • Support for automation and resource management • Open standards and interoperability with other platforms and tools for governance, visualization, analytics, etc. lenadroid
  • 42. Multi-Cloud Infrastructure Management • Terraform Open-source infrastructure as code software tool that enables you to safely and predictably create, change, and improve infrastructure. • Pulumi Open-source infrastructure as code SDK that enables you to create, deploy, and manage infrastructure on any cloud, using your favorite languages. • Crossplane Assemble infrastructure from multiple vendors, and expose higher level self-service APIs for application teams to consume, without having to write any code. lenadroid
  • 43. Multi-Cloud Workload Portability • Azure Arc Build cloud-native apps anywhere, at scale. Run Azure services in any Kubernetes environment, whether it’s on-premises, multi-cloud, or at the edge • Google Athnos A modern application management platform that provides a consistent development and operations experience for cloud and on-premises environments lenadroid
  • 44. Kubernetes Open-Standard Technologies • Open Application Model An open standard for defining cloud native apps. KubeVella - https://kubevela.io/docs/concepts • Open Policy Agent Declarative Policy-as-Code, enables portability, combination with Infra-as-Code. https://www.openpolicyagent.org/docs/latest • Service Catalog Provision managed services and make them available within a Kubernetes cluster. https://kubernetes.io/docs/concepts/extend-kubernetes/service-catalog/ NOT AN EXHAUSTIVE LIST lenadroid
  • 45. Benefits Brought by Data Mesh • Data Quality • Tailored resource and focus allocation • Organizational cohesion while allowing flexibility • Reducing complexity • Democratizing creating value • Better understanding of value and innovation opportunities • Empowering a more consistent and fast change @lenadroid
  • 46. Important Focus Areas for Technology Providers • Open Standards, Open Protocols, Open-Source Integrations • Workload sharing and multi-tenancy • No-copy data and compute mobility support • Granularity of access-control • Richness of automation and extensibility capabilities • Flexibility and elasticity • Provider-agnostic/multi-cloud operations support • Variety of limitations (quotas, data volume, resource count, etc.) @lenadroid
  • 47. Data Mesh will drive better Interoperability, Open Standards, and Data Quality in the Industry @lenadroid
  • 48. Thank you! Follow lenadroid for more insights