SlideShare a Scribd company logo
Ozone: Evolution of HDFS Scalability to trillions
of file system objects
Dinesh Chitlangia
2 © Cloudera, Inc. All rights reserved.
Credits
● Apache Hadoop community
● Cloudera
● ApacheCON Chicago
3 © Cloudera, Inc. All rights reserved.
Ozone - Why, When, What
Notions
Architecture
Deployment
Ozone - Write & Read path
Using Ozone
Ozone for Enterprise
Q & A
Agenda
4 © Cloudera, Inc. All rights reserved.
Why?
Challenges with HDFS
● Regular users ~ 200M files
● Heavy users ~ 400M+ files
● Make your HDFS healthy day
● Limited scalability of Namespace/Blockspace/Client/RPC
● Future
○ Cloud
○ Streaming
○ Small files are inevitable
5 © Cloudera, Inc. All rights reserved.
When?
When you need scale/cloud and HDFS stalls you.
● Scale - Files/Throughput
● Archival Store / Large Data Store / Dedicated Storage Clusters
● S3 is the new NFS
● Cloud like presence on-prem
● Cannot control small files
● Adopting K8 and need big data capable file system
6 © Cloudera, Inc. All rights reserved.
What?
Spiritual successor to HDFS
● Object Store for Big Data
● Set of Microservices - Divide, Conquer, Scale
● Scale both Objects & IOPS
● Namenode bottleneck is history
● Seamless transition for Yarn, MapReduce, Hive, Spark apps.
● Supports K8s, CSI and ability to run on K8s natively.
7 © Cloudera, Inc. All rights reserved.
Notions
A few fundamentals
● Volumes ~ user accounts
● Buckets ~ directories (no sub-buckets)
● Keys ~ files
● Volume can have many buckets
● Buckets can have many keys
● Key is composed of Blocks, Blocks are further divided into Chunks
● HDDS Notions
○ Containers [Collection of Blocks]
○ Pipeline
8 © Cloudera, Inc. All rights reserved.
Architecture
Ozone’s Microservices - Divide, Conquer, Scale
● Ozone Manager - namespace [~Namenodes]
● Storage Container Managers - blockspace [~BlockServer]
● Recon Server - Control Plane
● S3 Gateway
● Datanodes
9 © Cloudera, Inc. All rights reserved.
Architecture
The Big Picture
10 © Cloudera, Inc. All rights reserved.
Deployment
Variants
11 © Cloudera, Inc. All rights reserved.
Ozone - Write Path
Similar to DFS Write, Blocks are written directly to Datanodes
12 © Cloudera, Inc. All rights reserved.
Ozone - Read Path
Similar to DFS Read, Blocks are read directly from Datanodes
13 © Cloudera, Inc. All rights reserved.
Using Ozone: Is it as painful as HDFS?
We hear you and we have to setup Ozone every time we test.
● Docker
○ docker-compose up -d
○ runs it on local machine
● K8s
○ helm install ozone
● Traditional tarball
○ Untar
○ Run genconfig
○ Update the configurations
● If you are familiar with HDFS commands
○ dfs -ls hdfs://user
● with ozone, it will become
○ dfs -ls o3fs://user
● If you are familiar with S3 commands like
○ aws s3 ls -endpoint=us-west1. /bucketName
● with Ozone s3 it becomes
○ aws s3 ls -endpoint=s3g.local. /bucketName
Setup Usage
14 © Cloudera, Inc. All rights reserved.
Ozone for Enterprise
Designed for Scale
● 10 Billion Keys will be supported in first official release
● Partial Namespace in memory
● Off heap memory usage
● Scale OM/SCM independently, without any disruption
● Create large aggregations of metadata ~ Storage Containers
● Evenly distribute metadata across the cluster including Datanodes
15 © Cloudera, Inc. All rights reserved.
Ozone for Enterprise
Ensuring Correctness & Consistency
● RAFT Consensus Protocol via Apache RATIS
● RocksDB for metadata storage
● Tested with industry recognised off-the-shelf components
○ Blockade Tests - Tests to inject errors/failures in the clusters
○ Tested Apache Spark, YARN, Hive workloads
○ Real world workloads in Apache Spark
○ K8s based clusters, long running clusters, ephemeral clusters
○ S3AFileSystem & similar open source test suites to test S3 Gateway
○ Freon - custom load generator
16 © Cloudera, Inc. All rights reserved.
Ozone for Enterprise
Simplified Security
● Similar to HDFS, relies on Kerberos / Delegation Token / Block Token
● SCM comes with its own Certificate Authority and users DO NOT need to know
about it.
● Kerberos is only needed for OM/SCM, not for datanodes
● Security is on by default, not an afterthought
● Transparent Data Encryption
● Selectively audit READ or WRITE events, switch configs without the need to
restart.
17 © Cloudera, Inc. All rights reserved.
Ozone for Enterprise
High Availability
● Built-in HA
● Single HA Configuration mode
● Regular HA Configuration mode [3 instances of OM/SCM]
18 © Cloudera, Inc. All rights reserved.
Ozone for Enterprise
Road ahead
● Stability & Scale testing
○ TPC-DS, Chaos Monkey, Scale testing with Partners
● Network Topology
● HA Support
● Disk Scanner
● In-place upgrades for HDFS Clusters
● Erasure Coding
● GDPR Compliance
● Consistent Reads from Standby OM/SCM
● Apache Ranger - Ozone Plugin
19 © Cloudera, Inc. All rights reserved.
Ozone for Enterprise
References
https://hadoop.apache.org/ozone/
https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Road+Map
© Cloudera, Inc. All rights reserved.
Q & A
THANK YOU
Ad

More Related Content

What's hot (20)

Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
DataWorks Summit
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
DataWorks Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
Junping Du
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
Manish Chopra
 
HDFS Analysis for Small Files
HDFS Analysis for Small FilesHDFS Analysis for Small Files
HDFS Analysis for Small Files
DataWorks Summit/Hadoop Summit
 
Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACID
Owen O'Malley
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
SpringPeople
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
Heiko Loewe
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
larsgeorge
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Data Guarantees and Fault Tolerance in Streaming Systems
Data Guarantees and Fault Tolerance in Streaming SystemsData Guarantees and Fault Tolerance in Streaming Systems
Data Guarantees and Fault Tolerance in Streaming Systems
DataWorks Summit
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
DataWorks Summit
 
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid AnalyticsTo The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
DataWorks Summit/Hadoop Summit
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
DataWorks Summit
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
DataWorks Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
Junping Du
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
Manish Chopra
 
Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACID
Owen O'Malley
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
SpringPeople
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
Heiko Loewe
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
larsgeorge
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Data Guarantees and Fault Tolerance in Streaming Systems
Data Guarantees and Fault Tolerance in Streaming SystemsData Guarantees and Fault Tolerance in Streaming Systems
Data Guarantees and Fault Tolerance in Streaming Systems
DataWorks Summit
 

Similar to Ozone - Evolution of hdfs scalability (20)

Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
Cloudera, Inc.
 
Red Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed Hat Storage for Mere Mortals
Red Hat Storage for Mere Mortals
Red_Hat_Storage
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
 
Building an Apache Hadoop data application
Building an Apache Hadoop data applicationBuilding an Apache Hadoop data application
Building an Apache Hadoop data application
tomwhite
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
Cloudera, Inc.
 
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use CaseOracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
Orgad Kimchi
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
 
Introduzione alla nuova famiglia di NAS SnapServer
Introduzione alla nuova famiglia di NAS SnapServerIntroduzione alla nuova famiglia di NAS SnapServer
Introduzione alla nuova famiglia di NAS SnapServer
Paolo Rossi
 
Ozone: Evolution of HDFS
Ozone: Evolution of HDFSOzone: Evolution of HDFS
Ozone: Evolution of HDFS
ajay yadav
 
The great 8 of ODA
The great 8 of ODAThe great 8 of ODA
The great 8 of ODA
Guido Boulogne
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News Aggregator
Mário Almeida
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Community
 
DDN Product Update from SC13
DDN Product Update from SC13DDN Product Update from SC13
DDN Product Update from SC13
inside-BigData.com
 
dbaas-clone
dbaas-clonedbaas-clone
dbaas-clone
Jos van den Oord [Oracle DBA,OCM,OCP,RAC,CLOUD]
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
Marc Cluet
 
2018 Infortrend EonStor GSe Pro Family Introduction
2018 Infortrend EonStor GSe Pro Family Introduction2018 Infortrend EonStor GSe Pro Family Introduction
2018 Infortrend EonStor GSe Pro Family Introduction
infortrendgroup
 
Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3
Alluxio, Inc.
 
Core os dna_automacon
Core os dna_automaconCore os dna_automacon
Core os dna_automacon
Patrick Galbraith
 
Oracle cloud storage and file system
Oracle cloud storage and file systemOracle cloud storage and file system
Oracle cloud storage and file system
Andrejs Karpovs
 
Red Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed Hat Storage for Mere Mortals
Red Hat Storage for Mere Mortals
Red_Hat_Storage
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
 
Building an Apache Hadoop data application
Building an Apache Hadoop data applicationBuilding an Apache Hadoop data application
Building an Apache Hadoop data application
tomwhite
 
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use CaseOracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
Orgad Kimchi
 
Introduzione alla nuova famiglia di NAS SnapServer
Introduzione alla nuova famiglia di NAS SnapServerIntroduzione alla nuova famiglia di NAS SnapServer
Introduzione alla nuova famiglia di NAS SnapServer
Paolo Rossi
 
Ozone: Evolution of HDFS
Ozone: Evolution of HDFSOzone: Evolution of HDFS
Ozone: Evolution of HDFS
ajay yadav
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News Aggregator
Mário Almeida
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Community
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
Marc Cluet
 
2018 Infortrend EonStor GSe Pro Family Introduction
2018 Infortrend EonStor GSe Pro Family Introduction2018 Infortrend EonStor GSe Pro Family Introduction
2018 Infortrend EonStor GSe Pro Family Introduction
infortrendgroup
 
Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3
Alluxio, Inc.
 
Oracle cloud storage and file system
Oracle cloud storage and file systemOracle cloud storage and file system
Oracle cloud storage and file system
Andrejs Karpovs
 
Ad

Recently uploaded (20)

Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Ad

Ozone - Evolution of hdfs scalability

  • 1. Ozone: Evolution of HDFS Scalability to trillions of file system objects Dinesh Chitlangia
  • 2. 2 © Cloudera, Inc. All rights reserved. Credits ● Apache Hadoop community ● Cloudera ● ApacheCON Chicago
  • 3. 3 © Cloudera, Inc. All rights reserved. Ozone - Why, When, What Notions Architecture Deployment Ozone - Write & Read path Using Ozone Ozone for Enterprise Q & A Agenda
  • 4. 4 © Cloudera, Inc. All rights reserved. Why? Challenges with HDFS ● Regular users ~ 200M files ● Heavy users ~ 400M+ files ● Make your HDFS healthy day ● Limited scalability of Namespace/Blockspace/Client/RPC ● Future ○ Cloud ○ Streaming ○ Small files are inevitable
  • 5. 5 © Cloudera, Inc. All rights reserved. When? When you need scale/cloud and HDFS stalls you. ● Scale - Files/Throughput ● Archival Store / Large Data Store / Dedicated Storage Clusters ● S3 is the new NFS ● Cloud like presence on-prem ● Cannot control small files ● Adopting K8 and need big data capable file system
  • 6. 6 © Cloudera, Inc. All rights reserved. What? Spiritual successor to HDFS ● Object Store for Big Data ● Set of Microservices - Divide, Conquer, Scale ● Scale both Objects & IOPS ● Namenode bottleneck is history ● Seamless transition for Yarn, MapReduce, Hive, Spark apps. ● Supports K8s, CSI and ability to run on K8s natively.
  • 7. 7 © Cloudera, Inc. All rights reserved. Notions A few fundamentals ● Volumes ~ user accounts ● Buckets ~ directories (no sub-buckets) ● Keys ~ files ● Volume can have many buckets ● Buckets can have many keys ● Key is composed of Blocks, Blocks are further divided into Chunks ● HDDS Notions ○ Containers [Collection of Blocks] ○ Pipeline
  • 8. 8 © Cloudera, Inc. All rights reserved. Architecture Ozone’s Microservices - Divide, Conquer, Scale ● Ozone Manager - namespace [~Namenodes] ● Storage Container Managers - blockspace [~BlockServer] ● Recon Server - Control Plane ● S3 Gateway ● Datanodes
  • 9. 9 © Cloudera, Inc. All rights reserved. Architecture The Big Picture
  • 10. 10 © Cloudera, Inc. All rights reserved. Deployment Variants
  • 11. 11 © Cloudera, Inc. All rights reserved. Ozone - Write Path Similar to DFS Write, Blocks are written directly to Datanodes
  • 12. 12 © Cloudera, Inc. All rights reserved. Ozone - Read Path Similar to DFS Read, Blocks are read directly from Datanodes
  • 13. 13 © Cloudera, Inc. All rights reserved. Using Ozone: Is it as painful as HDFS? We hear you and we have to setup Ozone every time we test. ● Docker ○ docker-compose up -d ○ runs it on local machine ● K8s ○ helm install ozone ● Traditional tarball ○ Untar ○ Run genconfig ○ Update the configurations ● If you are familiar with HDFS commands ○ dfs -ls hdfs://user ● with ozone, it will become ○ dfs -ls o3fs://user ● If you are familiar with S3 commands like ○ aws s3 ls -endpoint=us-west1. /bucketName ● with Ozone s3 it becomes ○ aws s3 ls -endpoint=s3g.local. /bucketName Setup Usage
  • 14. 14 © Cloudera, Inc. All rights reserved. Ozone for Enterprise Designed for Scale ● 10 Billion Keys will be supported in first official release ● Partial Namespace in memory ● Off heap memory usage ● Scale OM/SCM independently, without any disruption ● Create large aggregations of metadata ~ Storage Containers ● Evenly distribute metadata across the cluster including Datanodes
  • 15. 15 © Cloudera, Inc. All rights reserved. Ozone for Enterprise Ensuring Correctness & Consistency ● RAFT Consensus Protocol via Apache RATIS ● RocksDB for metadata storage ● Tested with industry recognised off-the-shelf components ○ Blockade Tests - Tests to inject errors/failures in the clusters ○ Tested Apache Spark, YARN, Hive workloads ○ Real world workloads in Apache Spark ○ K8s based clusters, long running clusters, ephemeral clusters ○ S3AFileSystem & similar open source test suites to test S3 Gateway ○ Freon - custom load generator
  • 16. 16 © Cloudera, Inc. All rights reserved. Ozone for Enterprise Simplified Security ● Similar to HDFS, relies on Kerberos / Delegation Token / Block Token ● SCM comes with its own Certificate Authority and users DO NOT need to know about it. ● Kerberos is only needed for OM/SCM, not for datanodes ● Security is on by default, not an afterthought ● Transparent Data Encryption ● Selectively audit READ or WRITE events, switch configs without the need to restart.
  • 17. 17 © Cloudera, Inc. All rights reserved. Ozone for Enterprise High Availability ● Built-in HA ● Single HA Configuration mode ● Regular HA Configuration mode [3 instances of OM/SCM]
  • 18. 18 © Cloudera, Inc. All rights reserved. Ozone for Enterprise Road ahead ● Stability & Scale testing ○ TPC-DS, Chaos Monkey, Scale testing with Partners ● Network Topology ● HA Support ● Disk Scanner ● In-place upgrades for HDFS Clusters ● Erasure Coding ● GDPR Compliance ● Consistent Reads from Standby OM/SCM ● Apache Ranger - Ozone Plugin
  • 19. 19 © Cloudera, Inc. All rights reserved. Ozone for Enterprise References https://hadoop.apache.org/ozone/ https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Road+Map
  • 20. © Cloudera, Inc. All rights reserved. Q & A