SlideShare a Scribd company logo
Scaling HDFS at Xiaomi
Chen Zhang
Outline
• Introduction of Xiaomi
• Scenarios and challenges
• Improvements on HDFS federation
• Experience on scaling up single NameNode
• Efficient management of hundreds of clusters
About Xiaomi
World’s 4th largest
smartphone maker
Sold 118 Million
phones in 2018
About Xiaomi
World’s Largest
consumer IoT platform
Over 150 Million
smart devices connected
Software and Internet Services
MIUI MiPay/Finance
App Market Ads
MiCloud Game
MiPush Smart Home
News Feeds …
Scenarios
HDFS
HBase
EMQ
Yarn Talos
FDS(S3) Spark HiveImpala
Scenarios
Micloud
MiPush Feeds User
Profile
Talos
Ads
Online Services
• 100+ Independent Clusters
• Low Latency
• High availability
Offline
Services
Hadoop
• Several Huge Clusters
• High throughput
• High Scalability, High availability
Data Growth
2
23
41
71
3
30
60
150
0
20
40
60
80
100
120
140
160
2015 2016 2017 2018
Data Growth of The Largest Cluster
File counts (10 million) Data Size (PB)
Challenges
• Challenges at late 2016
data growth is too fast dependency is too complex code change is almost impossible
What We Need
We need A Huge Single HDFS
Cluster
Improvements on HDFS Federation
• Problem of HDFS Federation at late 2016
– NameNode are independent, metadata is not shared
– Client side MountTable config, hard to maintain
– MountTable don’t support nesting mount-point
– ViewFileSystem is not compatible with DistributedFileSystem
– RBF is not stable and not fully functioning at late 2016
Improvements on HDFS Federation
viewfs
Pool 1 Pool nPool k
Block Pools
Datanode 1
…
Datanode 2
…
Datanode m
…
NS 1 NS k
Foreign
NS n
Common Storage
NN-1 NN-k NN-n
… …
BlockStorageNamespace
Original HDFS Federation
user
/
yarn hive
service1 service2
small
dir1
small
dir2
small
service2
small
service1
…
…
Improvements on HDFS Federation
viewfs
Pool 1 Pool nPool k
Block Pools
Datanode 2
…
Datanode 3
…
Datanode m
…
NS 1 NS k
Foreign
NS n
Common Storage
NN-1 NN-k NN-n
… …
BlockStorageNamespace
Support Nested MountPoints
Pool 1
NS 0
NN-0
…
Datanode 1
…
user
/
yarn hive
service1 service2
hdfs:// -> FederatedDFSFileSystem
extends DistributedFileSystem
Add Default NameSpace
Support rename across NameSpaces
Compatible with hdfs://, don’t need
to change any code
Update MountTable Config from ZK
Nested Mount table and Default NameSpace
1. Xiaomi is not only a hardware company, also an Internet
company, which develops very fast
2. There are more than 100 internet services, the new business and
services emerges quickly, based on our smart devices and more
than 300 million users
3. It’s hard for us to use a fixed mount table which is pre-divided
NN-1 NN-k NN-nNN-0
user
/
yarn hive
service1 service2
Nested Mount table and Default NameSpace
/some_new_nosql_service
/user/live_show_services
/user/short_video_services
1. At First, we divide the initial mount
point by data amount and QPS. Only
need to config a dozen of mountpoints
for the largest services, others fall into
the default NameSpaces
2. When new infrastructure-services and
internet-services emerges, the whole
mount table don’t need any updates
3. HADOOP-13055 supports linkFallback,
but our solution is more flexible
NS 1 NS kNS 0 NS n
Client Transparency
ViewFileSystem
FederatedDFSFilesystem
/user/service1 /user/service2in
fs.hdfs.impl=FederatedDFSFileSyste
m
hdfs://clustername/user/service1
access config
ZooKeeper fetch mounttable
watch
Admin
Tool
update
Client Transparency
RPC integration
• listStatus
• getContentSummary
• setQuota/getQuota
Admin Tools
• refreshNodes
• setBalancerBandwidth
• DataNode decommission
NN-1 NN-k NN-nNN-0
user
/
yarn hive
service1 service2
NS 1 NS kNS 0 NS n
/user/service1/.Trash/
Trash optimization
• moveToTrash is an rename operation
• moveToTrash across namenode is very
expensive
Rename Across NameSpaces
Client
locked
hardlink
namenode1 namenode2
datanode1 datanode2 datanode3
blockpool1
blockpool2
Link block
Rename Across NameSpaces in Detail
Source Phase 1
1. Sanity Check.
• Existence
• Permission
• Can’t be reserved directory
• Can’t be symlink
• Not in encryption zones
2. Serialize the inode-tree and blocks
information with ProtoBuf
• Name
• Permissions
• mtime/atime
• Replication factor
• Block locations
• Acl / Xattr / Quota …
Rename Across NameSpaces in Detail
Source Phase 1
3. Lock the directory
• Add a FederationRenameFeature. Record the information about renameId, source
and destination path
• With FederationRenameFeature, all sub-directories and files in this directory, and all
inodes in the parent path, is not writable
4. Add a federation-rename record
5. Return the serialized data to client
Rename Across NameSpaces in Detail
Dest Phase 1
1. Sanity Check
• permission, quota, not in encryption zones
2. Deserialize the inode-tree, graft it to the destination path
• Allocate inode id for each inode
• Allocate block id and new GS for each block
• Update acl and other features
Rename Across NameSpaces in Detail
Dest Phase 1
3. Lock the directory
• Also use FederationRenameFeature
4. Update quota count
5. Add a federation-rename record
6. Return a list of block information, inclouding:
• srcBlockId, destBlockId, blockSize, srcGenStamp, destGenStamp for each block
Rename Across NameSpaces in Detail
Link Block
1. For each DN, send request in batch
• Create new block file by hardlink, one by one
• With a total operation timeout
2. Using a ThreadPoolExecutor
3. For each block, count as complete if at least 2/3 replicas succeed
• Slow DN will not affect the total progress
Rename Across NameSpaces in Detail
Source Phase2
1. Delete the source directory/file
2. Delete all the inodes and blocks asyncronizely
3. Remove federation-rename record
Dest Phase2
1. Remove FedeartionRenameFeature, make the target directory
visible
2. Remove federation-rename record
Error Handling
Failed at How to Handle Result
Source Phase 1 Fail Fail
Dest Phase 1 Cancel source-phase1 Fail
Link Block
Request Fail
NameNode Fixer will redo the remaining steps
Will succeed
finally
Source Phase 2
Request Fail
NameNode Fixer will redo the remaining steps
Will succeed
finally
Dest Phase 2
Request Fail
NameNode Fixer will redo the remaining steps
Will succeed
finally
Error Handling
NameNode Failover and Restart
1. All operation have editlog
2. FederationRenameFeature will serialized to FsImage
3. Federation-rename records won’t serialized to FsImage, rebuild
from log replay or FsImage loading ( if some inode have
FederationRenameFeature, then add a Federation-rename record)
Scaling up NameNodes
Our Largest NameNode
1. 150GB heap
2. Use CMS GC
3. More than 500 million objects (240 million files and 260 million
blocks)
4. More than 20000 QPS
Scaling up NameNodes
Experience
• Throttle
– BlockReport / Incremental-BlockReport throttle
– Concurrent GetContentSummary throttle
• Lock optimization
• Config optimization
• Add more tracing information
Block Report Throttle
• Problem:Full GC when NameNode Startup
NameNode
60%
DN
DN
DN
DN
DN
Thousands of DN Block Report
at almost same time
DN
DN
DN
DN
DN
NameNode could only
process one block report
one time
Throttle the max concurrent
block reports, extra reports
will be rejected, and DN will
retry later
Other optimization
• Lock Optimization on exhausting operations
– When processing block report, release and re-gain the lock for every storage
– When processing getContentSummary, release the lock every N files
• Config optimization
– More handlers
– Longer heart-beat interval
– Longer full block report interval
– disable retry-cache and access-time
More tracing information
• Record Operations that hold the FSNamesystem lock too long
• Record QPS monitor on both server-side and client-side, push these
data to our internal monitor system
• Record failure reason and statistics of block allocation failure
• Add log for slow block report processing
How We Efficiently Manage 100+ Clusters
• We use HBase heavily in Xiaomi
• 20~30 HBase clusters for sensitive services and businesses in each
datacenter
• With the rapid growth of the global business, now there are more than 5
datacenters distributed in the whole world
• The number of total clusters also grows very quickly, make it hard to
maintain
How We Efficiently Manage 100+ Clusters
• Initially…
cluster-1
Canary
cluster-2
Canary
cluster-3
Canary
cluster-n
Canary
Efficiently manage 100+ clusters
cluster-1
Canary Task
cluster-2 cluster-3 cluster-n
ClustrerOne Monitor System
Canary Task
Canary Task
Balancer Task
Balancer Task
Balancer Task
ZooKeeper
NameService
metrics
generated
configuration
Q&A
Ad

More Related Content

What's hot (20)

Running Analytics at the Speed of Your Business
Running Analytics at the Speed of Your BusinessRunning Analytics at the Speed of Your Business
Running Analytics at the Speed of Your Business
Redis Labs
 
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
Michael Stack
 
HBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanHBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at Meituan
Michael Stack
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
DataWorks Summit/Hadoop Summit
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at Xiaomi
Michael Stack
 
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraCisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
DataStax Academy
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power Systems
Redis Labs
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
DataStax
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Redis Labs
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
DataWorks Summit
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
Michael Stack
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Michael Stack
 
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
Aerospike, Inc.
 
Building Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStaxBuilding Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStax
DataStax
 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
Michael Stack
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
Cloudera, Inc.
 
Red Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and FutureRed Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and Future
Red_Hat_Storage
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
DataWorks Summit
 
Running Analytics at the Speed of Your Business
Running Analytics at the Speed of Your BusinessRunning Analytics at the Speed of Your Business
Running Analytics at the Speed of Your Business
Redis Labs
 
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
Michael Stack
 
HBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanHBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at Meituan
Michael Stack
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at Xiaomi
Michael Stack
 
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraCisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
DataStax Academy
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power Systems
Redis Labs
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
DataStax
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Redis Labs
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
DataWorks Summit
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
Michael Stack
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Michael Stack
 
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
Aerospike, Inc.
 
Building Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStaxBuilding Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStax
DataStax
 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
Michael Stack
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
Cloudera, Inc.
 
Red Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and FutureRed Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and Future
Red_Hat_Storage
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
DataWorks Summit
 

Similar to Scaling HDFS at Xiaomi (20)

HUG_Ireland_BryanQuinnPresentation_20160111
HUG_Ireland_BryanQuinnPresentation_20160111HUG_Ireland_BryanQuinnPresentation_20160111
HUG_Ireland_BryanQuinnPresentation_20160111
John Mulhall
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
Alluxio, Inc.
 
Cloud stack overview
Cloud stack overviewCloud stack overview
Cloud stack overview
howie YU
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
Nitin Mehta
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
Alluxio, Inc.
 
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
confluent
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
kawamuray
 
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackAdam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
ShapeBlue
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt
omalreda
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Erik Krogen
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
QAware GmbH
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Lucidworks
 
2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview
Dimas Prasetyo
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
Alexandra Sasha Blumenfeld
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Nati Shalom
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDS
ScyllaDB
 
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon
 
FILES IN TODAY’S WORLD - #MFSummit2017
FILES IN TODAY’S WORLD - #MFSummit2017FILES IN TODAY’S WORLD - #MFSummit2017
FILES IN TODAY’S WORLD - #MFSummit2017
Micro Focus
 
HUG_Ireland_BryanQuinnPresentation_20160111
HUG_Ireland_BryanQuinnPresentation_20160111HUG_Ireland_BryanQuinnPresentation_20160111
HUG_Ireland_BryanQuinnPresentation_20160111
John Mulhall
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
Alluxio, Inc.
 
Cloud stack overview
Cloud stack overviewCloud stack overview
Cloud stack overview
howie YU
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
Nitin Mehta
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
Alluxio, Inc.
 
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
confluent
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
kawamuray
 
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackAdam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
ShapeBlue
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt
omalreda
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Erik Krogen
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
QAware GmbH
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Lucidworks
 
2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview
Dimas Prasetyo
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
Alexandra Sasha Blumenfeld
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Nati Shalom
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDS
ScyllaDB
 
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon
 
FILES IN TODAY’S WORLD - #MFSummit2017
FILES IN TODAY’S WORLD - #MFSummit2017FILES IN TODAY’S WORLD - #MFSummit2017
FILES IN TODAY’S WORLD - #MFSummit2017
Micro Focus
 
Ad

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
DataWorks Summit
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart Cities
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
DataWorks Summit
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart Cities
DataWorks Summit
 
Ad

Recently uploaded (20)

Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 

Scaling HDFS at Xiaomi

  • 1. Scaling HDFS at Xiaomi Chen Zhang
  • 2. Outline • Introduction of Xiaomi • Scenarios and challenges • Improvements on HDFS federation • Experience on scaling up single NameNode • Efficient management of hundreds of clusters
  • 3. About Xiaomi World’s 4th largest smartphone maker Sold 118 Million phones in 2018
  • 4. About Xiaomi World’s Largest consumer IoT platform Over 150 Million smart devices connected
  • 5. Software and Internet Services MIUI MiPay/Finance App Market Ads MiCloud Game MiPush Smart Home News Feeds …
  • 7. Scenarios Micloud MiPush Feeds User Profile Talos Ads Online Services • 100+ Independent Clusters • Low Latency • High availability Offline Services Hadoop • Several Huge Clusters • High throughput • High Scalability, High availability
  • 8. Data Growth 2 23 41 71 3 30 60 150 0 20 40 60 80 100 120 140 160 2015 2016 2017 2018 Data Growth of The Largest Cluster File counts (10 million) Data Size (PB)
  • 9. Challenges • Challenges at late 2016 data growth is too fast dependency is too complex code change is almost impossible
  • 10. What We Need We need A Huge Single HDFS Cluster
  • 11. Improvements on HDFS Federation • Problem of HDFS Federation at late 2016 – NameNode are independent, metadata is not shared – Client side MountTable config, hard to maintain – MountTable don’t support nesting mount-point – ViewFileSystem is not compatible with DistributedFileSystem – RBF is not stable and not fully functioning at late 2016
  • 12. Improvements on HDFS Federation viewfs Pool 1 Pool nPool k Block Pools Datanode 1 … Datanode 2 … Datanode m … NS 1 NS k Foreign NS n Common Storage NN-1 NN-k NN-n … … BlockStorageNamespace Original HDFS Federation user / yarn hive service1 service2 small dir1 small dir2 small service2 small service1 … …
  • 13. Improvements on HDFS Federation viewfs Pool 1 Pool nPool k Block Pools Datanode 2 … Datanode 3 … Datanode m … NS 1 NS k Foreign NS n Common Storage NN-1 NN-k NN-n … … BlockStorageNamespace Support Nested MountPoints Pool 1 NS 0 NN-0 … Datanode 1 … user / yarn hive service1 service2 hdfs:// -> FederatedDFSFileSystem extends DistributedFileSystem Add Default NameSpace Support rename across NameSpaces Compatible with hdfs://, don’t need to change any code Update MountTable Config from ZK
  • 14. Nested Mount table and Default NameSpace 1. Xiaomi is not only a hardware company, also an Internet company, which develops very fast 2. There are more than 100 internet services, the new business and services emerges quickly, based on our smart devices and more than 300 million users 3. It’s hard for us to use a fixed mount table which is pre-divided
  • 15. NN-1 NN-k NN-nNN-0 user / yarn hive service1 service2 Nested Mount table and Default NameSpace /some_new_nosql_service /user/live_show_services /user/short_video_services 1. At First, we divide the initial mount point by data amount and QPS. Only need to config a dozen of mountpoints for the largest services, others fall into the default NameSpaces 2. When new infrastructure-services and internet-services emerges, the whole mount table don’t need any updates 3. HADOOP-13055 supports linkFallback, but our solution is more flexible NS 1 NS kNS 0 NS n
  • 17. Client Transparency RPC integration • listStatus • getContentSummary • setQuota/getQuota Admin Tools • refreshNodes • setBalancerBandwidth • DataNode decommission NN-1 NN-k NN-nNN-0 user / yarn hive service1 service2 NS 1 NS kNS 0 NS n /user/service1/.Trash/ Trash optimization • moveToTrash is an rename operation • moveToTrash across namenode is very expensive
  • 18. Rename Across NameSpaces Client locked hardlink namenode1 namenode2 datanode1 datanode2 datanode3 blockpool1 blockpool2 Link block
  • 19. Rename Across NameSpaces in Detail Source Phase 1 1. Sanity Check. • Existence • Permission • Can’t be reserved directory • Can’t be symlink • Not in encryption zones 2. Serialize the inode-tree and blocks information with ProtoBuf • Name • Permissions • mtime/atime • Replication factor • Block locations • Acl / Xattr / Quota …
  • 20. Rename Across NameSpaces in Detail Source Phase 1 3. Lock the directory • Add a FederationRenameFeature. Record the information about renameId, source and destination path • With FederationRenameFeature, all sub-directories and files in this directory, and all inodes in the parent path, is not writable 4. Add a federation-rename record 5. Return the serialized data to client
  • 21. Rename Across NameSpaces in Detail Dest Phase 1 1. Sanity Check • permission, quota, not in encryption zones 2. Deserialize the inode-tree, graft it to the destination path • Allocate inode id for each inode • Allocate block id and new GS for each block • Update acl and other features
  • 22. Rename Across NameSpaces in Detail Dest Phase 1 3. Lock the directory • Also use FederationRenameFeature 4. Update quota count 5. Add a federation-rename record 6. Return a list of block information, inclouding: • srcBlockId, destBlockId, blockSize, srcGenStamp, destGenStamp for each block
  • 23. Rename Across NameSpaces in Detail Link Block 1. For each DN, send request in batch • Create new block file by hardlink, one by one • With a total operation timeout 2. Using a ThreadPoolExecutor 3. For each block, count as complete if at least 2/3 replicas succeed • Slow DN will not affect the total progress
  • 24. Rename Across NameSpaces in Detail Source Phase2 1. Delete the source directory/file 2. Delete all the inodes and blocks asyncronizely 3. Remove federation-rename record Dest Phase2 1. Remove FedeartionRenameFeature, make the target directory visible 2. Remove federation-rename record
  • 25. Error Handling Failed at How to Handle Result Source Phase 1 Fail Fail Dest Phase 1 Cancel source-phase1 Fail Link Block Request Fail NameNode Fixer will redo the remaining steps Will succeed finally Source Phase 2 Request Fail NameNode Fixer will redo the remaining steps Will succeed finally Dest Phase 2 Request Fail NameNode Fixer will redo the remaining steps Will succeed finally
  • 26. Error Handling NameNode Failover and Restart 1. All operation have editlog 2. FederationRenameFeature will serialized to FsImage 3. Federation-rename records won’t serialized to FsImage, rebuild from log replay or FsImage loading ( if some inode have FederationRenameFeature, then add a Federation-rename record)
  • 27. Scaling up NameNodes Our Largest NameNode 1. 150GB heap 2. Use CMS GC 3. More than 500 million objects (240 million files and 260 million blocks) 4. More than 20000 QPS
  • 28. Scaling up NameNodes Experience • Throttle – BlockReport / Incremental-BlockReport throttle – Concurrent GetContentSummary throttle • Lock optimization • Config optimization • Add more tracing information
  • 29. Block Report Throttle • Problem:Full GC when NameNode Startup NameNode 60% DN DN DN DN DN Thousands of DN Block Report at almost same time DN DN DN DN DN NameNode could only process one block report one time Throttle the max concurrent block reports, extra reports will be rejected, and DN will retry later
  • 30. Other optimization • Lock Optimization on exhausting operations – When processing block report, release and re-gain the lock for every storage – When processing getContentSummary, release the lock every N files • Config optimization – More handlers – Longer heart-beat interval – Longer full block report interval – disable retry-cache and access-time
  • 31. More tracing information • Record Operations that hold the FSNamesystem lock too long • Record QPS monitor on both server-side and client-side, push these data to our internal monitor system • Record failure reason and statistics of block allocation failure • Add log for slow block report processing
  • 32. How We Efficiently Manage 100+ Clusters • We use HBase heavily in Xiaomi • 20~30 HBase clusters for sensitive services and businesses in each datacenter • With the rapid growth of the global business, now there are more than 5 datacenters distributed in the whole world • The number of total clusters also grows very quickly, make it hard to maintain
  • 33. How We Efficiently Manage 100+ Clusters • Initially… cluster-1 Canary cluster-2 Canary cluster-3 Canary cluster-n Canary
  • 34. Efficiently manage 100+ clusters cluster-1 Canary Task cluster-2 cluster-3 cluster-n ClustrerOne Monitor System Canary Task Canary Task Balancer Task Balancer Task Balancer Task ZooKeeper NameService metrics generated configuration
  • 35. Q&A

Editor's Notes

  • #2: introduce my self Today I’ll share some works we did on scaling HDFS spoken English
  • #4: investigation of xiaomi phone sales main market is india and china, also have good market share at southeast aisa and euroupe not in America
  • #5: IoT sales a variety of smart-devices it sales very well in china
  • #6: based on these phones and devices, we build lots of internet services and business these are most import part of them
  • #7: for this page, most services are well-known, I would introduce some of services that developed by us Talos is a data integration and distribution system FDS is an object storage system, which is quite similar with AWS S3, EMQ is a cloud message queue, which is also similar with AWS EMQ
  • #8: our clusters could be divided into 2 part, online vs offline these 2 scenarios is quite different, which brings us different challenges for online services, most HDFS clusters is deployed for hbase, we use hbase heavily, there are more than 100 online hdfs clusters and more than 3000 nodes the biggest challenge for online cluster is latency, especially the impaction of slow nodes and slow disk this part is not belong to this session, I’ll not introduce them in detail on the other hand, for offline analysis, we build several huge clusters, for these clusters ,the biggest challenge is scalability, which is how to serve more data and files
  • #9: let take a look at the data growth this is the chart for our largest cluster 4 years ago by the end of last year everybody knows what this means to hdfs cluster single namenode is hard to serve so many data
  • #10: with the repaid growth, we meet the scalability in 2016 after a bounch of work, we successfully make namenode become stable, but it will not last for a long time, we have to enable federation but the dependency is too complex and it’s almost impossible to divide these data into different namespaces it’s also very hard for us to ask users change their code to use viewfs
  • #11: So the only way makes sense for us is to build a huge single cluster more accurately, we need to modify federation to make it works like a single hdfs cluster how we did that let’s first take a look at the defects of federation
  • #13: in this solution, for every directory you need to assign a namespace, you have to add a mountpoint
  • #14: if the path is not in mount-table, then it will be mapped to one of the default namespaces in addition, to make the federation works like a single cluster, we support rename across namenode to avoid the code change, we created a new filesystem that wrapped viewfs in it in the last, we move the mountable to zookeeper and can update it automatically, so user don’t need to worry about the mount table this is the whole solution of us to make Federation works as a single cluster, in the next, I’ll introduce each part in details
  • #17: first, we create a wrapper FileSystem, it’s extended from DistributedFileSystem our users don’t need to change any code, just update some configs when the client initialing, it will fetch mounttable from zk in addition, we add a watcher, so clients can get the latest config anytime when they update at last, we made a admin tool to operate the mounttable config on zookeeper
  • #18: to make the federation transparently to user, still a lot of works to do here is some of them another improvement that worth to mention is the trash optimization by default , every user have only one trash folder, and since movetotrash is a rename operation and we support rename across namenode. a user delete operation on other namespaces may cause a rename across namenode. this operation’s cost is high. we don’t want it be triggered too frequently by removing trash data, so we did some optimization
  • #19: I’ll first introduce the overview, and then introduce some details it’s very complex, I’ll try to explain it as clear as I can there are 5 steps to complete a federation-rename
  • #28: Ok, the next is some experience of tuning a single namenode
  • #30: let me show the reason first, let‘s assume in the normal case, heap usage is 60%. when NN restart, it start receiving a lot of blockreport other blockreports that waiting proceed is stored in memory, the report speed is much higher than the processing speed, so the reports in memory keep accumulating until the heap is full.