SlideShare a Scribd company logo
HBase and HDFS: Past, Present,
           Future
                 Todd Lipcon
              todd@cloudera.com
Twitter: @tlipcon      #hbase IRC: tlipcon




              May 22, 2012
Intro / who am I?
        Been working on data stuff for a few years
        HBase, HDFS, MR committer
        Cloudera engineer since March ’09




         (a) My posts to hbase-dev    (b) My posts to
                                      (core|hdfs|mapreduce)-dev


                                                                   A
  You know I’m an engineer since my slides are ugly and written in LTEX
Framework for discussion
     Time periods
         Past (Hadoop pre-1.0)
         Present (Hadoop 1.x, 2.0)
         Future (Hadoop 2.x and later)

     Categories
         Reliability/Availability
         Performance
         Feature set
HDFS and HBase History - 2006
  Author: Douglass Cutting <cutting@apache.org>
  Date:   Fri Jan 27 22:19:42 2006 +0000

      Create hadoop sub-project.
HDFS and HBase History - 2007
  Author: Douglass Cutting <cutting@apache.org>
  Date:   Tue Apr 3 20:34:28 2007 +0000

      HADOOP-1045. Add contrib/hbase, a
      BigTable-like online database.
HDFS and HBase History - 2008
  Author: Jim Kellerman <jimk@apache.org>
  Date:   Tue Feb 5 02:36:26 2008 +0000

      2008/02/04 HBase is now a subproject of Hadoop.
      The first HBase release as a subproject will be
      release 0.1.0 which will be equivalent to the
      version of HBase included in Hadoop 0.16.0...
HDFS and HBase History - Early 2010
  HBase has been around for 3 years, But HDFS still
  acts like MapReduce is the only important client! §




          People have accused HDFS of being like a molasses train:
                      high throughput but not so fast
HDFS and HBase History - 2010
     HBase becomes a top-level project
     Facebook chooses HBase for Messages product
     Jump from HBase 0.20 to HBase 0.89 and 0.90
     First CDH3 betas include HBase
     HDFS community starts to work on features
     for HBase.
         Infamous hadoop-0.20-append branch
What did we get done?
And where are we going?
Reliability in the past: Hadoop 1.0
     Pre-1.0, if the DN crashed, HBase would lose
     its WALs (and your beloved data).
         1.0 integrated hadoop-0.20-append branch into
         a main-line release
         True durability support for HBase
         We have a fighting chance at metadata reliability!

     Numerous bug fixes for write pipeline recovery
     and other error paths
         HBase is not nearly so forgiving as MapReduce!
         “Single-writer” fault tolerance vs “job-level” fault
         tolerance
Reliability in the past: Hadoop 1.0
     Pre-1.0: if any disk failed, entire DN would go
     offline
         Problematic for HBase: local RS would lose all
         locality!
         1.0: per-disk failure detection in DN
         (HDFS-457)
         Allows HBase to lose a disk without losing all
         locality
  Tip: Configure
  dfs.datanode.failed.volumes.tolerated = 1
Reliability today: Hadoop 2.0
     Integrates Highly Available HDFS
     Active-standby hot failover removes SPOF
     Transparent to clients: no HBase changes
     necessary
     Tested extensively under HBase read/write
     workloads
     Coupled with HBase master failover, no more
     HBase SPOF!
HDFS HA
Reliability in the future: HA in 2.x
      Remove dependency on NFS (HDFS-3077)
          Quorum-commit protocol for NameNode edit logs
          Similar to ZAB/Multi-Paxos

      Automatic failover for HA NameNodes
      (HDFS-3042)
          ZooKeeper-based master election, just like HBase
          Merge to trunk should be this week.
Other reliability work for HDFS 2.x
     2.0: current hflush() API only guarantees
     data is replicated to three machines – not fully
     on disk.
     A cluster-wide power outage can lose data.
         Upcoming in 2.x: Support for hsync()
         (HDFS-744, HBASE-5954)
         Calls fsync() for all replicas of the WAL
         Full durability of edits, even with full cluster
         power outages
hflush() and hsync()
HDFS wire compatibility in Hadoop 2.0
     In 1.0: HDFS client version must match server
     version closely.
     How many of you have manually copied HDFS
     client jars?
     Client-server compatibility in 2.0:
         Protobuf-based RPC
         Easier HBase installs: no more futzing with jars
         Separate HBase upgrades from HDFS
         upgrades
     Intra-cluster server compatibility in the works
         Allow for rolling upgrade without downtime
Performance: Hadoop 1.0
     Pre-1.0: even for reads from local machine,
     client connects to DN via TCP
     1.0: Short-circuit local reads
           Obtains direct access to underlying local block file,
           then uses regular FileInputStream access.
           2x speedup for random reads

     Configure dfs.client.read.shortcircuit = true
     Configure dfs.block.local-path-access.user = hbase
     Configure dfs.datanode.data.dir.perm = 755
     Currently does not support security §
Performance: Hadoop 2.0
     Pre-2.0: Up to 50% CPU spent verifying CRC
     2.0: Native checksums using SSE4.2 crc32
     asm (HDFS-2080)
         2.5x speedup reading from buffer cache
         Now only 15% CPU overhead to checksumming
     Pre-2.0: re-establishes TCP connection to DN
     for each seek
     2.0: Rewritten BlockReader, keepalive to DN
     (HDFS-941)
         40% improvement on random read for HBase
         2-2.5x in micro-benchmarks
     Total improvement vs 0.20.2: 3.4x!
Performance: Hadoop 2.x
     Currently: lots of CPU spent copying data in
     memory
     “Direct-read” API: read directly into
     user-provided DirectByteBuffers (HDFS-2834)
         Another ˜2x improvement to sequential
         throughput reading from cache
         Opportunity to avoid two more buffer copies
         reading compressed data (HADOOP-8148)
         Codec APIs still in progress, needs integration into
         HBase
Performance: Hadoop 2.x
     True “zero-copy read” support (HDFS-3051)
         New API would allow direct access to mmaped
         block files
         No syscall or JNI overhead for reads
         Initial benchmarks indicate at least ˜30% gain.
         Some open questions around best safe
         implementation
Current read path
Proposed read path
Performance: why emphasize CPU?
     Machines with lots of RAM now inexpensive
     (48-96GB common)
     Want to use that to improve cache hit ratios.
     Unfortunately, 50GB+ Java heaps still
     impractical (GC pauses too long)
     Allocate the extra RAM to the buffer cache
         OS caches compressed data: another win!
     CPU overhead reading from buffer cache
     becomes limiting factor for read workloads
What’s up next in 2.x?
     HDFS Hard-links (HDFS-3370)
         Will allow for HBase to clone/snapshot tables
         efficiently!
         Improves HBase table-scoped backup story

     HDFS Snapshots (HDFS-2802)
         HBase-wide snapshot support for point-in-time
         recovery
         Enables consistent backups copied off-site for DR
What’s up next in 2.x?
     Improved block placement policies
     (HDFS-1094)
         Fundamental tradeoff between probability of data
         unvailability and the amount of data that becomes
         unavailable
         Current scheme: if any 3 nodes not on the same
         rack die, some very small amount of data is
         unavailable
         Proposed scheme: lessen chances of unavailability,
         but if a certain three nodes die, a larger amount is
         unavailable
         For many HBase applications: any single lost block
         halts whole operation. Prefer to minimize
         probability.
What’s up next in 2.x?
     HBase-specific block placement hints
     (HBASE-4755)
         Assign each region a set of three RS (primary and
         two backups)
         Place underlying data blocks on these three DNs
         Could then fail-over and load-balance without
         losing any locality!
Summary

                Hadoop 1.0            Hadoop 2.0       Hadoop 2.x
 Availability   - DN volume           - NameNode HA    - HA without NAS
                  failure isolation   - Wire Compat    - Rolling upgrade
Performance     - Short-circuit       - Native CRC     - Direct-read API
                  reads               - DN Keepalive   - Zero-copy API
                                                       - Direct codec API
   Features     - durable hflush()                     - hsync()
                                                       - Snapshots
                                                       - Hard links
                                                       - HBase-aware block
                                                         placement
Summary
     HBase is no longer a second-class citizen.
     We’ve come a long way since Hadoop 0.20.2 in
     performance, reliability, and availability.
     New features coming in the 2.x line specifically
     to benefit HBase use cases
 Hadoop 2.0 features available today via CDH4 beta.
 Several Cloudera customers already using CDH4b2
 with HBase with great success.
 Official Hadoop 2.0 release and CDH4 GA coming
 soon.
Questions?




 todd@cloudera.com
  Twitter: @tlipcon
#hbase IRC: tlipcon

   P.S. we’re hiring!

More Related Content

PDF
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
PPTX
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
PPTX
NoSQL: Cassadra vs. HBase
PDF
Meet HBase 1.0
PPTX
Digital Library Collection Management using HBase
PPTX
HBaseCon 2015: HBase Operations in a Flurry
PPTX
HBase: Where Online Meets Low Latency
PDF
HBase: Extreme Makeover
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
NoSQL: Cassadra vs. HBase
Meet HBase 1.0
Digital Library Collection Management using HBase
HBaseCon 2015: HBase Operations in a Flurry
HBase: Where Online Meets Low Latency
HBase: Extreme Makeover

What's hot (20)

PDF
HBaseCon 2015: Elastic HBase on Mesos
PDF
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
PPTX
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
PPTX
Off-heaping the Apache HBase Read Path
PDF
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
PPTX
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
PPTX
HBase Accelerated: In-Memory Flush and Compaction
PPTX
HBaseCon 2013: Compaction Improvements in Apache HBase
PPTX
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
PPTX
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
PDF
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PPTX
HBaseCon 2015: HBase Performance Tuning @ Salesforce
PDF
HBaseCon 2015- HBase @ Flipboard
PPTX
Apache HBase Performance Tuning
PDF
HBaseCon 2015: HBase Operations at Xiaomi
PDF
hbaseconasia2017: HBase在Hulu的使用和实践
PPTX
Rigorous and Multi-tenant HBase Performance Measurement
PPTX
Keynote: The Future of Apache HBase
PPTX
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Off-heaping the Apache HBase Read Path
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBase Accelerated: In-Memory Flush and Compaction
HBaseCon 2013: Compaction Improvements in Apache HBase
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBase and HDFS: Understanding FileSystem Usage in HBase
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015- HBase @ Flipboard
Apache HBase Performance Tuning
HBaseCon 2015: HBase Operations at Xiaomi
hbaseconasia2017: HBase在Hulu的使用和实践
Rigorous and Multi-tenant HBase Performance Measurement
Keynote: The Future of Apache HBase
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
Ad

Viewers also liked (20)

PPTX
HBaseCon 2012 | HBase powered Merchant Lookup Service at Intuit
PDF
Bulk Loading in the Wild: Ingesting the World's Energy Data
PDF
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
PPTX
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
PDF
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
PPTX
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
PDF
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
PDF
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
PDF
HBaseCon 2013: Scalable Network Designs for Apache HBase
PPTX
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
PDF
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
PPTX
HBaseCon 2013: Full-Text Indexing for Apache HBase
PPTX
Design Patterns for Building 360-degree Views with HBase and Kiji
PPTX
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
PPTX
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
PPTX
HBaseCon 2013: Near Real Time Indexing for eBay Search
PPTX
HBase: Just the Basics
PDF
Intro to HBase
PDF
HBaseCon 2012 | Orchestrating Clusters with Ironfan and Chef - Runa
PDF
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase powered Merchant Lookup Service at Intuit
Bulk Loading in the Wild: Ingesting the World's Energy Data
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2013: Full-Text Indexing for Apache HBase
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBase: Just the Basics
Intro to HBase
HBaseCon 2012 | Orchestrating Clusters with Ironfan and Chef - Runa
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Ad

Similar to HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera (20)

PDF
Hbase status quo apache-con europe - nov 2012
PDF
HBase User Group #9: HBase and HDFS
PPTX
Geo-based content processing using hbase
PDF
Storage Infrastructure Behind Facebook Messages
PPTX
Overview of big data & hadoop version 1 - Tony Nguyen
PPTX
Overview of Big data, Hadoop and Microsoft BI - version1
PDF
Facebook keynote-nicolas-qcon
PDF
支撑Facebook消息处理的h base存储系统
PDF
Facebook Messages & HBase
ODP
Hadoop demo ppt
PPTX
Chicago Data Summit: Geo-based Content Processing Using HBase
ODP
HDFS presented by VIJAY
PPTX
BIG DATA: Apache Hadoop
PDF
[B4]deview 2012-hdfs
PDF
Hbase 20141003
PDF
Hadoop Architecture and HDFS
PPT
Hw09 Practical HBase Getting The Most From Your H Base Install
PPT
Hadoop 1.x vs 2
DOC
Hadoop cluster configuration
Hbase status quo apache-con europe - nov 2012
HBase User Group #9: HBase and HDFS
Geo-based content processing using hbase
Storage Infrastructure Behind Facebook Messages
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of Big data, Hadoop and Microsoft BI - version1
Facebook keynote-nicolas-qcon
支撑Facebook消息处理的h base存储系统
Facebook Messages & HBase
Hadoop demo ppt
Chicago Data Summit: Geo-based Content Processing Using HBase
HDFS presented by VIJAY
BIG DATA: Apache Hadoop
[B4]deview 2012-hdfs
Hbase 20141003
Hadoop Architecture and HDFS
Hw09 Practical HBase Getting The Most From Your H Base Install
Hadoop 1.x vs 2
Hadoop cluster configuration

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Introducing the data science sandbox as a service 8.30.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18

Recently uploaded (20)

PDF
SAP855240_ALP - Defining the Global Template PUBLIC.pdf
PDF
Event Presentation Google Cloud Next Extended 2025
PDF
cuic standard and advanced reporting.pdf
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Omni-Path Integration Expertise Offered by Nor-Tech
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Cloud computing and distributed systems.
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
DevOps & Developer Experience Summer BBQ
PDF
Transforming Manufacturing operations through Intelligent Integrations
PPTX
CroxyProxy Instagram Access id login.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
PDF
Modernizing your data center with Dell and AMD
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Reimagining Insurance: Connected Data for Confident Decisions.pdf
SAP855240_ALP - Defining the Global Template PUBLIC.pdf
Event Presentation Google Cloud Next Extended 2025
cuic standard and advanced reporting.pdf
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Omni-Path Integration Expertise Offered by Nor-Tech
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Cloud computing and distributed systems.
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
DevOps & Developer Experience Summer BBQ
Transforming Manufacturing operations through Intelligent Integrations
CroxyProxy Instagram Access id login.pptx
MYSQL Presentation for SQL database connectivity
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
Modernizing your data center with Dell and AMD
20250228 LYD VKU AI Blended-Learning.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Reimagining Insurance: Connected Data for Confident Decisions.pdf

HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera

  • 1. HBase and HDFS: Past, Present, Future Todd Lipcon [email protected] Twitter: @tlipcon #hbase IRC: tlipcon May 22, 2012
  • 2. Intro / who am I? Been working on data stuff for a few years HBase, HDFS, MR committer Cloudera engineer since March ’09 (a) My posts to hbase-dev (b) My posts to (core|hdfs|mapreduce)-dev A You know I’m an engineer since my slides are ugly and written in LTEX
  • 3. Framework for discussion Time periods Past (Hadoop pre-1.0) Present (Hadoop 1.x, 2.0) Future (Hadoop 2.x and later) Categories Reliability/Availability Performance Feature set
  • 4. HDFS and HBase History - 2006 Author: Douglass Cutting <[email protected]> Date: Fri Jan 27 22:19:42 2006 +0000 Create hadoop sub-project.
  • 5. HDFS and HBase History - 2007 Author: Douglass Cutting <[email protected]> Date: Tue Apr 3 20:34:28 2007 +0000 HADOOP-1045. Add contrib/hbase, a BigTable-like online database.
  • 6. HDFS and HBase History - 2008 Author: Jim Kellerman <[email protected]> Date: Tue Feb 5 02:36:26 2008 +0000 2008/02/04 HBase is now a subproject of Hadoop. The first HBase release as a subproject will be release 0.1.0 which will be equivalent to the version of HBase included in Hadoop 0.16.0...
  • 7. HDFS and HBase History - Early 2010 HBase has been around for 3 years, But HDFS still acts like MapReduce is the only important client! § People have accused HDFS of being like a molasses train: high throughput but not so fast
  • 8. HDFS and HBase History - 2010 HBase becomes a top-level project Facebook chooses HBase for Messages product Jump from HBase 0.20 to HBase 0.89 and 0.90 First CDH3 betas include HBase HDFS community starts to work on features for HBase. Infamous hadoop-0.20-append branch
  • 9. What did we get done? And where are we going?
  • 10. Reliability in the past: Hadoop 1.0 Pre-1.0, if the DN crashed, HBase would lose its WALs (and your beloved data). 1.0 integrated hadoop-0.20-append branch into a main-line release True durability support for HBase We have a fighting chance at metadata reliability! Numerous bug fixes for write pipeline recovery and other error paths HBase is not nearly so forgiving as MapReduce! “Single-writer” fault tolerance vs “job-level” fault tolerance
  • 11. Reliability in the past: Hadoop 1.0 Pre-1.0: if any disk failed, entire DN would go offline Problematic for HBase: local RS would lose all locality! 1.0: per-disk failure detection in DN (HDFS-457) Allows HBase to lose a disk without losing all locality Tip: Configure dfs.datanode.failed.volumes.tolerated = 1
  • 12. Reliability today: Hadoop 2.0 Integrates Highly Available HDFS Active-standby hot failover removes SPOF Transparent to clients: no HBase changes necessary Tested extensively under HBase read/write workloads Coupled with HBase master failover, no more HBase SPOF!
  • 14. Reliability in the future: HA in 2.x Remove dependency on NFS (HDFS-3077) Quorum-commit protocol for NameNode edit logs Similar to ZAB/Multi-Paxos Automatic failover for HA NameNodes (HDFS-3042) ZooKeeper-based master election, just like HBase Merge to trunk should be this week.
  • 15. Other reliability work for HDFS 2.x 2.0: current hflush() API only guarantees data is replicated to three machines – not fully on disk. A cluster-wide power outage can lose data. Upcoming in 2.x: Support for hsync() (HDFS-744, HBASE-5954) Calls fsync() for all replicas of the WAL Full durability of edits, even with full cluster power outages
  • 17. HDFS wire compatibility in Hadoop 2.0 In 1.0: HDFS client version must match server version closely. How many of you have manually copied HDFS client jars? Client-server compatibility in 2.0: Protobuf-based RPC Easier HBase installs: no more futzing with jars Separate HBase upgrades from HDFS upgrades Intra-cluster server compatibility in the works Allow for rolling upgrade without downtime
  • 18. Performance: Hadoop 1.0 Pre-1.0: even for reads from local machine, client connects to DN via TCP 1.0: Short-circuit local reads Obtains direct access to underlying local block file, then uses regular FileInputStream access. 2x speedup for random reads Configure dfs.client.read.shortcircuit = true Configure dfs.block.local-path-access.user = hbase Configure dfs.datanode.data.dir.perm = 755 Currently does not support security §
  • 19. Performance: Hadoop 2.0 Pre-2.0: Up to 50% CPU spent verifying CRC 2.0: Native checksums using SSE4.2 crc32 asm (HDFS-2080) 2.5x speedup reading from buffer cache Now only 15% CPU overhead to checksumming Pre-2.0: re-establishes TCP connection to DN for each seek 2.0: Rewritten BlockReader, keepalive to DN (HDFS-941) 40% improvement on random read for HBase 2-2.5x in micro-benchmarks Total improvement vs 0.20.2: 3.4x!
  • 20. Performance: Hadoop 2.x Currently: lots of CPU spent copying data in memory “Direct-read” API: read directly into user-provided DirectByteBuffers (HDFS-2834) Another ˜2x improvement to sequential throughput reading from cache Opportunity to avoid two more buffer copies reading compressed data (HADOOP-8148) Codec APIs still in progress, needs integration into HBase
  • 21. Performance: Hadoop 2.x True “zero-copy read” support (HDFS-3051) New API would allow direct access to mmaped block files No syscall or JNI overhead for reads Initial benchmarks indicate at least ˜30% gain. Some open questions around best safe implementation
  • 24. Performance: why emphasize CPU? Machines with lots of RAM now inexpensive (48-96GB common) Want to use that to improve cache hit ratios. Unfortunately, 50GB+ Java heaps still impractical (GC pauses too long) Allocate the extra RAM to the buffer cache OS caches compressed data: another win! CPU overhead reading from buffer cache becomes limiting factor for read workloads
  • 25. What’s up next in 2.x? HDFS Hard-links (HDFS-3370) Will allow for HBase to clone/snapshot tables efficiently! Improves HBase table-scoped backup story HDFS Snapshots (HDFS-2802) HBase-wide snapshot support for point-in-time recovery Enables consistent backups copied off-site for DR
  • 26. What’s up next in 2.x? Improved block placement policies (HDFS-1094) Fundamental tradeoff between probability of data unvailability and the amount of data that becomes unavailable Current scheme: if any 3 nodes not on the same rack die, some very small amount of data is unavailable Proposed scheme: lessen chances of unavailability, but if a certain three nodes die, a larger amount is unavailable For many HBase applications: any single lost block halts whole operation. Prefer to minimize probability.
  • 27. What’s up next in 2.x? HBase-specific block placement hints (HBASE-4755) Assign each region a set of three RS (primary and two backups) Place underlying data blocks on these three DNs Could then fail-over and load-balance without losing any locality!
  • 28. Summary Hadoop 1.0 Hadoop 2.0 Hadoop 2.x Availability - DN volume - NameNode HA - HA without NAS failure isolation - Wire Compat - Rolling upgrade Performance - Short-circuit - Native CRC - Direct-read API reads - DN Keepalive - Zero-copy API - Direct codec API Features - durable hflush() - hsync() - Snapshots - Hard links - HBase-aware block placement
  • 29. Summary HBase is no longer a second-class citizen. We’ve come a long way since Hadoop 0.20.2 in performance, reliability, and availability. New features coming in the 2.x line specifically to benefit HBase use cases Hadoop 2.0 features available today via CDH4 beta. Several Cloudera customers already using CDH4b2 with HBase with great success. Official Hadoop 2.0 release and CDH4 GA coming soon.
  • 30. Questions? [email protected] Twitter: @tlipcon #hbase IRC: tlipcon P.S. we’re hiring!