We start by looking at distributed database features that impact latency. Then we take a deeper look at the HBase read and write paths with a focus on request latency. We examine the sources of latency and how to minimize them.
This document discusses techniques for improving latency in HBase. It analyzes the write and read paths, identifying sources of latency such as networking, HDFS flushes, garbage collection, and machine failures. For writes, it finds that single puts can achieve millisecond latency while streaming puts can hide latency spikes. For reads, it notes cache hits are sub-millisecond while cache misses and seeks add latency. GC pauses of 25-100ms are common, and failures hurt locality and require cache rebuilding. The document outlines ongoing work to reduce GC, use off-heap memory, improve compactions and caching to further optimize for low latency.
The document discusses different types of block caches in HBase including LruBlockCache, SlabCache, and BucketCache. It explains that block caching improves performance by storing frequently accessed blocks in faster memory rather than slower disk storage. Each block cache has its own configuration options and memory usage characteristics. Benchmark results show that the off-heap BucketCache provides strong performance due to its use of off-heap memory for the L2 cache.
This document summarizes a presentation about optimizing for low latency in HBase. It discusses how to measure latency, the write and read paths in HBase, sources of latency like garbage collection and compactions, and techniques for reducing latency like streaming puts, block caching, and timeline consistency. The key points are that single puts can achieve millisecond latency while garbage collection and machine failures can cause pauses of 10s of milliseconds to seconds, and optimizing for the "magical 1%" of requests after the 99th percentile is important to improve average latency.
This document summarizes a presentation about optimizing HBase performance through caching. It discusses how baseline tests showed low cache hit rates and CPU/memory utilization. Reducing the table block size improved cache hits but increased overhead. Adding an off-heap bucket cache to store table data minimized JVM garbage collection latency spikes and improved memory utilization by caching frequently accessed data outside the Java heap. Configuration parameters for the bucket cache are also outlined.
Anoop Sam John and Ramkrishna Vasudevan (Intel)
HBase provides an LRU based on heap cache but its size (and so the total data size that can be cached) is limited by Java’s max heap space. This talk highlights our work under HBASE-11425 to allow the HBase read path to work directly from the off-heap area.
HBase 2.0 is the next stable major release for Apache HBase scheduled for early 2017. It is the biggest and most exciting milestone release from the Apache community after 1.0. HBase-2.0 contains a large number of features that is long time in the development, some of which include rewritten region assignment, perf improvements (RPC, rewritten write pipeline, etc), async clients, C++ client, offheaping memstore and other buffers, Spark integration, shading of dependencies as well as a lot of other fixes and stability improvements. We will go into technical details on some of the most important improvements in the release, as well as what are the implications for the users in terms of API and upgrade paths. Existing users of HBase/Phoenix as well as operators managing HBase clusters will benefit the most where they can learn about the new release and the long list of features. We will also briefly cover earlier 1.x release lines and compatibility and upgrade paths for existing users and conclude by giving an outlook on the next level of initiatives for the project.
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
Todd Lipcon presents a solution to avoid full garbage collections (GCs) in HBase by using MemStore-Local Allocation Buffers (MSLABs). The document outlines that write operations in HBase can cause fragmentation in the old generation heap, leading to long GC pauses. MSLABs address this by allocating each MemStore's data into contiguous 2MB chunks, eliminating fragmentation. When MemStores flush, the freed chunks are large and contiguous. With MSLABs enabled, the author saw basically zero full GCs during load testing. MSLABs improve performance and stability by preventing GC pauses caused by fragmentation.
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
The document discusses techniques for rigorously measuring Apache HBase performance in both standalone and multi-tenant environments. It introduces the Yahoo! Cloud Serving Benchmark (YCSB) and best practices for cluster setup, workload generation, data loading, and measurement. These include pre-splitting tables, warming caches, setting target throughput, and using appropriate workload distributions. The document also covers challenges in achieving good multi-tenant performance across HBase, MapReduce and Apache Solr.
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.
The strength of an open source project resides entirely in its developer community; a strong democratic culture of participation and hacking makes for a better piece of software. The key requirement is having developers who are not only willing to contribute, but also knowledgeable about the project’s internal structure and architecture. This session will introduce developers to the core internal architectural concepts of HBase, not just “what” it does from the outside, but “how” it works internally, and “why” it does things a certain way. We’ll walk through key sections of code and discuss key concepts like the MVCC implementation and memstore organization. The goal is to convert serious “HBase Users” into HBase Developer Users”, and give voice to some of the deep knowledge locked in the committers’ heads.
HBase Accelerated introduces an in-memory flush and compaction pipeline for HBase to improve performance of real-time workloads. By keeping data in memory longer and avoiding frequent disk flushes and compactions, it reduces I/O and improves read and scan latencies. Evaluation on workloads with high update rates and small working sets showed the new approach significantly outperformed the default HBase implementation by serving most data from memory. Work is ongoing to further optimize the in-memory representation and memory usage.
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
Eshcar Hillel and Anastasia Braginsky (Yahoo!)
Real-time HBase application performance depends critically on the amount of I/O in the datapath. Here we’ll describe an optimization of HBase for high-churn applications that frequently insert/update/delete the same keys, such as for high-speed queuing and e-commerce.
The document evaluates the performance of HBase version 0.20.0 on a small cluster. It describes the testbed setup including hardware specifications and Hadoop/HBase configuration parameters. A series of experiments are run to test random reads, random writes, sequential reads, sequential writes, and scans. The results show significant performance improvements over previous versions, getting closer to the performance levels of Google BigTable as reported in their paper.
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
This document discusses file system usage in HBase. It provides an overview of the three main file types in HBase: write-ahead logs (WALs), data files, and reference files. It describes durability semantics, IO fencing techniques for region server recovery, and how HBase leverages data locality through short circuit reads, checksums, and block placement hints. The document is intended help understand HBase's interactions with HDFS for tuning IO performance.
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
HBase Coprocessors allow user code to be run on region servers within each region of an HBase table. Coprocessors are loaded dynamically and scale automatically as regions are split or merged. They provide hooks into various HBase operations via observer classes and define an interface for custom endpoint calls between clients and servers. Examples of use cases include secondary indexes, filters, and replacing MapReduce jobs with server-side processing.
The document summarizes the HBase 1.0 release which introduces major new features and interfaces including a new client API, region replicas for high availability, online configuration changes, and semantic versioning. It describes goals of laying a stable foundation, stabilizing clusters and clients, and making versioning explicit. Compatibility with earlier versions is discussed and the new interfaces like ConnectionFactory, Connection, Table and BufferedMutator are introduced along with examples of using them.
Yfrog uses HBase as its scalable database backend to store and serve 250 million photos from over 60 million monthly users across 4 HBase clusters ranging from 50TB to 1PB in size. The authors provide best practices for configuring and monitoring HBase, including using smaller commodity servers, tuning JVM garbage collection, monitoring metrics like thread usage and disk I/O, and implementing caching and replication for high performance and reliability. Following these practices has allowed Yfrog's HBase deployment to run smoothly and efficiently.
Breaking the Sound Barrier with Persistent Memory HBaseCon
Liqi Yi and Shylaja Kokoori (Intel)
A fully optimized HBase cluster could easily hit the limit of the underlying storage device’s capability, which is beyond the reach of software optimization alone. To get around this constraint, we need a new design that brings data processing and data storage closer together. In this presentation, we will look at how persistent memory will change the way large datasets are stored. We will review the hardware characteristics of 3D XPoint™, a new persistent memory technology with low latency and high capacity. We will also discuss opportunities for further improvement within the HBase framework using persistent memory.
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed StorageCloudera, Inc.
Pinterest uses Apache HBase to store data for users' personalized "following feeds" at scale. This involves storing billions of pins and updates per day. Some key challenges addressed are handling high throughput writes from fanouts, providing low latency reads, and resolving potential data inconsistencies from race conditions. Optimizations to HBase include increased memstore size, block cache tuning, and prefix compression. Maintaining high availability involves writing to dual clusters, tight Zookeeper timeouts, and automated repairs.
Speakers: Liang Xie and Honghua Feng (Xiamoi)
This talk covers the HBase environment at Xiaomi, including thoughts and practices around latency, hardware/OS/VM configuration, GC tuning, the use of a new write thread model and reverse scan, and block index optimization. It will also include some discussion of planned JIRAs based on these approaches.
Since 2013, Yahoo! has been successfully running multi-tenant HBase clusters. Our tenants run applications ranging from real-time processing (e.g. content personalization, Ad targeting) to operational warehouses (e.g. advertising, content). Tenants are guaranteed an adequate level of resource isolation and security. This is achieved through the use of open source and in-house developed HBase features such as region server groups, group-based replication, and group-based favored nodes.
Today, with the increase in adoption and new use cases, we are working towards scaling our HBase clusters to support petabytes of data without compromising on performance and operability. A common tradeoff when scaling a cluster to this size is to increase the size of a region, thus avoiding the problem of having too many regions on a cluster. However, large regions negatively affect the performance and operability of a cluster mainly because region size determines the following: 1. granularity for load distribution, and 2. amount of write amplification due to compaction. Thus we are working towards enabling an HBase cluster to host at least a million regions.
In this presentation, we will walk through the key features we have implemented as well as share our experiences working on multi-tenancy and scaling the cluster.
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
If you’re running an HBase cluster in production, you’ve probably noticed that HBase shares a number of useful metrics about everything from your block cache performance to your HDFS latencies over JMX (or Ganglia, or just a file). The problem is that it’s sometimes hard to know what these metrics mean to you and your users. Should you be worried if your memstore SizeMB is 1.5GB? What if your regionservers have a hundred stores each? This talk will explain how to understand and interpret the metrics HBase exports. Along the way we’ll cover some high-level background on HBase’s internals, and share some battle tested rules-of-thumb about how to interpret and react to metrics you might see.
HBase-2.0.0 has been a couple of years in the making. It is chock-a-block full of a long list of new features and fixes. In this session, the 2.0.0 release manager will perform the impossible, describing the release content inside the session time bounds.
hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
Now that you've seen Base 1.0, what's ahead in HBase 2.0, and beyond—and why? Find out from this panel of people who have designed and/or are working on 2.0 features.
The document describes Accordion, a novel in-memory compaction algorithm for HBase that improves write performance. Accordion applies the log-structured merge tree design used in LSM databases to HBase's memory data structure. This transforms random writes to sequential writes, keeping more data in memory for longer. As a result, Accordion achieves higher write throughput, lower disk I/O, and reduced read latency compared to HBase's previous approach. Accordion has been integrated into HBase 2.0 as the default memory compaction implementation.
In this session, you will learn the work Xiaomi has done to improve the availability and stability of our HBase clusters, including cross-site data and service backup and a coordinated compaction framework. You'll also learn about the Themis framework, which supports cross-row transactions on HBase based on Google's percolator algorithm, and its usage in Xiaomi's applications.
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
Pinterest runs 38 different HBase clusters in production, doing a lot of different types of work—with some doing up to 5 million operations per second. In this talk, you'll get details about how we do capacity planning, maintenance tasks such as online automated rolling compaction, configuration management, and monitoring.
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
gohbase is an implementation of an HBase client in pure Go: https://github.com/tsuna/gohbase. In this presentation we'll talk about its architecture and compare its performance against the native Java HBase client as well as AsyncHBase (http://opentsdb.github.io/asynchbase/) and some nice characteristics of golang that resulted in a simpler implementation.
If you've used a modern, interactive map such as Google or Bing Maps, you've consumed "map tiles". Map tiles are small images rendering a piece of the mosaic that is the whole map. Using conventional means, rendering tiles for the whole globe at multiple resolutions is a huge data processing effort. Even highly optimized, it spans a couple TBs and a few days of computation. Enter Hadoop. In this talk, I'll show you how to generate your own custom tiles using Hadoop. There will be pretty pictures.
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
The document discusses techniques for rigorously measuring Apache HBase performance in both standalone and multi-tenant environments. It introduces the Yahoo! Cloud Serving Benchmark (YCSB) and best practices for cluster setup, workload generation, data loading, and measurement. These include pre-splitting tables, warming caches, setting target throughput, and using appropriate workload distributions. The document also covers challenges in achieving good multi-tenant performance across HBase, MapReduce and Apache Solr.
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.
The strength of an open source project resides entirely in its developer community; a strong democratic culture of participation and hacking makes for a better piece of software. The key requirement is having developers who are not only willing to contribute, but also knowledgeable about the project’s internal structure and architecture. This session will introduce developers to the core internal architectural concepts of HBase, not just “what” it does from the outside, but “how” it works internally, and “why” it does things a certain way. We’ll walk through key sections of code and discuss key concepts like the MVCC implementation and memstore organization. The goal is to convert serious “HBase Users” into HBase Developer Users”, and give voice to some of the deep knowledge locked in the committers’ heads.
HBase Accelerated introduces an in-memory flush and compaction pipeline for HBase to improve performance of real-time workloads. By keeping data in memory longer and avoiding frequent disk flushes and compactions, it reduces I/O and improves read and scan latencies. Evaluation on workloads with high update rates and small working sets showed the new approach significantly outperformed the default HBase implementation by serving most data from memory. Work is ongoing to further optimize the in-memory representation and memory usage.
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
Eshcar Hillel and Anastasia Braginsky (Yahoo!)
Real-time HBase application performance depends critically on the amount of I/O in the datapath. Here we’ll describe an optimization of HBase for high-churn applications that frequently insert/update/delete the same keys, such as for high-speed queuing and e-commerce.
The document evaluates the performance of HBase version 0.20.0 on a small cluster. It describes the testbed setup including hardware specifications and Hadoop/HBase configuration parameters. A series of experiments are run to test random reads, random writes, sequential reads, sequential writes, and scans. The results show significant performance improvements over previous versions, getting closer to the performance levels of Google BigTable as reported in their paper.
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
This document discusses file system usage in HBase. It provides an overview of the three main file types in HBase: write-ahead logs (WALs), data files, and reference files. It describes durability semantics, IO fencing techniques for region server recovery, and how HBase leverages data locality through short circuit reads, checksums, and block placement hints. The document is intended help understand HBase's interactions with HDFS for tuning IO performance.
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
HBase Coprocessors allow user code to be run on region servers within each region of an HBase table. Coprocessors are loaded dynamically and scale automatically as regions are split or merged. They provide hooks into various HBase operations via observer classes and define an interface for custom endpoint calls between clients and servers. Examples of use cases include secondary indexes, filters, and replacing MapReduce jobs with server-side processing.
The document summarizes the HBase 1.0 release which introduces major new features and interfaces including a new client API, region replicas for high availability, online configuration changes, and semantic versioning. It describes goals of laying a stable foundation, stabilizing clusters and clients, and making versioning explicit. Compatibility with earlier versions is discussed and the new interfaces like ConnectionFactory, Connection, Table and BufferedMutator are introduced along with examples of using them.
Yfrog uses HBase as its scalable database backend to store and serve 250 million photos from over 60 million monthly users across 4 HBase clusters ranging from 50TB to 1PB in size. The authors provide best practices for configuring and monitoring HBase, including using smaller commodity servers, tuning JVM garbage collection, monitoring metrics like thread usage and disk I/O, and implementing caching and replication for high performance and reliability. Following these practices has allowed Yfrog's HBase deployment to run smoothly and efficiently.
Breaking the Sound Barrier with Persistent Memory HBaseCon
Liqi Yi and Shylaja Kokoori (Intel)
A fully optimized HBase cluster could easily hit the limit of the underlying storage device’s capability, which is beyond the reach of software optimization alone. To get around this constraint, we need a new design that brings data processing and data storage closer together. In this presentation, we will look at how persistent memory will change the way large datasets are stored. We will review the hardware characteristics of 3D XPoint™, a new persistent memory technology with low latency and high capacity. We will also discuss opportunities for further improvement within the HBase framework using persistent memory.
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed StorageCloudera, Inc.
Pinterest uses Apache HBase to store data for users' personalized "following feeds" at scale. This involves storing billions of pins and updates per day. Some key challenges addressed are handling high throughput writes from fanouts, providing low latency reads, and resolving potential data inconsistencies from race conditions. Optimizations to HBase include increased memstore size, block cache tuning, and prefix compression. Maintaining high availability involves writing to dual clusters, tight Zookeeper timeouts, and automated repairs.
Speakers: Liang Xie and Honghua Feng (Xiamoi)
This talk covers the HBase environment at Xiaomi, including thoughts and practices around latency, hardware/OS/VM configuration, GC tuning, the use of a new write thread model and reverse scan, and block index optimization. It will also include some discussion of planned JIRAs based on these approaches.
Since 2013, Yahoo! has been successfully running multi-tenant HBase clusters. Our tenants run applications ranging from real-time processing (e.g. content personalization, Ad targeting) to operational warehouses (e.g. advertising, content). Tenants are guaranteed an adequate level of resource isolation and security. This is achieved through the use of open source and in-house developed HBase features such as region server groups, group-based replication, and group-based favored nodes.
Today, with the increase in adoption and new use cases, we are working towards scaling our HBase clusters to support petabytes of data without compromising on performance and operability. A common tradeoff when scaling a cluster to this size is to increase the size of a region, thus avoiding the problem of having too many regions on a cluster. However, large regions negatively affect the performance and operability of a cluster mainly because region size determines the following: 1. granularity for load distribution, and 2. amount of write amplification due to compaction. Thus we are working towards enabling an HBase cluster to host at least a million regions.
In this presentation, we will walk through the key features we have implemented as well as share our experiences working on multi-tenancy and scaling the cluster.
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
If you’re running an HBase cluster in production, you’ve probably noticed that HBase shares a number of useful metrics about everything from your block cache performance to your HDFS latencies over JMX (or Ganglia, or just a file). The problem is that it’s sometimes hard to know what these metrics mean to you and your users. Should you be worried if your memstore SizeMB is 1.5GB? What if your regionservers have a hundred stores each? This talk will explain how to understand and interpret the metrics HBase exports. Along the way we’ll cover some high-level background on HBase’s internals, and share some battle tested rules-of-thumb about how to interpret and react to metrics you might see.
HBase-2.0.0 has been a couple of years in the making. It is chock-a-block full of a long list of new features and fixes. In this session, the 2.0.0 release manager will perform the impossible, describing the release content inside the session time bounds.
hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
Now that you've seen Base 1.0, what's ahead in HBase 2.0, and beyond—and why? Find out from this panel of people who have designed and/or are working on 2.0 features.
The document describes Accordion, a novel in-memory compaction algorithm for HBase that improves write performance. Accordion applies the log-structured merge tree design used in LSM databases to HBase's memory data structure. This transforms random writes to sequential writes, keeping more data in memory for longer. As a result, Accordion achieves higher write throughput, lower disk I/O, and reduced read latency compared to HBase's previous approach. Accordion has been integrated into HBase 2.0 as the default memory compaction implementation.
In this session, you will learn the work Xiaomi has done to improve the availability and stability of our HBase clusters, including cross-site data and service backup and a coordinated compaction framework. You'll also learn about the Themis framework, which supports cross-row transactions on HBase based on Google's percolator algorithm, and its usage in Xiaomi's applications.
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
Pinterest runs 38 different HBase clusters in production, doing a lot of different types of work—with some doing up to 5 million operations per second. In this talk, you'll get details about how we do capacity planning, maintenance tasks such as online automated rolling compaction, configuration management, and monitoring.
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
gohbase is an implementation of an HBase client in pure Go: https://github.com/tsuna/gohbase. In this presentation we'll talk about its architecture and compare its performance against the native Java HBase client as well as AsyncHBase (http://opentsdb.github.io/asynchbase/) and some nice characteristics of golang that resulted in a simpler implementation.
If you've used a modern, interactive map such as Google or Bing Maps, you've consumed "map tiles". Map tiles are small images rendering a piece of the mosaic that is the whole map. Using conventional means, rendering tiles for the whole globe at multiple resolutions is a huge data processing effort. Even highly optimized, it spans a couple TBs and a few days of computation. Enter Hadoop. In this talk, I'll show you how to generate your own custom tiles using Hadoop. There will be pretty pictures.
The document provides an overview of the state of the Apache HBase database project. It discusses the project goals of availability, stability, and scalability. It also summarizes the mature codebase, active development areas like region replicas and ProcedureV2, and the growing ecosystem of SQL interfaces and other Hadoop components integrated with HBase. Recent releases include 1.1.2 which improved scanners and introduced quotas and throttling, and the 1.0 release which adopted semantic versioning and added region replicas.
Introduction to Hadoop, HBase, and NoSQLNick Dimiduk
The document is a presentation on NoSQL databases given by Nick Dimiduk. It begins with an introduction of the speaker and their background. The presentation then covers what NoSQL is not, the motivations for NoSQL databases, an overview of Hadoop and its components, and a description of HBase as a structured, distributed database built on Hadoop.
The document discusses the HBase client API for connecting to HBase clusters from applications like webapps. It describes the Java, Ruby, Python, and Thrift client interfaces as well as examples of using scans and puts with these interfaces. It also briefly mentions the REST client interface and some other alternative client libraries like asynchbase and Orderly.
This document provides an overview of Apache Phoenix, including:
- What Phoenix is and how it provides a SQL interface for Apache HBase
- The current state of Phoenix including SQL support, secondary indexes, and optimizations
- New features in Phoenix 4.4 like functional indexes, user defined functions, and integration with Spark
The presentation covers the evolution and capabilities of Phoenix as a relational layer for HBase that transforms SQL queries into native HBase API calls.
HBase is a distributed, scalable, big data store modeled after Google's Bigtable. The document outlines the key aspects of HBase, including that it uses HDFS for storage, Zookeeper for coordination, and can optionally use MapReduce for batch processing. It describes HBase's architecture with a master server distributing regions across multiple region servers, which store and serve data from memory and disks.
HBase can be an intimidating beast for someone considering its adoption. For what kinds of workloads is it well suited? How does it integrate into the rest of my application infrastructure? What are the data semantics upon which applications can be built? What are the deployment and operational concerns? In this talk, I'll address each of these questions in turn. As supporting evidence, both high-level application architecture and internal details will be discussed. This is an interactive talk: bring your questions and your use-cases!
This document introduces Pig, an open source platform for analyzing large datasets that sits on top of Hadoop. It provides an example of using Pig Latin to find the top 5 most visited websites by users aged 18-25 from user and website data. Key points covered include who uses Pig, how it works, performance advantages over MapReduce, and upcoming new features. The document encourages learning more about Pig through online documentation and tutorials.
At Salesforce, we have deployed many thousands of HBase/HDFS servers, and learned a lot about tuning during this process. This talk will walk you through the many relevant HBase, HDFS, Apache ZooKeeper, Java/GC, and Operating System configuration options and provides guidelines about which options to use in what situation, and how they relate to each other.
This document discusses tuning HBase and HDFS for performance and correctness. Some key recommendations include:
- Enable HDFS sync on close and sync behind writes for correctness on power failures.
- Tune HBase compaction settings like blockingStoreFiles and compactionThreshold based on whether the workload is read-heavy or write-heavy.
- Size RegionServer machines based on disk size, heap size, and number of cores to optimize for the workload.
- Set client and server RPC chunk sizes like hbase.client.write.buffer to 2MB to maximize network throughput.
- Configure various garbage collection settings in HBase like -Xmn512m and -XX:+UseCMSInit
Speaker: Varun Sharma (Pinterest)
Over the past year, HBase has become an integral component of Pinterest's storage stack. HBase has enabled us to quickly launch and iterate on new products and create amazing pinner experiences. This talk briefly describes some of these applications, the underlying schema, and how our HBase setup stays highly available and performant despite billions of requests every week. It will also include some performance tips for running on SSDs. Finally, we will talk about a homegrown serving technology we built from a mashup of HBase components that has gained wide adoption across Pinterest.
HBase is a distributed, column-oriented database that stores data in tables divided into rows and columns. It is optimized for random, real-time read/write access to big data. The document discusses HBase's key concepts like tables, regions, and column families. It also covers performance tuning aspects like cluster configuration, compaction strategies, and intelligent key design to spread load evenly. Different use cases are suitable for HBase depending on access patterns, such as time series data, messages, or serving random lookups and short scans from large datasets. Proper data modeling and tuning are necessary to maximize HBase's performance.
From: DataWorks Summit 2017 - Munich - 20170406
HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.
This talk delves into the many ways that a user has to use HBase in a project. Lars will look at many practical examples based on real applications in production, for example, on Facebook and eBay and the right approach for those wanting to find their own implementation. He will also discuss advanced concepts, such as counters, coprocessors and schema design.
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
The document provides an overview of HBase schema design and cluster sizing notes. It discusses HBase architecture including tables, regions, distribution, and compactions. It emphasizes the importance of schema design, including using intelligent keys, denormalization, and duplication to overcome limitations. The document also covers techniques like salting keys, hashing vs sequential keys, and examples of schema design for applications like mail inbox and Facebook insights. It stresses designing for the use case and avoiding hotspotting when sizing clusters.
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
An overview of the history of Big Data, followed by a deep dive into the Hadoop ecosystem. Detailed explanation of how HDFS, MapReduce, and HBase work, followed by a discussion of how to tune HBase performance. Finally, a look at industry trends, including challenges faced and being solved by Bloomberg for using Hadoop for financial data.
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
An overview of the history of Big Data, followed by a deep dive into the Hadoop ecosystem. Detailed explanation of how HDFS, MapReduce, and HBase work, followed by a discussion of how to tune HBase performance. Finally, a look at industry trends, including challenges faced and being solved by Bloomberg for using Hadoop for financial data.
HBase is an Apache implementation of Google BigTable that provides random reads, writes, and scans of large datasets at scale. It offers strong consistency, durability, and latency in the milliseconds. Data is stored in a distributed, replicated fashion to ensure availability even during failures through redundant copies. Performance is optimized through caching and tuning garbage collection.
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
This document provides an overview of HBase architecture and advanced usage topics. It discusses course credit requirements, HBase architecture components like storage, write path, read path, files, region splits and more. It also covers advanced topics like secondary indexes, search integration, transactions and bloom filters. The document emphasizes that HBase uses log-structured merge trees for efficient data handling and operates at the disk transfer level rather than disk seek level for performance. It also provides details on various classes involved in write-ahead logging.
Elastic HBase on Mesos aims to improve resource utilization of HBase clusters by running HBase in Docker containers managed by Mesos and Marathon. This allows HBase clusters to dynamically scale based on varying workload demands, increases utilization by running mixed workloads on shared resources, and simplifies operations through standard containerization. Key benefits include easier management, higher efficiency through elastic scaling and resource sharing, and improved cluster tunability.
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsSpeedment, Inc.
Presentation Summary: By leveraging on memory mapped files, the Chronicle Engine supports large maps that easily can exceed the size of your server’s RAM, thus allowing application developers to create huge JVM:s where data can be obtained quickly and with predictable latency. The Chronicle Engine can be synchronized with an underlying database using Speedment so that your in-memory maps will be “alive” and change whenever data changes in the underlying database. Speedment can also automatically derive domain models directly from the database so that you can start using the solution very quickly. Because the Java Maps are mapped onto files, the maps can also be shared instantly between several JVM:s and when you restart a JVM, it may start very quickly without having to reload data from the underlying database. The mapped files can be hundreds of terabytes which has been done in real world deployment cases.
Adobe has packaged HBase in Docker containers and uses Marathon and Mesos to schedule them—allowing us to decouple the RegionServer from the host, express resource requirements declaratively, and open the door for unassisted real-time deployments, elastic (up and down) real-time scalability, and more. In this talk, you'll hear what we've learned and explain why this approach could fundamentally change HBase operations.
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopAyon Sinha
This document discusses Walmart Labs' use of eventual consistency with Kafka, SolrCloud, and Hadoop to power their large-scale ecommerce operations. It describes some of the challenges they faced, including slow query times, garbage collection pauses, and Zookeeper configuration issues. The key aspects of their solution involved using Kafka to handle asynchronous data ingestion into SolrCloud and Hadoop, batching updates for improved performance, dedicating hardware resources, and monitoring metrics to identify issues. This architecture has helped Walmart Labs scale to support their customers' high volumes of online shopping.
Cassandra is used for real-time bidding in online advertising. It processes billions of bid requests per day with low latency requirements. Segment data, which assigns product or service affinity to user groups, is stored in Cassandra to reduce calculations and allow users to be bid on sooner. Tuning the cache size and understanding the active dataset helps optimize performance.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell
With expertise in data architecture, performance tracking, and revenue forecasting, Andrew Marnell plays a vital role in aligning business strategies with data insights. Andrew Marnell’s ability to lead cross-functional teams ensures businesses achieve sustainable growth and operational excellence.
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?
Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxAnoop Ashok
In today's fast-paced retail environment, efficiency is key. Every minute counts, and every penny matters. One tool that can significantly boost your store's efficiency is a well-executed planogram. These visual merchandising blueprints not only enhance store layouts but also save time and money in the process.
This is the keynote of the Into the Box conference, highlighting the release of the BoxLang JVM language, its key enhancements, and its vision for the future.
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul
Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just tools—they're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/
HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar.
Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten.
In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich
- Zugriff auf die Konsole
- Auffinden und Interpretieren von Protokolldateien
- Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS)
- Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien
- Nutzung der Client Clocking-Funktion
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda
HBase Low Latency, StrataNYC 2014
1. HBase Low Latency
Nick Dimiduk, Hortonworks (@xefyr)
Nicolas Liochon, Scaled Risk (@nkeywal)
Strata New York, October 17, 2014
2. Agenda
• Latency, what is it, how to measure it
• Write path
• Read path
• Next steps
3. What’s low latency
• Meaning from micro seconds (High Frequency
Trading) to seconds (interactive queries)
• In this talk milliseconds
Latency is about percentiles
• Average != 50% percentile
• There are often order of magnitudes between « average » and « 95
percentile »
• Post 99% = « magical 1% ». Work in progress here.
4. Measure latency
YCSB - Yahoo! Cloud Serving Benchmark
• Useful for comparison between databases
• Set of workload already defined
bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation
• More options related to HBase: autoflush, replicas, …
• Latency measured in micro second
• Easier for internal analysis
5. Why is it important
Durability
Availability
Consistency
6. 3
1
Client
Buffer
0
Server
Buffer
HBase BigTable
OS
Buffer
on GFS1
Traditional
DB Engine
Disk
Durability
2
8. Consistency
Two processes: P1, P2
Counter updated by a P1
v1, then v2, then v3
Eventual consistency allows P1
and P2 to see these events in any
order.
Strong consistency allows only one
order
Google F1 paper, VLDB (2013)
We store financial data and have hard requirements
on data integrity and consistency. We also have a lot
of experience with eventual consistency systems at
Google. In all such systems, we find developers spend
a significant fraction of their time building extremely
complex and error-prone mechanisms to cope with
eventual consistency and handle data that may be
out of date. We think this is an unacceptable burden
to place on developers and that consistency problems
should be solved at the database level.
9. Consistency
Big Table design: allows consistency by partitioning the data: each
machine serves a subset of the data.
10. Availability
• Contract is: « a client outside the cluster will sees HBase as available if
there are partitions or failure within the HBase cluster »
• There is a lot more to say, but it’s outside the scope of this talk
(unfortunately)
12. Trade off
• Maximizing the benefits while minimizing
the cost
• Implementation details count
• Configuration counts
13. Write path
• Two parts
• Single put (WAL)
• The client just sends the put
• Multiple puts from the client (new behavior since 0.96)
• The client is much smarter
• Four stages to look at for latency
• Start (establish tcp connections, etc.)
• Steady: when expected conditions are met
• Machine failure: expected as well
• Overloaded system
14. Single put: communication & scheduling
• Client: TCP connection to the server
• Shared: multitheads on the same client are using the same TCP connection
• Pooling is possible and does improve the performances in some circonstances
• hbase.client.ipc.pool.size
• Server: multiple calls from multiple threads on multiple machines
• Can become thousand of simultaneous queries
• Scheduling is required
15. Single put: real work
• The server must
• Write into the WAL queue
• Sync the WAL queue (HDFS flush)
• Write into the memstore
• WALs queue is shared between all the regions/handlers
• Sync is avoided if another handlers did the work
• You may flush more than expected
16. Simple put: A small run
Percentile Time in ms
Mean 1.21
50% 0.95
95% 1.50
99% 2.12
17. Latency sources
• Candidate one: network
• 0.5ms within a datacenter
• Much less between nodes in the same rack
Percentile Time in ms
Mean 0.13
50% 0.12
95% 0.15
99% 0.47
18. Latency sources
• Candidate two: HDFS Flush
Percentile Time in ms
Mean 0.33
50% 0.26
95% 0.59
99% 1.24
• We can still do better: HADOOP-7714 & sons.
19. Latency sources
• Millisecond world: everything can go wrong
• JVM
• Network
• OS Scheduler
• File System
• All this goes into the post 99% percentile
• Requires monitoring
• Usually using the latest version shelps.
20. Latency sources
• Split (and presplits)
• Autosharding is great!
• Puts have to wait
• Impacts: seconds
• Balance
• Regions move
• Triggers a retry for the client
• hbase.client.pause = 100ms since HBase 0.96
• Garbage Collection
• Impacts: 10’s of ms, even with a good config
• Covered with the read path of this talk
21. From steady to loaded and overloaded
• Number of concurrent tasks is a factor of
• Number of cores
• Number of disks
• Number of remote machines used
• Difficult to estimate
• Queues are doomed to happen
• hbase.regionserver.handler.count
• So for low latency
• Replable scheduler since HBase 0.98 (HBASE-8884). Requires specific code.
• RPC Priorities: since 0.98 (HBASE-11048)
22. From loaded to overloaded
• MemStore takes too much room: flush, then blocksquite quickly
• hbase.regionserver.global.memstore.size.lower.limit
• hbase.regionserver.global.memstore.size
• hbase.hregion.memstore.block.multiplier
• Too many Hfiles: block until compactions keep up
• hbase.hstore.blockingStoreFiles
• Too manyWALs files: Flush and block
• hbase.regionserver.maxlogs
23. Machine failure
• Failure
• Dectect
• Reallocate
• Replay WAL
• ReplayingWAL is NOT required for puts
• hbase.master.distributed.log.replay
• (default true in 1.0)
• Failure = Dectect + Reallocate + Retry
• That’s in the range of ~1s for simple failures
• Silent failures leads puts you in the 10s range if the hardware does not help
• zookeeper.session.timeout
24. Single puts
• Millisecond range
• Spikes do happen in steady mode
• 100ms
• Causes: GC, load, splits
25. Streaming puts
Htable#setAutoFlushTo(false)
Htable#put
Htable#flushCommit
• As simple puts, but
• Puts are grouped and send in background
• Load is taken into account
• Does not block
26. Multiple puts
hbase.client.max.total.tasks (default 100)
hbase.client.max.perserver.tasks (default 5)
hbase.client.max.perregion.tasks (default 1)
• Decouple the client from a latency spike of a region server
• Increase the throughput by 50% compared to old multiput
• Makes split and GC more transparent
27. Conclusion on write path
• Single puts can be very fast
• It’s not a « hard real time » system: there are spikes
• Most latency spikes can be hidden when streaming puts
• Failure are NOT that difficult for the write path
• No WAL to replay
29. Read path
• Get/short scan are assumed for low-latency operations
• Again, two APIs
• Single get: HTable#get(Get)
• Multi-get: HTable#get(List<Get>)
• Four stages, same as write path
• Start (tcp connection, …)
• Steady: when expected conditions are met
• Machine failure: expected as well
• Overloaded system: you may need to add machines or tune your workload
30. Multi get / Client
Group Gets by
RegionServer
Execute them
one by one
33. AcceSstso rlaagtee hniecrya rmchya: gan diifdfeeresnt view
Dean/2009
Memory is 100000x
faster than disk!
Disk seek = 10ms
34. Known unknowns
• For each candidate HFile
• Exclude by file metadata
• Timestamp
• Rowkey range
• Exclude by bloom filter
StoreFileScanner#
shouldUseScanner()
35. Unknown knowns
• Merge sort results polled from Stores
• Seek each scanner to a reference KeyValue
• Retrieve candidate data from disk
• Multiple HFiles => mulitple seeks
• hbase.storescanner.parallel.seek.enable=true
• Short Circuit Reads
• dfs.client.read.shortcircuit=true
• Block locality
• Happy clusters compact!
HFileBlock#
readBlockData()
37. BlockCache Showdown
• LruBlockCache
• Default, onheap
• Quite good most of the time
• Evictions impact GC
• BucketCache
• Offheap alternative
• Serialization overhead
• Large memory configurations
http://www.n10k.com/blog/block
cache-showdown/
L2 off-heap BucketCache
makes a strong showing
38. Latency enemies: Garbage Collection
• Use heap. Not too much. With CMS.
• Max heap
• 30GB (compressed pointers)
• 8-16GB if you care about 9’s
• Healthy cluster load
• regular, reliable collections
• 25-100ms pause on regular interval
• Overloaded RegionServer suffers GC overmuch
39. Off-heap to the rescue?
• BucketCache (0.96, HBASE-7404)
• Network interfaces (HBASE-9535)
• MemStore et al (HBASE-10191)
40. Latency enemies: Compactions
• Fewer HFiles => fewer seeks
• Evict data blocks!
• Evict Index blocks!!
• hfile.block.index.cacheonwrite
• Evict bloom blocks!!!
• hfile.block.bloom.cacheonwrite
• OS buffer cache to the rescue
• Compactected data is still fresh
• Better than going all the way back to disk
42. Hedging our bets
• HDFS Hedged reads (2.4, HDFS-5776)
• Reads on secondary DataNodes
• Strongly consistent
• Works at the HDFS level
• Timeline consistency (HBASE-10070)
• Reads on « Replica Region »
• Not strongly consistent
43. Read latency in summary
• Steady mode
• Cache hit: < 1 ms
• Cache miss: + 10 ms per seek
• Writing while reading => cache churn
• GC: 25-100ms pause on regular interval
Network request + (1 - P(cache hit)) * (10 ms * seeks)
• Same long tail issues as write
• Overloaded: same scheduling issues as write
• Partial failures hurt a lot
44. HBase ranges for 99% latency
Put
Streamed
Multiput Get Timeline get
Steady milliseconds milliseconds milliseconds milliseconds
Failure seconds seconds seconds milliseconds
GC
10’s of
milliseconds milliseconds
10’s of
milliseconds milliseconds
45. What’s next
• Less GC
• Use less objects
• Offheap
✓Compressed BlockCache (HBASE-11331)
• Prefered location (HBASE-4755)
• The « magical 1% »
• Most tools stops at the 99% latency
• What happens after is much more complex
46. 35.0x
30.0x
25.0x
20.0x
15.0x
10.0x
5.0x
0.0x
Performance with Compressed BlockCache
Total RAM: 24G LruBlockCache Size: 12G
Data Size: 45G Compressed Size: 11G
Compression: SNAPPY
throughput (ops/sec) latency (ms, p95) latency (ms, p99) cpu load
Times improvement
Metric
Enabled
Disabled
47. Thanks!
Nick Dimiduk, Hortonworks (@xefyr)
Nicolas Liochon, Scaled Risk (@nkeywal)
Strata New York, October 17, 2014
Editor's Notes
#30: This talk: assume get/short scan implies low-latency requirements
No fancy streaming client like Puts; waiting for slowest RS.
#31: Gets grouped by RS, groups sent in parallel, block until all groups return.
Network call: 10’s micros, in parallel
#32: Read path in full detail is quite complex
See Lar’s HBaseCon 2012 talk for the nitty-gritty
Complexity optimizing around one invariant: 10ms seek
#33: Read path in full detail is quite complex
See Lar’s HBaseCon 2012 talk for the nitty-gritty
Complexity optimizing around one invariant: 10ms seek
#34: Complexity optimizing around one invariant: 10ms seek
Aiming for 100 microSec world; how to work around this?
Goal: avoid disk at all cost!
#35: Goal: don’t go to disk unless absolutely necessary.
Tactic: Candidate HFile elimination.
Regular compactions => 3-5 candidates
#36: Mergesort over multiple files, multiple seeks
More spindles = parallel scanning
SCR avoids proxy process (DataNode)
But remember! Goal: don’t go to disk unless absolutely necessary.
#37: “block” is a segment of an HFile
Data blocks, index blocks, and bloom blocks
Read blocks retained in BlockCache
Seek to same and adjacent data become cache hits
#38: HBase ships with a variety of cache implementations
Happy with 95% stats? Stick with LruBlockCache and modest heapsize
Pushing 99%? Lru still okay, but watch that heapsize.
Spent money on RAM? BucketCache
#39: GC is a part of healthy operation
BlockCache garbage is awkward size and age, which means pauses
Pause time is a function of heap size
More like ~16GiB if you’re really worried about 99%
Overloaded: more cache evictions, time in GC
#40: Why generate garbage at all?
GC are smart, but maybe we know our pain spots better?
Don’t know until we try
#41: Necessary for fewer scan candidates, fewer seeks
Buffer cache to the rescue
“That’s steady state and overloaded; let’s talk about failure”
#42: Replaying WALs takes time
Unlucky: no data locality, talking to remove DataNode
Empty cache
“Failure isn’t binary. What about the sick and the dying?”
#44: Reads dominated by disk seek, so keep that data in memory
After cache miss, GC is the next candidate cause of latency
“Ideal formula”
P(cache hit): fn (cache size :: db size, request locality)
Sometimes the jitter dominates
#45: Standard deployment, well designed schema
Millisecond responses, seconds for failure recovery, and GC at a regular interval
Everything we’ve focused on here is impactful of the 99%
Beyond that there’s a lot of interesting problems to solve
#46: There’s always more work to be done
Generate less garbage
Compressed BlockCache
Improve recovery time and locality