SlideShare a Scribd company logo
Large Scale Web Apps @Pinterest
(Powered by Apache HBase)
May 5, 2014
Pinterest is a visual discovery tool for
collecting the things you love, and discovering
related content along the way.
What is Pinterest ?
Large-scale Web Apps @ Pinterest
Scale
Challenges @scale
• 100s of millions of pins/repins per month
• Billions of requests per week
• Millions of daily active users
• Billions of pins
• One of the largest discovery tools on the internet
Storage stack @Pinterest
!
• MySQL
• Redis (persistence and for cache)
• MemCache (Consistent Hashing)
App Tier
Manual
Sharding
Sharding
Logic
Why HBase ?
!
• High Write throughput
- Unlike MySQL/B-Tree, writes don’t ever seek on Disk
• Seamless integration with Hadoop
• Distributed operation
- Fault tolerance
- Load Balancing
- Easily add/remove nodes
!
Non-Technical Reasons
• Large active community
• Large scale online use cases
Outline
!
• Features powered by HBase
• SaaS (Storage as a Service)
- MetaStore
- HFile Service (Terrapin)
• Our HBase setup - optimizing for High availability & Low latency
Applications/Features
!
• Offline
- Analytics
- Search Indexing
- ETL/Hadoop worklows
• Online
- Personalized Feeds
- Rich Pins
- Recommendations
!
Why HBase ?
Personalized Feeds
WHY HBASE ?
Write Heavy load due
to Pin fanout.
Recommended
Pins
Users I follow
Rich Pins
WHY HBASE ?
Negative Hits with Bloom
Filters
Recommendations
HADOOP
1.0
HBASE +
HADOOP 2.0
HADOOP
2.0
WHY HBASE ?
Seamless Data Transfer from
Hadoop
Generate
Recommendations
DistCP Jobs
Serving Cluster
SaaS
• Large number of feature requests
• 1 Cluster per feature
• Scaling with organizational growth
• Need for “defensive” multi tenant storage
• Previous solutions reaching their limits
MetaStore I
• Key Value store on top of HBase
• 1 HBase Table per Feature with salted keys
• Pre split tables
• Table level rate limiting (online/offline reads/writes)
• No Scan support
• Simple client API!
!
string getValue(string feature, string key, boolean online);
void setValue(string feature, string key, string value,
boolean online);
MetaStore II
MetaStore
Thrift Server
Primary HBase Secondary HBase
Clients
Master/Master
Replication
Thrift
Salting +
Rate Limiting
ZooKeeper
Issue
Gets/Sets
Notifications
Metastore Config
- Rate Limits
- Primary Cluster
HFile Service (Terrapin)
• Solve the Bulk Upload problem
• HBase backed solution
- Bulk upload + major compact
- Major compact to delete old data
• Design solution from scratch using mashup of:
- HFile
- HBase BlockCache
- Avoid compactions
- Low latency key value lookups
!
!
!
High Level Architecture I
!
Client Library
/Service
ETL/Batch Jobs
Load/Reload
HFile
Servers
!
HFiles on
Amazon S3
Key/Value
Lookups
Multiple
HFiles/Server
High Level Architecture II
• Each HFile server runs 2 processes
- Copier: pulls HFiles from S3 to local disk
- Supershard: serves multiple HFile shards to client
• ZooKeeper
- Detecting alive servers
- Coordinating loading/swapping of new data
- Enabling clients to detect availability of new data
• Loader Module (replaces distcp)
- Trigger new data copy
- Trigger swap through zookeeper
- Update ZooKeeper and notify client
• Client library understands sharding
• Old data deleted by background process
!
Salient Features
• Multi tenancy through namespacing
• Pluggable sharding functions - modulus, range & more
• HBase Block Cache
• Multiple clusters for redundancy
• Speculative execution across clusters for low latency
!
!
!
Setting up for Success
• Many online usecases/applications
• Optimize for:
- Low MTTR - high availability
- Low latency (performance)
!
!
MTTR - I
DEADLIVE STALE
20sec 9min 40sec
!
• Stale nodes avoided
- As candidates for Reads
- As candidate replicas for writes
- During Lease Recovery
• Copying of underreplicated blocks starts when a Node is
marked as “Dead”
DataNode States
MTTR - II
Failure Detection
Lease Recovery
Log Split
Recover Regions
30 sec ZooKeeper
session timeout
HDFS 4721
HDFS 3703 +
HDFS 3912
< 2 min
!
• Avoid stale nodes at each point of the recovery process
• Multi minute timeouts ==> Multi second timeouts
Simulate, Simulate, Simulate
Simulate “Pull the plug failures” and “tail -f the logs”
• kill -9 both datanode and region server - causes connection
refused errors
• kill -STOP both datanode and region server - causes socket
timeouts
• Blackhole hosts using iptables - connect timeouts + “No
Route to host” - Most representative of AWS failures
Performance
Configuration tweaks
• Small Block Size, 4K-16K
• Prefix compression to cache more - when data is in the key,
close to 4X reduction for some data sets
• Separation of RPC handler threads for reads vs writes
• Short circuit local reads
• HBase level checksums (HBASE 5074)
Hardware
• SATA (m1.xl/c1.xl) and SSD (hi1.4xl)
• Choose based on limiting factor
- Disk space - pick SATA for max GB/$$
- IOPs - pick SSD for max IOPs/$$, clusters with heavy reads or
heavy compaction activity
Performance (SSDs)
HFile Read Performance
• Turn off block cache for Data Blocks, reduce GC + heap
fragmentation
• Keep block cache on for Index Blocks
• Increase “dfs.client.read.shortcircuit.streams.cache.size” from
100 to 10,000 (with short circuit reads)
• Approx. 3X improvement in read throughput
!
Write Performance
• WAL contention when client sets AutoFlush=true
• HBase 8755
In the Pipeline...
!
• Building a graph database on HBase
• Disaster recovery - snapshot + incremental backup + restore
• Off Heap cache - reduce GC overhead and better use of
hardware
• Read path optimizations
And we are Hiring !!
Ad

More Related Content

What's hot (20)

HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBaseCon
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
HBaseCon
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
Matthew Blair
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
DataWorks Summit
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
HBaseCon
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
Cloudera, Inc.
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Gruter
 
Apache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at Cerner
HBaseCon
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
Cloudera, Inc.
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Cloudera, Inc.
 
HBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
Michael Stack
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
Cloudera, Inc.
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBaseCon
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
HBaseCon
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
Matthew Blair
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
DataWorks Summit
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
HBaseCon
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
Cloudera, Inc.
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Gruter
 
Apache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at Cerner
HBaseCon
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
Cloudera, Inc.
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Cloudera, Inc.
 
HBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
Michael Stack
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
Cloudera, Inc.
 

Viewers also liked (20)

Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
HBaseCon
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
Nick Dimiduk
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
DataWorks Summit
 
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Yahoo Developer Network
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
DataWorks Summit
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
Cloudera, Inc.
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
Cloudera, Inc.
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
Cloudera, Inc.
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
Cloudera, Inc.
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
Cloudera, Inc.
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
Cloudera, Inc.
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
Cloudera, Inc.
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon
 
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three Acts
Cloudera, Inc.
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
HBaseCon
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
Cloudera, Inc.
 
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
Cloudera, Inc.
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
Cloudera, Inc.
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
Cloudera, Inc.
 
Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
HBaseCon
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
Nick Dimiduk
 
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Yahoo Developer Network
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
DataWorks Summit
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
Cloudera, Inc.
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
Cloudera, Inc.
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
Cloudera, Inc.
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
Cloudera, Inc.
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
Cloudera, Inc.
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
Cloudera, Inc.
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
Cloudera, Inc.
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon
 
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three Acts
Cloudera, Inc.
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
HBaseCon
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
Cloudera, Inc.
 
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
Cloudera, Inc.
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
Cloudera, Inc.
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
Cloudera, Inc.
 
Ad

Similar to Large-scale Web Apps @ Pinterest (20)

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Dataconomy Media
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
Nick Dimiduk
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
强 王
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
HBaseCon
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
ssuserd3a367
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
DataWorks Summit/Hadoop Summit
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
Yifeng Jiang
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
Alfresco Software
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
cdmaxime
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
Michael Stack
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China Telecom
Michael Stack
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
chariorienit
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
Ceph Community
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Dataconomy Media
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
Nick Dimiduk
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
强 王
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
HBaseCon
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
ssuserd3a367
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
DataWorks Summit/Hadoop Summit
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
Yifeng Jiang
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
Alfresco Software
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
cdmaxime
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
Michael Stack
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China Telecom
Michael Stack
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
Ceph Community
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
Ad

More from HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon
 

Recently uploaded (20)

How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 

Large-scale Web Apps @ Pinterest

  • 1. Large Scale Web Apps @Pinterest (Powered by Apache HBase) May 5, 2014
  • 2. Pinterest is a visual discovery tool for collecting the things you love, and discovering related content along the way. What is Pinterest ?
  • 4. Scale Challenges @scale • 100s of millions of pins/repins per month • Billions of requests per week • Millions of daily active users • Billions of pins • One of the largest discovery tools on the internet
  • 5. Storage stack @Pinterest ! • MySQL • Redis (persistence and for cache) • MemCache (Consistent Hashing) App Tier Manual Sharding Sharding Logic
  • 6. Why HBase ? ! • High Write throughput - Unlike MySQL/B-Tree, writes don’t ever seek on Disk • Seamless integration with Hadoop • Distributed operation - Fault tolerance - Load Balancing - Easily add/remove nodes ! Non-Technical Reasons • Large active community • Large scale online use cases
  • 7. Outline ! • Features powered by HBase • SaaS (Storage as a Service) - MetaStore - HFile Service (Terrapin) • Our HBase setup - optimizing for High availability & Low latency
  • 8. Applications/Features ! • Offline - Analytics - Search Indexing - ETL/Hadoop worklows • Online - Personalized Feeds - Rich Pins - Recommendations ! Why HBase ?
  • 9. Personalized Feeds WHY HBASE ? Write Heavy load due to Pin fanout. Recommended Pins Users I follow
  • 10. Rich Pins WHY HBASE ? Negative Hits with Bloom Filters
  • 11. Recommendations HADOOP 1.0 HBASE + HADOOP 2.0 HADOOP 2.0 WHY HBASE ? Seamless Data Transfer from Hadoop Generate Recommendations DistCP Jobs Serving Cluster
  • 12. SaaS • Large number of feature requests • 1 Cluster per feature • Scaling with organizational growth • Need for “defensive” multi tenant storage • Previous solutions reaching their limits
  • 13. MetaStore I • Key Value store on top of HBase • 1 HBase Table per Feature with salted keys • Pre split tables • Table level rate limiting (online/offline reads/writes) • No Scan support • Simple client API! ! string getValue(string feature, string key, boolean online); void setValue(string feature, string key, string value, boolean online);
  • 14. MetaStore II MetaStore Thrift Server Primary HBase Secondary HBase Clients Master/Master Replication Thrift Salting + Rate Limiting ZooKeeper Issue Gets/Sets Notifications Metastore Config - Rate Limits - Primary Cluster
  • 15. HFile Service (Terrapin) • Solve the Bulk Upload problem • HBase backed solution - Bulk upload + major compact - Major compact to delete old data • Design solution from scratch using mashup of: - HFile - HBase BlockCache - Avoid compactions - Low latency key value lookups ! ! !
  • 16. High Level Architecture I ! Client Library /Service ETL/Batch Jobs Load/Reload HFile Servers ! HFiles on Amazon S3 Key/Value Lookups Multiple HFiles/Server
  • 17. High Level Architecture II • Each HFile server runs 2 processes - Copier: pulls HFiles from S3 to local disk - Supershard: serves multiple HFile shards to client • ZooKeeper - Detecting alive servers - Coordinating loading/swapping of new data - Enabling clients to detect availability of new data • Loader Module (replaces distcp) - Trigger new data copy - Trigger swap through zookeeper - Update ZooKeeper and notify client • Client library understands sharding • Old data deleted by background process !
  • 18. Salient Features • Multi tenancy through namespacing • Pluggable sharding functions - modulus, range & more • HBase Block Cache • Multiple clusters for redundancy • Speculative execution across clusters for low latency ! ! !
  • 19. Setting up for Success • Many online usecases/applications • Optimize for: - Low MTTR - high availability - Low latency (performance) ! !
  • 20. MTTR - I DEADLIVE STALE 20sec 9min 40sec ! • Stale nodes avoided - As candidates for Reads - As candidate replicas for writes - During Lease Recovery • Copying of underreplicated blocks starts when a Node is marked as “Dead” DataNode States
  • 21. MTTR - II Failure Detection Lease Recovery Log Split Recover Regions 30 sec ZooKeeper session timeout HDFS 4721 HDFS 3703 + HDFS 3912 < 2 min ! • Avoid stale nodes at each point of the recovery process • Multi minute timeouts ==> Multi second timeouts
  • 22. Simulate, Simulate, Simulate Simulate “Pull the plug failures” and “tail -f the logs” • kill -9 both datanode and region server - causes connection refused errors • kill -STOP both datanode and region server - causes socket timeouts • Blackhole hosts using iptables - connect timeouts + “No Route to host” - Most representative of AWS failures
  • 23. Performance Configuration tweaks • Small Block Size, 4K-16K • Prefix compression to cache more - when data is in the key, close to 4X reduction for some data sets • Separation of RPC handler threads for reads vs writes • Short circuit local reads • HBase level checksums (HBASE 5074) Hardware • SATA (m1.xl/c1.xl) and SSD (hi1.4xl) • Choose based on limiting factor - Disk space - pick SATA for max GB/$$ - IOPs - pick SSD for max IOPs/$$, clusters with heavy reads or heavy compaction activity
  • 24. Performance (SSDs) HFile Read Performance • Turn off block cache for Data Blocks, reduce GC + heap fragmentation • Keep block cache on for Index Blocks • Increase “dfs.client.read.shortcircuit.streams.cache.size” from 100 to 10,000 (with short circuit reads) • Approx. 3X improvement in read throughput ! Write Performance • WAL contention when client sets AutoFlush=true • HBase 8755
  • 25. In the Pipeline... ! • Building a graph database on HBase • Disaster recovery - snapshot + incremental backup + restore • Off Heap cache - reduce GC overhead and better use of hardware • Read path optimizations
  • 26. And we are Hiring !!