SlideShare a Scribd company logo
Evaluating NVMe drives for
accelerating HBase
Nicolas Poggi and David Grier
FOSDEM Jan 2017
BSC Data Centric Computing – Rackspace collaboration
Outline
1. Intro on BSC and ALOJA
2. Cluster specs and disk
benchmarks
3. HBase use case with NVE
1. Read-only workload
• Different strategies
2. Mixed workload
4. Summary
2
Max of bw (MB/s)
Max of lat (us)
BYTES/S
REQ SIZE
Barcelona Supercomputing Center (BSC)
• Spanish national supercomputing center 22 years history in:
• Computer Architecture, networking and distributed systems
research
• Based at BarcelonaTech University (UPC)
• Large ongoing life science computational projects
• Prominent body of research activity around Hadoop
• 2008-2013: SLA Adaptive Scheduler, Accelerators, Locality
Awareness, Performance Management. 7+ publications
• 2013-Present: Cost-efficient upcoming Big Data architectures
(ALOJA) 6+ publications
ALOJA: towards cost-effective Big Data
• Research project for automating characterization and
optimization of Big Data deployments
• Open source Benchmarking-to-Insights platform and tools
• Largest Big Data public repository (70,000+ jobs)
• Community collaboration with industry and academia
http://aloja.bsc.es
Big Data
Benchmarking
Online
Repository
Web / ML
Analytics
Motivation and objectives
• Explore use cases where NVMe devices
can speedup Big Data apps
• Poor initial results…
• HBase (this study) based on Intel report
• Measure the possibilities of NVMe devices
• System level benchmarks (FIO, IO Meter)
• WiP towards tiered-storage for Big data status
• Extend ALOJA into low-level I/O
• Challenge
• benchmark and stress high-end Big Data
clusters
• In reasonable amount of time (and cost)
First tests:
5
8512 8667 8523
9668
0
2000
4000
6000
8000
10000
12000
Seconds
Lower is better
Running time of terasort (1TB) under different
disks
terasort
teragen
Marginal improvement!!!
Cluster and drive specs
All nodes (x5)
Operating System CentOS 7.2
Memory 128GB
CPU Single Octo-core (16 threads)
Disk Config OS 2x600GB SAS RAID1
Network 10Gb/10Gb redundant
Master node (x1, extra nodes for HA not used in these tests)
Disk Config Master storage 4x600GB SAS RAID10
XSF partition
Data Nodes (x4)
NVMe (cache) 1.6TB Intel DC P3608 NVMe SSD (PCIe)
Disk config HDFS data
storage
10x 0.6TB NL-SAS/SATA JBOD
PCIe3 x8, 12Gb/s SAS RAID
SEAGATE ST3600057SS 15K (XFS partition)
1. Intel DC P3608 (current 2015)
• PCI-Express 3.0 x8 lanes
• Capacity: 1.6TB (two drives)
• Seq R/W BW:
• 5000/2000 MB/s, (128k req)
• Random 4k R/W IOPS:
• 850k/150k
• Random 8k R/W IOPS:
• 500k/60k, 8 workers
• Price $10,780
• (online search 02/2017)
2. LSI Nytro WarpDrive 4-400 (old gen 2012)
• PCI-Express 2.0 x8 lanes
• Capacity: 1.6TB (two drives)
• Seq. R/W BW:
• 2000/1000 MB/s (256k req)
• R/W IOPS:
• 185k/120k (8k req)
• Price $4,096
• (online search 02/2017)
• $12,195 MSRP 2012
6
FIO Benchmarks
Objectives:
• Assert vendor specs (BW, IOPS, Latency)
• Seq R/W, Random R/W
• Verify driver/firmware and OS
• Set performance expectations
Commands on reference on last slides 7
Max of bw (MB/s)
Max of lat (us)
0
5000000
10000000
15000000
20000000
25000000
524288
1048576
2097152
4194304
Bytes/s
Req Size
FIO results: Max Bandwidth
Higher is better.
MaxBW recorded for each deviceunder differentsettings: req size, io depth, threads. Using libaio. 8
Results:
• Random R/W similar in both
PCIe SSDs
• But not for the SAS
JBOD
• SAS JBOD achieves high both
seq R/W
• 10 disks 2GB/s
• Achieved both PCIe vendor
numbers
• Also on IOPS
• Combined WPD disks only
improve in W performance
Intel NVMe SAS (15KRPM) 10 and 1 disk(s) PCIe SSD (old gen) 1 and 2 disks
NVMe (2 disks P3608) SAS JBOD (10d) SAS disk (1 disk 15K) PCIe (WPD 1 disk) PCIe (WPD 2 disks)
New cluster (NVMe) Old Cluster
randread 4674.99 409.95 118.07 1935.24 4165.65
randwrite 2015.27 843 249.06 1140.96 1256.12
read 4964.44 1861.4 198.54 2033.52 3957.42
write 2006 1869.49 204.17 1201.52 2066.48
0
1000
2000
3000
4000
5000
6000
MB/s
Max bandwidth (MB/s) per disk type
NVMe (2 disks P3608) SAS JBOD (10d) PCIe (WPD 1 disk) PCIe (WPD 2 disks)
New cluster (NVMe) Old Cluster
randread 381.43 5823.5 519.61 250.06
randwrite 389.9 1996.9 1340.35 252.96
read 369.53 294.33 405.65 204.39
write 369.42 280.2 852.03 410.14
0
1000
2000
3000
4000
5000
6000
7000
µsecs
Average latency by device (64KB req size, 1 io depth)
FIO results: Latency (smoke test)
Higher is better.
Average latencyfor req size64KB and 1 io depth (varying workers). Using libaio. 9
Results:
• JBOD has highest latency for
random R/W (as expected)
• But very low for seq
• Combined WPD disks lower
the latency
• Lower than P3608
disks.
Notes:
• Need to more thorough
comparison and at different
settings.
Intel NVMe SAS 10 disk JBOD PCIe SSD (old gen) 1 and 2 disks
High latency
HBase in a nutshell
• Highly scalable Big data key-value store
• On top of Hadoop (HDFS)
• Based on Google’s Bigtable
• Real-time and random access
• Indexed
• Low-latency
• Block cache and Bloom Filters
• Linear, modular scalability
• Automatic sharding of tables and failover
• Strictly consistent reads and writes.
• failover support
• Production ready and battle tested
• Building block of other projects
HBase R/W architecture
10
Source: HDP doc
JVM Heap
L2 Bucket Cache (BC) in HBase
Region server (worker) memory with BC
11
• Adds a second “block” storage for HFiles
• Use case: L2 cache and replaces OS buffer
cache
• Does copy-on‐read
• Fixed sized, reserved on startup
• 3 different modes:
• Heap
• Marginal improvement
• Divides mem with the block cache
• Offheap (in RAM)
• Uses Java NIO’s Direct ByteBuffer
• File
• Any local file / device
• Bypasses HDFS
• Saves RAM
L2-BucketCache experiments summary
Tested configurations for HBase v1.24
1. HBase default (baseline)
2. HBase w/ Bucket cache Offheap
1. Size: 32GB /work node
3. HBase w/ Bucket cache in RAM disk
1. Size: 32GB /work node
4. HBase w/ Bucket cache in NVMe disk
1. Size: 250GB / worker node
• All using same Hadoop and HDFS
configuration
• On JBOD (10 SAS disks, /grid/{0,9})
• 1 Replica, short-circuit reads
Experiments
1. Read-only (workload C)
1. RAM at 128GB / node
2. RAM at 32GB / node
3. Clearing buffer cache
2. Full YCSB (workloads A-F)
4. RAM at 128GB / node
5. RAM at 32GB / node
• Payload:
• YCSB 250M records.
• ~2TB HDFS raw
12
Experiment 1: Read-only (gets)
YCSB Workload C: 25M records, 500 threads, ~2TB HDFS
13
E 1.1: Throughput of the 4 configurations
(128GB RAM)
Higher is betterfor Ops (Y1), lowerfor latency (Y2)
(Offheap run2 and 3 the same run)
14
Results:
• Ops/Sec improve with BC
• Offheap 1.6x
• RAMd 1.5x
• NVMe 2.3x
• AVG latency as well
• (req time)
• First run slower on 4
cases (writes OS cache
and BC)
• Baseline and RAMd
only 6% faster after
• NVMe 16%
• 3rd run not faster, cache
already loaded
• Tested onheap config:
• > 8GB failed
• 8GB slower than
baseline
Baseline BucketCache Offheap BucketCache RAM disk BucketCache NVMe
WorkloadC_run1 105610 133981 161712 218342
WorkloadC_run2 111530 175483 171236 257017
WorkloadC_run3 111422 175483 170889 253625
Cache Time % 5.5 31 5.6 16.2
Speeup (run3) 1 1.57 1.53 2.28
Latency µs (run3) 4476 2841 2917 1964
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0
50000
100000
150000
200000
250000
300000
Ops/Sec
Throughput of 3 consecutive iterations of Workload C (128GB)
E 1.1 Cluster resource consumption: Baseline
15
CPU % (AVG)
Disk R/W MB/s (SUM)
Mem Usage KB (AVG)
NET R/W Mb/s (SUM)
Notes:
• Java heap and OS buffer cache holds 100% of WL
• Data is read from disks (HDFS) only in first part of
the run,
• then throughput stabilizes (see NET and CPU)
• Free resources,
• Bottleneck in application and OS path (not shown)
Write
Read
E 1.1 Cluster resource consumption: Bucket Cache strategies
16
Offheap (32GB) Disk R/W RAM disk (tmpfs 32GB) Disk R/W
NVMe (250GB) Disk R/WNotes:
• 3 BC strategies faster than baseline
• BC LRU more effective than OS buffer
• Offheap slightly more efficient ran RAMd (same size)
• But seems to take longer to fill (different per node)
• And more capacity for same payload (plus java heap)
• NVM can hold the complete WL in the BC
• Read and Writes to MEM not captured by charts
BC fills on 1st run
WL doesn’t fit
completely
E 1.2-3
Challenge:
Limit OS buffer cache effect on experiments
1st approach, larger payload. Cons: high execution time
2nd limit available RAM (using stress tool)
3rd clear buffer cache periodically (drop caches)
17
E1.2: Throughput of the 4 configurations
(32GB RAM)
Higher is betterfor Ops (Y1), lowerfor latency (Y2) 18
Results:
• Ops/Sec improve only
with NVMe up to 8X
• RAMd performs
close to baseline
• First run same on baseline
and RAMd
• tmpfs “blocks” as
RAM is needed
• At lower capacity,
external BC shows more
improvement
Baseline BucketCache Offheap BucketCache RAM disk BucketCache NVMe
WorkloadC_run1 20578 14715.493 21520 109488
WorkloadC_run2 20598 16995 21534 166588
Speeup (run2) 1 0.83 1.05 8.09
Cache Time % 99.9 99.9 99.9 48
Latency µs (run3) 24226 29360 23176 2993
0
5000
10000
15000
20000
25000
30000
35000
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
Ops/Sec
Throughput of 2 consecutive iterations of Workload C (32GB)
E 1.1 Cluster CPU% AVG: Bucket Cache (32GB RAM)
19
Baseline
RAM disk (/dev/shm 8GB)
Offheap (4GB)
NVMe (250GB)
Read disk throughput: 2.4GB/s Read disk throughput: 2.8GB/s
Read disk throughput: 2.5GB/s Read disk throughput: 38.GB/s
BC failure
Slowest
E1.3: Throughput of the 4 configurations
(Drop OS buffer cache)
Higher is betterfor Ops (Y1), lowerfor latency (Y2).
Dropping cacheevery 10 secs.
20
Results:
• Ops/Sec improve only
with NVMe up to 9X
• RAMd performs 1.43X
better this time
• First run same on baseline
and RAMd
• But RAMd worked
fine
• Having a larger sized BC
improves performance
over RAMd
Baseline BucketCache Offheap BucketCache RAM disk BucketCache NVMe
WorkloadC_run1 22780 30593 32447 126306
WorkloadC_run2 22770 30469 32617 210976
Speeup (run3) 1 1.34 1.43 9.27
Cache Time % -0.1 -0.1 0.5 67
Latency µs (run2) 21924 16375 15293 2361
0
5000
10000
15000
20000
25000
0
50000
100000
150000
200000
250000
Ops/Sec
Throughput of 2 consecutive iterations of Workload C (Drop OS Cache)
Experiment 2: All workloads (-E)
YCSB workloads A-D, F: 25M records, 1000 threads
21
Benchmark suite: The Yahoo! Cloud Serving Benchmark (YCSB)
• Open source specification and kit, for comparing NoSQL DBs. (since 2010)
• Core workloads:
• A: Update heavy workload
• 50/50 R/W.
• B: Read mostly workload
• 95/5 R/W mix.
• C: Read only
• 100% read.
• D: Read latest workload
• Inserts new records and reads them
• Workload E: Short ranges (Not used, takes too long to run SCAN type)
• Short ranges of records are queried, instead of individual records
• F: Read-modify-write
• read a record, modify it, and write back.
https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads 22
E2.1: Throughput and Speedup ALL
(128GB RAM)
Higher is better 23
Results:
• Datagen same in all
• (write-only)
• Overall: Ops/Sec improve
with BC
• RAMd 14%
• NVMe 37%
• WL D gets higher speedup
with NVMe
• WL F 6% faster on RAMd
than in NVMe
• Need to run more
iterations to see max
improvement
Baseline
BucketCache RAM
disk
BucketCache NVMe
Datagen 68502 67998 65933
WL A 77049 83379 96752
WL B 80966 87788 115713
WL C 89372 99403 132738
WL D 136426 171123 244759
WL F 48699 65223 62496
0
50000
100000
150000
200000
250000
300000
Ops/Sec
Throughput of workloads A-D,F (128GB, 1 iteration)
Baseline
BucketCache RAM
disk
BucketCache NVMe
Speedup Data 1 0.99 0.96
Speedup A 1 1.08 1.26
Speedup B 1 1.08 1.43
Speedup C 1 1.11 1.49
Speedup D 1 1.25 1.79
Speedup F 1 1.34 1.28
Total 1 1.14 1.37
0.8
1
1.2
1.4
1.6
1.8
2
Speedup
Speedup of workloads A-D,F (128GB, 1 iteration)
E2.1: Throughput and Speedup ALL
(32GB RAM)
Higher is better 24
Results:
• Datagen slower with the
RAMd (less OS RAM)
Overall: Ops/Sec improve
with BC
• RAMd 17%
• NVMe 87%
• WL C gets higher speedup
with NVMe
• WL F now faster with
NVMe
• Need to run more
iterations to see max
improvement
Baseline
BucketCache RAM
disk
BucketCache NVMe
Speedup Data 1 0.88 0.98
Speedup A 1 1.02 1.37
Speedup B 1 1.31 2.38
Speedup C 1 1.42 2.65
Speedup D 1 1.1 1.5
Speedup F 1 1.27 1.98
Total 1 1.17 1.81
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
Speedup
Speedup of workloads A-D,F (128GB, 1 iteration)
Baseline
BucketCache RAM
disk
BucketCache NVMe
Datagen 68821 60819 67737
WL A 55682 56702 76315
WL B 33551 43978 79895
WL C 30631 43420 81245
WL D 85725 94631 128881
WL F 25540 32464 50568
0
20000
40000
60000
80000
100000
120000
140000
Ops/Sec
Throughput of workloads A-D,F (128GB, 1 iteration)
E2.1: CPU and Disk for ALL (32GB RAM)
25
Baseline CPU %
NVMe CPU%
Baseline Disk R/w
NVMe Disk R/W
High I/O wait
Read disk throughput: 1.8GB/s
Moderate I/O wait (2x less time)
Read disk throughput: up to 25GB/s
E2.1: NET and MEM ALL (32GB RAM)
26
Baseline MEM
NVMe MEM
Baseline NET R/W
NVMe NET R/W
Higher OS cache util
Throughput: 1Gb/s
Lower OS cache util Throughput: up to 2.5 Gb/s
Summary
Lessons learned, findings, conclusions, references
27
Bucket Cache results recap (medium sized WL)
• Full cluster (128GB RAM / node)
• WL-C up to 2.7x speedup (warm cache)
• Full benchmark (CRUD) from 0.3 to 0.9x speedup (cold cache)
• Limiting resources
• 32GB RAM / node
• WL-C NVMe gets up to 8X improvement (warm cache)
• Other techniques failed/poor results
• Full benchmark between 0.4 and 2.7x speedup (cold cache)
• Drop OS cache WL-C
• Up to 9x with NVMe, only < 0.5x with other techniques (warm cache)
• Latency reduces significantly with cached results
• Onheap BC not recommended
• just give more RAM to BlockCache
28
Open challenges / Lessons learned
• Generating app level workloads that stresses newer HW
• At acceptable time / cost
• Still need to
• Run micro-benchmarks
• Per DEV, node, cluster
• Large working sets
• > RAM (128GB / Node)
• > NMVe (1.6TB / Node)
• OS buffer cache highly effective
• at least with HDFS and HBase
• Still, a RAM app “L2 cache” is able to speedup
• App level LRU more effective
• YCSB Zipfian distribution (popular records)
• The larger the WL, higher the gains
• Can be simulated by limiting resources or dropping caches effectively
29
8512 8667 8523
9668
0
2000
4000
6000
8000
10000
12000
NVMe JBOD10+NVMe JBOD10 JBOD05
Seconds
Lower is better
Running time of terasort (1TB) under different disks
terasort
teragen
To conclude…
• NVMe offers significant BW and Latency improvement over SAS/SATA,
but
• JBODs still perform well for seq R/W
• Also cheaper €/TB
• Big Data apps still designed for rotational (avoid random I/O)
• Full tiered-storage support is missing by Big Data frameworks
• Byte addressable vs. block access
• Research shows improvements
• Need to rely on external tools/file systems
• Alluxio (Tachyon), Triple-H, New file systems (SSDFS), …
• Fast devices speedup, but still caching is the simple use case…
30
References
• ALOJA
• http://aloja.bsc.es
• https://github.com/aloja/aloja
• Bucket cache and HBase
• BlockCache (and Bucket Cache 101) http://www.n10k.com/blog/blockcache-101/
• Intel brief on bucket cache:
http://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/apache-
hbase-block-cache-testing-brief.pdf
• http://www.slideshare.net/larsgeorge/hbase-status-report-hadoop-summit-europe-2014
• HBase performance: http://www.slideshare.net/bijugs/h-base-performance
• Benchmarks
• FIO https://github.com/axboe/fio
• Brian F. Cooper, et. Al. 2010. Benchmarking cloud serving systems with YCSB.
http://dx.doi.org/10.1145/1807128.1807152
31
Thanks, questions?
Follow up / feedback : Nicolas.Poggi@bsc.es
Twitter: @ni_po
Evaluating NVMe drives for accelerating HBase
FIO commands:
• Sequential read
• fio --name=read --directory=${dir} --ioengine=libaio --direct=1 --bs=${bs} --rw=read --iodepth=${iodepth} --numjobs=1 --buffered=0 --size=2gb --
runtime=30 --time_based --randrepeat=0 --norandommap --refill_buffers --output-format=json
• Random read
• fio --name=randread --directory=${dir} --ioengine=libaio --direct=1 --bs=${bs} --rw=randread --iodepth=${iodepth} --numjobs=${numjobs} --
buffered=0 --size=2gb --runtime=30 --time_based --randrepeat=0 --norandommap --refill_buffers --output-format=json
• Sequential read on raw devices:
• fio --name=read --filename=${dev} --ioengine=libaio --direct=1 --bs=${bs} --rw=read --iodepth=${iodepth} --numjobs=1 --buffered=0 --size=2gb --
runtime=30 --time_based --randrepeat=0 --norandommap --refill_buffers --output-format=json
• Sequential write
• fio --name=write --directory=${dir} --ioengine=libaio --direct=1 --bs=${bs} --rw=write --iodepth=${iodepth} --numjobs=1 --buffered=0 --size=2gb --
runtime=30 --time_based --randrepeat=0 --norandommap --refill_buffers --output-format=json
• Random write
• fio --name=randwrite --directory=${dir} --ioengine=libaio --direct=1 --bs=${bs} --rw=randwrite --iodepth=${iodepth} --numjobs=${numjobs} --
buffered=0 --size=2gb --runtime=30 --time_based --randrepeat=0 --norandommap --refill_buffers --output-format=json
• Random read on raw devices:
• fio --name=randread --filename=${dev} --ioengine=libaio --direct=1 --bs=${bs} --rw=randread --iodepth=${iodepth} --numjobs=${numjobs} --
buffered=0 --size=2gb --runtime=30 --time_based --randrepeat=0 --norandommap --refill_buffers --offset_increment=2gb --output-format=json
https://github.com/axboe/fio 33
Ad

More Related Content

What's hot (20)

HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
HBaseCon
 
Redis on NVMe SSD - Zvika Guz, Samsung
 Redis on NVMe SSD - Zvika Guz, Samsung Redis on NVMe SSD - Zvika Guz, Samsung
Redis on NVMe SSD - Zvika Guz, Samsung
Redis Labs
 
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Masao Fujii
 
Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)
Scott Mansfield
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
ScyllaDB
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
HBaseCon
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
Marian Marinov
 
Automatic Operation Bot for Ceph - You Ji
Automatic Operation Bot for Ceph - You JiAutomatic Operation Bot for Ceph - You Ji
Automatic Operation Bot for Ceph - You Ji
Ceph Community
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
DaeMyung Kang
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! Scale
HBaseCon
 
EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)
Scott Mansfield
 
G1: To Infinity and Beyond
G1: To Infinity and BeyondG1: To Infinity and Beyond
G1: To Infinity and Beyond
ScyllaDB
 
Hadoop at Bloomberg:Medium data for the financial industry
Hadoop at Bloomberg:Medium data for the financial industryHadoop at Bloomberg:Medium data for the financial industry
Hadoop at Bloomberg:Medium data for the financial industry
Matthew Hunt
 
Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and Memory
Chris Nauroth
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
Gluster.org
 
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John HaanBasic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Ceph Community
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA Update
Danielle Womboldt
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
HBaseCon
 
Redis on NVMe SSD - Zvika Guz, Samsung
 Redis on NVMe SSD - Zvika Guz, Samsung Redis on NVMe SSD - Zvika Guz, Samsung
Redis on NVMe SSD - Zvika Guz, Samsung
Redis Labs
 
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Masao Fujii
 
Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)
Scott Mansfield
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
ScyllaDB
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
HBaseCon
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
Marian Marinov
 
Automatic Operation Bot for Ceph - You Ji
Automatic Operation Bot for Ceph - You JiAutomatic Operation Bot for Ceph - You Ji
Automatic Operation Bot for Ceph - You Ji
Ceph Community
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
DaeMyung Kang
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! Scale
HBaseCon
 
G1: To Infinity and Beyond
G1: To Infinity and BeyondG1: To Infinity and Beyond
G1: To Infinity and Beyond
ScyllaDB
 
Hadoop at Bloomberg:Medium data for the financial industry
Hadoop at Bloomberg:Medium data for the financial industryHadoop at Bloomberg:Medium data for the financial industry
Hadoop at Bloomberg:Medium data for the financial industry
Matthew Hunt
 
Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and Memory
Chris Nauroth
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
Gluster.org
 
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John HaanBasic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Ceph Community
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA Update
Danielle Womboldt
 

Viewers also liked (20)

The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
Nicolas Poggi
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)
Nicolas Poggi
 
Engineering Mechanics: Statics Design problem # 5.4 concrete chutw
 Engineering Mechanics: Statics Design  problem  # 5.4  concrete chutw Engineering Mechanics: Statics Design  problem  # 5.4  concrete chutw
Engineering Mechanics: Statics Design problem # 5.4 concrete chutw
kehali Haileselassie
 
스마트큐 발표자료 김재우
스마트큐 발표자료 김재우스마트큐 발표자료 김재우
스마트큐 발표자료 김재우
JaeWoo Kim
 
State-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
State-of-the-Art RFT— Meeting the Ferromagnetic Tube ChallengeState-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
State-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
Eddyfi
 
Bahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasiBahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasi
Sri Utanti
 
Keeping Pressure Vessels Safe with the Sharck™  Probe
Keeping Pressure Vessels Safe with the Sharck™  ProbeKeeping Pressure Vessels Safe with the Sharck™  Probe
Keeping Pressure Vessels Safe with the Sharck™  Probe
Eddyfi
 
Engineering Mechanics Statics design problem # 5.4 concrete chut by Kehali...
Engineering Mechanics Statics  design problem  # 5.4  concrete chut by Kehali...Engineering Mechanics Statics  design problem  # 5.4  concrete chut by Kehali...
Engineering Mechanics Statics design problem # 5.4 concrete chut by Kehali...
kehali Haileselassie
 
JLL JF 500 Spin Bike
 JLL JF 500 Spin Bike JLL JF 500 Spin Bike
JLL JF 500 Spin Bike
JLL Fitness
 
OBA.BY
OBA.BYOBA.BY
OBA.BY
Obaby2013
 
Inspecting Lead-Clad Pipes with Pulsed Eddy Current (PEC)
Inspecting Lead-Clad Pipes with Pulsed Eddy Current (PEC)Inspecting Lead-Clad Pipes with Pulsed Eddy Current (PEC)
Inspecting Lead-Clad Pipes with Pulsed Eddy Current (PEC)
Eddyfi
 
La salute degli occhi 5 semplici cose da
La salute degli occhi  5 semplici cose daLa salute degli occhi  5 semplici cose da
La salute degli occhi 5 semplici cose da
mindspk101
 
Texture Powerpoint Final
Texture Powerpoint FinalTexture Powerpoint Final
Texture Powerpoint Final
kphan22
 
Alejandra Ortiz bibliography
Alejandra Ortiz bibliographyAlejandra Ortiz bibliography
Alejandra Ortiz bibliography
Alejandrita Ortiz
 
Twisted Tube ® Heat Exchanger Inspection with Eddy Currents
Twisted Tube ® Heat Exchanger Inspection with Eddy CurrentsTwisted Tube ® Heat Exchanger Inspection with Eddy Currents
Twisted Tube ® Heat Exchanger Inspection with Eddy Currents
Eddyfi
 
Bipolar junction transistor characterstics biassing and amplification, lab 9
Bipolar junction transistor characterstics biassing and amplification, lab 9Bipolar junction transistor characterstics biassing and amplification, lab 9
Bipolar junction transistor characterstics biassing and amplification, lab 9
kehali Haileselassie
 
Lab 7 diode with operational amplifiers by kehali b. haileselassie and kou
Lab 7  diode with operational amplifiers by kehali b. haileselassie and kouLab 7  diode with operational amplifiers by kehali b. haileselassie and kou
Lab 7 diode with operational amplifiers by kehali b. haileselassie and kou
kehali Haileselassie
 
The case for Hadoop performance
The case for Hadoop performanceThe case for Hadoop performance
The case for Hadoop performance
Nicolas Poggi
 
Detecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Detecting Flaws in Condenser Tubing Welds With the DefHi® ProbeDetecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Detecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Eddyfi
 
Assigment 6
Assigment 6Assigment 6
Assigment 6
fuzuli41
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
Nicolas Poggi
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)
Nicolas Poggi
 
Engineering Mechanics: Statics Design problem # 5.4 concrete chutw
 Engineering Mechanics: Statics Design  problem  # 5.4  concrete chutw Engineering Mechanics: Statics Design  problem  # 5.4  concrete chutw
Engineering Mechanics: Statics Design problem # 5.4 concrete chutw
kehali Haileselassie
 
스마트큐 발표자료 김재우
스마트큐 발표자료 김재우스마트큐 발표자료 김재우
스마트큐 발표자료 김재우
JaeWoo Kim
 
State-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
State-of-the-Art RFT— Meeting the Ferromagnetic Tube ChallengeState-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
State-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
Eddyfi
 
Bahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasiBahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasi
Sri Utanti
 
Keeping Pressure Vessels Safe with the Sharck™  Probe
Keeping Pressure Vessels Safe with the Sharck™  ProbeKeeping Pressure Vessels Safe with the Sharck™  Probe
Keeping Pressure Vessels Safe with the Sharck™  Probe
Eddyfi
 
Engineering Mechanics Statics design problem # 5.4 concrete chut by Kehali...
Engineering Mechanics Statics  design problem  # 5.4  concrete chut by Kehali...Engineering Mechanics Statics  design problem  # 5.4  concrete chut by Kehali...
Engineering Mechanics Statics design problem # 5.4 concrete chut by Kehali...
kehali Haileselassie
 
JLL JF 500 Spin Bike
 JLL JF 500 Spin Bike JLL JF 500 Spin Bike
JLL JF 500 Spin Bike
JLL Fitness
 
Inspecting Lead-Clad Pipes with Pulsed Eddy Current (PEC)
Inspecting Lead-Clad Pipes with Pulsed Eddy Current (PEC)Inspecting Lead-Clad Pipes with Pulsed Eddy Current (PEC)
Inspecting Lead-Clad Pipes with Pulsed Eddy Current (PEC)
Eddyfi
 
La salute degli occhi 5 semplici cose da
La salute degli occhi  5 semplici cose daLa salute degli occhi  5 semplici cose da
La salute degli occhi 5 semplici cose da
mindspk101
 
Texture Powerpoint Final
Texture Powerpoint FinalTexture Powerpoint Final
Texture Powerpoint Final
kphan22
 
Alejandra Ortiz bibliography
Alejandra Ortiz bibliographyAlejandra Ortiz bibliography
Alejandra Ortiz bibliography
Alejandrita Ortiz
 
Twisted Tube ® Heat Exchanger Inspection with Eddy Currents
Twisted Tube ® Heat Exchanger Inspection with Eddy CurrentsTwisted Tube ® Heat Exchanger Inspection with Eddy Currents
Twisted Tube ® Heat Exchanger Inspection with Eddy Currents
Eddyfi
 
Bipolar junction transistor characterstics biassing and amplification, lab 9
Bipolar junction transistor characterstics biassing and amplification, lab 9Bipolar junction transistor characterstics biassing and amplification, lab 9
Bipolar junction transistor characterstics biassing and amplification, lab 9
kehali Haileselassie
 
Lab 7 diode with operational amplifiers by kehali b. haileselassie and kou
Lab 7  diode with operational amplifiers by kehali b. haileselassie and kouLab 7  diode with operational amplifiers by kehali b. haileselassie and kou
Lab 7 diode with operational amplifiers by kehali b. haileselassie and kou
kehali Haileselassie
 
The case for Hadoop performance
The case for Hadoop performanceThe case for Hadoop performance
The case for Hadoop performance
Nicolas Poggi
 
Detecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Detecting Flaws in Condenser Tubing Welds With the DefHi® ProbeDetecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Detecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Eddyfi
 
Assigment 6
Assigment 6Assigment 6
Assigment 6
fuzuli41
 
Ad

Similar to Accelerating HBase with NVMe and Bucket Cache (20)

Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
David Grier
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
Nicolas Poggi
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
Hitoshi Sato
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
bergwolf
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
Mirantis
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Patrick McGarry
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Ceph Community
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
Bigstep
 
Proving out flash storage array performance using swingbench and slob
Proving out flash storage array performance using swingbench and slobProving out flash storage array performance using swingbench and slob
Proving out flash storage array performance using swingbench and slob
Kapil Goyal
 
FlashSQL 소개 & TechTalk
FlashSQL 소개 & TechTalkFlashSQL 소개 & TechTalk
FlashSQL 소개 & TechTalk
I Goo Lee
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
Hazelcast
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
OVHcloud
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlStorage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
ITCamp
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep Dive
Red_Hat_Storage
 
Ceph
CephCeph
Ceph
Hien Nguyen Van
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
DataStax Academy
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
Yahoo Developer Network
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
David Grier
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
Nicolas Poggi
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
Mirantis
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Patrick McGarry
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Ceph Community
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
Bigstep
 
Proving out flash storage array performance using swingbench and slob
Proving out flash storage array performance using swingbench and slobProving out flash storage array performance using swingbench and slob
Proving out flash storage array performance using swingbench and slob
Kapil Goyal
 
FlashSQL 소개 & TechTalk
FlashSQL 소개 & TechTalkFlashSQL 소개 & TechTalk
FlashSQL 소개 & TechTalk
I Goo Lee
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
Hazelcast
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
OVHcloud
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlStorage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
ITCamp
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep Dive
Red_Hat_Storage
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
DataStax Academy
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
Yahoo Developer Network
 
Ad

More from Nicolas Poggi (8)

Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA ConstraintsBenchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Nicolas Poggi
 
Correctness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQLCorrectness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQL
Nicolas Poggi
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
Nicolas Poggi
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloud
Nicolas Poggi
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
Nicolas Poggi
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
Nicolas Poggi
 
Vagrant + Docker provider [+Puppet]
Vagrant + Docker provider [+Puppet]Vagrant + Docker provider [+Puppet]
Vagrant + Docker provider [+Puppet]
Nicolas Poggi
 
Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA ConstraintsBenchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Nicolas Poggi
 
Correctness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQLCorrectness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQL
Nicolas Poggi
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
Nicolas Poggi
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloud
Nicolas Poggi
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
Nicolas Poggi
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
Nicolas Poggi
 
Vagrant + Docker provider [+Puppet]
Vagrant + Docker provider [+Puppet]Vagrant + Docker provider [+Puppet]
Vagrant + Docker provider [+Puppet]
Nicolas Poggi
 

Recently uploaded (20)

Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 

Accelerating HBase with NVMe and Bucket Cache

  • 1. Evaluating NVMe drives for accelerating HBase Nicolas Poggi and David Grier FOSDEM Jan 2017 BSC Data Centric Computing – Rackspace collaboration
  • 2. Outline 1. Intro on BSC and ALOJA 2. Cluster specs and disk benchmarks 3. HBase use case with NVE 1. Read-only workload • Different strategies 2. Mixed workload 4. Summary 2 Max of bw (MB/s) Max of lat (us) BYTES/S REQ SIZE
  • 3. Barcelona Supercomputing Center (BSC) • Spanish national supercomputing center 22 years history in: • Computer Architecture, networking and distributed systems research • Based at BarcelonaTech University (UPC) • Large ongoing life science computational projects • Prominent body of research activity around Hadoop • 2008-2013: SLA Adaptive Scheduler, Accelerators, Locality Awareness, Performance Management. 7+ publications • 2013-Present: Cost-efficient upcoming Big Data architectures (ALOJA) 6+ publications
  • 4. ALOJA: towards cost-effective Big Data • Research project for automating characterization and optimization of Big Data deployments • Open source Benchmarking-to-Insights platform and tools • Largest Big Data public repository (70,000+ jobs) • Community collaboration with industry and academia http://aloja.bsc.es Big Data Benchmarking Online Repository Web / ML Analytics
  • 5. Motivation and objectives • Explore use cases where NVMe devices can speedup Big Data apps • Poor initial results… • HBase (this study) based on Intel report • Measure the possibilities of NVMe devices • System level benchmarks (FIO, IO Meter) • WiP towards tiered-storage for Big data status • Extend ALOJA into low-level I/O • Challenge • benchmark and stress high-end Big Data clusters • In reasonable amount of time (and cost) First tests: 5 8512 8667 8523 9668 0 2000 4000 6000 8000 10000 12000 Seconds Lower is better Running time of terasort (1TB) under different disks terasort teragen Marginal improvement!!!
  • 6. Cluster and drive specs All nodes (x5) Operating System CentOS 7.2 Memory 128GB CPU Single Octo-core (16 threads) Disk Config OS 2x600GB SAS RAID1 Network 10Gb/10Gb redundant Master node (x1, extra nodes for HA not used in these tests) Disk Config Master storage 4x600GB SAS RAID10 XSF partition Data Nodes (x4) NVMe (cache) 1.6TB Intel DC P3608 NVMe SSD (PCIe) Disk config HDFS data storage 10x 0.6TB NL-SAS/SATA JBOD PCIe3 x8, 12Gb/s SAS RAID SEAGATE ST3600057SS 15K (XFS partition) 1. Intel DC P3608 (current 2015) • PCI-Express 3.0 x8 lanes • Capacity: 1.6TB (two drives) • Seq R/W BW: • 5000/2000 MB/s, (128k req) • Random 4k R/W IOPS: • 850k/150k • Random 8k R/W IOPS: • 500k/60k, 8 workers • Price $10,780 • (online search 02/2017) 2. LSI Nytro WarpDrive 4-400 (old gen 2012) • PCI-Express 2.0 x8 lanes • Capacity: 1.6TB (two drives) • Seq. R/W BW: • 2000/1000 MB/s (256k req) • R/W IOPS: • 185k/120k (8k req) • Price $4,096 • (online search 02/2017) • $12,195 MSRP 2012 6
  • 7. FIO Benchmarks Objectives: • Assert vendor specs (BW, IOPS, Latency) • Seq R/W, Random R/W • Verify driver/firmware and OS • Set performance expectations Commands on reference on last slides 7 Max of bw (MB/s) Max of lat (us) 0 5000000 10000000 15000000 20000000 25000000 524288 1048576 2097152 4194304 Bytes/s Req Size
  • 8. FIO results: Max Bandwidth Higher is better. MaxBW recorded for each deviceunder differentsettings: req size, io depth, threads. Using libaio. 8 Results: • Random R/W similar in both PCIe SSDs • But not for the SAS JBOD • SAS JBOD achieves high both seq R/W • 10 disks 2GB/s • Achieved both PCIe vendor numbers • Also on IOPS • Combined WPD disks only improve in W performance Intel NVMe SAS (15KRPM) 10 and 1 disk(s) PCIe SSD (old gen) 1 and 2 disks NVMe (2 disks P3608) SAS JBOD (10d) SAS disk (1 disk 15K) PCIe (WPD 1 disk) PCIe (WPD 2 disks) New cluster (NVMe) Old Cluster randread 4674.99 409.95 118.07 1935.24 4165.65 randwrite 2015.27 843 249.06 1140.96 1256.12 read 4964.44 1861.4 198.54 2033.52 3957.42 write 2006 1869.49 204.17 1201.52 2066.48 0 1000 2000 3000 4000 5000 6000 MB/s Max bandwidth (MB/s) per disk type
  • 9. NVMe (2 disks P3608) SAS JBOD (10d) PCIe (WPD 1 disk) PCIe (WPD 2 disks) New cluster (NVMe) Old Cluster randread 381.43 5823.5 519.61 250.06 randwrite 389.9 1996.9 1340.35 252.96 read 369.53 294.33 405.65 204.39 write 369.42 280.2 852.03 410.14 0 1000 2000 3000 4000 5000 6000 7000 µsecs Average latency by device (64KB req size, 1 io depth) FIO results: Latency (smoke test) Higher is better. Average latencyfor req size64KB and 1 io depth (varying workers). Using libaio. 9 Results: • JBOD has highest latency for random R/W (as expected) • But very low for seq • Combined WPD disks lower the latency • Lower than P3608 disks. Notes: • Need to more thorough comparison and at different settings. Intel NVMe SAS 10 disk JBOD PCIe SSD (old gen) 1 and 2 disks High latency
  • 10. HBase in a nutshell • Highly scalable Big data key-value store • On top of Hadoop (HDFS) • Based on Google’s Bigtable • Real-time and random access • Indexed • Low-latency • Block cache and Bloom Filters • Linear, modular scalability • Automatic sharding of tables and failover • Strictly consistent reads and writes. • failover support • Production ready and battle tested • Building block of other projects HBase R/W architecture 10 Source: HDP doc JVM Heap
  • 11. L2 Bucket Cache (BC) in HBase Region server (worker) memory with BC 11 • Adds a second “block” storage for HFiles • Use case: L2 cache and replaces OS buffer cache • Does copy-on‐read • Fixed sized, reserved on startup • 3 different modes: • Heap • Marginal improvement • Divides mem with the block cache • Offheap (in RAM) • Uses Java NIO’s Direct ByteBuffer • File • Any local file / device • Bypasses HDFS • Saves RAM
  • 12. L2-BucketCache experiments summary Tested configurations for HBase v1.24 1. HBase default (baseline) 2. HBase w/ Bucket cache Offheap 1. Size: 32GB /work node 3. HBase w/ Bucket cache in RAM disk 1. Size: 32GB /work node 4. HBase w/ Bucket cache in NVMe disk 1. Size: 250GB / worker node • All using same Hadoop and HDFS configuration • On JBOD (10 SAS disks, /grid/{0,9}) • 1 Replica, short-circuit reads Experiments 1. Read-only (workload C) 1. RAM at 128GB / node 2. RAM at 32GB / node 3. Clearing buffer cache 2. Full YCSB (workloads A-F) 4. RAM at 128GB / node 5. RAM at 32GB / node • Payload: • YCSB 250M records. • ~2TB HDFS raw 12
  • 13. Experiment 1: Read-only (gets) YCSB Workload C: 25M records, 500 threads, ~2TB HDFS 13
  • 14. E 1.1: Throughput of the 4 configurations (128GB RAM) Higher is betterfor Ops (Y1), lowerfor latency (Y2) (Offheap run2 and 3 the same run) 14 Results: • Ops/Sec improve with BC • Offheap 1.6x • RAMd 1.5x • NVMe 2.3x • AVG latency as well • (req time) • First run slower on 4 cases (writes OS cache and BC) • Baseline and RAMd only 6% faster after • NVMe 16% • 3rd run not faster, cache already loaded • Tested onheap config: • > 8GB failed • 8GB slower than baseline Baseline BucketCache Offheap BucketCache RAM disk BucketCache NVMe WorkloadC_run1 105610 133981 161712 218342 WorkloadC_run2 111530 175483 171236 257017 WorkloadC_run3 111422 175483 170889 253625 Cache Time % 5.5 31 5.6 16.2 Speeup (run3) 1 1.57 1.53 2.28 Latency µs (run3) 4476 2841 2917 1964 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 50000 100000 150000 200000 250000 300000 Ops/Sec Throughput of 3 consecutive iterations of Workload C (128GB)
  • 15. E 1.1 Cluster resource consumption: Baseline 15 CPU % (AVG) Disk R/W MB/s (SUM) Mem Usage KB (AVG) NET R/W Mb/s (SUM) Notes: • Java heap and OS buffer cache holds 100% of WL • Data is read from disks (HDFS) only in first part of the run, • then throughput stabilizes (see NET and CPU) • Free resources, • Bottleneck in application and OS path (not shown) Write Read
  • 16. E 1.1 Cluster resource consumption: Bucket Cache strategies 16 Offheap (32GB) Disk R/W RAM disk (tmpfs 32GB) Disk R/W NVMe (250GB) Disk R/WNotes: • 3 BC strategies faster than baseline • BC LRU more effective than OS buffer • Offheap slightly more efficient ran RAMd (same size) • But seems to take longer to fill (different per node) • And more capacity for same payload (plus java heap) • NVM can hold the complete WL in the BC • Read and Writes to MEM not captured by charts BC fills on 1st run WL doesn’t fit completely
  • 17. E 1.2-3 Challenge: Limit OS buffer cache effect on experiments 1st approach, larger payload. Cons: high execution time 2nd limit available RAM (using stress tool) 3rd clear buffer cache periodically (drop caches) 17
  • 18. E1.2: Throughput of the 4 configurations (32GB RAM) Higher is betterfor Ops (Y1), lowerfor latency (Y2) 18 Results: • Ops/Sec improve only with NVMe up to 8X • RAMd performs close to baseline • First run same on baseline and RAMd • tmpfs “blocks” as RAM is needed • At lower capacity, external BC shows more improvement Baseline BucketCache Offheap BucketCache RAM disk BucketCache NVMe WorkloadC_run1 20578 14715.493 21520 109488 WorkloadC_run2 20598 16995 21534 166588 Speeup (run2) 1 0.83 1.05 8.09 Cache Time % 99.9 99.9 99.9 48 Latency µs (run3) 24226 29360 23176 2993 0 5000 10000 15000 20000 25000 30000 35000 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 Ops/Sec Throughput of 2 consecutive iterations of Workload C (32GB)
  • 19. E 1.1 Cluster CPU% AVG: Bucket Cache (32GB RAM) 19 Baseline RAM disk (/dev/shm 8GB) Offheap (4GB) NVMe (250GB) Read disk throughput: 2.4GB/s Read disk throughput: 2.8GB/s Read disk throughput: 2.5GB/s Read disk throughput: 38.GB/s BC failure Slowest
  • 20. E1.3: Throughput of the 4 configurations (Drop OS buffer cache) Higher is betterfor Ops (Y1), lowerfor latency (Y2). Dropping cacheevery 10 secs. 20 Results: • Ops/Sec improve only with NVMe up to 9X • RAMd performs 1.43X better this time • First run same on baseline and RAMd • But RAMd worked fine • Having a larger sized BC improves performance over RAMd Baseline BucketCache Offheap BucketCache RAM disk BucketCache NVMe WorkloadC_run1 22780 30593 32447 126306 WorkloadC_run2 22770 30469 32617 210976 Speeup (run3) 1 1.34 1.43 9.27 Cache Time % -0.1 -0.1 0.5 67 Latency µs (run2) 21924 16375 15293 2361 0 5000 10000 15000 20000 25000 0 50000 100000 150000 200000 250000 Ops/Sec Throughput of 2 consecutive iterations of Workload C (Drop OS Cache)
  • 21. Experiment 2: All workloads (-E) YCSB workloads A-D, F: 25M records, 1000 threads 21
  • 22. Benchmark suite: The Yahoo! Cloud Serving Benchmark (YCSB) • Open source specification and kit, for comparing NoSQL DBs. (since 2010) • Core workloads: • A: Update heavy workload • 50/50 R/W. • B: Read mostly workload • 95/5 R/W mix. • C: Read only • 100% read. • D: Read latest workload • Inserts new records and reads them • Workload E: Short ranges (Not used, takes too long to run SCAN type) • Short ranges of records are queried, instead of individual records • F: Read-modify-write • read a record, modify it, and write back. https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads 22
  • 23. E2.1: Throughput and Speedup ALL (128GB RAM) Higher is better 23 Results: • Datagen same in all • (write-only) • Overall: Ops/Sec improve with BC • RAMd 14% • NVMe 37% • WL D gets higher speedup with NVMe • WL F 6% faster on RAMd than in NVMe • Need to run more iterations to see max improvement Baseline BucketCache RAM disk BucketCache NVMe Datagen 68502 67998 65933 WL A 77049 83379 96752 WL B 80966 87788 115713 WL C 89372 99403 132738 WL D 136426 171123 244759 WL F 48699 65223 62496 0 50000 100000 150000 200000 250000 300000 Ops/Sec Throughput of workloads A-D,F (128GB, 1 iteration) Baseline BucketCache RAM disk BucketCache NVMe Speedup Data 1 0.99 0.96 Speedup A 1 1.08 1.26 Speedup B 1 1.08 1.43 Speedup C 1 1.11 1.49 Speedup D 1 1.25 1.79 Speedup F 1 1.34 1.28 Total 1 1.14 1.37 0.8 1 1.2 1.4 1.6 1.8 2 Speedup Speedup of workloads A-D,F (128GB, 1 iteration)
  • 24. E2.1: Throughput and Speedup ALL (32GB RAM) Higher is better 24 Results: • Datagen slower with the RAMd (less OS RAM) Overall: Ops/Sec improve with BC • RAMd 17% • NVMe 87% • WL C gets higher speedup with NVMe • WL F now faster with NVMe • Need to run more iterations to see max improvement Baseline BucketCache RAM disk BucketCache NVMe Speedup Data 1 0.88 0.98 Speedup A 1 1.02 1.37 Speedup B 1 1.31 2.38 Speedup C 1 1.42 2.65 Speedup D 1 1.1 1.5 Speedup F 1 1.27 1.98 Total 1 1.17 1.81 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 Speedup Speedup of workloads A-D,F (128GB, 1 iteration) Baseline BucketCache RAM disk BucketCache NVMe Datagen 68821 60819 67737 WL A 55682 56702 76315 WL B 33551 43978 79895 WL C 30631 43420 81245 WL D 85725 94631 128881 WL F 25540 32464 50568 0 20000 40000 60000 80000 100000 120000 140000 Ops/Sec Throughput of workloads A-D,F (128GB, 1 iteration)
  • 25. E2.1: CPU and Disk for ALL (32GB RAM) 25 Baseline CPU % NVMe CPU% Baseline Disk R/w NVMe Disk R/W High I/O wait Read disk throughput: 1.8GB/s Moderate I/O wait (2x less time) Read disk throughput: up to 25GB/s
  • 26. E2.1: NET and MEM ALL (32GB RAM) 26 Baseline MEM NVMe MEM Baseline NET R/W NVMe NET R/W Higher OS cache util Throughput: 1Gb/s Lower OS cache util Throughput: up to 2.5 Gb/s
  • 27. Summary Lessons learned, findings, conclusions, references 27
  • 28. Bucket Cache results recap (medium sized WL) • Full cluster (128GB RAM / node) • WL-C up to 2.7x speedup (warm cache) • Full benchmark (CRUD) from 0.3 to 0.9x speedup (cold cache) • Limiting resources • 32GB RAM / node • WL-C NVMe gets up to 8X improvement (warm cache) • Other techniques failed/poor results • Full benchmark between 0.4 and 2.7x speedup (cold cache) • Drop OS cache WL-C • Up to 9x with NVMe, only < 0.5x with other techniques (warm cache) • Latency reduces significantly with cached results • Onheap BC not recommended • just give more RAM to BlockCache 28
  • 29. Open challenges / Lessons learned • Generating app level workloads that stresses newer HW • At acceptable time / cost • Still need to • Run micro-benchmarks • Per DEV, node, cluster • Large working sets • > RAM (128GB / Node) • > NMVe (1.6TB / Node) • OS buffer cache highly effective • at least with HDFS and HBase • Still, a RAM app “L2 cache” is able to speedup • App level LRU more effective • YCSB Zipfian distribution (popular records) • The larger the WL, higher the gains • Can be simulated by limiting resources or dropping caches effectively 29 8512 8667 8523 9668 0 2000 4000 6000 8000 10000 12000 NVMe JBOD10+NVMe JBOD10 JBOD05 Seconds Lower is better Running time of terasort (1TB) under different disks terasort teragen
  • 30. To conclude… • NVMe offers significant BW and Latency improvement over SAS/SATA, but • JBODs still perform well for seq R/W • Also cheaper €/TB • Big Data apps still designed for rotational (avoid random I/O) • Full tiered-storage support is missing by Big Data frameworks • Byte addressable vs. block access • Research shows improvements • Need to rely on external tools/file systems • Alluxio (Tachyon), Triple-H, New file systems (SSDFS), … • Fast devices speedup, but still caching is the simple use case… 30
  • 31. References • ALOJA • http://aloja.bsc.es • https://github.com/aloja/aloja • Bucket cache and HBase • BlockCache (and Bucket Cache 101) http://www.n10k.com/blog/blockcache-101/ • Intel brief on bucket cache: http://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/apache- hbase-block-cache-testing-brief.pdf • http://www.slideshare.net/larsgeorge/hbase-status-report-hadoop-summit-europe-2014 • HBase performance: http://www.slideshare.net/bijugs/h-base-performance • Benchmarks • FIO https://github.com/axboe/fio • Brian F. Cooper, et. Al. 2010. Benchmarking cloud serving systems with YCSB. http://dx.doi.org/10.1145/1807128.1807152 31
  • 32. Thanks, questions? Follow up / feedback : [email protected] Twitter: @ni_po Evaluating NVMe drives for accelerating HBase
  • 33. FIO commands: • Sequential read • fio --name=read --directory=${dir} --ioengine=libaio --direct=1 --bs=${bs} --rw=read --iodepth=${iodepth} --numjobs=1 --buffered=0 --size=2gb -- runtime=30 --time_based --randrepeat=0 --norandommap --refill_buffers --output-format=json • Random read • fio --name=randread --directory=${dir} --ioengine=libaio --direct=1 --bs=${bs} --rw=randread --iodepth=${iodepth} --numjobs=${numjobs} -- buffered=0 --size=2gb --runtime=30 --time_based --randrepeat=0 --norandommap --refill_buffers --output-format=json • Sequential read on raw devices: • fio --name=read --filename=${dev} --ioengine=libaio --direct=1 --bs=${bs} --rw=read --iodepth=${iodepth} --numjobs=1 --buffered=0 --size=2gb -- runtime=30 --time_based --randrepeat=0 --norandommap --refill_buffers --output-format=json • Sequential write • fio --name=write --directory=${dir} --ioengine=libaio --direct=1 --bs=${bs} --rw=write --iodepth=${iodepth} --numjobs=1 --buffered=0 --size=2gb -- runtime=30 --time_based --randrepeat=0 --norandommap --refill_buffers --output-format=json • Random write • fio --name=randwrite --directory=${dir} --ioengine=libaio --direct=1 --bs=${bs} --rw=randwrite --iodepth=${iodepth} --numjobs=${numjobs} -- buffered=0 --size=2gb --runtime=30 --time_based --randrepeat=0 --norandommap --refill_buffers --output-format=json • Random read on raw devices: • fio --name=randread --filename=${dev} --ioengine=libaio --direct=1 --bs=${bs} --rw=randread --iodepth=${iodepth} --numjobs=${numjobs} -- buffered=0 --size=2gb --runtime=30 --time_based --randrepeat=0 --norandommap --refill_buffers --offset_increment=2gb --output-format=json https://github.com/axboe/fio 33