Managing Hadoop, HBase and Storm Clusters at Yahoo Scale

Managing Hadoop, HBase and Storm Clusters at
Yahoo Scale
PRESENTED BY Dheeraj Kapur, Savitha Ravikrishnan⎪June 30, 2016

Agenda
Topic Speaker(s)
Introduction, HDFS RU, HBase RU & Storm RU Dheeraj Kapur
YARN RU, Component RU, Distributed Cache & Sharelib Savitha Ravikrishnan
Q&A All Presenters
HadoopSummit 2016

Grid Infrastructure at Yahoo
HadoopSummit 2016
▪ A multi-tenant, secure, distributed compute and storage environment, based on Hadoop stack for
large scale data processing
▪ 3 data centers, over 45k physical nodes.
▪ 18 YARN (Hadoop) clusters, having 350 to 5200 nodes.
▪ 9 HBase clusters, having 80 to 1080 nodes.
▪ 13 Storm clusters, having 40 to 250 nodes

Grid Stack
Zookeeper
Backend
Support
Hadoop
Storage
Hadoop
Compute
Hadoop
Services
Support
Shop
Monitoring
Starling for
logging
HDFS
Hbase as
NoSql
store
Hcatalog for
metadata
registry
YARN (Mapred) and Tez
for Batch processing
Storm for stream
processing
Spark for iterative
programming
PIG for
ETL
Hive for
SQL
Oozie for
workflows
Proxy
services
GDM for
data Mang
Café on
Spark for
ML

Deployment Model
DataNode NodeManager
NameNode R
M
DataNodes RegionServers
NameNode HBase Master Nimbu
s
Supervisor
Administration, Management and Monitoring
ZooKeeper
Pools
HTTP/HDFS/GDM Load
Proxies
Applications and Data
Data
Feeds
Data
Stores
Oozie
Server
HS2/
HCat
HadoopSummit 2016

Hadoop Rolling Upgrade
▪ Complete CI/CD for HDFS and YARN Upgrades
▪ Build software and config “tgz” and push to repo servers
▪ Installs software and configs in pre-deploy phase, activate
during upgrade
▪ Slow upgrade 1 node per cycle
▪ Each component is upgraded independently i.e HDFS, YARN
& Client
HadoopSummit 2016
Release Configs/Bundles:
---
doc: This file is auto generated
packages:
- label: hadoop
version: 2.7.2.13.1606200235-20160620-000
- label: conf
version: 2.7.2.13.1606200235-20160620-000
- label: gridjdk
version: 1.7.0_17.1303042057-20160620-000
- label: yjava_jdk
version: 1.8.0_60.51-20160620-000

Package Download (pre- deploy)
RU
process
Git
(release
info)
Namenode, Datanodes,
Resourcemanager
HBaseMaster, Regionserver,
Gateways
Repo
Farm
Jenkins
Start
Servers
/Cluster
ygrid-deploy-software

CI/CD
process
Git
(release
info)
Jenkins
Start
HDFS Upgrade
RU
process
Finalize RU
Create Dir
Structure
Put NN in
RU mode
SNN
Upgrade
NN
Failover
SNN
Upgrade
foreach DN
Select DN
Check
installed
version
Stop DN
Activate new
software
Start DN
Wait for DN
to join
Stop/termina
te RU on X
failures
1
2
3a
3b
3c
4a
4b
4c 4d
4e
4f
After 100 hosts are
successfully upgraded
Check HDFS used
%age, Live nodes
consistency on
NNs
Terminate
Upgrade incase
of more than X
failure
Involves service and
IP failover from NN
to SNN and vice
versa
Safeupgrade-dn

Hadoop 2.7.x improvements over 2.6.x
Performance
▪ Reduce NN failover by parallelizing the quota init
▪ Datanode layout inefficiency causing high I/O load.
▪ Use a offline upgrade script to speed up the layout upgrade.
▪ Adding fake metrics sink to subvert JMX cache fix, causing delays in datanode upgrade/health
check.
▪ Improved datanode shutdown speed
Failure handling
▪ Reduce the read/write failures by blocking clients until DN is fully initialized.

YARN Rolling Upgrade
▪ Minimize downtime, maximize service availability
▪ Work preserving restart on RM and NM
▪ Retains state for 10mins.
▪ Ensures that applications continuously run during a RM restart
▪ Save state, update software, restart and restore state.
▪ Uses leveldb as state store
▪ After RM restarts, it loads all the application metadata and other credentials from state-store and
populates them into memory.
HadoopSummit 2016

CI/CD
process
Git
(release
info)
Jenkins
Start
YARN Upgrade
RU
process
Create Dir
Structure
Resource
Manager
Upgrade
HistoryServer
Upgrade
Foreach NM
Select NM
Check
installed
version
Safestop NM
(kill -9)
Activate new
software
Start NM
Wait for NM
to join
Stop/termina
te RU on X
failures
Timeline
Server
Upgrade
1
2
2a
2b 2c
2d
2e 3
4
5
Terminate
Upgrade incase
of more than X
failure

Distributed Cache
▪ Distributed cache distributes application-specific, large, read-only files efficiently.
▪ Applications specify the files to be cached in URLs (hdfs://) in the Job
▪ DistributedCache tracks the modification timestamps of the cached files.
▪ DistributedCache can be used to distribute simple, read-only data or text files and more complex
types such as archives and JAR files.
HadoopSummit 2016

Sharelib
▪ "Sharelib" is a management system for a directory in HDFS named /sharelib, which exists on every
cluster.
▪ Shared libraries can simplify the deployment and management of applications.
▪ The target directory is /sharelib, under which you will find various things: /sharelib/v1 - where all the
packages are
• /sharelib/v1/conf - where the unique metafile for the cluster is (and all previous versions)
• /sharelib/v1/{tez, pig, ... } - where the package versions are kept
▪ The links/tags (metafile) are unique per cluster.
▪ Grid Ops maintains shared libraries on HDFS of each cluster
▪ Packages in shared libraries include mapreduce, pig, hbase, hcatalog, hive and oozie.
HadoopSummit 2016

Jenkins
Start
Sharelib
Uploader
Git
Bundles
Verify Dist
Cache
Download
toDo
packages
Dist repo
Re-package
and upload
package
Re-generate
Meta info
(HDFS)
Upload to
Oozie
Sharelib Update
Generate clients to update

Component Upgrade
HadoopSummit 2016
▪ New Releases : CI environment continuously releases certified builds & their versions.
▪ Generate state : Package rulesets contain the list of core packages and their dependencies for each
& every cluster
▪ Deploy cookbooks : contain chef code and configuration that is pushed to Chef server
▪ Deploy pipelines : are YAML files that specify the flow & order of the deploy for every
environment/cluster.
▪ Validation jobs : are run after a deploy completes on all the nodes which ensures end-to-end
functionality is working as expected.

Components Upgrade
CI
process
Component
versions
Git
Bundles
Certified
Releases
Rule set files
(cluster:
component
specific)
Git bundles
Certified
package
version info
Statefiles
Build
Farms
Cookbook,
Roles, Env,
Attribute files
Git (release
info)
Build
Farms
Artifactory
Ruby (Rake)
New Release Package Rulesets Deploy cookbooks
A B
Build
Farms
Rspec rubocop,
state generate,
compare & upload
Validate increment
version
1 2 3
Chef

CD
process
Components Upgrade cont..
Git (release
info)
Build
Farms
Statefiles
Deploy Pipeline
Component
Node
Ruby (Rake)
Min size, zerodowntime
check, targetsize, validate
Chef-client, cookbook-converge,
graceful shutdown and healthcheck
4
Chef
A B

HBase Rolling Upgrade
Release Configs:
default:
group: 'all'
command: 'start'
system: 'ALL'
verbose: 'true'
retry: 3
upgradeREST: 'false'
upgradeGateway: 'true'
dryrun: 'false'
force: 'false'
upgrade_type: 'rolling'
skip_nn_upgrade: 'false'
skip_master_upgrade: 'false'
Workflow definitions:
default:
continue_on_failure:
- broken
- badnodes
relux.red:
- master
- default
- user
- ca_soln-stage
- perf,perf2,projects
- restALL
▪ Workflow based system.
▪ Complete CI/CD for HDFS and HBase Upgrades
▪ Build tgz and push to repo servers
▪ Installs software before hand, activate new release during
upgrade
▪ Each component and Region group is upgraded independently
i.e HDFS, group of regionservers.

CI/CD
process
Git
(release
info)
Jenkins
Start
Put NN in RU
mode &
Upgrade NN
SNN
Master
Upgrade
Region-
server
Upgrade
process
Stargate
Upgrade
Gateway
Upgrade
HBase Upgrade
Foreach
DN/RS
Upgrade
regionserver
Repo Server
Package +
conf version
Stop
Regionserver
DN
Safeupgrade,
Stop DN
Upgrade and
Start DN
Upgrade and
Start RS
1
2
3
4
3a
3c
3b
3d 3e
3f
3f
5
HDFS Rolling
Upgrade process
Iterate over each group
Iterate over
each server in
a group

Storm Rolling Upgrade
Release Configs:
default:
parallel: 10
verbose: 'true'
retry: 3
dryrun: 'false'
upgrade_type: 'rolling'
quarantine: 'true'
terminate_on_failure: 'true'
sup_failure_threshold: 10
sendmail_to: 'dheerajk@yahoo-inc.com'
sendmail_cc: 'storm-devel@yahoo-inc.com, grid-ops@yahoo-inc.com'
cluster_workflow:
cluster1.colo1: pacemaker_drpc
cluster2.colo2: default
Workflow Defination:
default:
rolling_task:
- upgradeNimbus
- bounceNimbus
- upgradeSupervisor
- bounceSupervisor
- upgradeDRPC
- bounceDRPC
- upgradeGateways
- doGatewayTask
- verifySupervisor
- runDRPCTestTopology
- verifySoftwareVersion
full_upgrade_task:
- killAllTopologies
- specifyOperation_stop
- sleep10
- bounceNimbus
- bounceSupervisor
- bounceDRPC
- clearDiskCache
- cleanZKP
- upgradeNimbus
- upgradeSupervisor
- upgradeDRPC
- specifyOperation_start
- bounceNimbus
- bounceSupervisor
- bounceDRPC
- upgradeGateways
- doGatewayTask
- verifySupervisor
- runDRPCTestTopology
- verifySoftwareVersion
▪ Complete CI/CD system. Statefiles are build per
component and pushed to artifactory before
upgrade
▪ Installs software before hand, activate new release
during upgrade
▪ Each component is upgraded independently i.e
Pacemaker, Nimbus, DRPC & Supervisor

Storm Upgrade CI/CD
process
Git
(release
info)
Jenkins
Start
Artifactory
(State files &
Release info)
RE Jenkins
and SD
process
Pacemaker
Upgrade
Nimbus
Upgrade
Supervisor
Upgrade
Bounce
Workers
DRPC
Upgrade
DRPC
Upgrade
Verify
Supervisors
Run
Test/Validatio
n topology
Audit All
Components
RE Jenkins lets to statefile
generation for each component and
updates git with release info
Statefiles are published in
artifactory and downloaded during
upgrade
Upgrade fails if
more than X
supervisors
fails to upgrade

Rolling Upgrade timeline
Component Parallelism Hadoop 2.6.x Hadoop 2.7.x Hbase 0.98.x Storm 0.10.1.x
HDFS (4k nodes) 1 4 days 1 day X X
YARN (4k nodes) 1 1 day 1 day X X
HBase (1k nodes) 1-4 4-5 days X 4-5 days X
Storm (350
nodes)
10 X X X 4-6 hrs
Components 1 1-2 hrs 1-2 hrs 1-2 hrs X
HadoopSummit 2016

99.928
99.898
99.940
99.687
99.705
99.600
99.650
99.700
99.750
99.800
99.850
99.900
99.950
100.000
AB DB FB HB IB JB LB PB UB BT LT PT TT UT BR DR IR LR MR PR
Rolling Upgrade Impact
YTD Availability by Cluster
99.990

Managing Hadoop, HBase and Storm Clusters at Yahoo Scale

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Managing Hadoop, HBase and Storm Clusters at Yahoo Scale (20)

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded (20)

Managing Hadoop, HBase and Storm Clusters at Yahoo Scale