SlideShare a Scribd company logo
Managing Hadoop, HBase and Storm Clusters at
Yahoo Scale
PRESENTED BY Dheeraj Kapur, Savitha Ravikrishnan⎪June 30, 2016
Agenda
Topic Speaker(s)
Introduction, HDFS RU, HBase RU & Storm RU Dheeraj Kapur
YARN RU, Component RU, Distributed Cache & Sharelib Savitha Ravikrishnan
Q&A All Presenters
HadoopSummit 2016
Hadoop at Yahoo
Grid Infrastructure at Yahoo
HadoopSummit 2016
▪ A multi-tenant, secure, distributed compute and storage environment, based on Hadoop stack for
large scale data processing
▪ 3 data centers, over 45k physical nodes.
▪ 18 YARN (Hadoop) clusters, having 350 to 5200 nodes.
▪ 9 HBase clusters, having 80 to 1080 nodes.
▪ 13 Storm clusters, having 40 to 250 nodes
Grid Stack
Zookeeper
Backend
Support
Hadoop
Storage
Hadoop
Compute
Hadoop
Services
Support
Shop
Monitoring
Starling for
logging
HDFS
Hbase as
NoSql
store
Hcatalog for
metadata
registry
YARN (Mapred) and Tez
for Batch processing
Storm for stream
processing
Spark for iterative
programming
PIG for
ETL
Hive for
SQL
Oozie for
workflows
Proxy
services
GDM for
data Mang
Café on
Spark for
ML
Deployment Model
DataNode NodeManager
NameNode R
M
DataNodes RegionServers
NameNode HBase Master Nimbu
s
Supervisor
Administration, Management and Monitoring
ZooKeeper
Pools
HTTP/HDFS/GDM Load
Proxies
Applications and Data
Data
Feeds
Data
Stores
Oozie
Server
HS2/
HCat
HadoopSummit 2016
HDFS
Hadoop Rolling Upgrade
▪ Complete CI/CD for HDFS and YARN Upgrades
▪ Build software and config “tgz” and push to repo servers
▪ Installs software and configs in pre-deploy phase, activate
during upgrade
▪ Slow upgrade 1 node per cycle
▪ Each component is upgraded independently i.e HDFS, YARN
& Client
HadoopSummit 2016
Release Configs/Bundles:
---
doc: This file is auto generated
packages:
- label: hadoop
version: 2.7.2.13.1606200235-20160620-000
- label: conf
version: 2.7.2.13.1606200235-20160620-000
- label: gridjdk
version: 1.7.0_17.1303042057-20160620-000
- label: yjava_jdk
version: 1.8.0_60.51-20160620-000
Package Download (pre- deploy)
RU
process
Git
(release
info)
Namenode, Datanodes,
Resourcemanager
HBaseMaster, Regionserver,
Gateways
Repo
Farm
Jenkins
Start
Servers
/Cluster
ygrid-deploy-software
CI/CD
process
Git
(release
info)
Jenkins
Start
HDFS Upgrade
RU
process
Finalize RU
Create Dir
Structure
Put NN in
RU mode
SNN
Upgrade
NN
Failover
SNN
Upgrade
foreach DN
Select DN
Check
installed
version
Stop DN
Activate new
software
Start DN
Wait for DN
to join
Stop/termina
te RU on X
failures
1
2
3a
3b
3c
4a
4b
4c 4d
4e
4f
After 100 hosts are
successfully upgraded
Check HDFS used
%age, Live nodes
consistency on
NNs
Terminate
Upgrade incase
of more than X
failure
Involves service and
IP failover from NN
to SNN and vice
versa
Safeupgrade-dn
Hadoop 2.7.x improvements over 2.6.x
Performance
▪ Reduce NN failover by parallelizing the quota init
▪ Datanode layout inefficiency causing high I/O load.
▪ Use a offline upgrade script to speed up the layout upgrade.
▪ Adding fake metrics sink to subvert JMX cache fix, causing delays in datanode upgrade/health
check.
▪ Improved datanode shutdown speed
Failure handling
▪ Reduce the read/write failures by blocking clients until DN is fully initialized.
YARN
YARN Rolling Upgrade
▪ Minimize downtime, maximize service availability
▪ Work preserving restart on RM and NM
▪ Retains state for 10mins.
▪ Ensures that applications continuously run during a RM restart
▪ Save state, update software, restart and restore state.
▪ Uses leveldb as state store
▪ After RM restarts, it loads all the application metadata and other credentials from state-store and
populates them into memory.
HadoopSummit 2016
CI/CD
process
Git
(release
info)
Jenkins
Start
YARN Upgrade
RU
process
Create Dir
Structure
Resource
Manager
Upgrade
HistoryServer
Upgrade
Foreach NM
Select NM
Check
installed
version
Safestop NM
(kill -9)
Activate new
software
Start NM
Wait for NM
to join
Stop/termina
te RU on X
failures
Timeline
Server
Upgrade
1
2
2a
2b 2c
2d
2e 3
4
5
Terminate
Upgrade incase
of more than X
failure
Distributed cache & Sharelib
Distributed Cache
▪ Distributed cache distributes application-specific, large, read-only files efficiently.
▪ Applications specify the files to be cached in URLs (hdfs://) in the Job
▪ DistributedCache tracks the modification timestamps of the cached files.
▪ DistributedCache can be used to distribute simple, read-only data or text files and more complex
types such as archives and JAR files.
HadoopSummit 2016
Sharelib
▪ "Sharelib" is a management system for a directory in HDFS named /sharelib, which exists on every
cluster.
▪ Shared libraries can simplify the deployment and management of applications.
▪ The target directory is /sharelib, under which you will find various things: /sharelib/v1 - where all the
packages are
• /sharelib/v1/conf - where the unique metafile for the cluster is (and all previous versions)
• /sharelib/v1/{tez, pig, ... } - where the package versions are kept
▪ The links/tags (metafile) are unique per cluster.
▪ Grid Ops maintains shared libraries on HDFS of each cluster
▪ Packages in shared libraries include mapreduce, pig, hbase, hcatalog, hive and oozie.
HadoopSummit 2016
Jenkins
Start
Sharelib
Uploader
Git
Bundles
Verify Dist
Cache
Download
toDo
packages
Dist repo
Re-package
and upload
package
Re-generate
Meta info
(HDFS)
Upload to
Oozie
Sharelib Update
Generate clients to update
Subsystems
Component Upgrade
HadoopSummit 2016
▪ New Releases : CI environment continuously releases certified builds & their versions.
▪ Generate state : Package rulesets contain the list of core packages and their dependencies for each
& every cluster
▪ Deploy cookbooks : contain chef code and configuration that is pushed to Chef server
▪ Deploy pipelines : are YAML files that specify the flow & order of the deploy for every
environment/cluster.
▪ Validation jobs : are run after a deploy completes on all the nodes which ensures end-to-end
functionality is working as expected.
Components Upgrade
CI
process
Component
versions
Git
Bundles
Certified
Releases
Rule set files
(cluster:
component
specific)
Git bundles
Certified
package
version info
Statefiles
Build
Farms
Cookbook,
Roles, Env,
Attribute files
Git (release
info)
Build
Farms
Artifactory
Ruby (Rake)
New Release Package Rulesets Deploy cookbooks
A B
Build
Farms
Rspec rubocop,
state generate,
compare & upload
Validate increment
version
1 2 3
Chef
CD
process
Components Upgrade cont..
Git (release
info)
Build
Farms
Statefiles
Deploy Pipeline
Component
Node
Ruby (Rake)
Min size, zerodowntime
check, targetsize, validate
Chef-client, cookbook-converge,
graceful shutdown and healthcheck
4
Chef
A B
HBase
HBase Rolling Upgrade
Release Configs:
default:
group: 'all'
command: 'start'
system: 'ALL'
verbose: 'true'
retry: 3
upgradeREST: 'false'
upgradeGateway: 'true'
dryrun: 'false'
force: 'false'
upgrade_type: 'rolling'
skip_nn_upgrade: 'false'
skip_master_upgrade: 'false'
Workflow definitions:
default:
continue_on_failure:
- broken
- badnodes
relux.red:
- master
- default
- user
- ca_soln-stage
- perf,perf2,projects
- restALL
▪ Workflow based system.
▪ Complete CI/CD for HDFS and HBase Upgrades
▪ Build tgz and push to repo servers
▪ Installs software before hand, activate new release during
upgrade
▪ Each component and Region group is upgraded independently
i.e HDFS, group of regionservers.
CI/CD
process
Git
(release
info)
Jenkins
Start
Put NN in RU
mode &
Upgrade NN
SNN
Master
Upgrade
Region-
server
Upgrade
process
Stargate
Upgrade
Gateway
Upgrade
HBase Upgrade
Foreach
DN/RS
Upgrade
regionserver
Repo Server
Package +
conf version
Stop
Regionserver
DN
Safeupgrade,
Stop DN
Upgrade and
Start DN
Upgrade and
Start RS
1
2
3
4
3a
3c
3b
3d 3e
3f
3f
5
HDFS Rolling
Upgrade process
Iterate over each group
Iterate over
each server in
a group
STORM
Storm Rolling Upgrade
Release Configs:
default:
parallel: 10
verbose: 'true'
retry: 3
dryrun: 'false'
upgrade_type: 'rolling'
quarantine: 'true'
terminate_on_failure: 'true'
sup_failure_threshold: 10
sendmail_to: 'dheerajk@yahoo-inc.com'
sendmail_cc: 'storm-devel@yahoo-inc.com, grid-ops@yahoo-inc.com'
cluster_workflow:
cluster1.colo1: pacemaker_drpc
cluster2.colo2: default
Workflow Defination:
default:
rolling_task:
- upgradeNimbus
- bounceNimbus
- upgradeSupervisor
- bounceSupervisor
- upgradeDRPC
- bounceDRPC
- upgradeGateways
- doGatewayTask
- verifySupervisor
- runDRPCTestTopology
- verifySoftwareVersion
full_upgrade_task:
- killAllTopologies
- specifyOperation_stop
- sleep10
- bounceNimbus
- bounceSupervisor
- bounceDRPC
- clearDiskCache
- cleanZKP
- upgradeNimbus
- upgradeSupervisor
- upgradeDRPC
- specifyOperation_start
- bounceNimbus
- bounceSupervisor
- bounceDRPC
- upgradeGateways
- doGatewayTask
- verifySupervisor
- runDRPCTestTopology
- verifySoftwareVersion
▪ Complete CI/CD system. Statefiles are build per
component and pushed to artifactory before
upgrade
▪ Installs software before hand, activate new release
during upgrade
▪ Each component is upgraded independently i.e
Pacemaker, Nimbus, DRPC & Supervisor
Storm Upgrade CI/CD
process
Git
(release
info)
Jenkins
Start
Artifactory
(State files &
Release info)
RE Jenkins
and SD
process
Pacemaker
Upgrade
Nimbus
Upgrade
Supervisor
Upgrade
Bounce
Workers
DRPC
Upgrade
DRPC
Upgrade
Verify
Supervisors
Run
Test/Validatio
n topology
Audit All
Components
RE Jenkins lets to statefile
generation for each component and
updates git with release info
Statefiles are published in
artifactory and downloaded during
upgrade
Upgrade fails if
more than X
supervisors
fails to upgrade
Rolling Upgrade timeline
Component Parallelism Hadoop 2.6.x Hadoop 2.7.x Hbase 0.98.x Storm 0.10.1.x
HDFS (4k nodes) 1 4 days 1 day X X
YARN (4k nodes) 1 1 day 1 day X X
HBase (1k nodes) 1-4 4-5 days X 4-5 days X
Storm (350
nodes)
10 X X X 4-6 hrs
Components 1 1-2 hrs 1-2 hrs 1-2 hrs X
HadoopSummit 2016
99.928
99.898
99.940
99.687
99.705
99.600
99.650
99.700
99.750
99.800
99.850
99.900
99.950
100.000
AB DB FB HB IB JB LB PB UB BT LT PT TT UT BR DR IR LR MR PR
Rolling Upgrade Impact
YTD Availability by Cluster
99.990
Thank You
HadoopSummit 2016
Ad

More Related Content

What's hot (20)

Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
DataWorks Summit/Hadoop Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
DataWorks Summit/Hadoop Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
 
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezYahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
DataWorks Summit
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
DataWorks Summit
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
Evans Ye
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
Kunal Gupta
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
DataWorks Summit/Hadoop Summit
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
DataWorks Summit/Hadoop Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
DataWorks Summit/Hadoop Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezYahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
DataWorks Summit
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
Evans Ye
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
Kunal Gupta
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
DataWorks Summit/Hadoop Summit
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
DataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
DataWorks Summit/Hadoop Summit
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
P. Taylor Goetz
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
DataWorks Summit/Hadoop Summit
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
Robert Evans
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
DataWorks Summit/Hadoop Summit
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
P. Taylor Goetz
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Sumeet Singh
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
DataWorks Summit/Hadoop Summit
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
 
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
DataWorks Summit/Hadoop Summit
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
DataWorks Summit/Hadoop Summit
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
DataWorks Summit/Hadoop Summit
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Precisely
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
Tyler Mitchell
 
Workload Automation + Hadoop?
Workload Automation + Hadoop?Workload Automation + Hadoop?
Workload Automation + Hadoop?
DataWorks Summit/Hadoop Summit
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
DataWorks Summit/Hadoop Summit
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
 
Active Learning for Fraud Prevention
Active Learning for Fraud PreventionActive Learning for Fraud Prevention
Active Learning for Fraud Prevention
DataWorks Summit/Hadoop Summit
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
DataWorks Summit/Hadoop Summit
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
P. Taylor Goetz
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
Robert Evans
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
P. Taylor Goetz
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Sumeet Singh
 
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
DataWorks Summit/Hadoop Summit
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Precisely
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
Tyler Mitchell
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
 
Ad

Similar to Managing Hadoop, HBase and Storm Clusters at Yahoo Scale (20)

Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Wangda Tan
 
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow StoryHandling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow Story
DataWorks Summit
 
Upgrading hadoop
Upgrading hadoopUpgrading hadoop
Upgrading hadoop
Shashwat Shriparv
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Erik Krogen
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Sergey Lukjanov
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Community
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
Mandakini Kumari
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
 
Unit 5
Unit  5Unit  5
Unit 5
Ravi Kumar
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
Amal G Jose
 
Linux Experience for Herman
Linux Experience for HermanLinux Experience for Herman
Linux Experience for Herman
Hin Yeung Herman LEUNG
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
Sumitra Pundlik
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
Edureka!
 
Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh 01112016
Dipesh Singh 01112016
Dipesh Singh
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to Hadoop
Continuent
 
Azure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosAzure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment Scenarios
Brian Benz
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
Cloudera, Inc.
 
Integrating CloudStack & Ceph
Integrating CloudStack & CephIntegrating CloudStack & Ceph
Integrating CloudStack & Ceph
ShapeBlue
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Wangda Tan
 
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow StoryHandling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow Story
DataWorks Summit
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Erik Krogen
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Sergey Lukjanov
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Community
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
Mandakini Kumari
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
Amal G Jose
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
Sumitra Pundlik
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
Edureka!
 
Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh 01112016
Dipesh Singh 01112016
Dipesh Singh
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to Hadoop
Continuent
 
Azure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosAzure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment Scenarios
Brian Benz
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
Cloudera, Inc.
 
Integrating CloudStack & Ceph
Integrating CloudStack & CephIntegrating CloudStack & Ceph
Integrating CloudStack & Ceph
ShapeBlue
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 

Managing Hadoop, HBase and Storm Clusters at Yahoo Scale

  • 1. Managing Hadoop, HBase and Storm Clusters at Yahoo Scale PRESENTED BY Dheeraj Kapur, Savitha Ravikrishnan⎪June 30, 2016
  • 2. Agenda Topic Speaker(s) Introduction, HDFS RU, HBase RU & Storm RU Dheeraj Kapur YARN RU, Component RU, Distributed Cache & Sharelib Savitha Ravikrishnan Q&A All Presenters HadoopSummit 2016
  • 4. Grid Infrastructure at Yahoo HadoopSummit 2016 ▪ A multi-tenant, secure, distributed compute and storage environment, based on Hadoop stack for large scale data processing ▪ 3 data centers, over 45k physical nodes. ▪ 18 YARN (Hadoop) clusters, having 350 to 5200 nodes. ▪ 9 HBase clusters, having 80 to 1080 nodes. ▪ 13 Storm clusters, having 40 to 250 nodes
  • 5. Grid Stack Zookeeper Backend Support Hadoop Storage Hadoop Compute Hadoop Services Support Shop Monitoring Starling for logging HDFS Hbase as NoSql store Hcatalog for metadata registry YARN (Mapred) and Tez for Batch processing Storm for stream processing Spark for iterative programming PIG for ETL Hive for SQL Oozie for workflows Proxy services GDM for data Mang Café on Spark for ML
  • 6. Deployment Model DataNode NodeManager NameNode R M DataNodes RegionServers NameNode HBase Master Nimbu s Supervisor Administration, Management and Monitoring ZooKeeper Pools HTTP/HDFS/GDM Load Proxies Applications and Data Data Feeds Data Stores Oozie Server HS2/ HCat HadoopSummit 2016
  • 8. Hadoop Rolling Upgrade ▪ Complete CI/CD for HDFS and YARN Upgrades ▪ Build software and config “tgz” and push to repo servers ▪ Installs software and configs in pre-deploy phase, activate during upgrade ▪ Slow upgrade 1 node per cycle ▪ Each component is upgraded independently i.e HDFS, YARN & Client HadoopSummit 2016 Release Configs/Bundles: --- doc: This file is auto generated packages: - label: hadoop version: 2.7.2.13.1606200235-20160620-000 - label: conf version: 2.7.2.13.1606200235-20160620-000 - label: gridjdk version: 1.7.0_17.1303042057-20160620-000 - label: yjava_jdk version: 1.8.0_60.51-20160620-000
  • 9. Package Download (pre- deploy) RU process Git (release info) Namenode, Datanodes, Resourcemanager HBaseMaster, Regionserver, Gateways Repo Farm Jenkins Start Servers /Cluster ygrid-deploy-software
  • 10. CI/CD process Git (release info) Jenkins Start HDFS Upgrade RU process Finalize RU Create Dir Structure Put NN in RU mode SNN Upgrade NN Failover SNN Upgrade foreach DN Select DN Check installed version Stop DN Activate new software Start DN Wait for DN to join Stop/termina te RU on X failures 1 2 3a 3b 3c 4a 4b 4c 4d 4e 4f After 100 hosts are successfully upgraded Check HDFS used %age, Live nodes consistency on NNs Terminate Upgrade incase of more than X failure Involves service and IP failover from NN to SNN and vice versa Safeupgrade-dn
  • 11. Hadoop 2.7.x improvements over 2.6.x Performance ▪ Reduce NN failover by parallelizing the quota init ▪ Datanode layout inefficiency causing high I/O load. ▪ Use a offline upgrade script to speed up the layout upgrade. ▪ Adding fake metrics sink to subvert JMX cache fix, causing delays in datanode upgrade/health check. ▪ Improved datanode shutdown speed Failure handling ▪ Reduce the read/write failures by blocking clients until DN is fully initialized.
  • 12. YARN
  • 13. YARN Rolling Upgrade ▪ Minimize downtime, maximize service availability ▪ Work preserving restart on RM and NM ▪ Retains state for 10mins. ▪ Ensures that applications continuously run during a RM restart ▪ Save state, update software, restart and restore state. ▪ Uses leveldb as state store ▪ After RM restarts, it loads all the application metadata and other credentials from state-store and populates them into memory. HadoopSummit 2016
  • 14. CI/CD process Git (release info) Jenkins Start YARN Upgrade RU process Create Dir Structure Resource Manager Upgrade HistoryServer Upgrade Foreach NM Select NM Check installed version Safestop NM (kill -9) Activate new software Start NM Wait for NM to join Stop/termina te RU on X failures Timeline Server Upgrade 1 2 2a 2b 2c 2d 2e 3 4 5 Terminate Upgrade incase of more than X failure
  • 16. Distributed Cache ▪ Distributed cache distributes application-specific, large, read-only files efficiently. ▪ Applications specify the files to be cached in URLs (hdfs://) in the Job ▪ DistributedCache tracks the modification timestamps of the cached files. ▪ DistributedCache can be used to distribute simple, read-only data or text files and more complex types such as archives and JAR files. HadoopSummit 2016
  • 17. Sharelib ▪ "Sharelib" is a management system for a directory in HDFS named /sharelib, which exists on every cluster. ▪ Shared libraries can simplify the deployment and management of applications. ▪ The target directory is /sharelib, under which you will find various things: /sharelib/v1 - where all the packages are • /sharelib/v1/conf - where the unique metafile for the cluster is (and all previous versions) • /sharelib/v1/{tez, pig, ... } - where the package versions are kept ▪ The links/tags (metafile) are unique per cluster. ▪ Grid Ops maintains shared libraries on HDFS of each cluster ▪ Packages in shared libraries include mapreduce, pig, hbase, hcatalog, hive and oozie. HadoopSummit 2016
  • 18. Jenkins Start Sharelib Uploader Git Bundles Verify Dist Cache Download toDo packages Dist repo Re-package and upload package Re-generate Meta info (HDFS) Upload to Oozie Sharelib Update Generate clients to update
  • 20. Component Upgrade HadoopSummit 2016 ▪ New Releases : CI environment continuously releases certified builds & their versions. ▪ Generate state : Package rulesets contain the list of core packages and their dependencies for each & every cluster ▪ Deploy cookbooks : contain chef code and configuration that is pushed to Chef server ▪ Deploy pipelines : are YAML files that specify the flow & order of the deploy for every environment/cluster. ▪ Validation jobs : are run after a deploy completes on all the nodes which ensures end-to-end functionality is working as expected.
  • 21. Components Upgrade CI process Component versions Git Bundles Certified Releases Rule set files (cluster: component specific) Git bundles Certified package version info Statefiles Build Farms Cookbook, Roles, Env, Attribute files Git (release info) Build Farms Artifactory Ruby (Rake) New Release Package Rulesets Deploy cookbooks A B Build Farms Rspec rubocop, state generate, compare & upload Validate increment version 1 2 3 Chef
  • 22. CD process Components Upgrade cont.. Git (release info) Build Farms Statefiles Deploy Pipeline Component Node Ruby (Rake) Min size, zerodowntime check, targetsize, validate Chef-client, cookbook-converge, graceful shutdown and healthcheck 4 Chef A B
  • 23. HBase
  • 24. HBase Rolling Upgrade Release Configs: default: group: 'all' command: 'start' system: 'ALL' verbose: 'true' retry: 3 upgradeREST: 'false' upgradeGateway: 'true' dryrun: 'false' force: 'false' upgrade_type: 'rolling' skip_nn_upgrade: 'false' skip_master_upgrade: 'false' Workflow definitions: default: continue_on_failure: - broken - badnodes relux.red: - master - default - user - ca_soln-stage - perf,perf2,projects - restALL ▪ Workflow based system. ▪ Complete CI/CD for HDFS and HBase Upgrades ▪ Build tgz and push to repo servers ▪ Installs software before hand, activate new release during upgrade ▪ Each component and Region group is upgraded independently i.e HDFS, group of regionservers.
  • 25. CI/CD process Git (release info) Jenkins Start Put NN in RU mode & Upgrade NN SNN Master Upgrade Region- server Upgrade process Stargate Upgrade Gateway Upgrade HBase Upgrade Foreach DN/RS Upgrade regionserver Repo Server Package + conf version Stop Regionserver DN Safeupgrade, Stop DN Upgrade and Start DN Upgrade and Start RS 1 2 3 4 3a 3c 3b 3d 3e 3f 3f 5 HDFS Rolling Upgrade process Iterate over each group Iterate over each server in a group
  • 26. STORM
  • 27. Storm Rolling Upgrade Release Configs: default: parallel: 10 verbose: 'true' retry: 3 dryrun: 'false' upgrade_type: 'rolling' quarantine: 'true' terminate_on_failure: 'true' sup_failure_threshold: 10 sendmail_to: '[email protected]' sendmail_cc: '[email protected], [email protected]' cluster_workflow: cluster1.colo1: pacemaker_drpc cluster2.colo2: default Workflow Defination: default: rolling_task: - upgradeNimbus - bounceNimbus - upgradeSupervisor - bounceSupervisor - upgradeDRPC - bounceDRPC - upgradeGateways - doGatewayTask - verifySupervisor - runDRPCTestTopology - verifySoftwareVersion full_upgrade_task: - killAllTopologies - specifyOperation_stop - sleep10 - bounceNimbus - bounceSupervisor - bounceDRPC - clearDiskCache - cleanZKP - upgradeNimbus - upgradeSupervisor - upgradeDRPC - specifyOperation_start - bounceNimbus - bounceSupervisor - bounceDRPC - upgradeGateways - doGatewayTask - verifySupervisor - runDRPCTestTopology - verifySoftwareVersion ▪ Complete CI/CD system. Statefiles are build per component and pushed to artifactory before upgrade ▪ Installs software before hand, activate new release during upgrade ▪ Each component is upgraded independently i.e Pacemaker, Nimbus, DRPC & Supervisor
  • 28. Storm Upgrade CI/CD process Git (release info) Jenkins Start Artifactory (State files & Release info) RE Jenkins and SD process Pacemaker Upgrade Nimbus Upgrade Supervisor Upgrade Bounce Workers DRPC Upgrade DRPC Upgrade Verify Supervisors Run Test/Validatio n topology Audit All Components RE Jenkins lets to statefile generation for each component and updates git with release info Statefiles are published in artifactory and downloaded during upgrade Upgrade fails if more than X supervisors fails to upgrade
  • 29. Rolling Upgrade timeline Component Parallelism Hadoop 2.6.x Hadoop 2.7.x Hbase 0.98.x Storm 0.10.1.x HDFS (4k nodes) 1 4 days 1 day X X YARN (4k nodes) 1 1 day 1 day X X HBase (1k nodes) 1-4 4-5 days X 4-5 days X Storm (350 nodes) 10 X X X 4-6 hrs Components 1 1-2 hrs 1-2 hrs 1-2 hrs X HadoopSummit 2016
  • 30. 99.928 99.898 99.940 99.687 99.705 99.600 99.650 99.700 99.750 99.800 99.850 99.900 99.950 100.000 AB DB FB HB IB JB LB PB UB BT LT PT TT UT BR DR IR LR MR PR Rolling Upgrade Impact YTD Availability by Cluster 99.990