SlideShare a Scribd company logo
Page1 © Hortonworks Inc. 2015
Enterprise-Grade Rolling Upgrade for a Live
Hadoop Cluster
Sanjay Radia, Vinod Kumar Vavilapalli
Hortonworks Inc
June 9, 2015
Page2 © Hortonworks Inc. 2015
Agenda
•Introduction
•What is Rolling Upgrade?
•Problem – Several key issues to be addressed
–Wire compatibility and side-by-side installs are not sufficient!!
–Must Address: Data safety, Service degradation and disruption
•Enhancements to various components
–Packaging – side-by-side install
–HDFS, YARN, Hive, Oozie, …
Page3 © Hortonworks Inc. 2015
Sanjay Radia
•Chief Architect, Founder, Hortonworks
•Part of the Hadoop team at Yahoo! since 2007
–Chief Architect of Hadoop Core at Yahoo!
–Apache Hadoop PMC and Committer
• Prior
–Data center automation, schedulers, virtualization, Java, HA, OSs, File
Systems
– (Startup, Sun Microsystems, Inria …)
–Ph.D., University of Waterloo
Page4 © Hortonworks Inc. 2015
Vinod Kumar Vavilapalli
– Long time Hadooper since 2007
– Apache Hadoop Committer / PMC
– Apache Member
– Yahoo! -> Hortonworks
– MapReduce -> YARN from day one
Page5 © Hortonworks Inc. 2015
HDP Upgrade: Two Upgrade Modes
Stop the Cluster Upgrade
Shutdown services and cluster and then upgrade.
Traditionally this was the only way
Rolling Upgrade
Upgrade cluster and its services while cluster is
actively running applications
Note: Upgrade time is proportional to # nodes, not data size
Enterprises run critical services and data on a Hadoop cluster.
Need live cluster upgrade that maintains SLAs without degradation
Page6 © Hortonworks Inc. 2015
But you can also “Revert to Prior State”
Rollback
Revert bits and state of cluster and its services back to a
checkpoint’d state.
Why? This is an emergency procedure.
Downgrade
Downgrade the service and component to prior version, but
keep any new data and metadata that has been generated
Why? You are not happy with performance, or app compatibility, ….
Page7 © Hortonworks Inc. 2015
But aren’t wire compatibility and
side-by-side installs sufficient for
Rolling upgrades?
Unfortunately No!! Not if you want
• Data safety
• Keep running jobs/apps during upgrades; continue to run
correctly
• Maintain SLAs
• Allow downgrade/rollbacks in case of problems
Page8 © Hortonworks Inc. 2015
Issues that need to be addressed (1)
• Data safety
• HDFS’s upgrade checkpoint does not work for rolling upgrade
• Service degradation – note every daemon is restarted in rolling fashion
• HDFS write pipeline
• Application Masters on YARN restart
• NodeManagers restart
• Hive server is processing client queries – it cannot restart to new version without loss
• Client must not see failures – many components do not have retry
BUT Hadoop deals with failures, it will fix pipelines, restart tasks –
what is the big deal!!
Service degradation will be high because every daemon is restarted
Page9 © Hortonworks Inc. 2015
Issues that need to be addressed (2)
• Maintaining the application submitter’s context (correctness)
• MR tasks get their context from the local node
– In the past the submitters and node’s context were identical
– But with RU, a node’s binaries are being upgraded and hence may be inconsistent with submitter
- Half of the job could execute with old binaries and the other with the new one!!
• Persistent state
• Backward compatibility for upgrade (or convert)
• Forward compatibility for downgrade (or convert)
• Wire compatibility
• With clients (forward and backward)
• Internally (Between Masters and Slaves or Peers)
– Note: the upgrade is in a rolling fashion
Page10 © Hortonworks Inc. 2015
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• YARN Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page11 © Hortonworks Inc. 2015
Packaging: Side-by-side Installs (1)
• Need side-by-side installs of multiple versions on same node
• Some components are version N, while others are N+1
• For same component, some daemons version N, others N+1 on the same node (e.g. NN and DN)
• HDP’s solution: Use OS-distro standard packaging solution
• Rejected proprietary packing as a solution (no lock-in)
• Want to support RU via Ambari and Manually
• Standard packaging solutions like RPMs have useful tools and mechanisms
– Tools to install, uninstall, query, etc
– Manage dependencies automatically
– Admins do not need to learn new tools and formats
• Side benefits for ‘stop-the-world” upgrade:
• Can install the new binaries before the shutdown
Page12 © Hortonworks Inc. 2015
Packaging: Side-by-side installs (2)
• Layout: side-by-side
• /usr/hdp/2.2.0.0/hadoop
• /usr/hdp/2.2.0.0/hive
• /usr/hdp/2.3.0.0/hadoop
• /usr/hdp/2.3.0.0/hive
• Define what is current for each component’s
daemon and clients
• /usr/hdp/current/hdfs-nn->/usr/hdp/2.3.0.0/hadoop
• /usr/hdp/current/hadoop-client->/usr/hdp/2.2.0.0/hadoop
• /usr/hdp/current/hdfs-dn->/usr/hdp/2.2.0.0/hadoop
• Distro-select helps you manage the version switch
• Our solution: the package name contains the version number:
• E.g hadoop_2_2_0_0 is the RPM package name itself
– Hadoop_2_3_0_0 is different peer package
• Bin commands point to current:
/usr/bin/hadoop->/usr/hdp/current/hadoop-client/bin/hadoop
Page13 © Hortonworks Inc. 2015
Packaging: Side-by-side installs (3)
• distro-select tool to select current binary
• Per-component, Per-daemon
• Maintain stack consistency – that is what QE tested
• Each component refers to its siblings of same stack version
• Each component knows the “hadoop home” of the same stack
– Wrapper bin-scripts set this up
• Config updates can be optionally synchronized with binary upgrade
• Configs can sit in their old location
• But what if the new binary version requires slightly different config?
• Each binary version has its own config pointer
– /usr/hdp/2.2.0.0/hadoop/conf -> /etc/hadoop/conf
Page14 © Hortonworks Inc. 2015
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• YARN Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page15 © Hortonworks Inc. 2015
HDFS Enhancements (1)
Data safety
• Since version 2007, HDFS supported an upgrade-checkpoint
• Backups of HDFS not practical – too large
• Protects against HDFS bugs in new version deleting files
– Standard practice to use for ALL upgrade even patch releases
• But this only works for “stop-the-world” full upgrade and does not support downgrade
• Irresponsible to do rolling upgrade without such a mechanism
HDP 2.2 has enhanced upgrade-checkpoint (HDFS-5535)
• Markers for rollback
• “Hardlinks” to protect against deletes due to bugs in the new version of HDFS code
– Old scheme had hardlinks but we now delay the deletes
• Added downgrade capability
• Protobuf based fsImage for compatible extensibility
Page16 © Hortonworks Inc. 2015
HDFS Enhancements (2)
Minimize service degradation and retain data safety
• Fast datanode restart (HDFS-5498)
• Write pipeline – every DN will be upgraded and hence many write
pipelines will break and repaired
• Umbrella Jira HDFS-5535
– Repair it to the same DN during RU (avoid replica data copy)
– Retain same number of replicas in pipeline
• Upgrade HA standby and failover (NN HA available for a long time)
Page17 © Hortonworks Inc. 2015
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• YARN Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page18 © Hortonworks Inc. 2015
YARN Enhancements: Minimize Service Degradation
• YARN RM retains application queue (2013)
• YARN RM fail-over (2014)
– Note this retains the queues but ALL jobs are rekicked
• YARN RM can restart while retaining applications (2015)
Page19 © Hortonworks Inc. 2015
YARN Enhancements: Minimize Service Degradation
• A restarted YARN NodeManager retains existing containers (2015)
• Recall: restarting containers will cause serious SLA degradation
Page20 © Hortonworks Inc. 2015
YARN Enhancements: Compatibility
• Versioning of state-stores of RM and NMs
• Compatible evolution of tokens over time
• Wire compatibility between mixed versions of RM
Page21 © Hortonworks Inc. 2015
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• YARN Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page22 © Hortonworks Inc. 2015
Retaining Job/App context
• Previously a Job/Apps used libraries from the local node
• Worked because client-node & compute-nodes had same version
• But during RU, the NodeManager has multiple versions
• Must use the same version as used by the client when submitting a job
• Solution:
• Framework libraries are now installed in HDFS
• Client-context sent as “distro-version” variable in job config
• Has side benefits
– Frameworks now installed in single node and then uploaded to HDFS
• Note Oozie also enhanced to maintain consistent context
Page23 © Hortonworks Inc. 2015
YARN Rolling Upgrades: A Cluster Snapshot
Page24 © Hortonworks Inc. 2015
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• YARN Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page25 © Hortonworks Inc. 2015
Hive Enhancements
• Fast restarts + client-side reconnection
• Hive metastore and Hive client
• Hive-server2: stateful server that submits the client’s query
• Need to keep it running till the old queries complete
• Solution:
• Allow multiple Hive-servers to run, each registered in Zookeeper
• New client requests go to new servers
• Old server completes old queries but does not receive any new ones
– Old-server is removed from Zookeeper
• Side benefits
• HA + Load balancing solution for Hiveserver2
Page26 © Hortonworks Inc. 2015
Automated Rolling Upgrade
Via Ambari
Via Your own cluster management scripts
Page27 © Hortonworks Inc. 2015
HDP Rolling Upgrades Runbook
Pre-requisites
• HA
• Configs
Prepare
• Install bits
• DB backups
• HDFS
checkpoint
Rolling Upgrade Finalize
Rolling
Downgrade
Rollback
NOT Rolling. Shutdown all
services.
Note: Upgrade time is proportional to # nodes, not data size
Page30 © Hortonworks Inc. 2015
Both Manual and Automated Rolling Upgrade
• Ambari supports fully automated upgrades
• Verifies prerequisites
• Performs HDFS upgrade-checkpoint, prompts for DB backups
• Performs rolling upgrade
• All the components, in the right order
• Smoke tests at each critical stages
• Opportunities for Admin verification at critical stages
• Downgrade if you change your mind
• Have published the runbook for those that do not use Ambari
• You can do it manually or automate your own process
Page31 © Hortonworks Inc. 2015
Runbook: Rolling Upgrade
Ambari has automated
process for Rolling Upgrades
Services are switched over to
new version in rolling fashion
Any components not installed
on cluster are skipped
Zookeeper
Ranger
Core Masters
Core Slaves
Hive
Oozie
Falcon
Clients
Kafka
Knox
Storm
Slider
Flume
Hue
Finalize
HDFS, YARN, MR,
Tez, HBase, Pig.
Hive, Phoenix,
Mahout
HDFS
YARN
HBase
Page32 © Hortonworks Inc. 2015
Runbook: Rolling Downgrade
Zookeeper
Ranger
Core Masters
Core Slaves
Hive
Oozie
Falcon
Clients
Kafka
Knox
Storm
Slider
Flume
Hue
Downgrade
Finalize
Page33 © Hortonworks Inc. 2015
Summary
• Enterprises run critical services and data on a Hadoop cluster.
• Need a live cluster upgrade without degradation and maintaining SLAs
• We enhanced Hadoop components for enterprise-grade rolling upgrade
• Non-proprietary packaging solution using OS-standard solution (RPMs, Debs, )
• Data safety
– HDFS checkpoints and write-pipelines
• Maintain SLAs – solve a number of service degradation problems
– HDFS write pipelines, Yarn RM, NM state recovery, Hive, …
• Jobs/apps continue to run correctly with the right context
• Allow downgrade/rollbacks in case of problems
• All enhancements truly open source and pushed back to Apache?
• Yes of course – that is how Hortonworks does business …
Page34 © Hortonworks Inc. 2015
Backup slides
Page35 © Hortonworks Inc. 2015
Why didn’t you use alternatives
• Alternatives generally keep one version active, not two
• We need to move some services as a pack (clients)
• We need to support managing confs and binaries together and
separately
• Maybe we could have done it, but it was getting complex …..
Ad

More Related Content

What's hot (20)

High Availability with MariaDB Enterprise
High Availability with MariaDB EnterpriseHigh Availability with MariaDB Enterprise
High Availability with MariaDB Enterprise
MariaDB Corporation
 
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
Michael Noel
 
SQL 2014 AlwaysOn Availability Groups for SharePoint Farms - SPS Sydney 2014
SQL 2014 AlwaysOn Availability Groups for SharePoint Farms - SPS Sydney 2014SQL 2014 AlwaysOn Availability Groups for SharePoint Farms - SPS Sydney 2014
SQL 2014 AlwaysOn Availability Groups for SharePoint Farms - SPS Sydney 2014
Michael Noel
 
Install Oracle FMW - 'Mostly Scripted'
Install Oracle FMW - 'Mostly Scripted'Install Oracle FMW - 'Mostly Scripted'
Install Oracle FMW - 'Mostly Scripted'
makker_nl
 
Status Quo on the automation support in SOA Suite OGhTech17
Status Quo on the automation support in SOA Suite OGhTech17Status Quo on the automation support in SOA Suite OGhTech17
Status Quo on the automation support in SOA Suite OGhTech17
Jon Petter Hjulstad
 
Oracle Enterprise Linux
Oracle Enterprise LinuxOracle Enterprise Linux
Oracle Enterprise Linux
vkv_vkv
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
 
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
Michael Noel
 
33616611930205162156 upgrade internals_19c
33616611930205162156 upgrade internals_19c33616611930205162156 upgrade internals_19c
33616611930205162156 upgrade internals_19c
Locuto Riorama
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks
 
YARN and the Docker container runtime
YARN and the Docker container runtimeYARN and the Docker container runtime
YARN and the Docker container runtime
DataWorks Summit/Hadoop Summit
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing Hadoop
DataWorks Summit
 
Overview about OracleVM and Oracle Linux
Overview about OracleVM and Oracle LinuxOverview about OracleVM and Oracle Linux
Overview about OracleVM and Oracle Linux
andreas kuncoro
 
MySQL in the Cloud, is Amazon RDS for you?
MySQL in the Cloud, is Amazon RDS for you?MySQL in the Cloud, is Amazon RDS for you?
MySQL in the Cloud, is Amazon RDS for you?
Continuent
 
DevOps Culture & Enablement with Postgres Plus Cloud Database
DevOps Culture & Enablement with Postgres Plus Cloud DatabaseDevOps Culture & Enablement with Postgres Plus Cloud Database
DevOps Culture & Enablement with Postgres Plus Cloud Database
EDB
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
Maris Elsins
 
20618782218718364253 emea12 vldb
20618782218718364253 emea12 vldb20618782218718364253 emea12 vldb
20618782218718364253 emea12 vldb
Locuto Riorama
 
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and AdministerOracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Andrejs Karpovs
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
 
Apache Ambari - What's New in 1.7.0
Apache Ambari - What's New in 1.7.0Apache Ambari - What's New in 1.7.0
Apache Ambari - What's New in 1.7.0
Hortonworks
 
High Availability with MariaDB Enterprise
High Availability with MariaDB EnterpriseHigh Availability with MariaDB Enterprise
High Availability with MariaDB Enterprise
MariaDB Corporation
 
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
Michael Noel
 
SQL 2014 AlwaysOn Availability Groups for SharePoint Farms - SPS Sydney 2014
SQL 2014 AlwaysOn Availability Groups for SharePoint Farms - SPS Sydney 2014SQL 2014 AlwaysOn Availability Groups for SharePoint Farms - SPS Sydney 2014
SQL 2014 AlwaysOn Availability Groups for SharePoint Farms - SPS Sydney 2014
Michael Noel
 
Install Oracle FMW - 'Mostly Scripted'
Install Oracle FMW - 'Mostly Scripted'Install Oracle FMW - 'Mostly Scripted'
Install Oracle FMW - 'Mostly Scripted'
makker_nl
 
Status Quo on the automation support in SOA Suite OGhTech17
Status Quo on the automation support in SOA Suite OGhTech17Status Quo on the automation support in SOA Suite OGhTech17
Status Quo on the automation support in SOA Suite OGhTech17
Jon Petter Hjulstad
 
Oracle Enterprise Linux
Oracle Enterprise LinuxOracle Enterprise Linux
Oracle Enterprise Linux
vkv_vkv
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
 
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
Michael Noel
 
33616611930205162156 upgrade internals_19c
33616611930205162156 upgrade internals_19c33616611930205162156 upgrade internals_19c
33616611930205162156 upgrade internals_19c
Locuto Riorama
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing Hadoop
DataWorks Summit
 
Overview about OracleVM and Oracle Linux
Overview about OracleVM and Oracle LinuxOverview about OracleVM and Oracle Linux
Overview about OracleVM and Oracle Linux
andreas kuncoro
 
MySQL in the Cloud, is Amazon RDS for you?
MySQL in the Cloud, is Amazon RDS for you?MySQL in the Cloud, is Amazon RDS for you?
MySQL in the Cloud, is Amazon RDS for you?
Continuent
 
DevOps Culture & Enablement with Postgres Plus Cloud Database
DevOps Culture & Enablement with Postgres Plus Cloud DatabaseDevOps Culture & Enablement with Postgres Plus Cloud Database
DevOps Culture & Enablement with Postgres Plus Cloud Database
EDB
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
Maris Elsins
 
20618782218718364253 emea12 vldb
20618782218718364253 emea12 vldb20618782218718364253 emea12 vldb
20618782218718364253 emea12 vldb
Locuto Riorama
 
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and AdministerOracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Andrejs Karpovs
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
 
Apache Ambari - What's New in 1.7.0
Apache Ambari - What's New in 1.7.0Apache Ambari - What's New in 1.7.0
Apache Ambari - What's New in 1.7.0
Hortonworks
 

Similar to Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster (20)

Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
DataWorks Summit
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
markgrover
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
Hortonworks
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
Ian Lumb
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
markgrover
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
Chris Nauroth
 
Habitat at SRECon
Habitat at SREConHabitat at SRECon
Habitat at SRECon
Mandi Walls
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
Vinod Kumar Vavilapalli
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
DataWorks Summit
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
John Dougherty
 
Cloud Foundry at Rakuten
Cloud Foundry at RakutenCloud Foundry at Rakuten
Cloud Foundry at Rakuten
Platform CF
 
Hadoop In Action
Hadoop In ActionHadoop In Action
Hadoop In Action
Bigdata Meetup Kochi
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
Steve Loughran
 
Inside hadoop-dev
Inside hadoop-devInside hadoop-dev
Inside hadoop-dev
Steve Loughran
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
VMware Tanzu
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
Yifeng Jiang
 
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
DataWorks Summit
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
markgrover
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
Hortonworks
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
Ian Lumb
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
markgrover
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
Chris Nauroth
 
Habitat at SRECon
Habitat at SREConHabitat at SRECon
Habitat at SRECon
Mandi Walls
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
Vinod Kumar Vavilapalli
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
DataWorks Summit
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
John Dougherty
 
Cloud Foundry at Rakuten
Cloud Foundry at RakutenCloud Foundry at Rakuten
Cloud Foundry at Rakuten
Platform CF
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
Steve Loughran
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
VMware Tanzu
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
Yifeng Jiang
 
Ad

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Ad

Recently uploaded (20)

tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 

Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

  • 1. Page1 © Hortonworks Inc. 2015 Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster Sanjay Radia, Vinod Kumar Vavilapalli Hortonworks Inc June 9, 2015
  • 2. Page2 © Hortonworks Inc. 2015 Agenda •Introduction •What is Rolling Upgrade? •Problem – Several key issues to be addressed –Wire compatibility and side-by-side installs are not sufficient!! –Must Address: Data safety, Service degradation and disruption •Enhancements to various components –Packaging – side-by-side install –HDFS, YARN, Hive, Oozie, …
  • 3. Page3 © Hortonworks Inc. 2015 Sanjay Radia •Chief Architect, Founder, Hortonworks •Part of the Hadoop team at Yahoo! since 2007 –Chief Architect of Hadoop Core at Yahoo! –Apache Hadoop PMC and Committer • Prior –Data center automation, schedulers, virtualization, Java, HA, OSs, File Systems – (Startup, Sun Microsystems, Inria …) –Ph.D., University of Waterloo
  • 4. Page4 © Hortonworks Inc. 2015 Vinod Kumar Vavilapalli – Long time Hadooper since 2007 – Apache Hadoop Committer / PMC – Apache Member – Yahoo! -> Hortonworks – MapReduce -> YARN from day one
  • 5. Page5 © Hortonworks Inc. 2015 HDP Upgrade: Two Upgrade Modes Stop the Cluster Upgrade Shutdown services and cluster and then upgrade. Traditionally this was the only way Rolling Upgrade Upgrade cluster and its services while cluster is actively running applications Note: Upgrade time is proportional to # nodes, not data size Enterprises run critical services and data on a Hadoop cluster. Need live cluster upgrade that maintains SLAs without degradation
  • 6. Page6 © Hortonworks Inc. 2015 But you can also “Revert to Prior State” Rollback Revert bits and state of cluster and its services back to a checkpoint’d state. Why? This is an emergency procedure. Downgrade Downgrade the service and component to prior version, but keep any new data and metadata that has been generated Why? You are not happy with performance, or app compatibility, ….
  • 7. Page7 © Hortonworks Inc. 2015 But aren’t wire compatibility and side-by-side installs sufficient for Rolling upgrades? Unfortunately No!! Not if you want • Data safety • Keep running jobs/apps during upgrades; continue to run correctly • Maintain SLAs • Allow downgrade/rollbacks in case of problems
  • 8. Page8 © Hortonworks Inc. 2015 Issues that need to be addressed (1) • Data safety • HDFS’s upgrade checkpoint does not work for rolling upgrade • Service degradation – note every daemon is restarted in rolling fashion • HDFS write pipeline • Application Masters on YARN restart • NodeManagers restart • Hive server is processing client queries – it cannot restart to new version without loss • Client must not see failures – many components do not have retry BUT Hadoop deals with failures, it will fix pipelines, restart tasks – what is the big deal!! Service degradation will be high because every daemon is restarted
  • 9. Page9 © Hortonworks Inc. 2015 Issues that need to be addressed (2) • Maintaining the application submitter’s context (correctness) • MR tasks get their context from the local node – In the past the submitters and node’s context were identical – But with RU, a node’s binaries are being upgraded and hence may be inconsistent with submitter - Half of the job could execute with old binaries and the other with the new one!! • Persistent state • Backward compatibility for upgrade (or convert) • Forward compatibility for downgrade (or convert) • Wire compatibility • With clients (forward and backward) • Internally (Between Masters and Slaves or Peers) – Note: the upgrade is in a rolling fashion
  • 10. Page10 © Hortonworks Inc. 2015 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • YARN Enhancements • Retaining Job/App Context • Hive Enhancements
  • 11. Page11 © Hortonworks Inc. 2015 Packaging: Side-by-side Installs (1) • Need side-by-side installs of multiple versions on same node • Some components are version N, while others are N+1 • For same component, some daemons version N, others N+1 on the same node (e.g. NN and DN) • HDP’s solution: Use OS-distro standard packaging solution • Rejected proprietary packing as a solution (no lock-in) • Want to support RU via Ambari and Manually • Standard packaging solutions like RPMs have useful tools and mechanisms – Tools to install, uninstall, query, etc – Manage dependencies automatically – Admins do not need to learn new tools and formats • Side benefits for ‘stop-the-world” upgrade: • Can install the new binaries before the shutdown
  • 12. Page12 © Hortonworks Inc. 2015 Packaging: Side-by-side installs (2) • Layout: side-by-side • /usr/hdp/2.2.0.0/hadoop • /usr/hdp/2.2.0.0/hive • /usr/hdp/2.3.0.0/hadoop • /usr/hdp/2.3.0.0/hive • Define what is current for each component’s daemon and clients • /usr/hdp/current/hdfs-nn->/usr/hdp/2.3.0.0/hadoop • /usr/hdp/current/hadoop-client->/usr/hdp/2.2.0.0/hadoop • /usr/hdp/current/hdfs-dn->/usr/hdp/2.2.0.0/hadoop • Distro-select helps you manage the version switch • Our solution: the package name contains the version number: • E.g hadoop_2_2_0_0 is the RPM package name itself – Hadoop_2_3_0_0 is different peer package • Bin commands point to current: /usr/bin/hadoop->/usr/hdp/current/hadoop-client/bin/hadoop
  • 13. Page13 © Hortonworks Inc. 2015 Packaging: Side-by-side installs (3) • distro-select tool to select current binary • Per-component, Per-daemon • Maintain stack consistency – that is what QE tested • Each component refers to its siblings of same stack version • Each component knows the “hadoop home” of the same stack – Wrapper bin-scripts set this up • Config updates can be optionally synchronized with binary upgrade • Configs can sit in their old location • But what if the new binary version requires slightly different config? • Each binary version has its own config pointer – /usr/hdp/2.2.0.0/hadoop/conf -> /etc/hadoop/conf
  • 14. Page14 © Hortonworks Inc. 2015 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • YARN Enhancements • Retaining Job/App Context • Hive Enhancements
  • 15. Page15 © Hortonworks Inc. 2015 HDFS Enhancements (1) Data safety • Since version 2007, HDFS supported an upgrade-checkpoint • Backups of HDFS not practical – too large • Protects against HDFS bugs in new version deleting files – Standard practice to use for ALL upgrade even patch releases • But this only works for “stop-the-world” full upgrade and does not support downgrade • Irresponsible to do rolling upgrade without such a mechanism HDP 2.2 has enhanced upgrade-checkpoint (HDFS-5535) • Markers for rollback • “Hardlinks” to protect against deletes due to bugs in the new version of HDFS code – Old scheme had hardlinks but we now delay the deletes • Added downgrade capability • Protobuf based fsImage for compatible extensibility
  • 16. Page16 © Hortonworks Inc. 2015 HDFS Enhancements (2) Minimize service degradation and retain data safety • Fast datanode restart (HDFS-5498) • Write pipeline – every DN will be upgraded and hence many write pipelines will break and repaired • Umbrella Jira HDFS-5535 – Repair it to the same DN during RU (avoid replica data copy) – Retain same number of replicas in pipeline • Upgrade HA standby and failover (NN HA available for a long time)
  • 17. Page17 © Hortonworks Inc. 2015 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • YARN Enhancements • Retaining Job/App Context • Hive Enhancements
  • 18. Page18 © Hortonworks Inc. 2015 YARN Enhancements: Minimize Service Degradation • YARN RM retains application queue (2013) • YARN RM fail-over (2014) – Note this retains the queues but ALL jobs are rekicked • YARN RM can restart while retaining applications (2015)
  • 19. Page19 © Hortonworks Inc. 2015 YARN Enhancements: Minimize Service Degradation • A restarted YARN NodeManager retains existing containers (2015) • Recall: restarting containers will cause serious SLA degradation
  • 20. Page20 © Hortonworks Inc. 2015 YARN Enhancements: Compatibility • Versioning of state-stores of RM and NMs • Compatible evolution of tokens over time • Wire compatibility between mixed versions of RM
  • 21. Page21 © Hortonworks Inc. 2015 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • YARN Enhancements • Retaining Job/App Context • Hive Enhancements
  • 22. Page22 © Hortonworks Inc. 2015 Retaining Job/App context • Previously a Job/Apps used libraries from the local node • Worked because client-node & compute-nodes had same version • But during RU, the NodeManager has multiple versions • Must use the same version as used by the client when submitting a job • Solution: • Framework libraries are now installed in HDFS • Client-context sent as “distro-version” variable in job config • Has side benefits – Frameworks now installed in single node and then uploaded to HDFS • Note Oozie also enhanced to maintain consistent context
  • 23. Page23 © Hortonworks Inc. 2015 YARN Rolling Upgrades: A Cluster Snapshot
  • 24. Page24 © Hortonworks Inc. 2015 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • YARN Enhancements • Retaining Job/App Context • Hive Enhancements
  • 25. Page25 © Hortonworks Inc. 2015 Hive Enhancements • Fast restarts + client-side reconnection • Hive metastore and Hive client • Hive-server2: stateful server that submits the client’s query • Need to keep it running till the old queries complete • Solution: • Allow multiple Hive-servers to run, each registered in Zookeeper • New client requests go to new servers • Old server completes old queries but does not receive any new ones – Old-server is removed from Zookeeper • Side benefits • HA + Load balancing solution for Hiveserver2
  • 26. Page26 © Hortonworks Inc. 2015 Automated Rolling Upgrade Via Ambari Via Your own cluster management scripts
  • 27. Page27 © Hortonworks Inc. 2015 HDP Rolling Upgrades Runbook Pre-requisites • HA • Configs Prepare • Install bits • DB backups • HDFS checkpoint Rolling Upgrade Finalize Rolling Downgrade Rollback NOT Rolling. Shutdown all services. Note: Upgrade time is proportional to # nodes, not data size
  • 28. Page30 © Hortonworks Inc. 2015 Both Manual and Automated Rolling Upgrade • Ambari supports fully automated upgrades • Verifies prerequisites • Performs HDFS upgrade-checkpoint, prompts for DB backups • Performs rolling upgrade • All the components, in the right order • Smoke tests at each critical stages • Opportunities for Admin verification at critical stages • Downgrade if you change your mind • Have published the runbook for those that do not use Ambari • You can do it manually or automate your own process
  • 29. Page31 © Hortonworks Inc. 2015 Runbook: Rolling Upgrade Ambari has automated process for Rolling Upgrades Services are switched over to new version in rolling fashion Any components not installed on cluster are skipped Zookeeper Ranger Core Masters Core Slaves Hive Oozie Falcon Clients Kafka Knox Storm Slider Flume Hue Finalize HDFS, YARN, MR, Tez, HBase, Pig. Hive, Phoenix, Mahout HDFS YARN HBase
  • 30. Page32 © Hortonworks Inc. 2015 Runbook: Rolling Downgrade Zookeeper Ranger Core Masters Core Slaves Hive Oozie Falcon Clients Kafka Knox Storm Slider Flume Hue Downgrade Finalize
  • 31. Page33 © Hortonworks Inc. 2015 Summary • Enterprises run critical services and data on a Hadoop cluster. • Need a live cluster upgrade without degradation and maintaining SLAs • We enhanced Hadoop components for enterprise-grade rolling upgrade • Non-proprietary packaging solution using OS-standard solution (RPMs, Debs, ) • Data safety – HDFS checkpoints and write-pipelines • Maintain SLAs – solve a number of service degradation problems – HDFS write pipelines, Yarn RM, NM state recovery, Hive, … • Jobs/apps continue to run correctly with the right context • Allow downgrade/rollbacks in case of problems • All enhancements truly open source and pushed back to Apache? • Yes of course – that is how Hortonworks does business …
  • 32. Page34 © Hortonworks Inc. 2015 Backup slides
  • 33. Page35 © Hortonworks Inc. 2015 Why didn’t you use alternatives • Alternatives generally keep one version active, not two • We need to move some services as a pack (clients) • We need to support managing confs and binaries together and separately • Maybe we could have done it, but it was getting complex …..

Editor's Notes

  • #9: HDFS write pipeline – slow down writes, risk data Yarn App masters restart – app failure if App master does not have persistent state Node manager restart – Tasks fail, restarts, SLA degrades Hive server is processing client queries – it cannot restart for new version Client must not see failures – many components do not have retry
  • #28: Yahoo! upgrades approx 1K nodes (out of 40K) a day A 4K cluster takes 2 days