SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011–2018. All rights reserved
Migrating your clusters and workloads
from Hadoop 2 to Hadoop 3
Suma Shivaprasad - Staff Engineer
Rohith Sharma K S - Senior Software Engineer
2 © Hortonworks Inc. 2011–2018. All rights reserved
Speaker Info
Suma Shivaprasad
 Apache Hadoop Contributor
 Apache Atlas PMC
 Staff Engineer @ Hortonworks
Rohith Sharma K S
 Apache Hadoop PMC
 Sr.Software Engineer @ Hortonworks
3 © Hortonworks Inc. 2011–2018. All rights reserved
• Why upgrade to Apache Hadoop 3.x?
• Things to consider before upgrade
• Upgrade process
• Workload migration
• Other aspects
• Summary
Agenda
4 © Hortonworks Inc. 2011–2018. All rights reserved
Why upgrade to Apache
Hadoop 3.x?
5 © Hortonworks Inc. 2011–2018. All rights reserved
Major release with lot of features and improvements!
Motivation
• Federation GA
• Erasure Coding
• Significant cost savings in storage
• Reduction of overhead from 200%
to 50%
• Intra-DataNode Disk Balancer
HDFS
• Scheduler Improvements
• New Resource types - GPUs, FPGAs
• Fast and Global scheduling
• Containerization - Docker
• Long running Services rehash
• New UI2
• Timeline Server v2
YARN
6 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop-3
Container Runtimes (Docker / Linux / Default)
Platform Services
Storage
Service
Discovery
Holiday Web App
HBase
HTTP
MR Tez
Hive / Pig
Hive on
LLAPSpark
Resource
Management
Deep Learning App
On-Premises Cloud
7 © Hortonworks Inc. 2011–2018. All rights reserved
Things to consider
before upgrade
8 © Hortonworks Inc. 2011–2018. All rights reserved
Upgrades involve many things
• Upgrade mechanism
• Recommendation for 3.x - Express or Rolling ?
• Compatibility
• Source & Target versions
• Tooling
• Cluster Environment
• Configuration changes
• Script changes
• Classpath changes
9 © Hortonworks Inc. 2011–2018. All rights reserved
Upgrade mechanism: Express/Rolling Upgrades
• “Stop the world” Upgrades
• Cluster downtime
• Less stringent prerequisites
• Process
• Upgrade masters and workers in
one shot
Express Upgrades
• Preserve cluster operation
• Minimizes Service impact and downtime
• Can take longer to complete
• Process
• Upgrades masters and workers in
batches
Rolling Upgrades
10 © Hortonworks Inc. 2011–2018. All rights reserved
Recommendation for 3.x - Express or Rolling ?
● Major version upgrade
● Challenges and issues in supporting Rolling Upgrades
● Why rolling upgrades can't be done?
● HDFS-13596
● Change in edit log format
● HADOOP-15502
● MetricsPlugin API In-compatibility change
● HDFS-6440
● Incompatible changes in image transfer protocol
● Recommended
● ‘Express Upgrade’ from Hadoop 2 to 3
11 © Hortonworks Inc. 2011–2018. All rights reserved
Compatibility
• Wire compatibility
o Preserves compatibility with Hadoop 2 clients
o Distcp/WebHDFS compatibility preserved
• API compatibility
Not fully!
o Dependency version bumps
o Removal of deprecated APIs and tools
o Shell script rewrite, rework of Hadoop tools scripts
o Incompatible bug fixes!
12 © Hortonworks Inc. 2011–2018. All rights reserved
Source & Target versions
● Upgrades Tested with
• Why 2.8.4 release?
● Most of production deployments are close to 2.8.x
● What should users of 2.6.x and 2.7.x do?
● Recommend upgrading at least to Hadoop 2.8.4 before migrating to Hadoop 3!
Hadoop 2 Base version Hadoop 3 Base version
Apache Hadoop 2.8.4 Apache Hadoop 3.1.x
13 © Hortonworks Inc. 2011–2018. All rights reserved
Tooling
● Fresh Install
● Fully automated via Apache Ambari
● Manual installation of RPMs/Tar balls
● Upgrade
● Fully automated via Apache Ambari 2.7
● Manual upgrade
14 © Hortonworks Inc. 2011–2018. All rights reserved
Cluster Environment
• >= Java 8
• Java 7 EOL in April
2015
• Lot of libraries
support only Java 8
Java
• >= Bash V3
• POSIX shell NOT
supported
Shell
• If you want to use
containerized apps
in 3.x
• >= 1.12.5
• Also corresponding
stable OS
Docker
15 © Hortonworks Inc. 2011–2018. All rights reserved
Configuration changes: Hadoop Env files
• Common placeholder
• Precedence rule
• yarn/hdfs-env.sh
> hadoop-env.sh
> hard-coded
defaults
hadoop-env.sh
• HDFS_* replaces
HADOOP_*
• Precedence rule
• hdfs-env.sh >
hadoop-env.sh >
hard-coded
defaults
hdfs-env.sh
• YARN_* replaces
HADOOP_*
• Precedence rule
• yarn-env.sh >
hadoop-env.sh >
hard-coded
defaults
yarn-env.sh
16 © Hortonworks Inc. 2011–2018. All rights reserved
Configuration changes: Hadoop Env files Contd..
Daemon Heap Size HADOOP-10950
• Deprecated
• HADOOP_HEAPSIZE
• Replaced with
• HADOOP_HEAPSIZE_MAX and HADOOP_HEAPSIZE_MIN
• Units support in heap size
• Default unit is MB
• Ex: HADOOP_HEAPSIZE_MAX=4096
• Ex: HADOOP_HEAPSIZE_MAX=4g
• Auto-tuning
• Based on memory size of the host
17 © Hortonworks Inc. 2011–2018. All rights reserved
Configuration changes: YARN
Modified Defaults
• RM Max Completed Applications in State Store/Memory
Configuration Previous Current
yarn.resourcemanager.max-completed-
applications
10000 1000
yarn.resourcemanager.state-store.max-
completed-applications
10000 1000
18 © Hortonworks Inc. 2011–2018. All rights reserved
Configurations Changes: HDFS
Service Previous Current Port
NameNode 50470
50070
9871
9870
DataNode 50020
50010
50475
50075
9867
9866
9865
9864
Secondary NameNode 50091
50090
9869
9868
KMS 16000 9600
Change in Default Daemon Ports (HDFS-9427)
19 © Hortonworks Inc. 2011–2018. All rights reserved
Script changes: Starting/Stopping Hadoop Daemons
Daemon scripts
• *-daemon.sh deprecated
• Use bin/hdfs or bin/yarn commands with --daemon option
• Ex: bin/hdfs --daemon start/stop/status namenode
• Ex: bin/yarn --daemon start/stop/status resourcemanager
Debuggability
• Scripts support –debug
• Construction of env
• Java options and classpath
Logs/Pid
• Created as hadoop-yarn* instead of yarn-yarn*
• Log4j settings in the *-daemon.sh have been removed. Instead set via *_OPT in*-env.sh
• Eg: YARN_RESOURCEMANAGER_OPTS in yarn-env.sh
20 © Hortonworks Inc. 2011–2018. All rights reserved
Classpath Changes
Classpath isolation now!
Users should rebuild their applications with shaded hadoop-client jars
● Hadoop Dependencies leaked to application’s classpath - Guava,
protobuf,jackson,jetty...
● Shaded jars available - isolates downstream clients from any third party dependencies
HADOOP-11804
○ hadoop-client-api For compile time dependencies
○ hadoop-client-runtime For runtime third-party dependencies
○ hadoop-minicluster For test scope dependencies
● HDFS-6200 hadoop-hdfs jar contained both the hdfs server and the hdfs client.
○ Clients should instead depend on hadoop-hdfs-client instead to isolate themselves
from server-side dependencies
● No YARN/MR shaded jars
21 © Hortonworks Inc. 2011–2018. All rights reserved
Upgrade process
22 © Hortonworks Inc. 2011–2018. All rights reserved
YARN
• Stop all YARN queues
• Stop/Wait for Running applications to
complete
Hadoop Pre-Upgrade Steps
HDFS
• Run fsck and fix any errors
• hdfs fsck / -files –blocks –locations > dfs-old-
fsck.1.log
• Checkpoint Metadata
• hdfs dfsadmin -safemode enter
• hdfs dfsadmin -saveNamespace
• Backup checkpoint files
• ${dfs.namenode.name.dir}/current
• Get Cluster DataNode reports
• hdfs dfsadmin -report > dfs-old-report-1.log
• Capture Namespace
• hdfs dfs –ls –R / > dfs-old-lsr-1.log
• Finalize previous upgrade
• hdfs dfsadmin –finalizeUpgrade
STACK
• Backup Configuration files
• Stop users/services using YARN/HDFS
• Other metadata backup – Hive MetaStore,
Oozie etc
23 © Hortonworks Inc. 2011–2018. All rights reserved
Upgrade Steps
Configuration
Updates
Additional HDFS Upgrade Steps
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-
2.6.3/bk_command-line-upgrade/content/start-hadoop-core-
25.html
Install new
packages
Link to new
versions
Start ServicesStop Services
24 © Hortonworks Inc. 2011–2018. All rights reserved
Upgrade Validation
• Run HDFS Service checks
• Verify NameNode gets out of Safe Mode
hdfs dfsadmin -safeMode wait
• FileSystem Health
• Compare with Previous State
• Node list
• Full NameSpace
• Let Cluster run production workloads for
a while
• When ready to discard backup, finalize
HDFS upgrade
hdfs dfsadmin –upgrade finalize/query
HDFS
• Run YARN Service checks
• Submit test applications – MR, TEZ, …
YARN
25 © Hortonworks Inc. 2011–2018. All rights reserved
Enable New features
• Erasure Coding
• https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-
hdfs/HDFSErasureCoding.html
• YARN UI2
• https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnUI2.html
• ATSv2
• New Daemon – Timeline Reader
• https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html
• YARN DNS
• Service Discovery of YARN Services
• http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-
service/RegistryDNS.html
• HDFS Federation
• https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html
26 © Hortonworks Inc. 2011–2018. All rights reserved
Migrating
workloads
27 © Hortonworks Inc. 2011–2018. All rights reserved
● Full Binary compatibility of mapreduce
APIs
● hadoop-streaming related deprecated
IO Formats removed HADOOP-10485
○ XMLRecordInput/Output
○ CSVRecordInput
Compatibility Configuration
MapReduce (1/2)
● yarn.app.mapreduce.client.job.max-
retries
o Default changed from 0 to 3
o Protects clients from failures that are
transient.
28 © Hortonworks Inc. 2011–2018. All rights reserved
MapReduce (2/2) - Task Heap Management MAPREDUCE-5785
mapreduce.map.memory.mb mapreduce.map.java.opts Xmx Behaviour
Configured 2048MB Configured 1638 MB No Change 1638MB
Configured 2048 MB Not Configured Derived from
mapreduce.map.memory.mb 1638MB
Not Configured Configure 1638 MB Automatically inferred from Xmx in
mapreduce.map.java.opts. 1638MB
Not Configured Not Configured Default : 1024 MB
Heap size no longer needs to be specified in task configuration and Java options.
29 © Hortonworks Inc. 2011–2018. All rights reserved
Hive on Tez
● Hive 3.0.0 Hive version supporting Hadoop 3 HIVE-16531
● Does NOT support rolling upgrades
● Acid table format changes are not compatible with 2.x
● Tez version support for Hadoop 3
o Planned for release 0.10.0
● TEZ-3923 Move master to Hadoop 3+ and create separate 0.9.x line
● TEZ-3252 - [Umbrella] Enable support for Hadoop-3.x
30 © Hortonworks Inc. 2011–2018. All rights reserved
Spark
Ongoing efforts in community to build/validate Spark with Hadoop 3
○ SPARK-23534 Umbrella jira to Build/test with Hadoop 3
31 © Hortonworks Inc. 2011–2018. All rights reserved
Apache HBase
● HBase 2.0 supports Hadoop 3
● Does NOT support Rolling Upgrades in major version upgrades (1.x to 2.x)
● Refer Upgrade documentation for further details
https://github.com/apache/hbase/blob/master/src/main/asciidoc/_chapters/upgrading
.adoc#upgrade2.0
32 © Hortonworks Inc. 2011–2018. All rights reserved
• Apache Slider is retiring from Apache
Incubator
• Superseded by YARN Services.
• Port your Slider apps to Yarn Services
• Benefits of Yarn Services
• Easier to manage and deploy
• Single “yarnfile” to configure a Yarn
Service
• Supports container placement scheduling
such as affinity and anti-affinity YARN-6592
• Rolling upgrades for containers and service
YARN-7512 and YARN-4726.
• Services UI in YARN UI2 improving
debuggability and log access.
Apache Slider Applications
• YarnUI2
Integration
• Rolling
Upgrades for
containers/Se
rvices
• Affinity
• Anti-Affinity
• Single
“yarnfile” for
services
Easier to
Manage
Placement
UIUpgrades
33 © Hortonworks Inc. 2011–2018. All rights reserved
Hive on LLAP
● Now runs as a Yarn Service Application instead of a Slider App
● Version that supports LLAP as a YARN service is not released yet.
● Planned for release Hive-4.0.0/3.1.0
● Refer https://hortonworks.com/blog/apache-hive-llap-as-a-yarn-service
34 © Hortonworks Inc. 2011–2018. All rights reserved
PIG/Oozie
Support for Hadoop 3 In-Progress in the community
● PIG
○ Planned for release – 0.18.0
○ PIG-5253 Pig Hadoop 3 support
● OOZIE
○ Planned for release – 5.1.0
○ OOZIE-2973 Make sure Oozie works with Hadoop 3
35 © Hortonworks Inc. 2011–2018. All rights reserved
Other Aspects
36 © Hortonworks Inc. 2011–2018. All rights reserved
Other Aspects
Validations In-progress
● Performance testing
● Scale testing for HDFS/YARN
● OSes compatibility
37 © Hortonworks Inc. 2011–2018. All rights reserved
Summary
• Hadoop 3
• Eagerly awaited release with lots of new features and optimizations !
• 3.1.1 will be released soon with some bug fixes identified since 3.1.0
• Express Upgrades are recommended
• Admins
• A bit of work
• Users
• Should work mostly as-is
• Community effort
• HADOOP-15501 Upgrade efforts to Hadoop 3.x
• Wiki - https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.x+to+3.x+Upgrade+Efforts
• Volunteers needed for validating workload upgrades on Hadoop 3 !
38 © Hortonworks Inc. 2011–2018. All rights reserved
Questions?
39 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you
Ad

More Related Content

What's hot (20)

Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Ververica
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
Maksud Ibrahimov
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
DataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
Deon Huang
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
DataWorks Summit
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Ververica
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
Maksud Ibrahimov
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
DataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
Deon Huang
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
DataWorks Summit
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 

Similar to Migrating your clusters and workloads from Hadoop 2 to Hadoop 3 (20)

Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
DataWorks Summit
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDocker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
DataWorks Summit
 
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
DataWorks Summit
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
DataWorks Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
DataWorks Summit
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
DataWorks Summit
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Data Con LA
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
DataWorks Summit
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
Chris Nauroth
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
Hortonworks
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
DataWorks Summit
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
Yifeng Jiang
 
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
DataWorks Summit
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDocker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
DataWorks Summit
 
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
DataWorks Summit
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
DataWorks Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
DataWorks Summit
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Data Con LA
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
DataWorks Summit
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
Chris Nauroth
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
Hortonworks
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
DataWorks Summit
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
Yifeng Jiang
 
Ad

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Ad

Recently uploaded (20)

#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko
Fwdays
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.
gregtap1
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Rock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning JourneyRock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning Journey
Lynda Kane
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Automation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From AnywhereAutomation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From Anywhere
Lynda Kane
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko
Fwdays
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.
gregtap1
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Rock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning JourneyRock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning Journey
Lynda Kane
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Automation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From AnywhereAutomation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From Anywhere
Lynda Kane
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 

Migrating your clusters and workloads from Hadoop 2 to Hadoop 3

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Migrating your clusters and workloads from Hadoop 2 to Hadoop 3 Suma Shivaprasad - Staff Engineer Rohith Sharma K S - Senior Software Engineer
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Speaker Info Suma Shivaprasad  Apache Hadoop Contributor  Apache Atlas PMC  Staff Engineer @ Hortonworks Rohith Sharma K S  Apache Hadoop PMC  Sr.Software Engineer @ Hortonworks
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved • Why upgrade to Apache Hadoop 3.x? • Things to consider before upgrade • Upgrade process • Workload migration • Other aspects • Summary Agenda
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved Why upgrade to Apache Hadoop 3.x?
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Major release with lot of features and improvements! Motivation • Federation GA • Erasure Coding • Significant cost savings in storage • Reduction of overhead from 200% to 50% • Intra-DataNode Disk Balancer HDFS • Scheduler Improvements • New Resource types - GPUs, FPGAs • Fast and Global scheduling • Containerization - Docker • Long running Services rehash • New UI2 • Timeline Server v2 YARN
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop-3 Container Runtimes (Docker / Linux / Default) Platform Services Storage Service Discovery Holiday Web App HBase HTTP MR Tez Hive / Pig Hive on LLAPSpark Resource Management Deep Learning App On-Premises Cloud
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Things to consider before upgrade
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Upgrades involve many things • Upgrade mechanism • Recommendation for 3.x - Express or Rolling ? • Compatibility • Source & Target versions • Tooling • Cluster Environment • Configuration changes • Script changes • Classpath changes
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade mechanism: Express/Rolling Upgrades • “Stop the world” Upgrades • Cluster downtime • Less stringent prerequisites • Process • Upgrade masters and workers in one shot Express Upgrades • Preserve cluster operation • Minimizes Service impact and downtime • Can take longer to complete • Process • Upgrades masters and workers in batches Rolling Upgrades
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Recommendation for 3.x - Express or Rolling ? ● Major version upgrade ● Challenges and issues in supporting Rolling Upgrades ● Why rolling upgrades can't be done? ● HDFS-13596 ● Change in edit log format ● HADOOP-15502 ● MetricsPlugin API In-compatibility change ● HDFS-6440 ● Incompatible changes in image transfer protocol ● Recommended ● ‘Express Upgrade’ from Hadoop 2 to 3
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Compatibility • Wire compatibility o Preserves compatibility with Hadoop 2 clients o Distcp/WebHDFS compatibility preserved • API compatibility Not fully! o Dependency version bumps o Removal of deprecated APIs and tools o Shell script rewrite, rework of Hadoop tools scripts o Incompatible bug fixes!
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved Source & Target versions ● Upgrades Tested with • Why 2.8.4 release? ● Most of production deployments are close to 2.8.x ● What should users of 2.6.x and 2.7.x do? ● Recommend upgrading at least to Hadoop 2.8.4 before migrating to Hadoop 3! Hadoop 2 Base version Hadoop 3 Base version Apache Hadoop 2.8.4 Apache Hadoop 3.1.x
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Tooling ● Fresh Install ● Fully automated via Apache Ambari ● Manual installation of RPMs/Tar balls ● Upgrade ● Fully automated via Apache Ambari 2.7 ● Manual upgrade
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Cluster Environment • >= Java 8 • Java 7 EOL in April 2015 • Lot of libraries support only Java 8 Java • >= Bash V3 • POSIX shell NOT supported Shell • If you want to use containerized apps in 3.x • >= 1.12.5 • Also corresponding stable OS Docker
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Configuration changes: Hadoop Env files • Common placeholder • Precedence rule • yarn/hdfs-env.sh > hadoop-env.sh > hard-coded defaults hadoop-env.sh • HDFS_* replaces HADOOP_* • Precedence rule • hdfs-env.sh > hadoop-env.sh > hard-coded defaults hdfs-env.sh • YARN_* replaces HADOOP_* • Precedence rule • yarn-env.sh > hadoop-env.sh > hard-coded defaults yarn-env.sh
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved Configuration changes: Hadoop Env files Contd.. Daemon Heap Size HADOOP-10950 • Deprecated • HADOOP_HEAPSIZE • Replaced with • HADOOP_HEAPSIZE_MAX and HADOOP_HEAPSIZE_MIN • Units support in heap size • Default unit is MB • Ex: HADOOP_HEAPSIZE_MAX=4096 • Ex: HADOOP_HEAPSIZE_MAX=4g • Auto-tuning • Based on memory size of the host
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Configuration changes: YARN Modified Defaults • RM Max Completed Applications in State Store/Memory Configuration Previous Current yarn.resourcemanager.max-completed- applications 10000 1000 yarn.resourcemanager.state-store.max- completed-applications 10000 1000
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Configurations Changes: HDFS Service Previous Current Port NameNode 50470 50070 9871 9870 DataNode 50020 50010 50475 50075 9867 9866 9865 9864 Secondary NameNode 50091 50090 9869 9868 KMS 16000 9600 Change in Default Daemon Ports (HDFS-9427)
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Script changes: Starting/Stopping Hadoop Daemons Daemon scripts • *-daemon.sh deprecated • Use bin/hdfs or bin/yarn commands with --daemon option • Ex: bin/hdfs --daemon start/stop/status namenode • Ex: bin/yarn --daemon start/stop/status resourcemanager Debuggability • Scripts support –debug • Construction of env • Java options and classpath Logs/Pid • Created as hadoop-yarn* instead of yarn-yarn* • Log4j settings in the *-daemon.sh have been removed. Instead set via *_OPT in*-env.sh • Eg: YARN_RESOURCEMANAGER_OPTS in yarn-env.sh
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Classpath Changes Classpath isolation now! Users should rebuild their applications with shaded hadoop-client jars ● Hadoop Dependencies leaked to application’s classpath - Guava, protobuf,jackson,jetty... ● Shaded jars available - isolates downstream clients from any third party dependencies HADOOP-11804 ○ hadoop-client-api For compile time dependencies ○ hadoop-client-runtime For runtime third-party dependencies ○ hadoop-minicluster For test scope dependencies ● HDFS-6200 hadoop-hdfs jar contained both the hdfs server and the hdfs client. ○ Clients should instead depend on hadoop-hdfs-client instead to isolate themselves from server-side dependencies ● No YARN/MR shaded jars
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade process
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved YARN • Stop all YARN queues • Stop/Wait for Running applications to complete Hadoop Pre-Upgrade Steps HDFS • Run fsck and fix any errors • hdfs fsck / -files –blocks –locations > dfs-old- fsck.1.log • Checkpoint Metadata • hdfs dfsadmin -safemode enter • hdfs dfsadmin -saveNamespace • Backup checkpoint files • ${dfs.namenode.name.dir}/current • Get Cluster DataNode reports • hdfs dfsadmin -report > dfs-old-report-1.log • Capture Namespace • hdfs dfs –ls –R / > dfs-old-lsr-1.log • Finalize previous upgrade • hdfs dfsadmin –finalizeUpgrade STACK • Backup Configuration files • Stop users/services using YARN/HDFS • Other metadata backup – Hive MetaStore, Oozie etc
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade Steps Configuration Updates Additional HDFS Upgrade Steps https://docs.hortonworks.com/HDPDocuments/HDP2/HDP- 2.6.3/bk_command-line-upgrade/content/start-hadoop-core- 25.html Install new packages Link to new versions Start ServicesStop Services
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade Validation • Run HDFS Service checks • Verify NameNode gets out of Safe Mode hdfs dfsadmin -safeMode wait • FileSystem Health • Compare with Previous State • Node list • Full NameSpace • Let Cluster run production workloads for a while • When ready to discard backup, finalize HDFS upgrade hdfs dfsadmin –upgrade finalize/query HDFS • Run YARN Service checks • Submit test applications – MR, TEZ, … YARN
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved Enable New features • Erasure Coding • https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop- hdfs/HDFSErasureCoding.html • YARN UI2 • https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnUI2.html • ATSv2 • New Daemon – Timeline Reader • https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html • YARN DNS • Service Discovery of YARN Services • http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn- service/RegistryDNS.html • HDFS Federation • https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved Migrating workloads
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved ● Full Binary compatibility of mapreduce APIs ● hadoop-streaming related deprecated IO Formats removed HADOOP-10485 ○ XMLRecordInput/Output ○ CSVRecordInput Compatibility Configuration MapReduce (1/2) ● yarn.app.mapreduce.client.job.max- retries o Default changed from 0 to 3 o Protects clients from failures that are transient.
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved MapReduce (2/2) - Task Heap Management MAPREDUCE-5785 mapreduce.map.memory.mb mapreduce.map.java.opts Xmx Behaviour Configured 2048MB Configured 1638 MB No Change 1638MB Configured 2048 MB Not Configured Derived from mapreduce.map.memory.mb 1638MB Not Configured Configure 1638 MB Automatically inferred from Xmx in mapreduce.map.java.opts. 1638MB Not Configured Not Configured Default : 1024 MB Heap size no longer needs to be specified in task configuration and Java options.
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved Hive on Tez ● Hive 3.0.0 Hive version supporting Hadoop 3 HIVE-16531 ● Does NOT support rolling upgrades ● Acid table format changes are not compatible with 2.x ● Tez version support for Hadoop 3 o Planned for release 0.10.0 ● TEZ-3923 Move master to Hadoop 3+ and create separate 0.9.x line ● TEZ-3252 - [Umbrella] Enable support for Hadoop-3.x
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved Spark Ongoing efforts in community to build/validate Spark with Hadoop 3 ○ SPARK-23534 Umbrella jira to Build/test with Hadoop 3
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved Apache HBase ● HBase 2.0 supports Hadoop 3 ● Does NOT support Rolling Upgrades in major version upgrades (1.x to 2.x) ● Refer Upgrade documentation for further details https://github.com/apache/hbase/blob/master/src/main/asciidoc/_chapters/upgrading .adoc#upgrade2.0
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved • Apache Slider is retiring from Apache Incubator • Superseded by YARN Services. • Port your Slider apps to Yarn Services • Benefits of Yarn Services • Easier to manage and deploy • Single “yarnfile” to configure a Yarn Service • Supports container placement scheduling such as affinity and anti-affinity YARN-6592 • Rolling upgrades for containers and service YARN-7512 and YARN-4726. • Services UI in YARN UI2 improving debuggability and log access. Apache Slider Applications • YarnUI2 Integration • Rolling Upgrades for containers/Se rvices • Affinity • Anti-Affinity • Single “yarnfile” for services Easier to Manage Placement UIUpgrades
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved Hive on LLAP ● Now runs as a Yarn Service Application instead of a Slider App ● Version that supports LLAP as a YARN service is not released yet. ● Planned for release Hive-4.0.0/3.1.0 ● Refer https://hortonworks.com/blog/apache-hive-llap-as-a-yarn-service
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved PIG/Oozie Support for Hadoop 3 In-Progress in the community ● PIG ○ Planned for release – 0.18.0 ○ PIG-5253 Pig Hadoop 3 support ● OOZIE ○ Planned for release – 5.1.0 ○ OOZIE-2973 Make sure Oozie works with Hadoop 3
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved Other Aspects
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved Other Aspects Validations In-progress ● Performance testing ● Scale testing for HDFS/YARN ● OSes compatibility
  • 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved Summary • Hadoop 3 • Eagerly awaited release with lots of new features and optimizations ! • 3.1.1 will be released soon with some bug fixes identified since 3.1.0 • Express Upgrades are recommended • Admins • A bit of work • Users • Should work mostly as-is • Community effort • HADOOP-15501 Upgrade efforts to Hadoop 3.x • Wiki - https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.x+to+3.x+Upgrade+Efforts • Volunteers needed for validating workload upgrades on Hadoop 3 !
  • 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved Questions?
  • 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved Thank you

Editor's Notes

  • #6: YARN Yarn scheduler Improvements Improves cluster throughput Distributed scheduling significantly Fine grained scheduling according to resource types - GPUs, FPGAs Support for Long running Services and Docker Revamped UI ATS v2 - More scalable and based on Hbase HDFS HDFS Federation HDFS Intra-DataNode Disk Balancer Erasure Coding Significant cost savings in storage - savings in storage cost Reduction of overhead from 200% to 50%
  • #7: TODO- Animation
  • #10: YARN Yarn scheduler Improvements Improves cluster throughput Distributed scheduling significantly Fine grained scheduling according to resource types - GPUs, FPGAs Support for Long running Services and Docker Revamped UI ATS v2 - More scalable and based on Hbase HDFS HDFS Federation HDFS Intra-DataNode Disk Balancer Erasure Coding Significant cost savings in storage - savings in storage cost Reduction of overhead from 200% to 50%
  • #11: Major version upgrade - there are some challenges and issues in supporting Rolling Upgrades Why rolling upgrades cannot be done? HDFS Rolling Upgrade has issues with NN restarts due to change in edit log format HDFS-13596 Hadoop daemons configured with a custom MetricsPlugin sink fail to start after upgrade HADOOP-15502 (API In-compatibility) HDFS-6440 DataNode layout change prevents downgrade. Recommend admins to ‘Express Upgrade’ clusters from Hadoop 2 to 3 Community effort for upgrade issues is being tracked here - HADOOP-15501
  • #12: Wire compatibility Preserves compatibility with Hadoop 2 clients Distcp/WebHDFS compatibility preserved API compatibility Not fully! Dependency version bumps Removal of deprecated APIs and tools Shell script rewrite, rework of Hadoop tools scripts Incompatible bug fixes
  • #13: Upgrade has been tested/validated from Apache Hadoop 2.8.4 to Hadoop 3.1.0 in our test environments Ongoing effort in community to release Hadoop 3.1.1 with a lot of fixes. Recommend upgrading lower Hadoop 2 versions to at least Hadoop 2.8.4 before migrating to Hadoop 3
  • #14: Fresh Install Fully automated via Apache Ambari Manual installation of RPMs/Tarballs Upgrade Fully automated via Apache Ambari Manual upgrade
  • #15: Fresh Install Fully automated via Apache Ambari Manual installation of RPMs/Tarballs Upgrade Fully automated via Apache Ambari Manual upgrade
  • #16: Fresh Install Fully automated via Apache Ambari Manual installation of RPMs/Tarballs Upgrade Fully automated via Apache Ambari Manual upgrade
  • #17: [10:46 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/yarn-daemon.sh --config /usr/apache/conf start nodemanager WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. WARNING: Use of this script to start YARN daemons is deprecated. WARNING: Attempting to execute replacement "yarn --daemon start" instead. WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. [10:47 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/hadoop-daemon.sh --config /usr/apache/conf start datanode WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to start HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon start" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_SECURE_DN_PID_DIR has been replaced by HADOOP_SECURE_PID_DIR. Using value of HADOOP_SECURE_DN_PID_DIR. WARNING: HADOOP_SECURE_DN_LOG_DIR has been replaced by HADOOP_SECURE_LOG_DIR. Using value of HADOOP_SECURE_DN_LOG_DIR. WARNING: HADOOP_DATANODE_OPTS has been replaced by HDFS_DATANODE_OPTS. Using value of HADOOP_DATANODE_OPTS. ERROR: You must be a privileged user in order to run a secure service. [10:48 PM] Rohith Sharma KS: WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to stop HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon stop" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_NAMENODE_OPTS has been replaced by HDFS_NAMENODE_OPTS. Using value of HADOOP_NAMENODE_OPTS.
  • #18: [10:46 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/yarn-daemon.sh --config /usr/apache/conf start nodemanager WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. WARNING: Use of this script to start YARN daemons is deprecated. WARNING: Attempting to execute replacement "yarn --daemon start" instead. WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. [10:47 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/hadoop-daemon.sh --config /usr/apache/conf start datanode WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to start HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon start" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_SECURE_DN_PID_DIR has been replaced by HADOOP_SECURE_PID_DIR. Using value of HADOOP_SECURE_DN_PID_DIR. WARNING: HADOOP_SECURE_DN_LOG_DIR has been replaced by HADOOP_SECURE_LOG_DIR. Using value of HADOOP_SECURE_DN_LOG_DIR. WARNING: HADOOP_DATANODE_OPTS has been replaced by HDFS_DATANODE_OPTS. Using value of HADOOP_DATANODE_OPTS. ERROR: You must be a privileged user in order to run a secure service. [10:48 PM] Rohith Sharma KS: WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to stop HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon stop" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_NAMENODE_OPTS has been replaced by HDFS_NAMENODE_OPTS. Using value of HADOOP_NAMENODE_OPTS.
  • #19: [10:46 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/yarn-daemon.sh --config /usr/apache/conf start nodemanager WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. WARNING: Use of this script to start YARN daemons is deprecated. WARNING: Attempting to execute replacement "yarn --daemon start" instead. WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. [10:47 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/hadoop-daemon.sh --config /usr/apache/conf start datanode WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to start HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon start" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_SECURE_DN_PID_DIR has been replaced by HADOOP_SECURE_PID_DIR. Using value of HADOOP_SECURE_DN_PID_DIR. WARNING: HADOOP_SECURE_DN_LOG_DIR has been replaced by HADOOP_SECURE_LOG_DIR. Using value of HADOOP_SECURE_DN_LOG_DIR. WARNING: HADOOP_DATANODE_OPTS has been replaced by HDFS_DATANODE_OPTS. Using value of HADOOP_DATANODE_OPTS. ERROR: You must be a privileged user in order to run a secure service. [10:48 PM] Rohith Sharma KS: WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to stop HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon stop" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_NAMENODE_OPTS has been replaced by HDFS_NAMENODE_OPTS. Using value of HADOOP_NAMENODE_OPTS.
  • #23: HDFS Backup Configuration files Create a list of all the DataNodes in the cluster. hdfs dfsadmin -report > dfs-old-report-1.log Save Namespace hdfs dfsadmin -safemode enter hdfs dfsadmin -saveNamespace Backup the checkpoint files located in ${dfs.namenode.name.dir}/current Finalize any prior HDFS upgrade hdfs dfsadmin -finalizeUpgrade Create a fsimage for rollback hdfs dfsadmin -rollingUpgrade prepare YARN Stop all YARN queues Stop/Wait for Running applications to finish NOTE: YARN supports rolling upgrade!
  • #25: YARN Yarn scheduler Improvements Improves cluster throughput Distributed scheduling significantly Fine grained scheduling according to resource types - GPUs, FPGAs Support for Long running Services and Docker Revamped UI ATS v2 - More scalable and based on Hbase HDFS HDFS Federation HDFS Intra-DataNode Disk Balancer Erasure Coding Significant cost savings in storage - savings in storage cost Reduction of overhead from 200% to 50%
  • #29: Heap size no longer needs to be specified in task configuration and Java options.