Apache Hadoop 3 updates with migration story

1 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hadoop 3 Insights &
Migrating your clusters from Hadoop 2 to Hadoop 3
Sunil Govindan
Rohith Sharma K S

Speakers
Sunil Govindan
• Apache Hadoop PMC
• Contributing to YARN Scheduler improvements, Integrating TensorFlow to YARN etc
• Staff Engineer @ Hortonworks YARN Engineering Team
Rohith Sharma K S
• Apache Hadoop PMC
• Contributing Application Timeline Service v2 and Native Services
• Sr. Engineer @ Hortonworks YARN Engineering Team

0
20
40
60
80
100
120 aw
ajisakaa
jzhuge
rohithsharma
cheersyang
Naganarasimha
jojochuang
jingzhao
chris.douglas
Cyl
HuafengWang
sidharta-s
yzhangal
raviprak
nandakumar131
manirajv06@gmail.com
sisankar@microsoft.com
busbey
vishwajeet.dusane
Jim_Brennan
ayousufi
hrsharma
belugabehr
l201514
v123582
gphillips
iveselovsky
esmanii
HongfeiChen
ekundin
daemon
zhaoyunjiong
huanbang1993
ASikaria
jmaron
zhz
LarryLo
eric@apache.org
granthenke
liuhongtong
mpercy
viji_r
nemon
masatana
olegd
Jungyoo
ameetz
krash
poliva
clamb
raja@cmbasics.com
aoe
rcatherinot
call-fold
hsutherland
trtrmitya
imenache
kellyzly
Deepti.Sawhney
wuweiwei
shihaoliang
Frankola
jalberti
zhengxg3
bleuleon
ssonker
Total
Total
(a) 456 Contributors
(b) 46 with > 25 patches
(c) Long Tail
Our Community : Contributors 2.8.3 -> 3.1.0

• Introduction
• HDFS Improvements
• YARN State of Union
• Migration Story from Hadoop 2 clusters to Hadoop 3
Agenda

A brief timeline from past year: GA Releases
2.8.0 2.9.0 3.0.0 3.1.0
• GPU/FPGA
• YARN Native
Service
• Placement
Constraints
• YARN Federation
• Opportunistic
Container
(Backported from 3.0)
• New YARN UI
• Timeline V2
• Global Scheduling
• Multiple Resource
types
• New YARN UI
• Timeline service V2
• Erasure Coding
Ever involving requirements (computation intensive, larger, services)
• Application Priority
• Reservations
• Node labels
improvements
22 March ’17 17 Nov ’17 13 Dec ‘17 06 April ‘182.8.4 2.9.1 3.0.3 3.1.0

Apache Hadoop 3.0/3.1

• Motivation: improve storage efficiency of HDFS
• the storage efficiency compared to 3x replication
• Reduction of overhead from 200% to 40%
• Uses Reed-Solomon(k,m) erasure codes instead of replication
• Support for multiple erasure coding policies
• RS(3,2), RS(6,3), RS(10,4)
• Can improves data durability
• RS(6,3) can tolerate 3 failures
• RS(10,4) can tolerate 4 failures
• Missing blocks reconstructed from remaining blocks
HDFS Features : Erasure Coding

• Shell script rewrite
• Support for multiple Standby NameNodes
• Intra-DataNode balancer
• Support for Microsoft Azure Data Lake and Aliyun OSS
• Move default ports out of the ephemeral range
• S3 consistency and performance improvements (ongoing)
• Tightening the Hadoop compatibility policy (ongoing)
HDFS Features : Miscellaneous

YARN : Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services

Key Themes
Scale

• Tons of sites with clusters made up of large amount of nodes
• Oath (Yahoo!), Twitter, LinkedIn, Microsoft, Alibaba etc.
• Now: 40K nodes (federated), 20K nodes (single cluster).
• Roadmap: To 100K and beyond
Looking at the Scale!

Key Themes
Scale

Moving towards Global & Fast Scheduling
Scheduler
state
Placement
Committer
• Problems
• Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions.
• With this, we improved to
• Look at several nodes at a time
• YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour!
• Much better placement decisions

Better placement strategies (YARN-6592)
• Past
• Supported constraints in form of Node Locality
• Now YARN can support a lot more use cases
• Co-locate the allocations of a job on the same rack (affinity)
• Spread allocations across machines (anti-affinity) to minimize resource interference
• Allow up to a specific number of allocations in a node group (cardinality)

Addition Scheduling Improvements
• Absolute Resources Configuration in CS – YARN-5881
• Auto Creation of Leaf Queues - YARN-7117
• Application Timeout – YARN-3813
• Reservations in YARN

Key Themes
Scale

Usability: Queue & Logs
API based queue management
Decentralized
(YARN-5734)
Improved logs
management
(YARN-4904)
Live application logs

Usability: UI

Timeline Service 2.0
• Understanding and Monitoring a Hadoop cluster itself is a BigData problem
• Using HBase as backend for better scalability for read/write
• More robust storage fault tolerance
• Migration and compatibility with v.1.5

Key Themes
Scale

• Run both with and without docker on the same cluster
• Choose at run-time!
Containers

• YARN – Big Data apps, moving to generic apps with containerization
• K8S – industry standard orchestration layer for generic apps
• We have done YARN on YARN! Next slide
• YARN on K8S? K8S on YARN? Run them side-by-side?
• What does containerized BigData mean?
• Lift and Shift?
• Break up every service?
What about running all of Big Data containerized?

Ycloud: YARN Based Container Cloud
 Testing Hadoop on Hadoop!

Key Themes
Scale

• YARN supported only Memory and
CPU
• Now
• A generalized vector for all resources
• Admin could add arbitrary resource types!
Resource profiles and custom resource types
• Ease of resource requesting model
using profiles for apps
Profile Memory CPU GPU
Small 2 GB 4 Cores 0 Cores
Medium 4 GB 8 Cores 0 Cores
Large 16 GB 16 Cores 4 CoresMemory
CPU
GPU
FPGA
Node Manager

• Why?
• No need to setup separate clusters
• Leverage shared compute!
• Why need isolation?
• Multiple processes use the single GPU will be:
• Serialized.
• Cause OOM easily.
• GPU isolation on YARN:
• Granularity is for per-GPU device.
• Use cgroups / docker to enforce isolation.
GPU support on YARN
Tensorﬂow 1.2
Nginx AppUbuntu 14:04
Nginx AppHost OS
GPU Base Lib v1
Volume Mount
CUDA Library 5.0

• FPGA isolation on YARN: .
• Granularity is for per-FPGA device.
• Use Cgroups to enforce the isolation.
• Currently, only Intel OpenCL SDK for FPGA is supported. But implementation is
extensible to other FPGA SDK.
FPGA on YARN

Key Themes
Scale

• A native YARN services framework
• YARN-4692
• [Umbrella] Native YARN framework layer for services and
beyond
• Apache Slider retired from Incubator – lessons and key code carried over to YARN
• Simplified discovery of services via DNS mechanisms: YARN-4757
• regionserver-0.hbase-app-3.hadoop.yarn.site
• Application & Services upgrades
• “Do an upgrade of my HBase app with minimal impact to end-users”
• YARN-4726
Services support in YARN

How to run a new service in YARN ?

Apache Hadoop 3.2 and beyond

• “Take me to a node with JDK 10”
• Node Partition vs. Node Attribute
• Partition:
• One partition for one node
• ACL
• Shares between queues
• Preemption enforced.
• Attribute:
• For container placement
• No ACL/Shares on attributes
• First-come-first-serve
Node Attributes (YARN-3409)

• Every user says “Give me 16GB for my task”, even though it’s only needed at peak
• Each node has some allocated but unutilized capacity. Use such capacity to run opportunistic tasks
• Preempt such tasks when needed
Container overcommit (YARN-1011)

• “Start this service when YARN starts”
• “initd for YARN”
• System services is services required by
YARN, need to be started during
bootstrap.
• For example YARN ATSv2 needs Hbase, so
Hbase is system service of YARN.
• Only Admin can configure
• Started along with ResourceManager
• Place spec files under yarn.service.system-
service.dir FS path
Auto-spawning of system services (YARN-8048)

TensorFlow on YARN (YARN-8220)
• Run deep learning workloads on the same cluster as analytics, stream processing etc!
• Integrated with latest TensorFlow 1.8 and has GPU support
• Use simple command to run TensorFlow app by using Native Service spec file (Yarnfile)
yarn app -launch distributed-tf <path-to-saved-yarnfile>
• A simple python command line utility also could be used to auto-create Yarnfile
python submit_tf_job.py
--remote_conf_path hdfs:///tf-job-conf
--input_spec example_tf_job_spec.json
--docker_image gpu.cuda_9.0.tf_1.8.0
--job_name distributed-tf-gpu
--user tf-user
--domain tensorflow.site
--distributed --kerberos

TensorFlow on YARN (YARN-8220)
Sample Yarnfile for TensorFlow job

Why upgrade to Apache
Hadoop 3.x?

Major release with lot of features and improvements!
Motivation
• Federation GA
• Erasure Coding
• Significant cost savings in storage
• Reduction of overhead from 200%
to 50%
• Intra-DataNode Disk Balancer
HDFS
• Scheduler Improvements
• New Resource types - GPUs, FPGAs
• Fast and Global scheduling
• Containerization - Docker
• Long running Services rehash
• New UI2
• Timeline Server v2
YARN

Hadoop-3
Container Runtimes (Docker / Linux / Default)
Platform Services
Storage
Service
Discovery
Holiday Web App
HBase
HTTP
MR Tez
Hive / Pig
Hive on
LLAPSpark
Resource
Management
Deep Learning App
On-Premises Cloud

Things to consider
before upgrade

Upgrades involve many things
• Upgrade mechanism
• Recommendation for 3.x - Express or Rolling ?
• Compatibility
• Source & Target versions
• Tooling
• Cluster Environment
• Configuration changes
• Script changes
• Classpath changes

Upgrade mechanism: Express/Rolling Upgrades
• “Stop the world” Upgrades
• Cluster downtime
• Less stringent prerequisites
• Process
• Upgrade masters and workers in
one shot
Express Upgrades
• Preserve cluster operation
• Minimizes Service impact and downtime
• Can take longer to complete
• Process
• Upgrades masters and workers in
batches
Rolling Upgrades

Compatibility
• Wire compatibility
o Preserves compatibility with Hadoop 2 clients
o Distcp/WebHDFS compatibility preserved
• API compatibility
Not fully!
o Dependency version bumps
o Removal of deprecated APIs and tools
o Shell script rewrite, rework of Hadoop tools scripts
o Incompatible bug fixes!

Source & Target versions
● Upgrades Tested with
• Why 2.8.4 release?
● Most of production deployments are close to 2.8.x
● What should users of 2.6.x and 2.7.x do?
● Recommend upgrading at least to Hadoop 2.8.4 before migrating to Hadoop 3!
Hadoop 2 Base version Hadoop 3 Base version
Apache Hadoop 2.8.4 Apache Hadoop 3.1.x

Tooling
● Fresh Install
● Fully automated via Apache Ambari
● Manual installation of RPMs/Tar balls
● Upgrade
● Fully automated via Apache Ambari 2.7
● Manual upgrade

Cluster Environment
• >= Java 8
• Java 7 EOL in April
2015
• Lot of libraries
support only Java 8
Java
• >= Bash V3
• POSIX shell NOT
supported
Shell
• If you want to use
containerized apps
in 3.x
• >= 1.12.5
• Also corresponding
stable OS
Docker

Configuration changes: Hadoop Env files
• Common placeholder
• Precedence rule
• yarn/hdfs-env.sh
> hadoop-env.sh
> hard-coded
defaults
hadoop-env.sh
• HDFS_* replaces
HADOOP_*
• Precedence rule
• hdfs-env.sh >
hadoop-env.sh >
hard-coded
defaults
hdfs-env.sh
• YARN_* replaces
HADOOP_*
• Precedence rule
• yarn-env.sh >
hadoop-env.sh >
hard-coded
defaults
yarn-env.sh

Configuration changes: Hadoop Env files Contd..
Daemon Heap Size HADOOP-10950
• Deprecated
• HADOOP_HEAPSIZE
• Replaced with
• HADOOP_HEAPSIZE_MAX and HADOOP_HEAPSIZE_MIN
• Units support in heap size
• Default unit is MB
• Ex: HADOOP_HEAPSIZE_MAX=4096
• Ex: HADOOP_HEAPSIZE_MAX=4g
• Auto-tuning
• Based on memory size of the host

Configuration changes: YARN
Modified Defaults
• RM Max Completed Applications in State Store/Memory
Configuration Previous Current
yarn.resourcemanager.max-completed-
applications
10000 1000
yarn.resourcemanager.state-store.max-
completed-applications
10000 1000

Configurations Changes: HDFS
Service Previous Current Port
NameNode 50470
50070
9871
9870
DataNode 50020
50010
50475
50075
9867
9866
9865
9864
Secondary NameNode 50091
50090
9869
9868
KMS 16000 9600
Change in Default Daemon Ports (HDFS-9427)

Script changes: Starting/Stopping Hadoop Daemons
Daemon scripts
• *-daemon.sh deprecated
• Use bin/hdfs or bin/yarn commands with --daemon option
• Ex: bin/hdfs --daemon start/stop/status namenode
• Ex: bin/yarn --daemon start/stop/status resourcemanager
Debuggability
• Scripts support –debug
• Construction of env
• Java options and classpath
Logs/Pid
• Created as hadoop-yarn* instead of yarn-yarn*
• Log4j settings in the *-daemon.sh have been removed. Instead set via *_OPT in*-env.sh
• Eg: YARN_RESOURCEMANAGER_OPTS in yarn-env.sh

Classpath Changes
Classpath isolation now!
Users should rebuild their applications with shaded hadoop-client jars
● Hadoop Dependencies leaked to application’s classpath - Guava,
protobuf,jackson,jetty...
● Shaded jars available - isolates downstream clients from any third party dependencies
HADOOP-11804
○ hadoop-client-api For compile time dependencies
○ hadoop-client-runtime For runtime third-party dependencies
○ hadoop-minicluster For test scope dependencies
● HDFS-6200 hadoop-hdfs jar contained both the hdfs server and the hdfs client.
○ Clients should instead depend on hadoop-hdfs-client instead to isolate themselves
from server-side dependencies
● No YARN/MR shaded jars

Upgrade process

YARN
• Stop all YARN queues
• Stop/Wait for Running applications to
complete
• NOTE: YARN supports rolling upgrade in
itself but if you upgrade HDFS + YARN
together, it gets problematic
Hadoop Pre-Upgrade Steps
HDFS
• Run fsck and fix any errors
• hdfs fsck / -files –blocks –locations > dfs-old-
fsck.1.log
• Checkpoint Metadata
• hdfs dfsadmin -safemode enter
• hdfs dfsadmin -saveNamespace
• Backup checkpoint files
• ${dfs.namenode.name.dir}/current
• Get Cluster DataNode reports
• hdfs dfsadmin -report > dfs-old-report-1.log
• Capture Namespace
• hdfs dfs –ls –R / > dfs-old-lsr-1.log
• Finalize previous upgrade
• hdfs dfsadmin –finalizeUpgrade
STACK
• Backup Configuration files
• Stop users/services using YARN/HDFS
• Other metadata backup – Hive MetaStore,
Oozie etc

Upgrade Steps
Configuration
Updates
Additional HDFS Upgrade Steps
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-
2.6.3/bk_command-line-upgrade/content/start-hadoop-core-
25.html
Install new
packages
Link to new
versions
Start ServicesStop Services

Upgrade Validation
• Run HDFS Service checks
• Verify NameNode gets out of Safe Mode
hdfs dfsadmin -safeMode wait
• FileSystem Health
• Compare with Previous State
• Node list
• Full NameSpace
• Let Cluster run production workloads for
a while
• When ready to discard backup, finalize
HDFS upgrade
hdfs dfsadmin –upgrade finalize/query
HDFS
• Run YARN Service checks
• Submit test applications – MR, TEZ, …
YARN

Enable New features
• Erasure Coding
• https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-
hdfs/HDFSErasureCoding.html
• YARN UI2
• https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnUI2.html
• ATSv2
• New Daemon – Timeline Reader
• https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html
• YARN DNS
• Service Discovery of YARN Services
• http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-
service/RegistryDNS.html
• HDFS Federation
• https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html

Other Aspects

Other Aspects
Validations In-progress
● Performance testing
● Scale testing for HDFS/YARN
● OS’s compatibility
● Workload Migration
● MapReduce
● Hive
● PIG
● Spark
● Slider

Summary
• Hadoop 3
• Eagerly awaited release with lots of new features and optimizations !
• 3.1.1 will be released soon with some bug fixes identified since 3.1.0
• Express Upgrades are recommended
• Admins
• A bit of work
• Users
• Should work mostly as-is
• Community effort
• HADOOP-15501 Upgrade efforts to Hadoop 3.x
• Wiki - https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.x+to+3.x+Upgrade+Efforts
• Volunteers needed for validating workload upgrades on Hadoop 3 !

Questions?

Thank you

Apache Hadoop 3 updates with migration story

Recommended

More Related Content

What's hot (20)

Similar to Apache Hadoop 3 updates with migration story (20)

Recently uploaded (20)

Apache Hadoop 3 updates with migration story

Editor's Notes