SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hadoop 3 Insights &
Migrating your clusters from Hadoop 2 to Hadoop 3
Sunil Govindan
Rohith Sharma K S
2 © Hortonworks Inc. 2011–2018. All rights reserved
Speakers
Sunil Govindan
• Apache Hadoop PMC
• Contributing to YARN Scheduler improvements, Integrating TensorFlow to YARN etc
• Staff Engineer @ Hortonworks YARN Engineering Team
Rohith Sharma K S
• Apache Hadoop PMC
• Contributing Application Timeline Service v2 and Native Services
• Sr. Engineer @ Hortonworks YARN Engineering Team
3 © Hortonworks Inc. 2011–2018. All rights reserved
0
20
40
60
80
100
120 aw
ajisakaa
jzhuge
rohithsharma
cheersyang
Naganarasimha
jojochuang
jingzhao
chris.douglas
Cyl
HuafengWang
sidharta-s
yzhangal
raviprak
nandakumar131
manirajv06@gmail.com
sisankar@microsoft.com
busbey
vishwajeet.dusane
Jim_Brennan
ayousufi
hrsharma
belugabehr
l201514
v123582
gphillips
iveselovsky
esmanii
HongfeiChen
ekundin
daemon
zhaoyunjiong
huanbang1993
ASikaria
jmaron
zhz
LarryLo
eric@apache.org
granthenke
liuhongtong
mpercy
viji_r
nemon
masatana
olegd
Jungyoo
ameetz
krash
poliva
clamb
raja@cmbasics.com
aoe
rcatherinot
call-fold
hsutherland
trtrmitya
imenache
kellyzly
Deepti.Sawhney
wuweiwei
shihaoliang
Frankola
jalberti
zhengxg3
bleuleon
ssonker
Total
Total
(a) 456 Contributors
(b) 46 with > 25 patches
(c) Long Tail
Our Community : Contributors 2.8.3 -> 3.1.0
4 © Hortonworks Inc. 2011–2018. All rights reserved
• Introduction
• HDFS Improvements
• YARN State of Union
• Migration Story from Hadoop 2 clusters to Hadoop 3
Agenda
5 © Hortonworks Inc. 2011–2018. All rights reserved
A brief timeline from past year: GA Releases
2.8.0 2.9.0 3.0.0 3.1.0
• GPU/FPGA
• YARN Native
Service
• Placement
Constraints
• YARN Federation
• Opportunistic
Container
(Backported from 3.0)
• New YARN UI
• Timeline V2
• Global Scheduling
• Multiple Resource
types
• New YARN UI
• Timeline service V2
• Erasure Coding
Ever involving requirements (computation intensive, larger, services)
• Application Priority
• Reservations
• Node labels
improvements
22 March ’17 17 Nov ’17 13 Dec ‘17 06 April ‘182.8.4 2.9.1 3.0.3 3.1.0
6 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hadoop 3.0/3.1
7 © Hortonworks Inc. 2011–2018. All rights reserved
• Motivation: improve storage efficiency of HDFS
• the storage efficiency compared to 3x replication
• Reduction of overhead from 200% to 40%
• Uses Reed-Solomon(k,m) erasure codes instead of replication
• Support for multiple erasure coding policies
• RS(3,2), RS(6,3), RS(10,4)
• Can improves data durability
• RS(6,3) can tolerate 3 failures
• RS(10,4) can tolerate 4 failures
• Missing blocks reconstructed from remaining blocks
HDFS Features : Erasure Coding
8 © Hortonworks Inc. 2011–2018. All rights reserved
• Shell script rewrite
• Support for multiple Standby NameNodes
• Intra-DataNode balancer
• Support for Microsoft Azure Data Lake and Aliyun OSS
• Move default ports out of the ephemeral range
• S3 consistency and performance improvements (ongoing)
• Tightening the Hadoop compatibility policy (ongoing)
HDFS Features : Miscellaneous
9 © Hortonworks Inc. 2011–2018. All rights reserved
YARN : Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
10 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
11 © Hortonworks Inc. 2011–2018. All rights reserved
• Tons of sites with clusters made up of large amount of nodes
• Oath (Yahoo!), Twitter, LinkedIn, Microsoft, Alibaba etc.
• Now: 40K nodes (federated), 20K nodes (single cluster).
• Roadmap: To 100K and beyond
Looking at the Scale!
12 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
13 © Hortonworks Inc. 2011–2018. All rights reserved
Moving towards Global & Fast Scheduling
Scheduler
state
Placement
Committer
• Problems
• Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions.
• With this, we improved to
• Look at several nodes at a time
• YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour!
• Much better placement decisions
14 © Hortonworks Inc. 2011–2018. All rights reserved
Better placement strategies (YARN-6592)
• Past
• Supported constraints in form of Node Locality
• Now YARN can support a lot more use cases
• Co-locate the allocations of a job on the same rack (affinity)
• Spread allocations across machines (anti-affinity) to minimize resource interference
• Allow up to a specific number of allocations in a node group (cardinality)
15 © Hortonworks Inc. 2011–2018. All rights reserved
Addition Scheduling Improvements
• Absolute Resources Configuration in CS – YARN-5881
• Auto Creation of Leaf Queues - YARN-7117
• Application Timeout – YARN-3813
• Reservations in YARN
16 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
17 © Hortonworks Inc. 2011–2018. All rights reserved
Usability: Queue & Logs
API based queue management
Decentralized
(YARN-5734)
Improved logs
management
(YARN-4904)
Live application logs
18 © Hortonworks Inc. 2011–2018. All rights reserved
Usability: UI
19 © Hortonworks Inc. 2011–2018. All rights reserved
Timeline Service 2.0
• Understanding and Monitoring a Hadoop cluster itself is a BigData problem
• Using HBase as backend for better scalability for read/write
• More robust storage fault tolerance
• Migration and compatibility with v.1.5
20 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
21 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
22 © Hortonworks Inc. 2011–2018. All rights reserved
• Run both with and without docker on the same cluster
• Choose at run-time!
Containers
23 © Hortonworks Inc. 2011–2018. All rights reserved
• YARN – Big Data apps, moving to generic apps with containerization
• K8S – industry standard orchestration layer for generic apps
• We have done YARN on YARN! Next slide
• YARN on K8S? K8S on YARN? Run them side-by-side?
• What does containerized BigData mean?
• Lift and Shift?
• Break up every service?
What about running all of Big Data containerized?
24 © Hortonworks Inc. 2011–2018. All rights reserved
Ycloud: YARN Based Container Cloud
 Testing Hadoop on Hadoop!
25 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
26 © Hortonworks Inc. 2011–2018. All rights reserved
• YARN supported only Memory and
CPU
• Now
• A generalized vector for all resources
• Admin could add arbitrary resource types!
Resource profiles and custom resource types
• Ease of resource requesting model
using profiles for apps
Profile Memory CPU GPU
Small 2 GB 4 Cores 0 Cores
Medium 4 GB 8 Cores 0 Cores
Large 16 GB 16 Cores 4 CoresMemory
CPU
GPU
FPGA
Node Manager
27 © Hortonworks Inc. 2011–2018. All rights reserved
• Why?
• No need to setup separate clusters
• Leverage shared compute!
• Why need isolation?
• Multiple processes use the single GPU will be:
• Serialized.
• Cause OOM easily.
• GPU isolation on YARN:
• Granularity is for per-GPU device.
• Use cgroups / docker to enforce isolation.
GPU support on YARN
Tensorflow 1.2
Nginx AppUbuntu 14:04
Nginx AppHost OS
GPU Base Lib v1
Volume Mount
CUDA Library 5.0
28 © Hortonworks Inc. 2011–2018. All rights reserved
• FPGA isolation on YARN: .
• Granularity is for per-FPGA device.
• Use Cgroups to enforce the isolation.
• Currently, only Intel OpenCL SDK for FPGA is supported. But implementation is
extensible to other FPGA SDK.
FPGA on YARN
29 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
30 © Hortonworks Inc. 2011–2018. All rights reserved
• A native YARN services framework
• YARN-4692
• [Umbrella] Native YARN framework layer for services and
beyond
• Apache Slider retired from Incubator – lessons and key code carried over to YARN
• Simplified discovery of services via DNS mechanisms: YARN-4757
• regionserver-0.hbase-app-3.hadoop.yarn.site
• Application & Services upgrades
• “Do an upgrade of my HBase app with minimal impact to end-users”
• YARN-4726
Services support in YARN
31 © Hortonworks Inc. 2011–2018. All rights reserved
How to run a new service in YARN ?
32 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hadoop 3.2 and beyond
33 © Hortonworks Inc. 2011–2018. All rights reserved
• “Take me to a node with JDK 10”
• Node Partition vs. Node Attribute
• Partition:
• One partition for one node
• ACL
• Shares between queues
• Preemption enforced.
• Attribute:
• For container placement
• No ACL/Shares on attributes
• First-come-first-serve
Node Attributes (YARN-3409)
34 © Hortonworks Inc. 2011–2018. All rights reserved
• Every user says “Give me 16GB for my task”, even though it’s only needed at peak
• Each node has some allocated but unutilized capacity. Use such capacity to run opportunistic tasks
• Preempt such tasks when needed
Container overcommit (YARN-1011)
35 © Hortonworks Inc. 2011–2018. All rights reserved
• “Start this service when YARN starts”
• “initd for YARN”
• System services is services required by
YARN, need to be started during
bootstrap.
• For example YARN ATSv2 needs Hbase, so
Hbase is system service of YARN.
• Only Admin can configure
• Started along with ResourceManager
• Place spec files under yarn.service.system-
service.dir FS path
Auto-spawning of system services (YARN-8048)
36 © Hortonworks Inc. 2011–2018. All rights reserved
TensorFlow on YARN (YARN-8220)
• Run deep learning workloads on the same cluster as analytics, stream processing etc!
• Integrated with latest TensorFlow 1.8 and has GPU support
• Use simple command to run TensorFlow app by using Native Service spec file (Yarnfile)
yarn app -launch distributed-tf <path-to-saved-yarnfile>
• A simple python command line utility also could be used to auto-create Yarnfile
python submit_tf_job.py
--remote_conf_path hdfs:///tf-job-conf
--input_spec example_tf_job_spec.json
--docker_image gpu.cuda_9.0.tf_1.8.0
--job_name distributed-tf-gpu
--user tf-user
--domain tensorflow.site
--distributed --kerberos
37 © Hortonworks Inc. 2011–2018. All rights reserved
TensorFlow on YARN (YARN-8220)
Sample Yarnfile for TensorFlow job
38 © Hortonworks Inc. 2011–2018. All rights reserved
Why upgrade to Apache
Hadoop 3.x?
39 © Hortonworks Inc. 2011–2018. All rights reserved
Major release with lot of features and improvements!
Motivation
• Federation GA
• Erasure Coding
• Significant cost savings in storage
• Reduction of overhead from 200%
to 50%
• Intra-DataNode Disk Balancer
HDFS
• Scheduler Improvements
• New Resource types - GPUs, FPGAs
• Fast and Global scheduling
• Containerization - Docker
• Long running Services rehash
• New UI2
• Timeline Server v2
YARN
40 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop-3
Container Runtimes (Docker / Linux / Default)
Platform Services
Storage
Service
Discovery
Holiday Web App
HBase
HTTP
MR Tez
Hive / Pig
Hive on
LLAPSpark
Resource
Management
Deep Learning App
On-Premises Cloud
41 © Hortonworks Inc. 2011–2018. All rights reserved
Things to consider
before upgrade
42 © Hortonworks Inc. 2011–2018. All rights reserved
Upgrades involve many things
• Upgrade mechanism
• Recommendation for 3.x - Express or Rolling ?
• Compatibility
• Source & Target versions
• Tooling
• Cluster Environment
• Configuration changes
• Script changes
• Classpath changes
43 © Hortonworks Inc. 2011–2018. All rights reserved
Upgrade mechanism: Express/Rolling Upgrades
• “Stop the world” Upgrades
• Cluster downtime
• Less stringent prerequisites
• Process
• Upgrade masters and workers in
one shot
Express Upgrades
• Preserve cluster operation
• Minimizes Service impact and downtime
• Can take longer to complete
• Process
• Upgrades masters and workers in
batches
Rolling Upgrades
44 © Hortonworks Inc. 2011–2018. All rights reserved
Compatibility
• Wire compatibility
o Preserves compatibility with Hadoop 2 clients
o Distcp/WebHDFS compatibility preserved
• API compatibility
Not fully!
o Dependency version bumps
o Removal of deprecated APIs and tools
o Shell script rewrite, rework of Hadoop tools scripts
o Incompatible bug fixes!
45 © Hortonworks Inc. 2011–2018. All rights reserved
Source & Target versions
● Upgrades Tested with
• Why 2.8.4 release?
● Most of production deployments are close to 2.8.x
● What should users of 2.6.x and 2.7.x do?
● Recommend upgrading at least to Hadoop 2.8.4 before migrating to Hadoop 3!
Hadoop 2 Base version Hadoop 3 Base version
Apache Hadoop 2.8.4 Apache Hadoop 3.1.x
46 © Hortonworks Inc. 2011–2018. All rights reserved
Tooling
● Fresh Install
● Fully automated via Apache Ambari
● Manual installation of RPMs/Tar balls
● Upgrade
● Fully automated via Apache Ambari 2.7
● Manual upgrade
47 © Hortonworks Inc. 2011–2018. All rights reserved
Cluster Environment
• >= Java 8
• Java 7 EOL in April
2015
• Lot of libraries
support only Java 8
Java
• >= Bash V3
• POSIX shell NOT
supported
Shell
• If you want to use
containerized apps
in 3.x
• >= 1.12.5
• Also corresponding
stable OS
Docker
48 © Hortonworks Inc. 2011–2018. All rights reserved
Configuration changes: Hadoop Env files
• Common placeholder
• Precedence rule
• yarn/hdfs-env.sh
> hadoop-env.sh
> hard-coded
defaults
hadoop-env.sh
• HDFS_* replaces
HADOOP_*
• Precedence rule
• hdfs-env.sh >
hadoop-env.sh >
hard-coded
defaults
hdfs-env.sh
• YARN_* replaces
HADOOP_*
• Precedence rule
• yarn-env.sh >
hadoop-env.sh >
hard-coded
defaults
yarn-env.sh
49 © Hortonworks Inc. 2011–2018. All rights reserved
Configuration changes: Hadoop Env files Contd..
Daemon Heap Size HADOOP-10950
• Deprecated
• HADOOP_HEAPSIZE
• Replaced with
• HADOOP_HEAPSIZE_MAX and HADOOP_HEAPSIZE_MIN
• Units support in heap size
• Default unit is MB
• Ex: HADOOP_HEAPSIZE_MAX=4096
• Ex: HADOOP_HEAPSIZE_MAX=4g
• Auto-tuning
• Based on memory size of the host
50 © Hortonworks Inc. 2011–2018. All rights reserved
Configuration changes: YARN
Modified Defaults
• RM Max Completed Applications in State Store/Memory
Configuration Previous Current
yarn.resourcemanager.max-completed-
applications
10000 1000
yarn.resourcemanager.state-store.max-
completed-applications
10000 1000
51 © Hortonworks Inc. 2011–2018. All rights reserved
Configurations Changes: HDFS
Service Previous Current Port
NameNode 50470
50070
9871
9870
DataNode 50020
50010
50475
50075
9867
9866
9865
9864
Secondary NameNode 50091
50090
9869
9868
KMS 16000 9600
Change in Default Daemon Ports (HDFS-9427)
52 © Hortonworks Inc. 2011–2018. All rights reserved
Script changes: Starting/Stopping Hadoop Daemons
Daemon scripts
• *-daemon.sh deprecated
• Use bin/hdfs or bin/yarn commands with --daemon option
• Ex: bin/hdfs --daemon start/stop/status namenode
• Ex: bin/yarn --daemon start/stop/status resourcemanager
Debuggability
• Scripts support –debug
• Construction of env
• Java options and classpath
Logs/Pid
• Created as hadoop-yarn* instead of yarn-yarn*
• Log4j settings in the *-daemon.sh have been removed. Instead set via *_OPT in*-env.sh
• Eg: YARN_RESOURCEMANAGER_OPTS in yarn-env.sh
53 © Hortonworks Inc. 2011–2018. All rights reserved
Classpath Changes
Classpath isolation now!
Users should rebuild their applications with shaded hadoop-client jars
● Hadoop Dependencies leaked to application’s classpath - Guava,
protobuf,jackson,jetty...
● Shaded jars available - isolates downstream clients from any third party dependencies
HADOOP-11804
○ hadoop-client-api For compile time dependencies
○ hadoop-client-runtime For runtime third-party dependencies
○ hadoop-minicluster For test scope dependencies
● HDFS-6200 hadoop-hdfs jar contained both the hdfs server and the hdfs client.
○ Clients should instead depend on hadoop-hdfs-client instead to isolate themselves
from server-side dependencies
● No YARN/MR shaded jars
54 © Hortonworks Inc. 2011–2018. All rights reserved
Upgrade process
55 © Hortonworks Inc. 2011–2018. All rights reserved
YARN
• Stop all YARN queues
• Stop/Wait for Running applications to
complete
• NOTE: YARN supports rolling upgrade in
itself but if you upgrade HDFS + YARN
together, it gets problematic
Hadoop Pre-Upgrade Steps
HDFS
• Run fsck and fix any errors
• hdfs fsck / -files –blocks –locations > dfs-old-
fsck.1.log
• Checkpoint Metadata
• hdfs dfsadmin -safemode enter
• hdfs dfsadmin -saveNamespace
• Backup checkpoint files
• ${dfs.namenode.name.dir}/current
• Get Cluster DataNode reports
• hdfs dfsadmin -report > dfs-old-report-1.log
• Capture Namespace
• hdfs dfs –ls –R / > dfs-old-lsr-1.log
• Finalize previous upgrade
• hdfs dfsadmin –finalizeUpgrade
STACK
• Backup Configuration files
• Stop users/services using YARN/HDFS
• Other metadata backup – Hive MetaStore,
Oozie etc
56 © Hortonworks Inc. 2011–2018. All rights reserved
Upgrade Steps
Configuration
Updates
Additional HDFS Upgrade Steps
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-
2.6.3/bk_command-line-upgrade/content/start-hadoop-core-
25.html
Install new
packages
Link to new
versions
Start ServicesStop Services
57 © Hortonworks Inc. 2011–2018. All rights reserved
Upgrade Validation
• Run HDFS Service checks
• Verify NameNode gets out of Safe Mode
hdfs dfsadmin -safeMode wait
• FileSystem Health
• Compare with Previous State
• Node list
• Full NameSpace
• Let Cluster run production workloads for
a while
• When ready to discard backup, finalize
HDFS upgrade
hdfs dfsadmin –upgrade finalize/query
HDFS
• Run YARN Service checks
• Submit test applications – MR, TEZ, …
YARN
58 © Hortonworks Inc. 2011–2018. All rights reserved
Enable New features
• Erasure Coding
• https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-
hdfs/HDFSErasureCoding.html
• YARN UI2
• https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnUI2.html
• ATSv2
• New Daemon – Timeline Reader
• https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html
• YARN DNS
• Service Discovery of YARN Services
• http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-
service/RegistryDNS.html
• HDFS Federation
• https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html
59 © Hortonworks Inc. 2011–2018. All rights reserved
Other Aspects
60 © Hortonworks Inc. 2011–2018. All rights reserved
Other Aspects
Validations In-progress
● Performance testing
● Scale testing for HDFS/YARN
● OS’s compatibility
● Workload Migration
● MapReduce
● Hive
● PIG
● Spark
● Slider
61 © Hortonworks Inc. 2011–2018. All rights reserved
Summary
• Hadoop 3
• Eagerly awaited release with lots of new features and optimizations !
• 3.1.1 will be released soon with some bug fixes identified since 3.1.0
• Express Upgrades are recommended
• Admins
• A bit of work
• Users
• Should work mostly as-is
• Community effort
• HADOOP-15501 Upgrade efforts to Hadoop 3.x
• Wiki - https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.x+to+3.x+Upgrade+Efforts
• Volunteers needed for validating workload upgrades on Hadoop 3 !
62 © Hortonworks Inc. 2011–2018. All rights reserved
Questions?
63 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you
Ad

More Related Content

What's hot (20)

Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
Rommel Garcia
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
Cloudera, Inc.
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
 
하이브 최적화 방안
하이브 최적화 방안하이브 최적화 방안
하이브 최적화 방안
Teddy Choi
 
How Safe is Asynchronous Master-Master Setup?
 How Safe is Asynchronous Master-Master Setup? How Safe is Asynchronous Master-Master Setup?
How Safe is Asynchronous Master-Master Setup?
Sveta Smirnova
 
Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Query Optimization with MySQL 8.0 and MariaDB 10.3: The BasicsQuery Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Jaime Crespo
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
Hortonworks
 
EuroBSDCon 2021 - (auto)Installing BSD Systems
EuroBSDCon 2021 - (auto)Installing BSD SystemsEuroBSDCon 2021 - (auto)Installing BSD Systems
EuroBSDCon 2021 - (auto)Installing BSD Systems
Vinícius Zavam
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
 
x86
x86x86
x86
Wei-Bo Chen
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Trieu Nguyen
 
TFA Collector - what can one do with it
TFA Collector - what can one do with it TFA Collector - what can one do with it
TFA Collector - what can one do with it
Sandesh Rao
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
Mydbops
 
POUG 2019 - Oracle Partitioning for DBAs and Devs
POUG 2019 - Oracle Partitioning for DBAs and DevsPOUG 2019 - Oracle Partitioning for DBAs and Devs
POUG 2019 - Oracle Partitioning for DBAs and Devs
Franky Weber Faust
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
 
Oracle Clusterware Node Management and Voting Disks
Oracle Clusterware Node Management and Voting DisksOracle Clusterware Node Management and Voting Disks
Oracle Clusterware Node Management and Voting Disks
Markus Michalewicz
 
Polymorphic Table Functions in SQL
Polymorphic Table Functions in SQLPolymorphic Table Functions in SQL
Polymorphic Table Functions in SQL
Chris Saxon
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
Cloudera, Inc.
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
 
하이브 최적화 방안
하이브 최적화 방안하이브 최적화 방안
하이브 최적화 방안
Teddy Choi
 
How Safe is Asynchronous Master-Master Setup?
 How Safe is Asynchronous Master-Master Setup? How Safe is Asynchronous Master-Master Setup?
How Safe is Asynchronous Master-Master Setup?
Sveta Smirnova
 
Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Query Optimization with MySQL 8.0 and MariaDB 10.3: The BasicsQuery Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Jaime Crespo
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
Hortonworks
 
EuroBSDCon 2021 - (auto)Installing BSD Systems
EuroBSDCon 2021 - (auto)Installing BSD SystemsEuroBSDCon 2021 - (auto)Installing BSD Systems
EuroBSDCon 2021 - (auto)Installing BSD Systems
Vinícius Zavam
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Trieu Nguyen
 
TFA Collector - what can one do with it
TFA Collector - what can one do with it TFA Collector - what can one do with it
TFA Collector - what can one do with it
Sandesh Rao
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
Mydbops
 
POUG 2019 - Oracle Partitioning for DBAs and Devs
POUG 2019 - Oracle Partitioning for DBAs and DevsPOUG 2019 - Oracle Partitioning for DBAs and Devs
POUG 2019 - Oracle Partitioning for DBAs and Devs
Franky Weber Faust
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
 
Oracle Clusterware Node Management and Voting Disks
Oracle Clusterware Node Management and Voting DisksOracle Clusterware Node Management and Voting Disks
Oracle Clusterware Node Management and Voting Disks
Markus Michalewicz
 
Polymorphic Table Functions in SQL
Polymorphic Table Functions in SQLPolymorphic Table Functions in SQL
Polymorphic Table Functions in SQL
Chris Saxon
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 

Similar to Apache Hadoop 3 updates with migration story (20)

Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
DataWorks Summit
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
DataWorks Summit
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
 
Containers and Big Data
Containers and Big Data Containers and Big Data
Containers and Big Data
DataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
DataWorks Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
DataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
DataWorks Summit
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
DataWorks Summit
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
DataWorks Summit
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
DataWorks Summit
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
DataWorks Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
DataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
DataWorks Summit
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
DataWorks Summit
 
Ad

Recently uploaded (20)

railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
Engineering Chemistry First Year Fullerenes
Engineering Chemistry First Year FullerenesEngineering Chemistry First Year Fullerenes
Engineering Chemistry First Year Fullerenes
5g2jpd9sp4
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
"Heaters in Power Plants: Types, Functions, and Performance Analysis"
"Heaters in Power Plants: Types, Functions, and Performance Analysis""Heaters in Power Plants: Types, Functions, and Performance Analysis"
"Heaters in Power Plants: Types, Functions, and Performance Analysis"
Infopitaara
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
comparison of motors.pptx 1. Motor Terminology.ppt
comparison of motors.pptx 1. Motor Terminology.pptcomparison of motors.pptx 1. Motor Terminology.ppt
comparison of motors.pptx 1. Motor Terminology.ppt
yadavmrr7
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
aset and manufacturing optimization and connecting edge
aset and manufacturing optimization and connecting edgeaset and manufacturing optimization and connecting edge
aset and manufacturing optimization and connecting edge
alilamisse
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
Elevate Your Workflow
Elevate Your WorkflowElevate Your Workflow
Elevate Your Workflow
NickHuld
 
Lecture 13 (Air and Noise Pollution and their Control) (1).pptx
Lecture 13 (Air and Noise Pollution and their Control) (1).pptxLecture 13 (Air and Noise Pollution and their Control) (1).pptx
Lecture 13 (Air and Noise Pollution and their Control) (1).pptx
huzaifabilalshams
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
Taking AI Welfare Seriously, In this report, we argue that there is a realist...
Taking AI Welfare Seriously, In this report, we argue that there is a realist...Taking AI Welfare Seriously, In this report, we argue that there is a realist...
Taking AI Welfare Seriously, In this report, we argue that there is a realist...
MiguelMarques372250
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
Building Security Systems in Architecture.pdf
Building Security Systems in Architecture.pdfBuilding Security Systems in Architecture.pdf
Building Security Systems in Architecture.pdf
rabiaatif2
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
Engineering Chemistry First Year Fullerenes
Engineering Chemistry First Year FullerenesEngineering Chemistry First Year Fullerenes
Engineering Chemistry First Year Fullerenes
5g2jpd9sp4
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
"Heaters in Power Plants: Types, Functions, and Performance Analysis"
"Heaters in Power Plants: Types, Functions, and Performance Analysis""Heaters in Power Plants: Types, Functions, and Performance Analysis"
"Heaters in Power Plants: Types, Functions, and Performance Analysis"
Infopitaara
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
comparison of motors.pptx 1. Motor Terminology.ppt
comparison of motors.pptx 1. Motor Terminology.pptcomparison of motors.pptx 1. Motor Terminology.ppt
comparison of motors.pptx 1. Motor Terminology.ppt
yadavmrr7
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
aset and manufacturing optimization and connecting edge
aset and manufacturing optimization and connecting edgeaset and manufacturing optimization and connecting edge
aset and manufacturing optimization and connecting edge
alilamisse
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
Elevate Your Workflow
Elevate Your WorkflowElevate Your Workflow
Elevate Your Workflow
NickHuld
 
Lecture 13 (Air and Noise Pollution and their Control) (1).pptx
Lecture 13 (Air and Noise Pollution and their Control) (1).pptxLecture 13 (Air and Noise Pollution and their Control) (1).pptx
Lecture 13 (Air and Noise Pollution and their Control) (1).pptx
huzaifabilalshams
 
Taking AI Welfare Seriously, In this report, we argue that there is a realist...
Taking AI Welfare Seriously, In this report, we argue that there is a realist...Taking AI Welfare Seriously, In this report, we argue that there is a realist...
Taking AI Welfare Seriously, In this report, we argue that there is a realist...
MiguelMarques372250
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
Building Security Systems in Architecture.pdf
Building Security Systems in Architecture.pdfBuilding Security Systems in Architecture.pdf
Building Security Systems in Architecture.pdf
rabiaatif2
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Ad

Apache Hadoop 3 updates with migration story

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Apache Hadoop 3 Insights & Migrating your clusters from Hadoop 2 to Hadoop 3 Sunil Govindan Rohith Sharma K S
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Speakers Sunil Govindan • Apache Hadoop PMC • Contributing to YARN Scheduler improvements, Integrating TensorFlow to YARN etc • Staff Engineer @ Hortonworks YARN Engineering Team Rohith Sharma K S • Apache Hadoop PMC • Contributing Application Timeline Service v2 and Native Services • Sr. Engineer @ Hortonworks YARN Engineering Team
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved 0 20 40 60 80 100 120 aw ajisakaa jzhuge rohithsharma cheersyang Naganarasimha jojochuang jingzhao chris.douglas Cyl HuafengWang sidharta-s yzhangal raviprak nandakumar131 [email protected] [email protected] busbey vishwajeet.dusane Jim_Brennan ayousufi hrsharma belugabehr l201514 v123582 gphillips iveselovsky esmanii HongfeiChen ekundin daemon zhaoyunjiong huanbang1993 ASikaria jmaron zhz LarryLo [email protected] granthenke liuhongtong mpercy viji_r nemon masatana olegd Jungyoo ameetz krash poliva clamb [email protected] aoe rcatherinot call-fold hsutherland trtrmitya imenache kellyzly Deepti.Sawhney wuweiwei shihaoliang Frankola jalberti zhengxg3 bleuleon ssonker Total Total (a) 456 Contributors (b) 46 with > 25 patches (c) Long Tail Our Community : Contributors 2.8.3 -> 3.1.0
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved • Introduction • HDFS Improvements • YARN State of Union • Migration Story from Hadoop 2 clusters to Hadoop 3 Agenda
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved A brief timeline from past year: GA Releases 2.8.0 2.9.0 3.0.0 3.1.0 • GPU/FPGA • YARN Native Service • Placement Constraints • YARN Federation • Opportunistic Container (Backported from 3.0) • New YARN UI • Timeline V2 • Global Scheduling • Multiple Resource types • New YARN UI • Timeline service V2 • Erasure Coding Ever involving requirements (computation intensive, larger, services) • Application Priority • Reservations • Node labels improvements 22 March ’17 17 Nov ’17 13 Dec ‘17 06 April ‘182.8.4 2.9.1 3.0.3 3.1.0
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Apache Hadoop 3.0/3.1
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved • Motivation: improve storage efficiency of HDFS • the storage efficiency compared to 3x replication • Reduction of overhead from 200% to 40% • Uses Reed-Solomon(k,m) erasure codes instead of replication • Support for multiple erasure coding policies • RS(3,2), RS(6,3), RS(10,4) • Can improves data durability • RS(6,3) can tolerate 3 failures • RS(10,4) can tolerate 4 failures • Missing blocks reconstructed from remaining blocks HDFS Features : Erasure Coding
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved • Shell script rewrite • Support for multiple Standby NameNodes • Intra-DataNode balancer • Support for Microsoft Azure Data Lake and Aliyun OSS • Move default ports out of the ephemeral range • S3 consistency and performance improvements (ongoing) • Tightening the Hadoop compatibility policy (ongoing) HDFS Features : Miscellaneous
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved YARN : Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved • Tons of sites with clusters made up of large amount of nodes • Oath (Yahoo!), Twitter, LinkedIn, Microsoft, Alibaba etc. • Now: 40K nodes (federated), 20K nodes (single cluster). • Roadmap: To 100K and beyond Looking at the Scale!
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Moving towards Global & Fast Scheduling Scheduler state Placement Committer • Problems • Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions. • With this, we improved to • Look at several nodes at a time • YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour! • Much better placement decisions
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Better placement strategies (YARN-6592) • Past • Supported constraints in form of Node Locality • Now YARN can support a lot more use cases • Co-locate the allocations of a job on the same rack (affinity) • Spread allocations across machines (anti-affinity) to minimize resource interference • Allow up to a specific number of allocations in a node group (cardinality)
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Addition Scheduling Improvements • Absolute Resources Configuration in CS – YARN-5881 • Auto Creation of Leaf Queues - YARN-7117 • Application Timeout – YARN-3813 • Reservations in YARN
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Usability: Queue & Logs API based queue management Decentralized (YARN-5734) Improved logs management (YARN-4904) Live application logs
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Usability: UI
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Timeline Service 2.0 • Understanding and Monitoring a Hadoop cluster itself is a BigData problem • Using HBase as backend for better scalability for read/write • More robust storage fault tolerance • Migration and compatibility with v.1.5
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved • Run both with and without docker on the same cluster • Choose at run-time! Containers
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved • YARN – Big Data apps, moving to generic apps with containerization • K8S – industry standard orchestration layer for generic apps • We have done YARN on YARN! Next slide • YARN on K8S? K8S on YARN? Run them side-by-side? • What does containerized BigData mean? • Lift and Shift? • Break up every service? What about running all of Big Data containerized?
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved Ycloud: YARN Based Container Cloud  Testing Hadoop on Hadoop!
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved • YARN supported only Memory and CPU • Now • A generalized vector for all resources • Admin could add arbitrary resource types! Resource profiles and custom resource types • Ease of resource requesting model using profiles for apps Profile Memory CPU GPU Small 2 GB 4 Cores 0 Cores Medium 4 GB 8 Cores 0 Cores Large 16 GB 16 Cores 4 CoresMemory CPU GPU FPGA Node Manager
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved • Why? • No need to setup separate clusters • Leverage shared compute! • Why need isolation? • Multiple processes use the single GPU will be: • Serialized. • Cause OOM easily. • GPU isolation on YARN: • Granularity is for per-GPU device. • Use cgroups / docker to enforce isolation. GPU support on YARN Tensorflow 1.2 Nginx AppUbuntu 14:04 Nginx AppHost OS GPU Base Lib v1 Volume Mount CUDA Library 5.0
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved • FPGA isolation on YARN: . • Granularity is for per-FPGA device. • Use Cgroups to enforce the isolation. • Currently, only Intel OpenCL SDK for FPGA is supported. But implementation is extensible to other FPGA SDK. FPGA on YARN
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved • A native YARN services framework • YARN-4692 • [Umbrella] Native YARN framework layer for services and beyond • Apache Slider retired from Incubator – lessons and key code carried over to YARN • Simplified discovery of services via DNS mechanisms: YARN-4757 • regionserver-0.hbase-app-3.hadoop.yarn.site • Application & Services upgrades • “Do an upgrade of my HBase app with minimal impact to end-users” • YARN-4726 Services support in YARN
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved How to run a new service in YARN ?
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved Apache Hadoop 3.2 and beyond
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved • “Take me to a node with JDK 10” • Node Partition vs. Node Attribute • Partition: • One partition for one node • ACL • Shares between queues • Preemption enforced. • Attribute: • For container placement • No ACL/Shares on attributes • First-come-first-serve Node Attributes (YARN-3409)
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved • Every user says “Give me 16GB for my task”, even though it’s only needed at peak • Each node has some allocated but unutilized capacity. Use such capacity to run opportunistic tasks • Preempt such tasks when needed Container overcommit (YARN-1011)
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved • “Start this service when YARN starts” • “initd for YARN” • System services is services required by YARN, need to be started during bootstrap. • For example YARN ATSv2 needs Hbase, so Hbase is system service of YARN. • Only Admin can configure • Started along with ResourceManager • Place spec files under yarn.service.system- service.dir FS path Auto-spawning of system services (YARN-8048)
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved TensorFlow on YARN (YARN-8220) • Run deep learning workloads on the same cluster as analytics, stream processing etc! • Integrated with latest TensorFlow 1.8 and has GPU support • Use simple command to run TensorFlow app by using Native Service spec file (Yarnfile) yarn app -launch distributed-tf <path-to-saved-yarnfile> • A simple python command line utility also could be used to auto-create Yarnfile python submit_tf_job.py --remote_conf_path hdfs:///tf-job-conf --input_spec example_tf_job_spec.json --docker_image gpu.cuda_9.0.tf_1.8.0 --job_name distributed-tf-gpu --user tf-user --domain tensorflow.site --distributed --kerberos
  • 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved TensorFlow on YARN (YARN-8220) Sample Yarnfile for TensorFlow job
  • 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved Why upgrade to Apache Hadoop 3.x?
  • 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved Major release with lot of features and improvements! Motivation • Federation GA • Erasure Coding • Significant cost savings in storage • Reduction of overhead from 200% to 50% • Intra-DataNode Disk Balancer HDFS • Scheduler Improvements • New Resource types - GPUs, FPGAs • Fast and Global scheduling • Containerization - Docker • Long running Services rehash • New UI2 • Timeline Server v2 YARN
  • 40. 40 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop-3 Container Runtimes (Docker / Linux / Default) Platform Services Storage Service Discovery Holiday Web App HBase HTTP MR Tez Hive / Pig Hive on LLAPSpark Resource Management Deep Learning App On-Premises Cloud
  • 41. 41 © Hortonworks Inc. 2011–2018. All rights reserved Things to consider before upgrade
  • 42. 42 © Hortonworks Inc. 2011–2018. All rights reserved Upgrades involve many things • Upgrade mechanism • Recommendation for 3.x - Express or Rolling ? • Compatibility • Source & Target versions • Tooling • Cluster Environment • Configuration changes • Script changes • Classpath changes
  • 43. 43 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade mechanism: Express/Rolling Upgrades • “Stop the world” Upgrades • Cluster downtime • Less stringent prerequisites • Process • Upgrade masters and workers in one shot Express Upgrades • Preserve cluster operation • Minimizes Service impact and downtime • Can take longer to complete • Process • Upgrades masters and workers in batches Rolling Upgrades
  • 44. 44 © Hortonworks Inc. 2011–2018. All rights reserved Compatibility • Wire compatibility o Preserves compatibility with Hadoop 2 clients o Distcp/WebHDFS compatibility preserved • API compatibility Not fully! o Dependency version bumps o Removal of deprecated APIs and tools o Shell script rewrite, rework of Hadoop tools scripts o Incompatible bug fixes!
  • 45. 45 © Hortonworks Inc. 2011–2018. All rights reserved Source & Target versions ● Upgrades Tested with • Why 2.8.4 release? ● Most of production deployments are close to 2.8.x ● What should users of 2.6.x and 2.7.x do? ● Recommend upgrading at least to Hadoop 2.8.4 before migrating to Hadoop 3! Hadoop 2 Base version Hadoop 3 Base version Apache Hadoop 2.8.4 Apache Hadoop 3.1.x
  • 46. 46 © Hortonworks Inc. 2011–2018. All rights reserved Tooling ● Fresh Install ● Fully automated via Apache Ambari ● Manual installation of RPMs/Tar balls ● Upgrade ● Fully automated via Apache Ambari 2.7 ● Manual upgrade
  • 47. 47 © Hortonworks Inc. 2011–2018. All rights reserved Cluster Environment • >= Java 8 • Java 7 EOL in April 2015 • Lot of libraries support only Java 8 Java • >= Bash V3 • POSIX shell NOT supported Shell • If you want to use containerized apps in 3.x • >= 1.12.5 • Also corresponding stable OS Docker
  • 48. 48 © Hortonworks Inc. 2011–2018. All rights reserved Configuration changes: Hadoop Env files • Common placeholder • Precedence rule • yarn/hdfs-env.sh > hadoop-env.sh > hard-coded defaults hadoop-env.sh • HDFS_* replaces HADOOP_* • Precedence rule • hdfs-env.sh > hadoop-env.sh > hard-coded defaults hdfs-env.sh • YARN_* replaces HADOOP_* • Precedence rule • yarn-env.sh > hadoop-env.sh > hard-coded defaults yarn-env.sh
  • 49. 49 © Hortonworks Inc. 2011–2018. All rights reserved Configuration changes: Hadoop Env files Contd.. Daemon Heap Size HADOOP-10950 • Deprecated • HADOOP_HEAPSIZE • Replaced with • HADOOP_HEAPSIZE_MAX and HADOOP_HEAPSIZE_MIN • Units support in heap size • Default unit is MB • Ex: HADOOP_HEAPSIZE_MAX=4096 • Ex: HADOOP_HEAPSIZE_MAX=4g • Auto-tuning • Based on memory size of the host
  • 50. 50 © Hortonworks Inc. 2011–2018. All rights reserved Configuration changes: YARN Modified Defaults • RM Max Completed Applications in State Store/Memory Configuration Previous Current yarn.resourcemanager.max-completed- applications 10000 1000 yarn.resourcemanager.state-store.max- completed-applications 10000 1000
  • 51. 51 © Hortonworks Inc. 2011–2018. All rights reserved Configurations Changes: HDFS Service Previous Current Port NameNode 50470 50070 9871 9870 DataNode 50020 50010 50475 50075 9867 9866 9865 9864 Secondary NameNode 50091 50090 9869 9868 KMS 16000 9600 Change in Default Daemon Ports (HDFS-9427)
  • 52. 52 © Hortonworks Inc. 2011–2018. All rights reserved Script changes: Starting/Stopping Hadoop Daemons Daemon scripts • *-daemon.sh deprecated • Use bin/hdfs or bin/yarn commands with --daemon option • Ex: bin/hdfs --daemon start/stop/status namenode • Ex: bin/yarn --daemon start/stop/status resourcemanager Debuggability • Scripts support –debug • Construction of env • Java options and classpath Logs/Pid • Created as hadoop-yarn* instead of yarn-yarn* • Log4j settings in the *-daemon.sh have been removed. Instead set via *_OPT in*-env.sh • Eg: YARN_RESOURCEMANAGER_OPTS in yarn-env.sh
  • 53. 53 © Hortonworks Inc. 2011–2018. All rights reserved Classpath Changes Classpath isolation now! Users should rebuild their applications with shaded hadoop-client jars ● Hadoop Dependencies leaked to application’s classpath - Guava, protobuf,jackson,jetty... ● Shaded jars available - isolates downstream clients from any third party dependencies HADOOP-11804 ○ hadoop-client-api For compile time dependencies ○ hadoop-client-runtime For runtime third-party dependencies ○ hadoop-minicluster For test scope dependencies ● HDFS-6200 hadoop-hdfs jar contained both the hdfs server and the hdfs client. ○ Clients should instead depend on hadoop-hdfs-client instead to isolate themselves from server-side dependencies ● No YARN/MR shaded jars
  • 54. 54 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade process
  • 55. 55 © Hortonworks Inc. 2011–2018. All rights reserved YARN • Stop all YARN queues • Stop/Wait for Running applications to complete • NOTE: YARN supports rolling upgrade in itself but if you upgrade HDFS + YARN together, it gets problematic Hadoop Pre-Upgrade Steps HDFS • Run fsck and fix any errors • hdfs fsck / -files –blocks –locations > dfs-old- fsck.1.log • Checkpoint Metadata • hdfs dfsadmin -safemode enter • hdfs dfsadmin -saveNamespace • Backup checkpoint files • ${dfs.namenode.name.dir}/current • Get Cluster DataNode reports • hdfs dfsadmin -report > dfs-old-report-1.log • Capture Namespace • hdfs dfs –ls –R / > dfs-old-lsr-1.log • Finalize previous upgrade • hdfs dfsadmin –finalizeUpgrade STACK • Backup Configuration files • Stop users/services using YARN/HDFS • Other metadata backup – Hive MetaStore, Oozie etc
  • 56. 56 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade Steps Configuration Updates Additional HDFS Upgrade Steps https://docs.hortonworks.com/HDPDocuments/HDP2/HDP- 2.6.3/bk_command-line-upgrade/content/start-hadoop-core- 25.html Install new packages Link to new versions Start ServicesStop Services
  • 57. 57 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade Validation • Run HDFS Service checks • Verify NameNode gets out of Safe Mode hdfs dfsadmin -safeMode wait • FileSystem Health • Compare with Previous State • Node list • Full NameSpace • Let Cluster run production workloads for a while • When ready to discard backup, finalize HDFS upgrade hdfs dfsadmin –upgrade finalize/query HDFS • Run YARN Service checks • Submit test applications – MR, TEZ, … YARN
  • 58. 58 © Hortonworks Inc. 2011–2018. All rights reserved Enable New features • Erasure Coding • https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop- hdfs/HDFSErasureCoding.html • YARN UI2 • https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnUI2.html • ATSv2 • New Daemon – Timeline Reader • https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html • YARN DNS • Service Discovery of YARN Services • http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn- service/RegistryDNS.html • HDFS Federation • https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html
  • 59. 59 © Hortonworks Inc. 2011–2018. All rights reserved Other Aspects
  • 60. 60 © Hortonworks Inc. 2011–2018. All rights reserved Other Aspects Validations In-progress ● Performance testing ● Scale testing for HDFS/YARN ● OS’s compatibility ● Workload Migration ● MapReduce ● Hive ● PIG ● Spark ● Slider
  • 61. 61 © Hortonworks Inc. 2011–2018. All rights reserved Summary • Hadoop 3 • Eagerly awaited release with lots of new features and optimizations ! • 3.1.1 will be released soon with some bug fixes identified since 3.1.0 • Express Upgrades are recommended • Admins • A bit of work • Users • Should work mostly as-is • Community effort • HADOOP-15501 Upgrade efforts to Hadoop 3.x • Wiki - https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.x+to+3.x+Upgrade+Efforts • Volunteers needed for validating workload upgrades on Hadoop 3 !
  • 62. 62 © Hortonworks Inc. 2011–2018. All rights reserved Questions?
  • 63. 63 © Hortonworks Inc. 2011–2018. All rights reserved Thank you

Editor's Notes

  • #40: YARN Yarn scheduler Improvements Improves cluster throughput Distributed scheduling significantly Fine grained scheduling according to resource types - GPUs, FPGAs Support for Long running Services and Docker Revamped UI ATS v2 - More scalable and based on Hbase HDFS HDFS Federation HDFS Intra-DataNode Disk Balancer Erasure Coding Significant cost savings in storage - savings in storage cost Reduction of overhead from 200% to 50%
  • #41: TODO- Animation
  • #44: YARN Yarn scheduler Improvements Improves cluster throughput Distributed scheduling significantly Fine grained scheduling according to resource types - GPUs, FPGAs Support for Long running Services and Docker Revamped UI ATS v2 - More scalable and based on Hbase HDFS HDFS Federation HDFS Intra-DataNode Disk Balancer Erasure Coding Significant cost savings in storage - savings in storage cost Reduction of overhead from 200% to 50%
  • #45: Wire compatibility Preserves compatibility with Hadoop 2 clients Distcp/WebHDFS compatibility preserved API compatibility Not fully! Dependency version bumps Removal of deprecated APIs and tools Shell script rewrite, rework of Hadoop tools scripts Incompatible bug fixes
  • #46: Upgrade has been tested/validated from Apache Hadoop 2.8.4 to Hadoop 3.1.0 in our test environments Ongoing effort in community to release Hadoop 3.1.1 with a lot of fixes. Recommend upgrading lower Hadoop 2 versions to at least Hadoop 2.8.4 before migrating to Hadoop 3
  • #47: Fresh Install Fully automated via Apache Ambari Manual installation of RPMs/Tarballs Upgrade Fully automated via Apache Ambari Manual upgrade
  • #48: Fresh Install Fully automated via Apache Ambari Manual installation of RPMs/Tarballs Upgrade Fully automated via Apache Ambari Manual upgrade
  • #49: Fresh Install Fully automated via Apache Ambari Manual installation of RPMs/Tarballs Upgrade Fully automated via Apache Ambari Manual upgrade
  • #50: [10:46 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/yarn-daemon.sh --config /usr/apache/conf start nodemanager WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. WARNING: Use of this script to start YARN daemons is deprecated. WARNING: Attempting to execute replacement "yarn --daemon start" instead. WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. [10:47 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/hadoop-daemon.sh --config /usr/apache/conf start datanode WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to start HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon start" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_SECURE_DN_PID_DIR has been replaced by HADOOP_SECURE_PID_DIR. Using value of HADOOP_SECURE_DN_PID_DIR. WARNING: HADOOP_SECURE_DN_LOG_DIR has been replaced by HADOOP_SECURE_LOG_DIR. Using value of HADOOP_SECURE_DN_LOG_DIR. WARNING: HADOOP_DATANODE_OPTS has been replaced by HDFS_DATANODE_OPTS. Using value of HADOOP_DATANODE_OPTS. ERROR: You must be a privileged user in order to run a secure service. [10:48 PM] Rohith Sharma KS: WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to stop HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon stop" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_NAMENODE_OPTS has been replaced by HDFS_NAMENODE_OPTS. Using value of HADOOP_NAMENODE_OPTS.
  • #51: [10:46 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/yarn-daemon.sh --config /usr/apache/conf start nodemanager WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. WARNING: Use of this script to start YARN daemons is deprecated. WARNING: Attempting to execute replacement "yarn --daemon start" instead. WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. [10:47 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/hadoop-daemon.sh --config /usr/apache/conf start datanode WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to start HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon start" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_SECURE_DN_PID_DIR has been replaced by HADOOP_SECURE_PID_DIR. Using value of HADOOP_SECURE_DN_PID_DIR. WARNING: HADOOP_SECURE_DN_LOG_DIR has been replaced by HADOOP_SECURE_LOG_DIR. Using value of HADOOP_SECURE_DN_LOG_DIR. WARNING: HADOOP_DATANODE_OPTS has been replaced by HDFS_DATANODE_OPTS. Using value of HADOOP_DATANODE_OPTS. ERROR: You must be a privileged user in order to run a secure service. [10:48 PM] Rohith Sharma KS: WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to stop HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon stop" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_NAMENODE_OPTS has been replaced by HDFS_NAMENODE_OPTS. Using value of HADOOP_NAMENODE_OPTS.
  • #52: [10:46 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/yarn-daemon.sh --config /usr/apache/conf start nodemanager WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. WARNING: Use of this script to start YARN daemons is deprecated. WARNING: Attempting to execute replacement "yarn --daemon start" instead. WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. [10:47 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/hadoop-daemon.sh --config /usr/apache/conf start datanode WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to start HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon start" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_SECURE_DN_PID_DIR has been replaced by HADOOP_SECURE_PID_DIR. Using value of HADOOP_SECURE_DN_PID_DIR. WARNING: HADOOP_SECURE_DN_LOG_DIR has been replaced by HADOOP_SECURE_LOG_DIR. Using value of HADOOP_SECURE_DN_LOG_DIR. WARNING: HADOOP_DATANODE_OPTS has been replaced by HDFS_DATANODE_OPTS. Using value of HADOOP_DATANODE_OPTS. ERROR: You must be a privileged user in order to run a secure service. [10:48 PM] Rohith Sharma KS: WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to stop HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon stop" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_NAMENODE_OPTS has been replaced by HDFS_NAMENODE_OPTS. Using value of HADOOP_NAMENODE_OPTS.
  • #56: HDFS Backup Configuration files Create a list of all the DataNodes in the cluster. hdfs dfsadmin -report > dfs-old-report-1.log Save Namespace hdfs dfsadmin -safemode enter hdfs dfsadmin -saveNamespace Backup the checkpoint files located in ${dfs.namenode.name.dir}/current Finalize any prior HDFS upgrade hdfs dfsadmin -finalizeUpgrade Create a fsimage for rollback hdfs dfsadmin -rollingUpgrade prepare YARN Stop all YARN queues Stop/Wait for Running applications to finish NOTE: YARN supports rolling upgrade!
  • #58: YARN Yarn scheduler Improvements Improves cluster throughput Distributed scheduling significantly Fine grained scheduling according to resource types - GPUs, FPGAs Support for Long running Services and Docker Revamped UI ATS v2 - More scalable and based on Hbase HDFS HDFS Federation HDFS Intra-DataNode Disk Balancer Erasure Coding Significant cost savings in storage - savings in storage cost Reduction of overhead from 200% to 50%