SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.0:
What’s new in
YARN & MapReduce
Tokyo, Oct.26 2016
Junping Du
junping_du@apache.org
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About Speakers
⬢ Junping Du
– Apache Hadoop Committer & PMC member
– Lead Software Engineer @ Hortonworks YARN Core Team
– 10+ years for developing enterprise software (5+ years for being “Hadooper”)
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
⬢ Evolutions in YARN & MR (Done and In Progress)
⬢ Timeline Estimation for Apache Hadoop 3.0 Release
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
First, A bit of Vision…
⬢ Evolution of Hadoop start with YARN
⬢ YARN Evolution will continue to drive Hadoop forward
Hadoop 3
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Several important trends in age of Hadoop 3.0 +
YARN and Other Platform Services
Storage
Resource
Management Security
Service
Discovery Management
Monitoring
Alerts
IOT Assembly
Kafka Storm HBase Solr
Governance
MR Tez Spark …
Innovating
frameworks:
Flink,
DL(TensorFlow),
etc.
Various Environments
On Premise Private Cloud Public Cloud
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Evolutions in YARN & MR
⬢ Re-architecture for YARN Timeline Service - ATS v2
⬢ Service Native Support in YARN
⬢ YARN Scheduling Enhancements
⬢ More Cloud Friendly
⬢ Better User Experiences
⬢ Other Enhancements
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Timeline Service Revolution – ATS v2
⬢ Why ATS v2?
– Scalability & Performance
To get rid of v1 limitation:
•Single global instance of
writer/reader
•Local disk based LevelDB storage
– Usability
•Handle flows as first-class
concepts and model aggregation
•Add configuration and metrics as
first-class members
•Better support for queries
– Reliability
v1 limitation:
•Data is stored in a local disk
•Single point of failure (SPOF) for
timeline server
– Flexibility
•Data model is more describable
•Extended to more specific info to
app
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Core Design for ATS v2
⬢ Distributed write path
– Logical per app collector + physical per
node writer
– Collector/Writer launched as an auxiliary
service in NM.
– Standalone writers will be added later.
⬢ Pluggable backend storage
– Built in with a scalable and reliable
implementation (HBase)
⬢ Enhanced data model
– Entity (bi-directional relation) with flow,
queue, etc.
– Configuration, Metric, Event, etc.
⬢ Separate reader instances
⬢ Aggregation & Accumulation
– Aggregation: rolling up the metric values to the
parent
•Online aggregation for apps and flow
runs
•Offline aggregation for users, flows
and queues
– Accumulation: rolling up the metric values
across time interval
•Accumulated resource consumption
for app, flow, etc.
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
ATS v2 Architecture
Resource
Manager
RMApp
NodeManager
Info of Collectors
{
app_1,
app_2,
….
}
app_1 AM
Syncapp_1
Collector
app_n
Collector
Aux Service
AM timeline info
Timeline
Writer
RM app
Events
NM
Collector
Service
Timeline
Writer
NM_n
…
NM_1
app_1
container
NM
Collector
Service
Sync
Container
Monitor
1
1Timeline
Reader
User
Queries
Container
metric info
HBase
container info
(to be added)
1
0
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Model in ATS v2
Entity
ID + Type
Configurations
Metadata(Info)
Parent-Child
Relationships
Metrics
Events
Metric
ID
Metadata
Single Value or
Time
Series(with
timestamps)
Cluster
Type
Cluster Attributes
Flow
Type
User
Flow Runs
Flow Attributes
Flow Run
Type
User
Running apps
Flow Run
Attributes
Application
Type
User
Flow + Run
Queue
Attempts
Attempt
Type
Application
Queue
Containers
Container
Type
Attempt
Attributes
Entities of first
class citizens
User
Username(ID)
Aggregated metrics
Queue
Queue(ID)
Sub queues
Aggregated metrics
Aggregation
Event
ID
Metadata
Timestamp
1
1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Status for ATS v2
⬢ For other details, like:
– Aggregations (app/flow/user/queue level, offline or online)
– HBase table schema for EntityTable, ApplicationTable, FlowRunTable, etc.
– Reader APIs (RESTful)
Please refer to previous talks in Hadoop Summit 2016 San Jose:
https://www.youtube.com/watch?v=adV-DFa-8us&index=6&list=PLKnYDs_-dq16K1NH83Bke2dGGUO3YKZ5b
⬢ Status
–Phase I (YARN-2928): already released as an alpha feature in 3.0.0-alpha1
–Phase II (YARN-5355): In progress
1
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native Service Support in YARN
 A native YARN framework. YARN-4692
– Abstract common Framework (Similar to Slider) to support long running service
– More simplified API
 Better support for long running service
– Recognition of long running service
• Affect the policy of preemption, container reservation, etc.
– Auto-restart of containers
• Containers in long running service are more stateful
– Service/application upgrade support
• More services are expected to run long enough to across versions
– Dynamic container configuration
• Only reserve resource for necessary moment
1
3
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
API Simplification - REST
 Existing APIs are too low level and not easy to work with.
 Simple REST API layer fronting YARN
– YARN-4793. Simplified API layer for services and beyond
 Create and manage lifecycle of YARN services.
Example: ZooKeeper App
1
4
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Discovery services in YARN
 YARN Service Discovery via DNS: YARN-4757
– Expose existing service information in YARN registry via DNS
• Current YARN service registry’s records will be converted into DNS entries
– Enabling Container to IP mappings - enables discovery of the IPs of containers via
standard DNS lookups.
• Application
– zkapp1.user1.yarncluster.com -> 192.168.10.11:8080
• Container
– container-1454001598828-0001-01-00004.yarncluster.com -> 192.168.10.18
1
5
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
More Cloud Friendly
⬢ Elastic
–Dynamic Resource Configuration
•YARN-291
•Allow tune down/up on NM’s resource in runtime
–Graceful decommissioning of NodeManagers
•YARN-914
•Drains a node that’s being decommissioned to allow running containers to
finish
⬢ Efficient
–Support for container resizing
•YARN-1197
•Allows applications to change the size of an existing container
–Task level native optimization
•MAPREDUCE-2841
1
6
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
More Cloud Friendly (Contd.)
⬢ Isolation
–Embrace container technology to achieve better isolation
–Resource isolation support for disk and network
•YARN-2619 (disk), YARN-2140 (network)
•Containers get a fair share of disk and network resources using Cgroups
–Docker support in LinuxContainerExecutor
•YARN-3611
•Support to launch Docker containers alongside process
•Packaging and resource isolation
⬢ Operation
–Container upgrades (YARN-4726)
•”Do an upgrade of my Spark / HBase apps with minimal impact to end-users”
–AM Restart With Work Preserving
•MAPREDUCE-6608
1
7
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scheduling Enhancements
 Application priorities: YARN-1963
– Inner-queue priority support
 Affinity / anti-affinity: YARN-1042
– More restraints on locations
 Global Scheduling: YARN-5139
– Get rid of per node scheduling model
– Enhance container scheduling throughput
1
8
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Operational and User Experience Enhancements (YARN-3368)
1
9
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Other YARN work could get released in Hadoop 3.X
⬢ Resource profiles
–YARN-3926
–Users can specify resource profile name instead of individual resources
–Resource types read via a config file
⬢ YARN federation
–YARN-2915
–Allows YARN to scale out to tens of thousands of nodes
–Cluster of clusters which appear as a single cluster to an end user
⬢ Gang Scheduling
–YARN-624
More Details in tomorrow noon session “Apache Hadoop YARN: Past,
Present and Future” by Junping Du and Jian He
2
0
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Release Timeline for Apache Hadoop 3.0
⬢ 3.0.0-alpha1 is released on Sep/3/2016
⬢ alpha2 in Q4. 2016 (Estimated)
⬢ beta1 in early Q1. 2017 (Estimated)
⬢ GA in Q1/Q2 2017 (Estimated)
2
1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP Evolution with Apache Hadoop and YARN
2
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved2
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you!
Ad

More Related Content

What's hot (20)

Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
DataWorks Summit
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
DataWorks Summit/Hadoop Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
DataWorks Summit/Hadoop Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
DataWorks Summit/Hadoop Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
DataWorks Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
DataWorks Summit/Hadoop Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 

Similar to Apache Hadoop 3.0 What's new in YARN and MapReduce (20)

Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
DataWorks Summit
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
alanfgates
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
Hortonworks
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
Bikas Saha
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
DataWorks Summit
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
alanfgates
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
Hortonworks
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
Bikas Saha
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 
Ad

Recently uploaded (20)

Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 

Apache Hadoop 3.0 What's new in YARN and MapReduce

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 3.0: What’s new in YARN & MapReduce Tokyo, Oct.26 2016 Junping Du [email protected]
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About Speakers ⬢ Junping Du – Apache Hadoop Committer & PMC member – Lead Software Engineer @ Hortonworks YARN Core Team – 10+ years for developing enterprise software (5+ years for being “Hadooper”)
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda ⬢ Evolutions in YARN & MR (Done and In Progress) ⬢ Timeline Estimation for Apache Hadoop 3.0 Release
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved First, A bit of Vision… ⬢ Evolution of Hadoop start with YARN ⬢ YARN Evolution will continue to drive Hadoop forward Hadoop 3
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Several important trends in age of Hadoop 3.0 + YARN and Other Platform Services Storage Resource Management Security Service Discovery Management Monitoring Alerts IOT Assembly Kafka Storm HBase Solr Governance MR Tez Spark … Innovating frameworks: Flink, DL(TensorFlow), etc. Various Environments On Premise Private Cloud Public Cloud
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Evolutions in YARN & MR ⬢ Re-architecture for YARN Timeline Service - ATS v2 ⬢ Service Native Support in YARN ⬢ YARN Scheduling Enhancements ⬢ More Cloud Friendly ⬢ Better User Experiences ⬢ Other Enhancements
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Timeline Service Revolution – ATS v2 ⬢ Why ATS v2? – Scalability & Performance To get rid of v1 limitation: •Single global instance of writer/reader •Local disk based LevelDB storage – Usability •Handle flows as first-class concepts and model aggregation •Add configuration and metrics as first-class members •Better support for queries – Reliability v1 limitation: •Data is stored in a local disk •Single point of failure (SPOF) for timeline server – Flexibility •Data model is more describable •Extended to more specific info to app
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Core Design for ATS v2 ⬢ Distributed write path – Logical per app collector + physical per node writer – Collector/Writer launched as an auxiliary service in NM. – Standalone writers will be added later. ⬢ Pluggable backend storage – Built in with a scalable and reliable implementation (HBase) ⬢ Enhanced data model – Entity (bi-directional relation) with flow, queue, etc. – Configuration, Metric, Event, etc. ⬢ Separate reader instances ⬢ Aggregation & Accumulation – Aggregation: rolling up the metric values to the parent •Online aggregation for apps and flow runs •Offline aggregation for users, flows and queues – Accumulation: rolling up the metric values across time interval •Accumulated resource consumption for app, flow, etc.
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ATS v2 Architecture Resource Manager RMApp NodeManager Info of Collectors { app_1, app_2, …. } app_1 AM Syncapp_1 Collector app_n Collector Aux Service AM timeline info Timeline Writer RM app Events NM Collector Service Timeline Writer NM_n … NM_1 app_1 container NM Collector Service Sync Container Monitor 1 1Timeline Reader User Queries Container metric info HBase container info (to be added)
  • 10. 1 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Model in ATS v2 Entity ID + Type Configurations Metadata(Info) Parent-Child Relationships Metrics Events Metric ID Metadata Single Value or Time Series(with timestamps) Cluster Type Cluster Attributes Flow Type User Flow Runs Flow Attributes Flow Run Type User Running apps Flow Run Attributes Application Type User Flow + Run Queue Attempts Attempt Type Application Queue Containers Container Type Attempt Attributes Entities of first class citizens User Username(ID) Aggregated metrics Queue Queue(ID) Sub queues Aggregated metrics Aggregation Event ID Metadata Timestamp
  • 11. 1 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Status for ATS v2 ⬢ For other details, like: – Aggregations (app/flow/user/queue level, offline or online) – HBase table schema for EntityTable, ApplicationTable, FlowRunTable, etc. – Reader APIs (RESTful) Please refer to previous talks in Hadoop Summit 2016 San Jose: https://www.youtube.com/watch?v=adV-DFa-8us&index=6&list=PLKnYDs_-dq16K1NH83Bke2dGGUO3YKZ5b ⬢ Status –Phase I (YARN-2928): already released as an alpha feature in 3.0.0-alpha1 –Phase II (YARN-5355): In progress
  • 12. 1 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Native Service Support in YARN  A native YARN framework. YARN-4692 – Abstract common Framework (Similar to Slider) to support long running service – More simplified API  Better support for long running service – Recognition of long running service • Affect the policy of preemption, container reservation, etc. – Auto-restart of containers • Containers in long running service are more stateful – Service/application upgrade support • More services are expected to run long enough to across versions – Dynamic container configuration • Only reserve resource for necessary moment
  • 13. 1 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved API Simplification - REST  Existing APIs are too low level and not easy to work with.  Simple REST API layer fronting YARN – YARN-4793. Simplified API layer for services and beyond  Create and manage lifecycle of YARN services. Example: ZooKeeper App
  • 14. 1 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Discovery services in YARN  YARN Service Discovery via DNS: YARN-4757 – Expose existing service information in YARN registry via DNS • Current YARN service registry’s records will be converted into DNS entries – Enabling Container to IP mappings - enables discovery of the IPs of containers via standard DNS lookups. • Application – zkapp1.user1.yarncluster.com -> 192.168.10.11:8080 • Container – container-1454001598828-0001-01-00004.yarncluster.com -> 192.168.10.18
  • 15. 1 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved More Cloud Friendly ⬢ Elastic –Dynamic Resource Configuration •YARN-291 •Allow tune down/up on NM’s resource in runtime –Graceful decommissioning of NodeManagers •YARN-914 •Drains a node that’s being decommissioned to allow running containers to finish ⬢ Efficient –Support for container resizing •YARN-1197 •Allows applications to change the size of an existing container –Task level native optimization •MAPREDUCE-2841
  • 16. 1 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved More Cloud Friendly (Contd.) ⬢ Isolation –Embrace container technology to achieve better isolation –Resource isolation support for disk and network •YARN-2619 (disk), YARN-2140 (network) •Containers get a fair share of disk and network resources using Cgroups –Docker support in LinuxContainerExecutor •YARN-3611 •Support to launch Docker containers alongside process •Packaging and resource isolation ⬢ Operation –Container upgrades (YARN-4726) •”Do an upgrade of my Spark / HBase apps with minimal impact to end-users” –AM Restart With Work Preserving •MAPREDUCE-6608
  • 17. 1 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduling Enhancements  Application priorities: YARN-1963 – Inner-queue priority support  Affinity / anti-affinity: YARN-1042 – More restraints on locations  Global Scheduling: YARN-5139 – Get rid of per node scheduling model – Enhance container scheduling throughput
  • 18. 1 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Operational and User Experience Enhancements (YARN-3368)
  • 19. 1 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Other YARN work could get released in Hadoop 3.X ⬢ Resource profiles –YARN-3926 –Users can specify resource profile name instead of individual resources –Resource types read via a config file ⬢ YARN federation –YARN-2915 –Allows YARN to scale out to tens of thousands of nodes –Cluster of clusters which appear as a single cluster to an end user ⬢ Gang Scheduling –YARN-624 More Details in tomorrow noon session “Apache Hadoop YARN: Past, Present and Future” by Junping Du and Jian He
  • 20. 2 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Release Timeline for Apache Hadoop 3.0 ⬢ 3.0.0-alpha1 is released on Sep/3/2016 ⬢ alpha2 in Q4. 2016 (Estimated) ⬢ beta1 in early Q1. 2017 (Estimated) ⬢ GA in Q1/Q2 2017 (Estimated)
  • 21. 2 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP Evolution with Apache Hadoop and YARN
  • 22. 2 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you!