SlideShare a Scribd company logo
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
CISL systems
Cloud and Information Services Lab (CISL)
Vision: One cluster to rule them all
Ambitious multi-person, multi-year agenda
Realizing the vision…
Application Engines
M/R AM REEFTezSpark Runtime
Cluster-wide resource management: YARN++
YARN + Federation
YARN + Rayon
YARN + Mercury
YARN + Mercury
YARN + Mercury YARN + Mercury YARN + Mercury
Per-job/framework Resource Management
Hive …Storm Giraph PigSpark
Big Picture
Research lab embedded in a Product organization doing Open-Source.
The 3 hats we wear in CISL…
(We are hiring… Come see us after the talk!)
Application Engines
M/R AM REEFTezSpark Runtime
Cluster-wide resource management: YARN++
YARN + Federation
YARN + Rayon
YARN + Mercury
YARN + Mercury
YARN + Mercury YARN + Mercury YARN + Mercury
Per-job/framework Resource Management
Hive …Storm Giraph PigSpark
Big Picture
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
cluster ROI
Consolidate workloads
• Heterogeneity
APIs
centralized distributed
high cluster utilization
Resource Management in Shared Clusters
•
•
Centralized Resource Management
[YARN, Mesos, Omega, Borg]
Node
Manager
Node
Manager
Node
Manager
1. Request
2. Allocation
3. Start task
•
•
•
Distributed Resource Management
[Apollo, Sparrow]
Node
Manager
Node
Manager
Node
Manager
•
•
Centralized vs. Distributed Scheduling
Centralized Distributed
Workload heterogeneity 
Task placement 
Enforcing scheduling
invariants 
Allocation latency 
Slot utilization 
Scalability 
“Sweet spot” we are after
1ms 100ms 1s 1m 1h
“Executor” model
Mercury sweet spot
task duration
• “Trade performance guarantees for allocation latency”
choose among scheduling types
Based on application type (SLA job, ad-hoc job, service), job characteristics (task
runtime, type of computation), cluster load, etc.
Mercury provides a programmatic way to use otherwise idle resources
Mercury achieves up to 40% task throughput and 66% mean
job latency gain over stock YARN
Mercury: Key Insight
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury Architecture (Conceptual)
Mercury
Runtime
Mercury
Runtime
Mercury
Runtime
Mercury Resource Management Framework
•
•
•
•
Container Types
GUARANTEED containers
•
•
QUEUEABLE containers
• opportunistically
•
•
•
•
central
•
distributed
[YARN-2882]
PATCH AVAILABLE
GUARANTEED vs. QUEUEABLE Containers
GUARANTEED containers
•
•
QUEUEABLE containers
•
•
•
•
Hybrid Scheduling on Tez AM: Examples
•
•
•
AMRMProxy
queuing
• Application
• Framework
Mercury Implementation over YARN
Q
G
AMRMProxy
•
•
•
•
[YARN-2884,2885]
GUARANTEED Request and Allocation
start(GUARANTEED, …)
request(GUARANTEED, …)
allocate(…)
rewriting a single parameter
QUEUEABLE Request and Allocation
start(QUEUEABLE, …)
request(QUEUEABLE, …)allocate(…)
unique token
we respect YARN’s
security guarantees
Task Execution: Conflict Resolution
two priorities
types of schedulers shared resources
[YARN-2883]
Issues with QUEUEABLE containers?
Application Policies
•
• container type to be
requested for each task
•
•
• Choosing QUEUEABLE at job level enables opportunistic jobs
[YARN-2887]
Framework Policies
•
•
•
rebalance
reordering
job arrival time
QUEUEABLE containers per node
Load Shaping Policies
Mercury
Runtime
Mercury
Runtime
Mercury
Runtime
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Experimental Setup
•
•
•
•
•
Task Throughput for Increasing Task Duration
•
•
Cosmos-based Workload: Task Throughput
•
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
•
•
•
•
•
Conclusion
Future Work
OSS Overview: Apache JIRA YARN-2877
Extend YARN to support distributed scheduling
•
•
•
•
•
•
•
•
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Resource Policing

More Related Content

What's hot (20)

PDF
TriHUG Feb: Hive on spark
trihug
 
PDF
Building large scale applications in yarn with apache twill
Henry Saputra
 
PDF
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Lucidworks
 
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
PDF
Apache Eagle - Monitor Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop to spark-v2
Sujee Maniyam
 
PPTX
Hadoop, Hive, Spark and Object Stores
Steve Loughran
 
PPTX
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
DataWorks Summit/Hadoop Summit
 
PDF
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark Summit
 
PPTX
Why your Spark Job is Failing
DataWorks Summit
 
PPTX
Hive on spark is blazing fast or is it final
Hortonworks
 
PDF
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
 
PPTX
Get most out of Spark on YARN
DataWorks Summit
 
PPTX
Rich placement constraints: Who said YARN cannot schedule services?
DataWorks Summit
 
PDF
Apache storm vs. Spark Streaming
P. Taylor Goetz
 
PDF
Low latency high throughput streaming using Apache Apex and Apache Kudu
DataWorks Summit
 
PDF
Fast and Reliable Apache Spark SQL Releases
DataWorks Summit
 
PDF
Harnessing the power of YARN with Apache Twill
Terence Yim
 
PDF
Hive on spark berlin buzzwords
Szehon Ho
 
PDF
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
Evan Chan
 
TriHUG Feb: Hive on spark
trihug
 
Building large scale applications in yarn with apache twill
Henry Saputra
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Lucidworks
 
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Apache Eagle - Monitor Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
Hadoop to spark-v2
Sujee Maniyam
 
Hadoop, Hive, Spark and Object Stores
Steve Loughran
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
DataWorks Summit/Hadoop Summit
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark Summit
 
Why your Spark Job is Failing
DataWorks Summit
 
Hive on spark is blazing fast or is it final
Hortonworks
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
 
Get most out of Spark on YARN
DataWorks Summit
 
Rich placement constraints: Who said YARN cannot schedule services?
DataWorks Summit
 
Apache storm vs. Spark Streaming
P. Taylor Goetz
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
DataWorks Summit
 
Fast and Reliable Apache Spark SQL Releases
DataWorks Summit
 
Harnessing the power of YARN with Apache Twill
Terence Yim
 
Hive on spark berlin buzzwords
Szehon Ho
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
Evan Chan
 

Viewers also liked (20)

PPTX
Algorithms of the heart
DataWorks Summit
 
PDF
11. grid scheduling and resource managament
Dr Sandeep Kumar Poonia
 
PPTX
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
 
PPTX
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
PPTX
Realistic Synthetic Generation Allows Secure Development
DataWorks Summit
 
PPTX
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
PPTX
Hadoop in Validated Environment - Data Governance Initiative
DataWorks Summit
 
PPT
Hadoop for Genomics__HadoopSummit2010
Yahoo Developer Network
 
PDF
50 Shades of SQL
DataWorks Summit
 
PDF
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
DataWorks Summit
 
PPTX
Karta an ETL Framework to process high volume datasets
DataWorks Summit
 
PPTX
Carpe Datum: Building Big Data Analytical Applications with HP Haven
DataWorks Summit
 
PPTX
Running Spark and MapReduce together in Production
DataWorks Summit
 
PPTX
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
 
PPTX
HBase and Drill: How loosley typed SQL is ideal for NoSQL
DataWorks Summit
 
PDF
Inspiring Travel at Airbnb [WIP]
DataWorks Summit
 
PDF
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
DataWorks Summit
 
PPTX
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
PPTX
NoSQL Needs SomeSQL
DataWorks Summit
 
PPTX
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
Algorithms of the heart
DataWorks Summit
 
11. grid scheduling and resource managament
Dr Sandeep Kumar Poonia
 
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
 
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
Realistic Synthetic Generation Allows Secure Development
DataWorks Summit
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
Hadoop in Validated Environment - Data Governance Initiative
DataWorks Summit
 
Hadoop for Genomics__HadoopSummit2010
Yahoo Developer Network
 
50 Shades of SQL
DataWorks Summit
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
DataWorks Summit
 
Karta an ETL Framework to process high volume datasets
DataWorks Summit
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
DataWorks Summit
 
Running Spark and MapReduce together in Production
DataWorks Summit
 
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
DataWorks Summit
 
Inspiring Travel at Airbnb [WIP]
DataWorks Summit
 
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
DataWorks Summit
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
NoSQL Needs SomeSQL
DataWorks Summit
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
Ad

Similar to Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters (20)

PPTX
Spark in yarn managed multi-tenant clusters
shareddatamsft
 
PPTX
Yarns about YARN: Migrating to MapReduce v2
DataWorks Summit
 
PPTX
Scale-Out Resource Management at Microsoft using Apache YARN
DataWorks Summit/Hadoop Summit
 
PDF
堵俊平:Hadoop virtualization extensions
hdhappy001
 
PPTX
YARN Federation
DataWorks Summit/Hadoop Summit
 
PPTX
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
 
PPTX
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
 
PDF
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Databricks
 
PDF
John Willis Cc Use Cases
GovCloud Network
 
PDF
Running Spark on Cloud
Qubole
 
PPTX
DEVNET-1106 Upcoming Services in OpenStack
Cisco DevNet
 
PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
PDF
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Mary Kypreos
 
PDF
Yarn
Yu Xia
 
PDF
K046045964
IJERA Editor
 
PDF
Yarns About Yarn
Cloudera, Inc.
 
PDF
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Mac Moore
 
PPTX
Spark One Platform Webinar
Cloudera, Inc.
 
PPTX
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
DataWorks Summit
 
PDF
Huawei Advanced Data Science With Spark Streaming
Jen Aman
 
Spark in yarn managed multi-tenant clusters
shareddatamsft
 
Yarns about YARN: Migrating to MapReduce v2
DataWorks Summit
 
Scale-Out Resource Management at Microsoft using Apache YARN
DataWorks Summit/Hadoop Summit
 
堵俊平:Hadoop virtualization extensions
hdhappy001
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Databricks
 
John Willis Cc Use Cases
GovCloud Network
 
Running Spark on Cloud
Qubole
 
DEVNET-1106 Upcoming Services in OpenStack
Cisco DevNet
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Mary Kypreos
 
Yarn
Yu Xia
 
K046045964
IJERA Editor
 
Yarns About Yarn
Cloudera, Inc.
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Mac Moore
 
Spark One Platform Webinar
Cloudera, Inc.
 
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
DataWorks Summit
 
Huawei Advanced Data Science With Spark Streaming
Jen Aman
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 

Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters