SlideShare a Scribd company logo
Edwina Lu and Ye Zhou,
Metrics-Driven Tuning of
Apache Spark at Scale
Hadoop Infra @ LinkedIn
• 10+ clusters
• 10,000+ nodes
• 1000+ users
2
Number of daily Spark apps for one cluster: close to
3K, a 2.4x increase in last 3 quarters
Spark applications consume 25% of resources,
average daily Spark resource consumption: 1.6 PBHr
3
Spark @ LinkedIn
0
500
1000
1500
2000
2500
3000
Number of Applications per Day
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
Average Daily Resource Usage
Spark Non-Spark
What We Discovered About Spark Usage
Only ~34% of allocated memory was
actually used.
Example application:
 200 executors
 spark.driver.memory: 16GB
 spark.executor.memory: 16GB
 Max executor JVM used memory: 6.6GB
 Max driver JVM used memory: 5.4GB
 Total wasted memory: 1.8TB
 Time: 1h
4
34%
61%
5%
Executor Memory
Peak Used JVM
Memory
Unused Executor
Memory
Reserved Memory
Memory Tuning: Motivation
• Memory and CPUs cost money
• These are limited resources, so must be used efficiently
• With 34% of allocated memory used, if memory usage is more efficient,
we can run 2-3 times as many Spark applications on the same
hardware
5
Memory Tuning: What and How to Tune?
• Spark tuning can be
complicated, with many
metrics and
configuration
parameters
• Many users have
limited knowledge
about how to tune
Spark applications
6
Memory Tuning: Scaling
• Data scientist and engineer time cost even more money
• Analyzing applications and giving tuning advice in person does not
scale for the Spark team or users who must wait for help
• Infrastructure efficiency vs. developer productivity
– Do we have to choose between these two?
7
Dr. Elephant
• Performance monitoring and tuning service
• Identify badly tuned applications and causes
• Provide actionable advice for fixing issues
• Compare performance changes over time
8
Dr. Elephant: How does it Work?
9
Metrics
Fetcher
History
Server
Application
Fetcher
Resource
Manager
Run
Rule 1
Run
Rule 2
Run
Rule 3
Database
Dr. Elephant UI
Challenges for Dr. Elephant to Support Spark
• Spark tuning heuristics
– What are the necessary metrics to enable effective tuning?
• Fetch Spark history
– Spark components are not equally scalable
10
Spark Memory Overview
11
Executor Memory
spark.executor.memory
Overhead (off-heap
memory)
spark.yarn.executor.memoryOverhead
max(executorMemory * 0.1, 384MB)
Execution Memory Storage Memory
spark.memory.storageFraction
Reserved Memory
300 MB
User Memory
1 – spark.memory.fraction = 0.4
Executor Container
UNIFIED MEMORY
spark.memory.fraction = 0.6
JVMUSEDMEMORY
EXECUTORMEMORY
Executor JVM Used Memory Heuristic
Spark
Executor
Memory
Peak JVM
Used Memory
Reserved
Memory
16GB
275.9MB300MBWastedMemory
Executor JVM Used Memory
Severity: Severe
The configured executor memory is much higher than
the maximum amount of JVM used by executors.
Please set spark.executor.memory to a lower value.
spark.executor.memory: 16 GB
Max executor peak JVM used memory: 6.6 GB
Suggested spark.executor.memory: 7 GB
12
Executor Unified Memory Heuristic
Unified
Memory
Peak
Unified
Memory
8.36GB
474.42KBWastedMemory
Executor Peak Unified Memory
Severity: Critical
The allocated unified memory is much higher than the
maximum amount of unified memory used by executors.
Please lower spark.memory.fraction.
spark.executor.memory: 10 GB
spark.memory.fraction: 0.6
Allocated unified memory: 6 GB
Max peak JVM used memory: 7.2 GB
Max peak unified memory: 1.2 GB
Suggested spark.memory.fraction: 0.2
13
Execution Memory Spill Heuristic
Disk
Executor
Memory Unified
Memory
Execution Memory Spill
Severity: Severe
Execution memory spill has been detected in stage 3. Shuffle
read bytes and spill are evenly distributed. There are 200 tasks
for this stage. Please increase spark.sql.shuffle.partitions, or
modify the code to use more partitions, or reduce the number of
executor cores.
spark.executor.memory 10 GB
spark.executor.cores 3
spark.executor.instances 300
Stage 3:
Median shuffle read bytes: 954 MB
Max shuffle read bytes: 955 MB
Median shuffle write bytes: 359 MB
Max shuffle write bytes: 388 MB
Median memoryBytesSpilled: 1.2 GB
Max memoryBytesSpilled: 1.2 GB
Num tasks: 200
14
Executor GC Heuristic
13 Seconds
2 Minutes
Executor Runtime
GCTime
Executor GC
Severity: Moderate
Executors are spending too much time in GC. Please
increase spark.executor.memory.
Spark.executor.memory: 4 GB
GC time to executor run time ratio: 0.164
Total executor run time: 1 Hour 15 Minutes
Total GC time: 12 Minutes
15
Automating Spark Tuning with Dr. Elephant @ LinkedIn
Well Tuned? Ship It!
Production
Tune It!
Yes
No
Development
16
Architecture
Executor
Task Task
Cache Driver
Task Scheduler
Listener Bus
Executor
Task Task
Cach
e
Executor
Task Task
Cache
HDFS
Spark
History Logs
Spark
History
Server
DAG Scheduler
EventLoggi
ng Listener
AppState
Listener
Task
Heartbeats
Task
Task
Heartbeats
Heartbeats REST
API
Web
UI
17
Upstream Ticket
SPARK-23206: Additional Memory Tuning Metrics
• New executor level memory metrics:
– JVM used memory
– Execution memory
– Storage memory
– Unified memory
• Metrics sent from executors to driver via Heartbeat
• Peak values for executor metrics logged at stage end
• Metrics exposed via web UI and REST API
18
Overview of our Solution
Scalable
application
metrics provider
Spark History
Server (SHS)
Enhancements
on SHS
Benefits brought by
enhanced SHS
Scalable
application history
provider
Dr Elephant
Performance
analysis at scale
Debug
Easy investigation
of past applications
19
Spark History Server (SHS) at LinkedIn
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
Log Parsing
Web UI Rest APIs
20
How does SHS work?
Apps DBsListing DB
Queued
Thread Pool
Update
Jetty Handlers
Thread Pool
Createhttp://www.yoursite.com
http://www.yoursite.com
http://www.yoursite.com
SHS
SPARK-18085
21
Not Happy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
Log Parsing
Web UI
Rest APIs
22
SHS Issues
• Missing applications
– Users cannot find their applications on the home page
• Extended loading time
– Application details page take a very long time (up to 0.5 hour) to load
• Handling large history files
– SHS gets completely stalled
• Handling high-volume concurrent requests
– SHS doesn’t return expected JSON response
23
Missing Applications
1
2
3
4
Submit Job
Start running
Job Failed
Check it out on
SHS
24
5
6
7
8Wait SHS to catch
up
Finally it shows
up
Check out the
details
Keep loading…
No response
Extended Loading Time
25
Extended Listing Delay
Listing DB
History Files
Update
1. Replay same file multiple times
2. Limited threads for the replay
3. Processing time proportional to file size
26
How to Decrease the Listing Delay
Listing DB
Read from
extended
attributes
Spark
Driver
Write log file content
Write log file extended
attributes key/value
Read from log content
when fail to read from
extended attributes
1
2
NameNode
• Use HDFS Extended Attributes
27
Extended Loading Delay
Apps DBs
Request
Response
SHS
Replaying all the events takes a long time for large log file
Replay
28
How to Decrease the Loading Delay
• DB creation time is unavoidable
• Start DB creation prior to User’s request for every application log file
Apps DBs
Request
SHS
Replay
Request
Response
29
Results of Improvement
• SHS can get the completed/running application information into home
page within 1 minute.
• Start to create DBs in 5 minutes for 90% applications right after they finish
30
Scalability Issues
• Increasing number of
Spark applications
• Increasing Spark users
31
Severe Garbage Collection (GC)
Full GC Full GC Full GCFull GC
32
What Caused GC?
• Unnecessary events used too
much memory while replaying
• SHS got completely stalled
• SHS needs to ignore those
unnecessary events
33
23GB
High-Volume Concurrent Requests
• When REST call frequency
goes beyond certain threshold,
SHS is likely to return non-
JSON response to users
• Home page shows empty list
34
Upstream Tickets
SPARK-23607: Use HDFS extended attributes to store application
summary
SPARK-21961: Filter out BlockStatusUpdates in History Server
when analyzing logs
SPARK-23608: Synchronize when attaching and detaching
SparkUI in History Server
35
Results
• User can always find their applications on
SHS home page within 1 minute
• For 90% of applications DBs, SHS will start
creating them within 5 minutes after they
complete
• Stable and reliable service
• Handle high-volume concurrent requests
36
Future Work
• More memory metrics:
– Netty memory
– Total memory
• More Tuning:
– Skew in assignment of tasks to executors
– Size/time skew in tasks for a stage
– DAG analysis
• Incremental Replay for History Logs
• Horizontal Scalable History Server
37
Q&A
38

More Related Content

What's hot (20)

PPTX
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
What’s new in Apache Spark 2.3
DataWorks Summit
 
PPTX
What's new in apache hive
DataWorks Summit
 
PPTX
End-to-End Security and Auditing in a Big Data as a Service Deployment
DataWorks Summit/Hadoop Summit
 
PDF
Application Architectures with Hadoop
hadooparchbook
 
PPTX
YARN Ready: Apache Spark
Hortonworks
 
PPTX
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
PPTX
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
 
PPT
Running Spark in Production
DataWorks Summit/Hadoop Summit
 
PPTX
Enabling Modern Application Architecture using Data.gov open government data
DataWorks Summit
 
PPTX
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit
 
PPTX
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop Platform at Yahoo
DataWorks Summit/Hadoop Summit
 
PDF
Improving HDFS Availability with IPC Quality of Service
DataWorks Summit
 
PPTX
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
PPTX
The Unbearable Lightness of Ephemeral Processing
DataWorks Summit
 
PPTX
Accelerating TensorFlow with RDMA for high-performance deep learning
DataWorks Summit
 
PPTX
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
PPTX
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Alex Zeltov
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
What’s new in Apache Spark 2.3
DataWorks Summit
 
What's new in apache hive
DataWorks Summit
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
DataWorks Summit/Hadoop Summit
 
Application Architectures with Hadoop
hadooparchbook
 
YARN Ready: Apache Spark
Hortonworks
 
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
 
Running Spark in Production
DataWorks Summit/Hadoop Summit
 
Enabling Modern Application Architecture using Data.gov open government data
DataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
Hadoop Platform at Yahoo
DataWorks Summit/Hadoop Summit
 
Improving HDFS Availability with IPC Quality of Service
DataWorks Summit
 
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
The Unbearable Lightness of Ephemeral Processing
DataWorks Summit
 
Accelerating TensorFlow with RDMA for high-performance deep learning
DataWorks Summit
 
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Alex Zeltov
 

Similar to Metrics-driven tuning of Apache Spark at scale (20)

PDF
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Databricks
 
PDF
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Databricks
 
PDF
Spark Tuning for Enterprise System Administrators
Alpine Data
 
PDF
Spark tuning2016may11bida
Anya Bida
 
PPTX
Dr. Elephant – Achieving Quicker, Easier, and Cost-Effective Big Data Analyti...
Akshay Rai
 
PDF
Spark Tuning for Enterprise System Administrators
Anya Bida
 
PDF
Apache Spark Performance is too hard. Let's make it easier
Databricks
 
PPTX
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
Akshay Rai
 
PDF
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Summit
 
PDF
Spark Autotuning - Spark Summit East 2017
Alpine Data
 
PDF
Spark Autotuning - Strata EU 2018
Holden Karau
 
PDF
Spark Autotuning Talk - Strata New York
Holden Karau
 
PDF
Spark 2.x Troubleshooting Guide
IBM
 
PPTX
Understanding Spark Tuning: Strata New York
Rachel Warren
 
PPTX
Spark autotuning talk final
Rachel Warren
 
PDF
How to Automate Performance Tuning for Apache Spark
Databricks
 
PDF
Spark Tuning for Enterprise System Administrators By Anya Bida
Spark Summit
 
PDF
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Anya Bida
 
PDF
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...
Spark Summit
 
PDF
What is New with Apache Spark Performance Monitoring in Spark 3.0
Databricks
 
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Databricks
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Databricks
 
Spark Tuning for Enterprise System Administrators
Alpine Data
 
Spark tuning2016may11bida
Anya Bida
 
Dr. Elephant – Achieving Quicker, Easier, and Cost-Effective Big Data Analyti...
Akshay Rai
 
Spark Tuning for Enterprise System Administrators
Anya Bida
 
Apache Spark Performance is too hard. Let's make it easier
Databricks
 
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
Akshay Rai
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Summit
 
Spark Autotuning - Spark Summit East 2017
Alpine Data
 
Spark Autotuning - Strata EU 2018
Holden Karau
 
Spark Autotuning Talk - Strata New York
Holden Karau
 
Spark 2.x Troubleshooting Guide
IBM
 
Understanding Spark Tuning: Strata New York
Rachel Warren
 
Spark autotuning talk final
Rachel Warren
 
How to Automate Performance Tuning for Apache Spark
Databricks
 
Spark Tuning for Enterprise System Administrators By Anya Bida
Spark Summit
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Anya Bida
 
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...
Spark Summit
 
What is New with Apache Spark Performance Monitoring in Spark 3.0
Databricks
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Ad

Recently uploaded (20)

PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 

Metrics-driven tuning of Apache Spark at scale

  • 1. Edwina Lu and Ye Zhou, Metrics-Driven Tuning of Apache Spark at Scale
  • 2. Hadoop Infra @ LinkedIn • 10+ clusters • 10,000+ nodes • 1000+ users 2
  • 3. Number of daily Spark apps for one cluster: close to 3K, a 2.4x increase in last 3 quarters Spark applications consume 25% of resources, average daily Spark resource consumption: 1.6 PBHr 3 Spark @ LinkedIn 0 500 1000 1500 2000 2500 3000 Number of Applications per Day 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 Average Daily Resource Usage Spark Non-Spark
  • 4. What We Discovered About Spark Usage Only ~34% of allocated memory was actually used. Example application:  200 executors  spark.driver.memory: 16GB  spark.executor.memory: 16GB  Max executor JVM used memory: 6.6GB  Max driver JVM used memory: 5.4GB  Total wasted memory: 1.8TB  Time: 1h 4 34% 61% 5% Executor Memory Peak Used JVM Memory Unused Executor Memory Reserved Memory
  • 5. Memory Tuning: Motivation • Memory and CPUs cost money • These are limited resources, so must be used efficiently • With 34% of allocated memory used, if memory usage is more efficient, we can run 2-3 times as many Spark applications on the same hardware 5
  • 6. Memory Tuning: What and How to Tune? • Spark tuning can be complicated, with many metrics and configuration parameters • Many users have limited knowledge about how to tune Spark applications 6
  • 7. Memory Tuning: Scaling • Data scientist and engineer time cost even more money • Analyzing applications and giving tuning advice in person does not scale for the Spark team or users who must wait for help • Infrastructure efficiency vs. developer productivity – Do we have to choose between these two? 7
  • 8. Dr. Elephant • Performance monitoring and tuning service • Identify badly tuned applications and causes • Provide actionable advice for fixing issues • Compare performance changes over time 8
  • 9. Dr. Elephant: How does it Work? 9 Metrics Fetcher History Server Application Fetcher Resource Manager Run Rule 1 Run Rule 2 Run Rule 3 Database Dr. Elephant UI
  • 10. Challenges for Dr. Elephant to Support Spark • Spark tuning heuristics – What are the necessary metrics to enable effective tuning? • Fetch Spark history – Spark components are not equally scalable 10
  • 11. Spark Memory Overview 11 Executor Memory spark.executor.memory Overhead (off-heap memory) spark.yarn.executor.memoryOverhead max(executorMemory * 0.1, 384MB) Execution Memory Storage Memory spark.memory.storageFraction Reserved Memory 300 MB User Memory 1 – spark.memory.fraction = 0.4 Executor Container UNIFIED MEMORY spark.memory.fraction = 0.6 JVMUSEDMEMORY EXECUTORMEMORY
  • 12. Executor JVM Used Memory Heuristic Spark Executor Memory Peak JVM Used Memory Reserved Memory 16GB 275.9MB300MBWastedMemory Executor JVM Used Memory Severity: Severe The configured executor memory is much higher than the maximum amount of JVM used by executors. Please set spark.executor.memory to a lower value. spark.executor.memory: 16 GB Max executor peak JVM used memory: 6.6 GB Suggested spark.executor.memory: 7 GB 12
  • 13. Executor Unified Memory Heuristic Unified Memory Peak Unified Memory 8.36GB 474.42KBWastedMemory Executor Peak Unified Memory Severity: Critical The allocated unified memory is much higher than the maximum amount of unified memory used by executors. Please lower spark.memory.fraction. spark.executor.memory: 10 GB spark.memory.fraction: 0.6 Allocated unified memory: 6 GB Max peak JVM used memory: 7.2 GB Max peak unified memory: 1.2 GB Suggested spark.memory.fraction: 0.2 13
  • 14. Execution Memory Spill Heuristic Disk Executor Memory Unified Memory Execution Memory Spill Severity: Severe Execution memory spill has been detected in stage 3. Shuffle read bytes and spill are evenly distributed. There are 200 tasks for this stage. Please increase spark.sql.shuffle.partitions, or modify the code to use more partitions, or reduce the number of executor cores. spark.executor.memory 10 GB spark.executor.cores 3 spark.executor.instances 300 Stage 3: Median shuffle read bytes: 954 MB Max shuffle read bytes: 955 MB Median shuffle write bytes: 359 MB Max shuffle write bytes: 388 MB Median memoryBytesSpilled: 1.2 GB Max memoryBytesSpilled: 1.2 GB Num tasks: 200 14
  • 15. Executor GC Heuristic 13 Seconds 2 Minutes Executor Runtime GCTime Executor GC Severity: Moderate Executors are spending too much time in GC. Please increase spark.executor.memory. Spark.executor.memory: 4 GB GC time to executor run time ratio: 0.164 Total executor run time: 1 Hour 15 Minutes Total GC time: 12 Minutes 15
  • 16. Automating Spark Tuning with Dr. Elephant @ LinkedIn Well Tuned? Ship It! Production Tune It! Yes No Development 16
  • 17. Architecture Executor Task Task Cache Driver Task Scheduler Listener Bus Executor Task Task Cach e Executor Task Task Cache HDFS Spark History Logs Spark History Server DAG Scheduler EventLoggi ng Listener AppState Listener Task Heartbeats Task Task Heartbeats Heartbeats REST API Web UI 17
  • 18. Upstream Ticket SPARK-23206: Additional Memory Tuning Metrics • New executor level memory metrics: – JVM used memory – Execution memory – Storage memory – Unified memory • Metrics sent from executors to driver via Heartbeat • Peak values for executor metrics logged at stage end • Metrics exposed via web UI and REST API 18
  • 19. Overview of our Solution Scalable application metrics provider Spark History Server (SHS) Enhancements on SHS Benefits brought by enhanced SHS Scalable application history provider Dr Elephant Performance analysis at scale Debug Easy investigation of past applications 19
  • 20. Spark History Server (SHS) at LinkedIn >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Log Parsing Web UI Rest APIs 20
  • 21. How does SHS work? Apps DBsListing DB Queued Thread Pool Update Jetty Handlers Thread Pool Createhttp://www.yoursite.com http://www.yoursite.com http://www.yoursite.com SHS SPARK-18085 21
  • 23. SHS Issues • Missing applications – Users cannot find their applications on the home page • Extended loading time – Application details page take a very long time (up to 0.5 hour) to load • Handling large history files – SHS gets completely stalled • Handling high-volume concurrent requests – SHS doesn’t return expected JSON response 23
  • 24. Missing Applications 1 2 3 4 Submit Job Start running Job Failed Check it out on SHS 24
  • 25. 5 6 7 8Wait SHS to catch up Finally it shows up Check out the details Keep loading… No response Extended Loading Time 25
  • 26. Extended Listing Delay Listing DB History Files Update 1. Replay same file multiple times 2. Limited threads for the replay 3. Processing time proportional to file size 26
  • 27. How to Decrease the Listing Delay Listing DB Read from extended attributes Spark Driver Write log file content Write log file extended attributes key/value Read from log content when fail to read from extended attributes 1 2 NameNode • Use HDFS Extended Attributes 27
  • 28. Extended Loading Delay Apps DBs Request Response SHS Replaying all the events takes a long time for large log file Replay 28
  • 29. How to Decrease the Loading Delay • DB creation time is unavoidable • Start DB creation prior to User’s request for every application log file Apps DBs Request SHS Replay Request Response 29
  • 30. Results of Improvement • SHS can get the completed/running application information into home page within 1 minute. • Start to create DBs in 5 minutes for 90% applications right after they finish 30
  • 31. Scalability Issues • Increasing number of Spark applications • Increasing Spark users 31
  • 32. Severe Garbage Collection (GC) Full GC Full GC Full GCFull GC 32
  • 33. What Caused GC? • Unnecessary events used too much memory while replaying • SHS got completely stalled • SHS needs to ignore those unnecessary events 33 23GB
  • 34. High-Volume Concurrent Requests • When REST call frequency goes beyond certain threshold, SHS is likely to return non- JSON response to users • Home page shows empty list 34
  • 35. Upstream Tickets SPARK-23607: Use HDFS extended attributes to store application summary SPARK-21961: Filter out BlockStatusUpdates in History Server when analyzing logs SPARK-23608: Synchronize when attaching and detaching SparkUI in History Server 35
  • 36. Results • User can always find their applications on SHS home page within 1 minute • For 90% of applications DBs, SHS will start creating them within 5 minutes after they complete • Stable and reliable service • Handle high-volume concurrent requests 36
  • 37. Future Work • More memory metrics: – Netty memory – Total memory • More Tuning: – Skew in assignment of tasks to executors – Size/time skew in tasks for a stage – DAG analysis • Incremental Replay for History Logs • Horizontal Scalable History Server 37