SlideShare a Scribd company logo
Native Support of Prometheus Monitoring
in Apache Spark 3
Dongjoon Hyun
DB Tsai
SPARK+AI SUMMIT 2020
Who am I
Dongjoon Hyun
Apache Spark PMC and Committer
Apache ORC PMC and Committer
Apache REEF PMC and Committer
https://github.com/dongjoon-hyun
https://www.linkedin.com/in/dongjoon
@dongjoonhyun
Who am I
DB Tsai
Apache Spark PMC and Committer
Apache SystemML PMC and Committer
Apache Yunikorn Committer
Apache Bahir Committer
https://github.com/dbtsai
https://www.linkedin.com/in/dbtsai
@dbtsai
Three popular methods
Monitoring Apache Spark
Web UI (Live and History Server)
• Jobs, Stages,Tasks, SQL queries
• Executors, Storage
Logs
• Event logs and Spark process logs
• Listeners (SparkListener, StreamingQueryListener, SparkStatusTracker, …)
Metrics
• Various numeric values
Early warning instead of post-mortem process
Metrics are useful to handle gray failures
Monitoring and alerting Spark jobs’gray failures
• Memory Leak or misconfiguration
• Performance degradation
• Growing streaming job’s inter-states
An open-source systems monitoring and alerting toolkit
Prometheus
Provides
• a multi-dimensional data model
• operational simplicity
• scalable data collection
• a powerful query language
A good option for Apache Spark Metrics
Prometheus Server
Prometheus Web UI
Alert Manager
Pushgateway
https://en.wikipedia.org/wiki/Prometheus_(software)
Using JmxSink and JMXExporter combination
Spark 2 with Prometheus (1/3)
Enable Spark’s built-in JmxSink in Spark’s conf/metrics.properties
Deploy Prometheus' JMXExporter library and its config file
Expose JMXExporter port, 9404, to Prometheus
Add `-javaagent` option to the target (master/worker/executor/driver/…)
-javaagent:./jmx_prometheus_javaagent-0.12.0.jar=9404:config.yaml
Using GraphiteSink and GraphiteExporter combination
Spark 2 with Prometheus (2/3)
Set up Graphite server
Enable Spark’s built-in Graphite Sink with several configurations
Enable Prometheus’GraphiteExporter at Graphite
Custom sink (or 3rd party Sink) + Pushgateway server
Spark 2 with Prometheus (3/3)
Set up Pushgateway server
Develop a custom sink (or use 3rd party libs) with Prometheus dependency
Deploy the sink libraries and its configuration file to the cluster
Pros and Cons
Pros
• Used already in production
• A general approach
Cons
• Difficult to setup at new environments
• Some custom libraries may have a dependency on Spark versions
Easy usage
Goal in Apache Spark 3
Be independent from the existing Metrics pipeline
• Use new endpoints and disable it by default
• Avoid introducing new dependency
Reuse the existing resources
• Use official documented ports of Master/Worker/Driver
• Take advantage of Prometheus Service Discovery in K8s as much as possible
What's new in Spark 3 Metrics
SPARK-29674 / SPARK-29557
DropWizard Metrics 4 for JDK11
Timeline
2.3 3.02.41.6 2.1 2.22.0
4.1.13.1.53.1.2DropWizard Metrics
Spark
20202019201820172016Year
DropWizard Metrics 4.x (Spark 3)
SPARK-29674 / SPARK-29557
DropWizard Metrics 4 for JDK11
Timeline
DropWizard Metrics 3.x (Spark 1/2)
metrics_master_workers_Value 0.0 metrics_master_workers_Value{type="gauges",} 0.0
metrics_master_workers_Number{type=“gauges",} 0.0
2.3 3.02.41.6 2.1 2.22.0
4.1.13.1.53.1.2DropWizard Metrics
Spark
20202019201820172016Year
A new metric source
ExecutorMetricsSource
Collect executor memory metrics to driver and expose it as ExecutorMetricsSource and
REST API (SPARK-23429, SPARK-27189, SPARK-27324, SPARK-24958)
• JVMHeapMemory / JVMOffHeapMemory
• OnHeapExecutionMemory / OffHeapExecutionMemory
• OnHeapStorageMemory / OffHeapStorageMemory
• OnHeapUnifiedMemory / OffHeapUnifiedMemory
• DirectPoolMemory / MappedPoolMemory
• MinorGCCount / MinorGCTime
• MajorGCCount / MajorGCTime
• ProcessTreeJVMVMemory
• ProcessTreeJVMRSSMemory
• ProcessTreePythonVMemory
• ProcessTreePythonRSSMemory
• ProcessTreeOtherVMemory
• ProcessTreeOtherRSSMemory
JVM Process Tree
Prometheus-format endpoints
Support Prometheus more natively (1/2)
PrometheusServlet: A friend of MetricSevlet
• A new metric sink supporting Prometheus-format (SPARK-29032)
• Unified way of configurations via conf/metrics.properties
• No additional system requirements (services / libraries / ports)
Prometheus-format endpoints
Support Prometheus more natively (1/2)
PrometheusServlet: A friend of MetricSevlet
• A new metric sink supporting Prometheus-format (SPARK-29032)
• Unified way of configurations via conf/metrics.properties
• No additional system requirements (services / libraries / ports)
PrometheusResource: A single endpoint for all executor memory metrics
• A new metric endpoint to export all executor metrics at driver (SPARK-29064/SPARK-29400)
• The most efficient way to discover and collect because driver has all information already
• Enabled by `spark.ui.prometheus.enabled` (default:false)
spark_info and service discovery
Support Prometheus more natively (2/2)
Add spark_info metric (SPARK-31743)
• A standard Prometheus way to expose
version and revision
• Monitoring Spark jobs per version
Support driver service annotation in K8S (SPARK-31696)
• Used by Prometheus service discovery
Under the hood
SPARK-29032AddPrometheusServlettomonitorMaster/Worker/Driver
PrometheusServlet
Make Master/Worker/Driver expose the metrics in Prometheus format at the existing port
Follow the output style of "Spark JMXSink + Prometheus JMXExporter + javaagent" way
Port Prometheus Endpoint (New in 3.0) JSON Endpoint (Since initial release)
Driver 4040 /metrics/prometheus/ /metrics/json/
Worker 8081 /metrics/prometheus/ /metrics/json/
Master 8080 /metrics/master/prometheus/ /metrics/master/json/
Master 8080 /metrics/applications/prometheus/ /metrics/applications/json/
Spark Driver Endpoint Example
Use conf/metrics.properties like the other sinks
PrometheusServlet Configuration
Copy conf/metrics.properties.template to conf/metrics.properties
Uncomment like the following in conf/metrics.properties
*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus
SPARK-29064AddPrometheusResourcetoexportexecutormetrics
PrometheusResource
New endpoint with the similar information of JSON endpoint
Driver exposes all executor memory metrics in Prometheus format
Port Prometheus Endpoint (New in 3.0) JSON Endpoint (Since 1.4)
Driver 4040 /metrics/executors/prometheus/ /api/v1/applications/{id}/executors/
Use spark.ui.prometheus.enabled
PrometheusResource Configuration
Run spark-shell with configuration
Run `curl` with the new endpoint
$ bin/spark-shell 
-c spark.ui.prometheus.enabled=true 
-c spark.executor.processTreeMetrics.enabled=true
$ curl http://localhost:4040/metrics/executors/prometheus/ | grep executor | head -n1
metrics_executor_rddBlocks{application_id="...", application_name="...", executor_id="..."} 0
Monitoring in K8s cluster
Key Monitoring Scenarios on K8s clusters
Monitoring batch job memory behavior
Monitoring dynamic allocation behavior
Monitoring streaming job behavior
Key Monitoring Scenarios on K8s clusters
Monitoring batch job memory behavior
Monitoring dynamic allocation behavior
Monitoring streaming job behavior
=> A risk to be killed?
=> Unexpected slowness?
=> Latency?
Use Prometheus Service Discovery
Monitoring batch job memory behavior (1/2)
Configuration Value
spark.ui.prometheus.enabled true
spark.kubernetes.driver.annotation.prometheus.io/scrape true
spark.kubernetes.driver.annotation.prometheus.io/path /metrics/executors/prometheus/
spark.kubernetes.driver.annotation.prometheus.io/port 4040
Monitoring batch job memory behavior (2/2)
spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster 
-c spark.driver.memory=2g 
-c spark.executor.instances=30 
-c spark.ui.prometheus.enabled=true 
-c spark.kubernetes.driver.annotation.prometheus.io/scrape=true 
-c spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/executors/prometheus/ 
-c spark.kubernetes.driver.annotation.prometheus.io/port=4040 
-c spark.kubernetes.container.image=spark:3.0.0 
--class org.apache.spark.examples.SparkPi 
local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar 200000
OOMKilled at Driver
Set spark.dynamicAllocation.*
Monitoring dynamic allocation behavior
spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster 
-c spark.dynamicAllocation.enabled=true 
-c spark.dynamicAllocation.executorIdleTimeout=5 
-c spark.dynamicAllocation.shuffleTracking.enabled=true 
-c spark.dynamicAllocation.maxExecutors=50 
-c spark.ui.prometheus.enabled=true 
… (the same) …
https://gist.githubusercontent.com/dongjoon-hyun/.../dynamic-pi.py 10000
Set spark.dynamicAllocation.*
Monitoring dynamic allocation behavior
spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster 
-c spark.dynamicAllocation.enabled=true 
-c spark.dynamicAllocation.executorIdleTimeout=5 
-c spark.dynamicAllocation.shuffleTracking.enabled=true 
-c spark.dynamicAllocation.maxExecutors=50 
-c spark.ui.prometheus.enabled=true 
… (the same) …
https://gist.githubusercontent.com/dongjoon-hyun/.../dynamic-pi.py 10000
`dynamic-pi.py` computes Pi, sleeps 1 minutes, and computes Pi again.
Select a single Spark app
rate(metrics_executor_totalTasks_total{...}[1m])
Inform Prometheus both metrics endpoints
Driver service annotation
spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster 
-c spark.ui.prometheus.enabled=true 
-c spark.kubernetes.driver.annotation.prometheus.io/scrape=true 
-c spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/prometheus/ 
-c spark.kubernetes.driver.annotation.prometheus.io/port=4040 
-c spark.kubernetes.driver.service.annotation.prometheus.io/scrape=true 
-c spark.kubernetes.driver.service.annotation.prometheus.io/path=/metrics/executors/prometheus/ 
-c spark.kubernetes.driver.service.annotation.prometheus.io/port=4040 
…
spark.dynamicAllocation.maxExecutors=30
spark.dynamicAllocation.maxExecutors=300
Executor Allocation Ratio
Set spark.sql.streaming.metricsEnabled=true (default:false)
Monitoring streaming job behavior (1/2)
Metrics
• latency
• inputRate-total
• processingRate-total
• states-rowsTotal
• states-usedBytes
• eventTime-watermark
Prefix of streaming query metric names
• metrics_[namespace]_spark_streaming_[queryName]
•
All metrics are important for alert
Monitoring streaming job behavior (2/2)
latency > micro-batch interval
• Spark can endure some situations, but the job needs to be re-design to prevent future
outage
states-rowsTotal grows indefinitely
• These jobs will die eventually due to OOM
- SPARK-27340 Alias on TimeWindow expression cause watermark metadata lost (Fixed at 3.0)
- SPARK-30553 Fix structured-streaming java example error
Separation of concerns
Prometheus Federation and Alert
Prometheus
Server
Prometheus Web UI
Alert Manager
Pushgateway
namespace1 (User)
… Prometheus
Server
Prometheus Web UI
Alert Manager
Pushgateway
namespace2 (User)
Prometheus
Server
Prometheus Web UI
Alert Manager
Pushgateway
Cluster-wise prometheus (Admin)
Metrics for batch job monitoring Metrics for streaming job monitoring
a subset of metrics
(spark_info + ...)
New endpoints are still experimental
Limitations and Tips
New endpoints expose only Spark metrics starting with `metrics_` or `spark_info`
• `javaagent` method can expose more metrics like `jvm_info`
PrometheusSevlet does not follow Prometheus naming convention
• Instead, it's designed to follow Spark 2 naming convention for consistency in Spark
The number of metrics grows if we don't set the followings
writeStream.queryName("spark")
spark.metrics.namespace=spark
Summary
Spark 3 provides a better integration with Prometheus monitoring
• Especially, in K8s environment, the metric collections become much easier than Spark 2
New Prometheus style endpoints are independent and additional options
• Users can migrate into new endpoints or use them with the existing methods in a mixed
way
Thank you!
Ad

More Related Content

What's hot (20)

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
colorant
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected Ways
Databricks
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
Databricks
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale PlatformsBest Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Databricks
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
Sandy Ryza
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
colorant
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected Ways
Databricks
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
Databricks
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale PlatformsBest Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Databricks
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
Sandy Ryza
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 

Similar to Native Support of Prometheus Monitoring in Apache Spark 3.0 (20)

Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3
Dongjoon Hyun
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
Databricks
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Service Mesh - Observability
Service Mesh - ObservabilityService Mesh - Observability
Service Mesh - Observability
Araf Karsh Hamid
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
GetInData
 
What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0
Databricks
 
Apache spark 2.4 and beyond
Apache spark 2.4 and beyondApache spark 2.4 and beyond
Apache spark 2.4 and beyond
Xiao Li
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Spark Summit
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To Prometheus
Etienne Coutaud
 
The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]
Mahmoud Hatem
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Stavros Kontopoulos
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Provectus
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
Saltlux Inc.
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
Databricks
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
GetInData
 
Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3
Dongjoon Hyun
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
Databricks
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Service Mesh - Observability
Service Mesh - ObservabilityService Mesh - Observability
Service Mesh - Observability
Araf Karsh Hamid
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
GetInData
 
What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0
Databricks
 
Apache spark 2.4 and beyond
Apache spark 2.4 and beyondApache spark 2.4 and beyond
Apache spark 2.4 and beyond
Xiao Li
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Spark Summit
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To Prometheus
Etienne Coutaud
 
The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]
Mahmoud Hatem
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Provectus
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
Saltlux Inc.
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
Databricks
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
GetInData
 
Ad

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Ad

Recently uploaded (20)

Call illuminati Agent in uganda+256776963507/0741506136
Call illuminati Agent in uganda+256776963507/0741506136Call illuminati Agent in uganda+256776963507/0741506136
Call illuminati Agent in uganda+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Induction Program of MTAB online session
Induction Program of MTAB online sessionInduction Program of MTAB online session
Induction Program of MTAB online session
LOHITH886892
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
AllContacts Vs AllSubscribers - SFMC.pptx
AllContacts Vs AllSubscribers - SFMC.pptxAllContacts Vs AllSubscribers - SFMC.pptx
AllContacts Vs AllSubscribers - SFMC.pptx
bpkr84
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Chromatography_Detailed_Information.docx
Chromatography_Detailed_Information.docxChromatography_Detailed_Information.docx
Chromatography_Detailed_Information.docx
NohaSalah45
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Andhra Pradesh Micro Irrigation Project”
Andhra Pradesh Micro Irrigation Project”Andhra Pradesh Micro Irrigation Project”
Andhra Pradesh Micro Irrigation Project”
vzmcareers
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptxPRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
JayeshTaneja4
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Induction Program of MTAB online session
Induction Program of MTAB online sessionInduction Program of MTAB online session
Induction Program of MTAB online session
LOHITH886892
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
AllContacts Vs AllSubscribers - SFMC.pptx
AllContacts Vs AllSubscribers - SFMC.pptxAllContacts Vs AllSubscribers - SFMC.pptx
AllContacts Vs AllSubscribers - SFMC.pptx
bpkr84
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Chromatography_Detailed_Information.docx
Chromatography_Detailed_Information.docxChromatography_Detailed_Information.docx
Chromatography_Detailed_Information.docx
NohaSalah45
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Andhra Pradesh Micro Irrigation Project”
Andhra Pradesh Micro Irrigation Project”Andhra Pradesh Micro Irrigation Project”
Andhra Pradesh Micro Irrigation Project”
vzmcareers
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptxPRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
JayeshTaneja4
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 

Native Support of Prometheus Monitoring in Apache Spark 3.0

  • 1. Native Support of Prometheus Monitoring in Apache Spark 3 Dongjoon Hyun DB Tsai SPARK+AI SUMMIT 2020
  • 2. Who am I Dongjoon Hyun Apache Spark PMC and Committer Apache ORC PMC and Committer Apache REEF PMC and Committer https://github.com/dongjoon-hyun https://www.linkedin.com/in/dongjoon @dongjoonhyun
  • 3. Who am I DB Tsai Apache Spark PMC and Committer Apache SystemML PMC and Committer Apache Yunikorn Committer Apache Bahir Committer https://github.com/dbtsai https://www.linkedin.com/in/dbtsai @dbtsai
  • 4. Three popular methods Monitoring Apache Spark Web UI (Live and History Server) • Jobs, Stages,Tasks, SQL queries • Executors, Storage Logs • Event logs and Spark process logs • Listeners (SparkListener, StreamingQueryListener, SparkStatusTracker, …) Metrics • Various numeric values
  • 5. Early warning instead of post-mortem process Metrics are useful to handle gray failures Monitoring and alerting Spark jobs’gray failures • Memory Leak or misconfiguration • Performance degradation • Growing streaming job’s inter-states
  • 6. An open-source systems monitoring and alerting toolkit Prometheus Provides • a multi-dimensional data model • operational simplicity • scalable data collection • a powerful query language A good option for Apache Spark Metrics Prometheus Server Prometheus Web UI Alert Manager Pushgateway https://en.wikipedia.org/wiki/Prometheus_(software)
  • 7. Using JmxSink and JMXExporter combination Spark 2 with Prometheus (1/3) Enable Spark’s built-in JmxSink in Spark’s conf/metrics.properties Deploy Prometheus' JMXExporter library and its config file Expose JMXExporter port, 9404, to Prometheus Add `-javaagent` option to the target (master/worker/executor/driver/…) -javaagent:./jmx_prometheus_javaagent-0.12.0.jar=9404:config.yaml
  • 8. Using GraphiteSink and GraphiteExporter combination Spark 2 with Prometheus (2/3) Set up Graphite server Enable Spark’s built-in Graphite Sink with several configurations Enable Prometheus’GraphiteExporter at Graphite
  • 9. Custom sink (or 3rd party Sink) + Pushgateway server Spark 2 with Prometheus (3/3) Set up Pushgateway server Develop a custom sink (or use 3rd party libs) with Prometheus dependency Deploy the sink libraries and its configuration file to the cluster
  • 10. Pros and Cons Pros • Used already in production • A general approach Cons • Difficult to setup at new environments • Some custom libraries may have a dependency on Spark versions
  • 11. Easy usage Goal in Apache Spark 3 Be independent from the existing Metrics pipeline • Use new endpoints and disable it by default • Avoid introducing new dependency Reuse the existing resources • Use official documented ports of Master/Worker/Driver • Take advantage of Prometheus Service Discovery in K8s as much as possible
  • 12. What's new in Spark 3 Metrics
  • 13. SPARK-29674 / SPARK-29557 DropWizard Metrics 4 for JDK11 Timeline 2.3 3.02.41.6 2.1 2.22.0 4.1.13.1.53.1.2DropWizard Metrics Spark 20202019201820172016Year
  • 14. DropWizard Metrics 4.x (Spark 3) SPARK-29674 / SPARK-29557 DropWizard Metrics 4 for JDK11 Timeline DropWizard Metrics 3.x (Spark 1/2) metrics_master_workers_Value 0.0 metrics_master_workers_Value{type="gauges",} 0.0 metrics_master_workers_Number{type=“gauges",} 0.0 2.3 3.02.41.6 2.1 2.22.0 4.1.13.1.53.1.2DropWizard Metrics Spark 20202019201820172016Year
  • 15. A new metric source ExecutorMetricsSource Collect executor memory metrics to driver and expose it as ExecutorMetricsSource and REST API (SPARK-23429, SPARK-27189, SPARK-27324, SPARK-24958) • JVMHeapMemory / JVMOffHeapMemory • OnHeapExecutionMemory / OffHeapExecutionMemory • OnHeapStorageMemory / OffHeapStorageMemory • OnHeapUnifiedMemory / OffHeapUnifiedMemory • DirectPoolMemory / MappedPoolMemory • MinorGCCount / MinorGCTime • MajorGCCount / MajorGCTime • ProcessTreeJVMVMemory • ProcessTreeJVMRSSMemory • ProcessTreePythonVMemory • ProcessTreePythonRSSMemory • ProcessTreeOtherVMemory • ProcessTreeOtherRSSMemory JVM Process Tree
  • 16. Prometheus-format endpoints Support Prometheus more natively (1/2) PrometheusServlet: A friend of MetricSevlet • A new metric sink supporting Prometheus-format (SPARK-29032) • Unified way of configurations via conf/metrics.properties • No additional system requirements (services / libraries / ports)
  • 17. Prometheus-format endpoints Support Prometheus more natively (1/2) PrometheusServlet: A friend of MetricSevlet • A new metric sink supporting Prometheus-format (SPARK-29032) • Unified way of configurations via conf/metrics.properties • No additional system requirements (services / libraries / ports) PrometheusResource: A single endpoint for all executor memory metrics • A new metric endpoint to export all executor metrics at driver (SPARK-29064/SPARK-29400) • The most efficient way to discover and collect because driver has all information already • Enabled by `spark.ui.prometheus.enabled` (default:false)
  • 18. spark_info and service discovery Support Prometheus more natively (2/2) Add spark_info metric (SPARK-31743) • A standard Prometheus way to expose version and revision • Monitoring Spark jobs per version Support driver service annotation in K8S (SPARK-31696) • Used by Prometheus service discovery
  • 20. SPARK-29032AddPrometheusServlettomonitorMaster/Worker/Driver PrometheusServlet Make Master/Worker/Driver expose the metrics in Prometheus format at the existing port Follow the output style of "Spark JMXSink + Prometheus JMXExporter + javaagent" way Port Prometheus Endpoint (New in 3.0) JSON Endpoint (Since initial release) Driver 4040 /metrics/prometheus/ /metrics/json/ Worker 8081 /metrics/prometheus/ /metrics/json/ Master 8080 /metrics/master/prometheus/ /metrics/master/json/ Master 8080 /metrics/applications/prometheus/ /metrics/applications/json/
  • 22. Use conf/metrics.properties like the other sinks PrometheusServlet Configuration Copy conf/metrics.properties.template to conf/metrics.properties Uncomment like the following in conf/metrics.properties *.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet *.sink.prometheusServlet.path=/metrics/prometheus master.sink.prometheusServlet.path=/metrics/master/prometheus applications.sink.prometheusServlet.path=/metrics/applications/prometheus
  • 23. SPARK-29064AddPrometheusResourcetoexportexecutormetrics PrometheusResource New endpoint with the similar information of JSON endpoint Driver exposes all executor memory metrics in Prometheus format Port Prometheus Endpoint (New in 3.0) JSON Endpoint (Since 1.4) Driver 4040 /metrics/executors/prometheus/ /api/v1/applications/{id}/executors/
  • 24. Use spark.ui.prometheus.enabled PrometheusResource Configuration Run spark-shell with configuration Run `curl` with the new endpoint $ bin/spark-shell -c spark.ui.prometheus.enabled=true -c spark.executor.processTreeMetrics.enabled=true $ curl http://localhost:4040/metrics/executors/prometheus/ | grep executor | head -n1 metrics_executor_rddBlocks{application_id="...", application_name="...", executor_id="..."} 0
  • 25. Monitoring in K8s cluster
  • 26. Key Monitoring Scenarios on K8s clusters Monitoring batch job memory behavior Monitoring dynamic allocation behavior Monitoring streaming job behavior
  • 27. Key Monitoring Scenarios on K8s clusters Monitoring batch job memory behavior Monitoring dynamic allocation behavior Monitoring streaming job behavior => A risk to be killed? => Unexpected slowness? => Latency?
  • 28. Use Prometheus Service Discovery Monitoring batch job memory behavior (1/2) Configuration Value spark.ui.prometheus.enabled true spark.kubernetes.driver.annotation.prometheus.io/scrape true spark.kubernetes.driver.annotation.prometheus.io/path /metrics/executors/prometheus/ spark.kubernetes.driver.annotation.prometheus.io/port 4040
  • 29. Monitoring batch job memory behavior (2/2) spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster -c spark.driver.memory=2g -c spark.executor.instances=30 -c spark.ui.prometheus.enabled=true -c spark.kubernetes.driver.annotation.prometheus.io/scrape=true -c spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/executors/prometheus/ -c spark.kubernetes.driver.annotation.prometheus.io/port=4040 -c spark.kubernetes.container.image=spark:3.0.0 --class org.apache.spark.examples.SparkPi local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar 200000
  • 31. Set spark.dynamicAllocation.* Monitoring dynamic allocation behavior spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster -c spark.dynamicAllocation.enabled=true -c spark.dynamicAllocation.executorIdleTimeout=5 -c spark.dynamicAllocation.shuffleTracking.enabled=true -c spark.dynamicAllocation.maxExecutors=50 -c spark.ui.prometheus.enabled=true … (the same) … https://gist.githubusercontent.com/dongjoon-hyun/.../dynamic-pi.py 10000
  • 32. Set spark.dynamicAllocation.* Monitoring dynamic allocation behavior spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster -c spark.dynamicAllocation.enabled=true -c spark.dynamicAllocation.executorIdleTimeout=5 -c spark.dynamicAllocation.shuffleTracking.enabled=true -c spark.dynamicAllocation.maxExecutors=50 -c spark.ui.prometheus.enabled=true … (the same) … https://gist.githubusercontent.com/dongjoon-hyun/.../dynamic-pi.py 10000 `dynamic-pi.py` computes Pi, sleeps 1 minutes, and computes Pi again.
  • 33. Select a single Spark app rate(metrics_executor_totalTasks_total{...}[1m])
  • 34. Inform Prometheus both metrics endpoints Driver service annotation spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster -c spark.ui.prometheus.enabled=true -c spark.kubernetes.driver.annotation.prometheus.io/scrape=true -c spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/prometheus/ -c spark.kubernetes.driver.annotation.prometheus.io/port=4040 -c spark.kubernetes.driver.service.annotation.prometheus.io/scrape=true -c spark.kubernetes.driver.service.annotation.prometheus.io/path=/metrics/executors/prometheus/ -c spark.kubernetes.driver.service.annotation.prometheus.io/port=4040 …
  • 36. Set spark.sql.streaming.metricsEnabled=true (default:false) Monitoring streaming job behavior (1/2) Metrics • latency • inputRate-total • processingRate-total • states-rowsTotal • states-usedBytes • eventTime-watermark Prefix of streaming query metric names • metrics_[namespace]_spark_streaming_[queryName] •
  • 37. All metrics are important for alert Monitoring streaming job behavior (2/2) latency > micro-batch interval • Spark can endure some situations, but the job needs to be re-design to prevent future outage states-rowsTotal grows indefinitely • These jobs will die eventually due to OOM - SPARK-27340 Alias on TimeWindow expression cause watermark metadata lost (Fixed at 3.0) - SPARK-30553 Fix structured-streaming java example error
  • 38. Separation of concerns Prometheus Federation and Alert Prometheus Server Prometheus Web UI Alert Manager Pushgateway namespace1 (User) … Prometheus Server Prometheus Web UI Alert Manager Pushgateway namespace2 (User) Prometheus Server Prometheus Web UI Alert Manager Pushgateway Cluster-wise prometheus (Admin) Metrics for batch job monitoring Metrics for streaming job monitoring a subset of metrics (spark_info + ...)
  • 39. New endpoints are still experimental Limitations and Tips New endpoints expose only Spark metrics starting with `metrics_` or `spark_info` • `javaagent` method can expose more metrics like `jvm_info` PrometheusSevlet does not follow Prometheus naming convention • Instead, it's designed to follow Spark 2 naming convention for consistency in Spark The number of metrics grows if we don't set the followings writeStream.queryName("spark") spark.metrics.namespace=spark
  • 40. Summary Spark 3 provides a better integration with Prometheus monitoring • Especially, in K8s environment, the metric collections become much easier than Spark 2 New Prometheus style endpoints are independent and additional options • Users can migrate into new endpoints or use them with the existing methods in a mixed way