Spark Monitoring With Graphite and Grafana Guide
Spark Monitoring With Graphite and Grafana Guide
Spark is distributed with the Metrics Java library which can greatly enhance abilities to diagnose issues
with Spark jobs. This document describes how to configure Metrics to report to a Graphite backend and
view the results with Grafana.
Spark MetricsSystem
A MetricsSystem instance lives on every driver and executor and optionally exposes metrics to a variety
of Sinks while applications are running.
In this way, MetricsSystem offers the way to monitor Spark applications using a variety of third-party
tools.
Graphite
In particular, MetricsSystem includes bindings to ship metrics to Graphite, a popular open-source tool
for collecting and serving time series data.
--files=/path/to/metrics.properties \
--conf spark.metrics.conf=metrics.properties
Grafana
Having thus configured Spark (and installed Graphite), we surveyed the many Graphite-visualization
tools that exist and began building custom Spark-monitoring dashboards using Grafana. Grafana is “an
open source, feature rich metrics dashboard and graph editor for Graphite, InfluxDB & OpenTSDB,” and
includes some powerful features for scripting the creation of dynamic dashboards, allowing us to
experiment with many ways of visualizing the performance of our Spark applications in real-time.
Architecture :
Installation
There are several pieces that need to be installed and made to talk to each other here:
1. Install Graphite.
2. Configure Spark to send metrics to your Graphite.
3. Install Grafana with your Graphite as a data source.
4. Create data source in Grafana pointing to the Graphite Host and port.
5. Build the dashboard with required queries.
Install Graphite
We will use the containerized version of Graphite. Follow the installation procedure in REHL version of
linux,
A) Install docker:
B) Start Docker.
Please also note that you can freely remap container port to any host port in case of corresponding port
is already occupied on host. It's also not mandatory to map all ports, map only required ports - please
see table below.
Mapped Ports
80 80 nginx
By default, statsd listens on the UDP port 8125. If you want it to listen on the TCP port 8125 instead, you
can set the environment variable STATSD_INTERFACE to tcp when running the container.
Once Graphite image started running, you can access the same by pointing your bowser to the
host name of the node you installed docker.
Example :
Configure Spark to Send Metrics to Graphite.
Create metrics.properties file with the following content, set the host and port of Graphite
accordingly, This file must be passed with Spark-Submit command .
*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.host=10.253.16.69
*.sink.graphite.port=2003
*.source.jvm.class=org.apache.spark.metrics.source.JvmSource
*.sink.graphite.period=5
*.sink.graphite.unit=seconds
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
Example :
spark2-submit --files=/home/vishal/metrics.properties --conf spark.metrics.conf=./metrics.properties
--master yarn /home/vishal/Spark-Kafla/SparkWikiPedia/DataAnalysis/target/DataAnalysis-2.0.0-jar-
with-dependencies.jar --executor-memory 3G --num-executors 1
Install and Configure Grafana
The Grafana docs are pretty good, but a little lacking the "quick start" department. The
basic steps you need to follow are:
$ mkdir grafana_intallation_file
$ sudo vi /etc/grafana/grafana.ini
This will start the grafana-server process as the grafana user, which was created
during the package installation. The default HTTP port is 3000 and default user and
group is admin.
Once in the Grafana UI, login as the Grafana Admin user (admin/admin).
Now that we are logged in to Grafana, we will add our new Graphite data source before we can build a
dashboard for our new spark metrics. Select to Data Sources on the left-hand side and select “Add new”
in the top of the screen. Here I am adding my “Graphite-Spark” source with Type Graphite. I did not add
any authentication to Graphite, so my Http Auth section is left empty. Access is set to direct. After you
click Save, the option “Test Connection” will be available.
Example JVM graph: