Kibana, Grafana and Zeppelin On Monitoring Data
Kibana, Grafana and Zeppelin On Monitoring Data
and Zeppelin
on Monitoring data
August 2016
Author:
Ildar Nurgaliev
Supervisors:
Edward Karavakis
Alberto Aimar
Project Specification
The goal of this project is to investigate different solutions and develop some typical
monitoring displays, such as the general IT Department overviews and service-specific
dashboards of data provided by the IT groups and stored in the IT monitoring repository
(ElasticSearch). As a part of the project, the current IT host monitoring dashboard and
the Data Centre Overview where migrated and ported from Kibana 3 to Grafana and
Kibana 4 in order to study features, performance and improvements of these displays.
The second part of the project is related to the report visualization of the ATLAS scrutiny
report that was implemented in Apache Zeppelin. The last part of the project is related to
the porting of the Apache PIG scripts to the Apache Spark computational framework in
order to accelerate the generation of the aggregated summaries and to have direct
Hadoop access that would be beneficial to any future project development in the report
automation.
1 | Page
CERN openlab Summer Student Report By 2016
Abstract
The CERN/IT monitoring team handles every day millions of monitoring events from the
CERN data centers and the WLCG sites. Today this data is provided to users through a
number of different tools and dashboards. This project aims at exploring new
visualization techniques to provide enhanced dashboards. Using the data being stored in
ElasticSearch and HDFS this project relies on tools like Kibana, Grafana, and Zeppelin
to develop new dashboards relevant for CERN and the LHC Experiments.
2 | Page
CERN openlab Summer Student Report By 2016
Table of Contents
1 Introduction 4
2 Relevant technologies 5
2.1 Elasticsearch 5
2.2 Kibana 4 5
2.3 Grafana 7
2.5 Zeppelin 11
2.6 Spark 11
9 Conclusion 19
References 20
3 | Page
CERN openlab Summer Student Report By 2016
1 Introduction
CERN runs today two large data centers, one in Geneva and a second in Budapest,
providing over 120,000 cores, 180 PB of disk space, 100 PB of tape space, and 150
high performance tape drives to serve CERN's computing and storage needs. This
capacity is extended by the Worldwide LHC Computing Grid (WLCG), providing
resources from different organizations around the globe, as CERN provides around 20%
of the total WLCG resources. WLCG is a global collaboration of more than 170
computing centers in 42 countries, linking up national and international grid and cloud
infrastructures. At any given time, there are typically more than 300,000 concurrent jobs
running and, on a daily basis, more than two million jobs are submitted and terabytes of
data are transferred between sites. The Monitoring team of the IT Compute and
Monitoring (CM) group is responsible for providing monitoring solutions for these
resources by handling millions of monitoring events from the CERN data centers and the
WLCG sites on a daily basis.
Therefore, providing a reliable monitoring service is extremely important for the constant
improvement and efficient use of the CERN data centers and the WLCG sites. The
CERN/IT team has setup an infrastructure specifically designed to collect and display
information about the nodes running in the data centers [1] and about the WLCG data
transfers, computational jobs and the state of sites and services [2]. The idea behind is
to facilitate the observation of the current and historical status and to give the opportunity
to understand the state of the system from a glance view and at the same time allow to
dig into detailed views in order to investigate problems.
The first part of the report consists of an overview of the technologies that were used to
store the data and to build a comprehensive dashboard. The second part of the report
shows the migrated dashboard visualizations of the Data Center Overview, the FTS (File
Transfer Service) and the HOST metrics and with the issues encountered.
The last part of the project is related to the report visualization of the ATLAS scrutiny
report plots that was implemented in Apache Zeppelin and to the porting of the Apache
PIG scripts to the Apache Spark computational framework in order to accelerate the
4 | Page
CERN openlab Summer Student Report By 2016
generation of the aggregated summaries and to have direct Hadoop access that would
be beneficial to any future project development in the report automation.
2 Relevant technologies
2.1 Elasticsearch
In order to display information about the state of the batch service, we use Elasticsearch,
which is a distributed, highly available and scalable search server based on Lucene. It
provides real-time search and analytic capabilities [3].
Due to this, ElasticSearch is an ideal candidate when you need to deal with huge
amounts of time-series data and you can adapt your project to work with this kind of
NoSQL document database. Particularly, they are focusing their attention in real
time data, multitenancy and text search, bundled with the advantages of NoSQL
document databases such as schema less, distributed computing and storage, high
availability and fault tolerant.
2.2 Kibana 4
Kibana is an open source analytics and visualization platform designed to work with
Elasticsearch [5]. Kibana is used to search, view, and interact with the data stored in the
Elasticsearch indices. A user can easily perform advanced data analysis and visualize
data in a variety of charts, tables, and maps. Kibana is used to quickly and easily
visualize large volumes of data and its browser-based interface enables to quickly create
and share dynamic dashboards that display changes to Elasticsearch queries in real
time. No code and no additional infrastructure required.
Kibana enables near real-time analysis and visualization of streaming data. Allows
interactive data exploration and supports cross-filtering. Having multiple chart types as
5 | Page
CERN openlab Summer Student Report By 2016
bar chart, line and scatter plots, histograms, pie charts, maps. It is open-source and has
a number of plug-in extensions that can further enhance its functionality.
Pros:
1. Kibana (ELK) is a popular stack for log storage and visualization with a big user
community.
2. Rich dynamic visualization dashboards.
3. The visualizations, searches and dashboards are stored as objects, meaning that
an object could be shared in multiple dashboards. That saves quite a lot of time
as the user doesn’t have to repeat common searches, visualizations and
dashboards, the common ones could be reused.
4. A Kibana dashboard displays a set of saved visualizations in groups that the user
can arrange freely. You can save a dashboard to share or reload at a later time.
5. The search field on the Discover page provides a way to query a specific subset
of transactions from the selected time frame. The query syntax is based on the
Lucene query syntax. It allows boolean operators, wildcards, and field filtering.
Moreover after search it is possible to see the document structure and even filter
on some fields.
6. Kibana 4 allows users to create scripts and plant them within the Elasticsearch
server. Once the scripts are planted, their returned results can be viewed in real
time as if they were data arriving directly from the logs.
7. Good documentation and multiple examples are available online.
Cons:
1. Area plot looks a little bit not-refined in most of the cases. As a result, it is better
to change it to a bar plot.
2. There is no way to migrate dashboards from Kibana3 to Kibana 4, thus you need
rebuild them from scratch.
3. No Customizable axis labels, i.e. instead of 5000000, a user cannot show 5mi or
put measurements in Y-axis (s, ms, terabytes) as it was possible in Kibana 3.
- Issue tracking https://github.com/elastic/kibana/issues/2386
4. No option to hide a column in a table view.
- Issue tracking https://github.com/elastic/kibana/issues/4307
Visualizations support:
Area chart Used to visualize the total contribution of several different series
6 | Page
CERN openlab Summer Student Report By 2016
Combined It is not possible to create plots from two different sources in the
plots (-) same view. Also there is no support for combined types in a plot, i.e.
a bar plot with the inclusion of lines as well
Plotting Scripted fields option and by using the Timelion additional plugin
derived fields
(+)
Exporting plots You can use the Share button on Kibana4 to get an iframe link you
and sharing can embed, but there aren't any other options at this time. The data
dashboards (+- of a visualization could be directly downloaded in the CSV and JSON
) format
2.3 Grafana
Grafana’s support for the Elasticsearch data source is relatively new, so there are some
rough edges and not all functions that Elasticsearch provides are currently available in
the editor, for example some aggregations. Grafana offers quite a lot of refined looking
charts [6].
7 | Page
CERN openlab Summer Student Report By 2016
2. Rich dynamic visualization dashboards, with additional features with the
templated variables that could be used for data sources, variables as a search
parameter, or time widths.
3. Inbuilt menu that could construct navigation to dashboards by tag search.
4. Lots of graph styling options.
5. Direct link to render images for every visualization in a dashboard, so it is very
easy to share a report.
6. Customizable Axes that are very useful and it frees from additional work for data
transformation for example from bytes to megabytes, we just mention that for the
Y-axis we want a representation of the data size in bytes and it automatically
transforms it in the right visualization.
7. A dashboard displays a set of saved visualizations in groups that you can
arrange freely. You can save a dashboard to share or reload at a later time.
8. Good documentation and examples online.
Cons:
1. There is no appropriate heat map for plotting, no double matrix table, just
aggregator by time.
2. If you add annotation events to a dashboard, the dashboard starts sometimes to
slow down.
3. Has limited grid template and plot alignments unlike Kibana 4.
4. No way for nominal data to be on the X-axis of a bar chart as Grafana is only for
time series. A Pie chart plugin extension could be used instead to visualize the
data in a pie chart.
5. Heat maps are not supported in a way of nominal matrix.
Visualizations support:
Area chart Visualize the total contribution of several different series
Geo map Used to associate the results of an aggregation with geographic points
Heat map EpochHeat Map Plugin for Grafana, no way to make nominal general
8 | Page
CERN openlab Summer Student Report By 2016
Combined plots It is possible to create plots from two different sources in the same
(+) view. Also there is support for combined types in a plot (as bar plot
with inclusion of lines as well)
In built menu Grafana offers an inbuilt menu visualization by using the ‘tag search’
(+) function
Feature
9 | Page
CERN openlab Summer Student Report By 2016
Export & Visualization -> CSV, PNG, Visualization -> CSV, JSON
Share Render image url
Combined Yes No
plots very flexible visualization
Templated Yes No
variable for search fields, time window
Templated Yes No
Repeated Auto Generate visualizations
visualizations from limited set of values for a
var
10 | Page
CERN openlab Summer Student Report By 2016
Templated Yes No
data source Possible to change data
source
on fly
Menu YES No
Generate menu from tag Markdown visualization as solution
search
2.5 Zeppelin
Apache Zeppelin is a web-based notebook that enables interactive data analytics with
powerful dynamic visualizations and it supports several technologies out of the box
(Python, SQL, Spark, Hive, Markdown, Shell, etc.) with the concept of an interpreter that
allows any language, data-processing-backend to be plugged into Zeppelin [7]. A
Zeppelin Notebook URL can be shared among people. In a shared Zeppelin notebook
the changes are seen by all the users in real-time, just like in Google docs. For public
publishing, Zeppelin provides an option to display the result only, without the inclusion of
Zeppelin's menu and buttons. This way, it can be easily embedded anywhere as an
iframe.
2.6 Spark
Apache Spark is an open-source framework for the distributed processing of
unstructured and semi-structured data, and it is part of the Hadoop ecosystem projects
[8]. Unlike the classic processing from the Hadoop core that implements a two-level
concept of MapReduce with disk storage, Spark uses specialized primitives to
recursively process data in memory, allowing to obtain a significant gain in the speed
work for some classes of problems. In particular, the possibility to access data loaded in
the memory multiple times is very useful for data analysis related tasks.
The project provides APIs for the Java language, Scala, Python and R. Spark consists of
the Spark Core and a number of extensions, such as Spark SQL (allows you to execute
SQL-queries on the data), Spark MLlib (set of libraries machine learning) and etc. It can
work both in the Hadoop cluster environment running YARN, and without the core
components of Hadoop. Spark supports multiple distributed storage systems - HDFS,
OpenStack Swift, NoSQL-databases such as Cassandra, Amazon S3.
11 | Page
CERN openlab Summer Student Report By 2016
12 | Page
CERN openlab Summer Student Report By 2016
13 | Page
CERN openlab Summer Student Report By 2016
4 Porting FTS Monitoring to Grafana
File Transfer Monitoring dashboards were fully reproduced in Grafana. The
implementation in Grafana, there is more space especially for the plots. The main menu,
context menu, filter by VO (Virtual Organizations) and filter by host were moved to the
upper part. As shown in the dashboard below, the filtering is represented by template
variables on the left up corner, moreover there is an additional time template that allows
us to dynamically change the time window from 10 minutes to 30 minutes and to 1 hour.
The menu is auto generated from the tag search: the bunch of dashboards have the
same tag, thus searching by tag gives us the current menu.
14 | Page
CERN openlab Summer Student Report By 2016
○ Ranking by Country
○ Ranking by Site
○ Ranking by Tier
● Correlated Plots
● Queue Monitoring Plots
15 | Page
CERN openlab Summer Student Report By 2016
For this purpose we decided to use SSO in front of Grafana by using Shibboleth, an
open-source project that provides Single Sign-On capabilities allowing sites to make
16 | Page
CERN openlab Summer Student Report By 2016
informed authorization decisions for individual access of protected online resources in a
privacy-preserving manner.
Grafana supports Access Control Lists (ACLs) and the definition of ACL rules in the
dashboards, i.e. this user can only read this dashboard, this user can read and write and
etc. This task consists of understanding how to ACLs work in Grafana, checking
Grafana's support for 'single sign on' (sso) and LDAP, and to define some ACL rules in a
couple of accounts.
By incorporating Shibboleth/SSO in Grafana, new users with a valid CERN account are
automatically mapped to a Grafana user and thus, eliminating the need to create a new
Grafana account. The default assignment group is MONIT with Viewer permissions. As a
future work we could integrate Puppet to automatically deploy and configure a Grafana
instance and to have an automated way to copy Grafana data sources and dashboards.
For this project, we studied the existing reports based on datasets popularity and unused
data in ATLAS. This reports were done with Cron Jobs running some PIG scripts that
aggregate the ATLAS datasets access events from Hadoop Summaries and were
storing them on a web server as CSV files. Afterwards, the CSV files were manually
downloaded and imported in an Excel spreadsheet. For this part, we automated the
manual intervention so that the user can select the data and get the required plots that
are used in the report directly within Zeppelin. The figure below shows the number of
17 | Page
CERN openlab Summer Student Report By 2016
accesses in a given time period. We are interested in particular in first two bins that
contain data with zero access over a given time period.
Figure 8. A user selects a date and the automated report-file search and load
occurs automatically.
We have also automated the data popularity reports by TOP-N unused data within every
month, the number of which is also could be dynamical chosen. The current excel report
shows just TOP-20 within last 9 months. Our dynamic implementation allows the user to
18 | Page
CERN openlab Summer Student Report By 2016
choose the date, get the report and within the report choose the TOP-N, last N months,
Represent in Petabytes or Terabytes, select TOP-N from last M months showing in last
K months where K <= M.
Figure 9. Scrutiny report by date: select TOP-20 in Petabytes slice of the last 9
months, with list of dataset names and actual occupied volume of unassessed data.
9 Conclusion
The monitoring team handles every day millions of monitoring events from the CERN
data centers and the WLCG sites. Today this data is provided to users through a number
of different tools and dashboards. The aim of this contribution was to explore existing
Open Source visualization techniques in order to provide enhanced dashboards. Two of
19 | Page
CERN openlab Summer Student Report By 2016
the most comprehensive and complete dashboard solutions are Kibana and Grafana.
We have performed the comparison between them and have also ported some
production dashboards from Kibana 3 to Kibana 4 and to Grafana in order to study
features, performance and improvements of these displays.
The second contribution of the project is related to the report visualization of the ATLAS
scrutiny report that required manual intervention to produce the plots. After this
contribution that was implemented in Apache Zeppelin, everything is fully automated
thus, eliminating the need for manual intervention.
The last contribution of the project is related to the porting of the Apache PIG scripts to
the Apache Spark computational framework in order to accelerate the generation of the
aggregated summaries and to have direct Hadoop access that would be beneficial to
any future project development in the report automation.
While contributing to the above projects within the OpenLab summer internship, I have
studied many technologies such as Elasticsearch, Kibana 3 & 4, Grafana, Apache
Zeppelin notebook, Apache Hadoop, Apache PIG and the Spark framework. It will have
a very big positive impact in my future data science career. Thank you CERN!
References
[1] P Andrade et al, “Agile Infrastructure Monitoring”, Phys.: Conf. Ser. 513, 062001
doi:10.1088/1742-6596/513/6/062001/
[2] J Andreeva et al, “Experiment Dashboard-a generic, scalable solution for monitoring
of the LHC computing activities, distributed sites and services”, 2012 J. Phys.: Conf. Ser.
396 032093 doi:10.1088/1742-6596/396/3/032093
[3] "Elastic · Revealing Insights from Data (Formerly Elasticsearch) | Elastic." 2011. 15
Aug. 2016 <http://www.elastic.co/>
[4] J Andreeva et al, “Processing of the WLCG monitoring data using NoSQL”, 2014 J.
Phys.: Conf. Ser. 513 032048 doi:10.1088/1742-6596/513/3/032048
[5] "Kibana 4. Literally. | Elastic." 2015. 15 Aug. 2016
<http://www.elastic.co/blog/kibana-4-literally>
[6] "Grafana - Beautiful Metrics, Analytics, dashboards and monitoring!." 2014. 15 Aug.
2016 <http://grafana.org/>
[7] "Zeppelin." 2015. 15 Aug. 2016 <https://zeppelin.apache.org/>
[8] "Apache Spark™ - Lightning-Fast Cluster Computing." 2014. 15 Aug. 2016
<http://spark.apache.org/>
[9] UK, C Allton et al. "Computing Resources Scrutiny Group." (2016).
20 | Page