SlideShare a Scribd company logo
Fast and efficient operational time series storage:
The missing link in dynamic software analysis
Symposium on Software Performance
Munich, 05.11.2015
Florian Lautenschlager, Andreas Kumlehn, Josef Adersberger,
Michael Philippsen
Design for Diagnosability
This research was in part funded by
Bavarian Ministry of Economic Affairs
and Media, Energy and Technology.
What is operational data?
2
■ Typical operational data are runtime metrics,
e.g. CPU load, memory consumption, logs, exceptions, etc.
■ Operational data is best represented as time series.
■ Continuously harvested along a multitude of dimensions.
■ Expected wide range of the values along each of the dimensions.
■ Frequencies of time spans tend to vary a lot.
3
“…interactive response times often make a qualitative
difference in data exploration, monitoring, online customer
support, rapid prototyping, debugging of data pipelines,
and other tasks.” [ Dremel: Interactive Analysis of Web-Scale Datasets, Sergey Melnik et al. ]
A typical toolchain for dynamic software analysis: collection
framework, time series storage, time series analysis framework
4
WRITE READ
Metrics
Kieker
collectD
Logstash
EKG Collector EKG Client
Kibana
Twitter - R
ETSY
EGADS
Graphite InfluxDB
OpenTSDB Chronix
Direct
Research Question:
Is it possible to exploit the characteristic features of operational
data to create a time series database that requires less space
and provides faster queries?
5
Chronix
Fast queriesEfficient storage
Extendable with analysis functions
Store every kind of operational data as time series
Scalable and portable
Yes. Chronix’ architecture enables both efficient storage of time
series and millisecond range queries.
6
(1)
Semantic Compression
(2)
Attributes and Chunks
(3)
Basic Compression
(4)
Multi-Dimensional
Storage
Record
data:<chunk>
attributes
Record
data:compressed
<chunk>
attributes
Record Storage
1 Mio. Points
100 Chunks *
10.000 Points
The key data type of Chronix is called a record.
It stores a compressed chunk of the time series and its
attributes.
7
record{
data:compressed{<chunk>}
//technical fields
id: 3dce1de0−...−93fb2e806d19
version: 1501692859622883300
start: 1427457011238
end: 1427471159292
//optional attributes
host: prodI5
process: scheduler
group: jmx
metric: heapMemory.Usage.Used
max: 896.571
}
Data:compressed{<chunk of time series data>}
■ Time Series: time stamp, numeric value
■ Traces: calls, exceptions, …
■ Logs: access, method runtimes
■ Complex data: models, test coverage,
anything else…
Optional attributes
■ Arbitrary attributes for the time series
■ Attributes are indexed
■ Make the chunk searchable
■ Can contain pre-calculated values
Chronix also provides aggregations and higher-level time series
analyses in its query language that other TSDBs do not.
8
Aggregations (ag)
■ Maximum
■ Minimum
■ Average
■ Standard Deviation
■ Percentile
Analyses (detect)
■ A trend analysis based on a linear
regression model.
■ An outlier analysis using the IQR.
■ A frequency analysis validating the
occurrence within a defined time range.
q=host:* AND -group:(jmx OR .net) & fq={!ANALYZE detect=frequency=10:6}
q=host:prod? AND group:(jmx OR .net) & fq={!ANALYZE ag=dev}
Benchmarks represent typical use cases in time series analysis.
The queries are collected from real-world analyses.
9
■ We have collected, arranged, and counted queries of real analyses.
■ Three real-world project’s operational time series data (14,195 time series, 512 Mio. points).
■Project 1: Web application for searching car information (8 web server, 20 search server)
■Project 2: Retail application for orders, billing, and customer relations (2 servers, 1 central database)
■Project 3: Sales application of a car manufacturer (2 servers, 1 central database)
Time Range (Days) #Queries
1 30
7 30
14 10
91 2
We repeat the 72
queries 20 times to
stabilize results.
Chronix outperforms related TSDBs in write throughput, storage
efficiency, and access times.
10
Chronix outperforms related TSDBs in write throughput, storage
efficiency, and access times.
11
Chronix outperforms related TSDBs in write throughput, storage
efficiency, and access times.
12
Chronix is open-source. Check http://www.chronix.io/ or @ChronixDB
13
14
Chronix is currently more a proof-of-concept than production-
ready. Work is going on!
Contact: florian.lautenschlager@qaware.de

More Related Content

What's hot (19)

PDF
Time Series Processing with Solr and Spark
Josef Adersberger
 
PDF
OpenTSDB for monitoring @ Criteo
Nathaniel Braun
 
PDF
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
NoSQLmatters
 
PDF
openTSDB - Metrics for a distributed world
Oliver Hankeln
 
PDF
OpenTSDB: HBaseCon2017
HBaseCon
 
PPTX
Update on OpenTSDB and AsyncHBase
HBaseCon
 
PDF
OpenTSDB 2.0
HBaseCon
 
PDF
Gnocchi v3
Gordon Chung
 
PDF
JEE on DC/OS
Josef Adersberger
 
PDF
Gnocchi v4 - past and present
Gordon Chung
 
PDF
Accidental Data Analytics
APNIC
 
PDF
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
Srinath Perera
 
PDF
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Flink Forward
 
PPTX
The Power of Both Choices: Practical Load Balancing for Distributed Stream Pr...
Anis Nasir
 
PDF
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 
PDF
A Deeper Dive into EXPLAIN
EDB
 
PDF
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Christopher Bradford
 
PDF
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Austin Benson
 
PDF
InfluxDB & Grafana
Pedro Salgado
 
Time Series Processing with Solr and Spark
Josef Adersberger
 
OpenTSDB for monitoring @ Criteo
Nathaniel Braun
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
NoSQLmatters
 
openTSDB - Metrics for a distributed world
Oliver Hankeln
 
OpenTSDB: HBaseCon2017
HBaseCon
 
Update on OpenTSDB and AsyncHBase
HBaseCon
 
OpenTSDB 2.0
HBaseCon
 
Gnocchi v3
Gordon Chung
 
JEE on DC/OS
Josef Adersberger
 
Gnocchi v4 - past and present
Gordon Chung
 
Accidental Data Analytics
APNIC
 
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
Srinath Perera
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Flink Forward
 
The Power of Both Choices: Practical Load Balancing for Distributed Stream Pr...
Anis Nasir
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 
A Deeper Dive into EXPLAIN
EDB
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Christopher Bradford
 
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Austin Benson
 
InfluxDB & Grafana
Pedro Salgado
 

Viewers also liked (20)

PDF
Chronix: A fast and efficient time series storage based on Apache Solr
Florian Lautenschlager
 
PPTX
1r ESO - Biologia i Geologia - Tema 08 - Les funcions vitals en els animals
INS Escola Intermunicipal del Penedès
 
PDF
Double irish with Dutch sandwich arrangement
sammysammysammy
 
PPTX
Як зберегти мир?
Ірпінська Біблійна Церква
 
PPTX
Ideal Business Mindset Inc Business Presentation
Felix Albutra
 
PPTX
πειραματα χημειας στην στ1 του 2ου δημοτικου σχολειου
alexkonta
 
PPTX
La lógica 10
José Zorrilla
 
DOCX
Practica de los signos de puntuacion
Elvis Asencios
 
DOCX
Masusing Banghay Aralin sa Filipino (Detailed lesson plan in Filipino) (CDSGA...
tj iglesias
 
PPT
Glosararium card teks debat , aby dan nuryahya ,luky ch xotr1 vocsten malang
Nuril anwar
 
PPTX
Contabilidad
Henry Cobo Hdez
 
PPTX
Service marketing- customer relationship management
sksbatish
 
PDF
Chronix as Long-Term Storage for Prometheus
QAware GmbH
 
PPTX
Hunting for a diagnosis
Maduka Sanjeewa
 
PPTX
3Com 3C17711 - RF
savomir
 
PDF
Aliens Space Station Brochure - Zricks.com
Zricks.com
 
DOCX
Guia de base de datos
Gaby Escobar Carmona
 
DOCX
Prueba 1 quinto lenguaje rio guejar
Secretaría de Educación Pública
 
DOCX
Prueba 1 quinto matematicas rio guejar
Secretaría de Educación Pública
 
DOCX
Prueba 1 tercero lenguaje rio guejar
Secretaría de Educación Pública
 
Chronix: A fast and efficient time series storage based on Apache Solr
Florian Lautenschlager
 
1r ESO - Biologia i Geologia - Tema 08 - Les funcions vitals en els animals
INS Escola Intermunicipal del Penedès
 
Double irish with Dutch sandwich arrangement
sammysammysammy
 
Ideal Business Mindset Inc Business Presentation
Felix Albutra
 
πειραματα χημειας στην στ1 του 2ου δημοτικου σχολειου
alexkonta
 
La lógica 10
José Zorrilla
 
Practica de los signos de puntuacion
Elvis Asencios
 
Masusing Banghay Aralin sa Filipino (Detailed lesson plan in Filipino) (CDSGA...
tj iglesias
 
Glosararium card teks debat , aby dan nuryahya ,luky ch xotr1 vocsten malang
Nuril anwar
 
Contabilidad
Henry Cobo Hdez
 
Service marketing- customer relationship management
sksbatish
 
Chronix as Long-Term Storage for Prometheus
QAware GmbH
 
Hunting for a diagnosis
Maduka Sanjeewa
 
3Com 3C17711 - RF
savomir
 
Aliens Space Station Brochure - Zricks.com
Zricks.com
 
Guia de base de datos
Gaby Escobar Carmona
 
Prueba 1 quinto lenguaje rio guejar
Secretaría de Educación Pública
 
Prueba 1 quinto matematicas rio guejar
Secretaría de Educación Pública
 
Prueba 1 tercero lenguaje rio guejar
Secretaría de Educación Pública
 
Ad

Similar to Efficient and Fast Time Series Storage - The missing link in dynamic software analysis (20)

PDF
Chronix Time Series Database - The New Time Series Kid on the Block
QAware GmbH
 
PDF
Enhanced Data Visualization provided for 200,000 Machines with OpenTSDB and C...
YASH Technologies
 
PPTX
Redis TimeSeries
Redis Labs
 
PPTX
Need for Time series Database
Pramit Choudhary
 
PDF
Enhanced Data Visualization provided for 200,000 Machines with OpenTSDB and ...
YASH Technologies
 
PPTX
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
Brian Brazil
 
PDF
TechEvent Time Seriesd Databases
Trivadis
 
PDF
Survey real time databases
Manuel Santos
 
PDF
Scaling Pinterest's Monitoring
Brian Overstreet
 
PPTX
Mongo db 2.4 time series data - Brignoli
Codemotion
 
PDF
PERFORMANCE STUDY OF TIME SERIES DATABASES
IJDMS
 
PDF
Performance Comparison between Pytorch and Mindspore
IJDMS
 
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Altinity Ltd
 
PDF
OSMC 2013 | openTSDB - metrics for a distributed world
NETWAYS
 
PDF
Ugif 04 2011 france ug04042011-jroy_ts
UGIF
 
PPTX
Update on OpenTSDB and AsyncHBase
HBaseCon
 
PDF
How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...
DevOps.com
 
PDF
Time Series Processing with Apache Spark
QAware GmbH
 
PDF
201506 OSIsoft Garter Big Data.pdf
UnitedLiftTechnologi
 
PDF
Les objets connectés : de nombreux cas d'usage
Jedha Bootcamp
 
Chronix Time Series Database - The New Time Series Kid on the Block
QAware GmbH
 
Enhanced Data Visualization provided for 200,000 Machines with OpenTSDB and C...
YASH Technologies
 
Redis TimeSeries
Redis Labs
 
Need for Time series Database
Pramit Choudhary
 
Enhanced Data Visualization provided for 200,000 Machines with OpenTSDB and ...
YASH Technologies
 
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
Brian Brazil
 
TechEvent Time Seriesd Databases
Trivadis
 
Survey real time databases
Manuel Santos
 
Scaling Pinterest's Monitoring
Brian Overstreet
 
Mongo db 2.4 time series data - Brignoli
Codemotion
 
PERFORMANCE STUDY OF TIME SERIES DATABASES
IJDMS
 
Performance Comparison between Pytorch and Mindspore
IJDMS
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Altinity Ltd
 
OSMC 2013 | openTSDB - metrics for a distributed world
NETWAYS
 
Ugif 04 2011 france ug04042011-jroy_ts
UGIF
 
Update on OpenTSDB and AsyncHBase
HBaseCon
 
How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...
DevOps.com
 
Time Series Processing with Apache Spark
QAware GmbH
 
201506 OSIsoft Garter Big Data.pdf
UnitedLiftTechnologi
 
Les objets connectés : de nombreux cas d'usage
Jedha Bootcamp
 
Ad

Recently uploaded (20)

PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 

Efficient and Fast Time Series Storage - The missing link in dynamic software analysis

  • 1. Fast and efficient operational time series storage: The missing link in dynamic software analysis Symposium on Software Performance Munich, 05.11.2015 Florian Lautenschlager, Andreas Kumlehn, Josef Adersberger, Michael Philippsen Design for Diagnosability This research was in part funded by Bavarian Ministry of Economic Affairs and Media, Energy and Technology.
  • 2. What is operational data? 2 ■ Typical operational data are runtime metrics, e.g. CPU load, memory consumption, logs, exceptions, etc. ■ Operational data is best represented as time series. ■ Continuously harvested along a multitude of dimensions. ■ Expected wide range of the values along each of the dimensions. ■ Frequencies of time spans tend to vary a lot.
  • 3. 3 “…interactive response times often make a qualitative difference in data exploration, monitoring, online customer support, rapid prototyping, debugging of data pipelines, and other tasks.” [ Dremel: Interactive Analysis of Web-Scale Datasets, Sergey Melnik et al. ]
  • 4. A typical toolchain for dynamic software analysis: collection framework, time series storage, time series analysis framework 4 WRITE READ Metrics Kieker collectD Logstash EKG Collector EKG Client Kibana Twitter - R ETSY EGADS Graphite InfluxDB OpenTSDB Chronix Direct
  • 5. Research Question: Is it possible to exploit the characteristic features of operational data to create a time series database that requires less space and provides faster queries? 5 Chronix Fast queriesEfficient storage Extendable with analysis functions Store every kind of operational data as time series Scalable and portable
  • 6. Yes. Chronix’ architecture enables both efficient storage of time series and millisecond range queries. 6 (1) Semantic Compression (2) Attributes and Chunks (3) Basic Compression (4) Multi-Dimensional Storage Record data:<chunk> attributes Record data:compressed <chunk> attributes Record Storage 1 Mio. Points 100 Chunks * 10.000 Points
  • 7. The key data type of Chronix is called a record. It stores a compressed chunk of the time series and its attributes. 7 record{ data:compressed{<chunk>} //technical fields id: 3dce1de0−...−93fb2e806d19 version: 1501692859622883300 start: 1427457011238 end: 1427471159292 //optional attributes host: prodI5 process: scheduler group: jmx metric: heapMemory.Usage.Used max: 896.571 } Data:compressed{<chunk of time series data>} ■ Time Series: time stamp, numeric value ■ Traces: calls, exceptions, … ■ Logs: access, method runtimes ■ Complex data: models, test coverage, anything else… Optional attributes ■ Arbitrary attributes for the time series ■ Attributes are indexed ■ Make the chunk searchable ■ Can contain pre-calculated values
  • 8. Chronix also provides aggregations and higher-level time series analyses in its query language that other TSDBs do not. 8 Aggregations (ag) ■ Maximum ■ Minimum ■ Average ■ Standard Deviation ■ Percentile Analyses (detect) ■ A trend analysis based on a linear regression model. ■ An outlier analysis using the IQR. ■ A frequency analysis validating the occurrence within a defined time range. q=host:* AND -group:(jmx OR .net) & fq={!ANALYZE detect=frequency=10:6} q=host:prod? AND group:(jmx OR .net) & fq={!ANALYZE ag=dev}
  • 9. Benchmarks represent typical use cases in time series analysis. The queries are collected from real-world analyses. 9 ■ We have collected, arranged, and counted queries of real analyses. ■ Three real-world project’s operational time series data (14,195 time series, 512 Mio. points). ■Project 1: Web application for searching car information (8 web server, 20 search server) ■Project 2: Retail application for orders, billing, and customer relations (2 servers, 1 central database) ■Project 3: Sales application of a car manufacturer (2 servers, 1 central database) Time Range (Days) #Queries 1 30 7 30 14 10 91 2 We repeat the 72 queries 20 times to stabilize results.
  • 10. Chronix outperforms related TSDBs in write throughput, storage efficiency, and access times. 10
  • 11. Chronix outperforms related TSDBs in write throughput, storage efficiency, and access times. 11
  • 12. Chronix outperforms related TSDBs in write throughput, storage efficiency, and access times. 12
  • 13. Chronix is open-source. Check http://www.chronix.io/ or @ChronixDB 13
  • 14. 14 Chronix is currently more a proof-of-concept than production- ready. Work is going on! Contact: [email protected]