FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index

@chronosphereio
Querying millions to billions of
metrics with M3DB’s index
FOSDEM 2020

@chronosphereio
@roskilli
Previously M3 tech lead at Uber, creator of M3DB.
CTO at Chronosphere.
Member of OpenMetrics.

@chronosphereio
Schema for data you would like to collect and aggregate
Name
● http_requests
Dimensions/Labels
● endpoint (e.g. /api/search)
● status_code (e.g. 500)
● deploy_version_git_sha (e.g. 25149a04c)
Monitoring: what is a metric?

@chronosphereio
1. Increasing number of regions, containers, k8s
pods, tracking deployed version - (cardinality!)
2. Metrics can have arbitrary number of dimensions
3. Building compound index is expensive
Problem

@chronosphereio
1. We have monitoring,
it’s awesome and
developers are happy
with standardized
metrics mostly.
Adding more metrics at organizations
2. Developers put
custom metrics on
everything and I am
deploying tons of
applications in
something like
Kubernetes, things are
ok!
3. Things are on way
too on ﬁre, we can’t
manage this many
things anymore, can
everyone just stop
please.
??
???

@chronosphereio
Timeseries
Timeseries from
lots of hosts and
container pods
ID Timeseries
1 __name__=cpu_seconds_total, pod=foo-123abc
8 __name__=memory_memfree, pod=foo-123abc
33 __name__=cpu_seconds_total, pod=foo-456def
44 __name__=memory_memfree, pod=foo-456def
45 __name__=cpu_seconds_total, pod=bar-768ghe
58 __name__=memory_memfree, pod=bar-768ghe
… millions .. and if you are unfortunate... billions

@chronosphereio
Aggregate metric cpu_seconds_total
Timeseries from
lots of hosts and
container pods
ID Timeseries

@chronosphereio
cpu_seconds_total and pod=foo-(.+)
Timeseries from
lots of hosts and
container pods
ID Timeseries

@chronosphereio
Need high flexibility and speed
1. Any arbitrary set of dimensions/labels can be
specified for filtering
2. Ideally speed is sub-linear

@chronosphereio
Timeseries column lookup
1. Secondary lookup using preﬁx ordered table
2. Secondary inverted index
Labels Timeseries ID
(fingerprint)
__name__=cpu, pod=foo-123abc 1 ID Column key Col value
1 __name__=cpu, pod=foo-123abc {t=...,v=...} ➡
2 __name__=cpu, pod=foo-456def {t=...,v=...} ➡
3 __name__=cpu, pod=bar-123abc {t=...,v=...} ➡
Label Label value Timeseries IDs
__name__ cpu 1, 2, 3
pod foo-123abc 1
foo-456def 2
bar-123abc 3

@chronosphereio
Ways to keep timeseries index/data
1. Index and data live separately
Lookup and returning timeseries data across processes,
typically making network request between the two
operations.
2. Index and data live together
Lookup next to timeseries data, send data back directly
once matches index query.

@chronosphereio
v1
M3 storage evolution (pre-open release, 2015)
Cassandra
Elastic
Search
Already
Indexed
Cache
Heavy
read
cache
Query
Query
Query
QueryRecently
read
cache
1. Fetch index (ES)
2. Fetch data (C*)
>100 servers
>1,000 servers

@chronosphereio
v1
Cassandra
Elastic
Search
Already
Indexed
Cache
Heavy
read
cache
Query
Query
Query
QueryRecently
read
cache
>100 servers
>1,000 servers

@chronosphereio
v2
M3DB
(data on disk
with LRU
caches)
Elastic
Search
Already
Indexed
Cache
Heavy
read
cache
Query
Query
Query
Query
With M3DB 7x less servers from
Cassandra, while increasing RF=2
to RF=3

@chronosphereio
v2
M3DB
(data on disk
with LRU
caches)
Elastic
Search
Already
Indexed
Cache
Heavy
read
cache
Query
Query
Query
Query

@chronosphereio
v4
All read/write caches for data/index now in M3DB nodes
M3 storage evolution (open release, 2018)
M3DB
(data and
index on disk
with LRU
caches)
Query
Query
Query
Query

@chronosphereio
Inverted index w/ Prometheus
Timeseries IDs 1, 33, 45
Timeseries IDs 8, 44, 58
Timeseries IDs 1, 8
Timeseries IDs 33, 44
Timeseries IDs 45, 58
__name__
cpu_seconds
mem_free
pod
foo-123abc
foo-456def
bar-123abc
https://github.com/prometheus/prometheus/blob/master
/tsdb/docs/format/index.md

@chronosphereio
https://github.com/prometheus/prometheus/blob/master
/tsdb/docs/format/index.md
TS IDs 1, 33, 45
TS IDs 8, 44, 58
TS IDs 1, 8
TS IDs 33, 44
TS IDs 45, 58
__name__
cpu_seconds
mem_free
pod
foo-123abc
foo-456def
bar-123abc
ID Timeseries
1 __name__=cpu_seconds, pod=foo-123abc
8 __name__=mem_free, pod=foo-123abc
33 __name__=cpu_seconds, pod=foo-456abc
44 __name__=mem_free, pod=foo-456abc
45 __name__=cpu_seconds, pod=bar-123abc
58 __name__=mem_free, pod=bar-123abc

@chronosphereio
Labels (name and distinct values entries)

@chronosphereio
Postings/Timeseries IDs

@chronosphereio
Matching label values
https://github.com/prometheus/prometheus/blob/38d32e06862f6b72700f67043ce574508b5697f0/tsdb/querier.go#L417-L451
vals, err := ix.LabelValues(m.Name)
...
var res []string
for _, val := range vals {
if m.Matches(val) {
res = append(res, val)
}
}
...
return ix.Postings(m.Name, res...) // Merges postings/timeseries IDs together

@chronosphereio
Inverted index w/ M3
1. Inverted index more similar to ElasticSearch & Apache Lucene.
2. Instead of storing distinct label values with associated
postings, instead stores distinct label values in FST (Finite
State Transducer).
3. Instead of storing postings/timeseries IDs as integer sets
(one after another), instead stores using Roaring Bitmaps
(compressed bitmaps) for fast intersection (across thousands
of sets).

@chronosphereio
What is an FST?
Like a compressed trie.
Good overview and some examples at
https://blog.burntsushi.net/transducers/
Searching data set of wikipedia titles is more than 10x
faster than grep.
This matters when you have billions of metrics, i.e. Uber
with 11 billion metrics.

@chronosphereio
https://github.com/chronosphereiox/high_cardinality_microbenchmark
Disclaimer: This is only testing one part of much bigger systems, mainly
to support architectural choices not for real world performance.
Demo

@chronosphereio
Thank you to M3 contributors:
…@chronosphere.io, …@uber.com, …@aiven.io, …@cloudera.com,
…@linkedin.com and many other great individuals!
Learn more (release 0.15.0 coming soon):
● Slack https://bit.ly/m3slack
● Mailing list https://groups.google.com/forum/#!forum/m3db
● GitHub https://github.com/m3db/m3
● Documentation https://m3db.io
● Chronosphere contact@chronosphere.io
Thank you, questions? Come say hi

FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index

Recommended

More Related Content

What's hot (19)

Similar to FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index (20)

Recently uploaded (20)

FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index