SlideShare a Scribd company logo
Never Fail Twice
How Playtech Mastered Failure Detection Across Distributed Systems
Bio
 Technical Architect with more than 18 y. of experience
 Passionate about IT
 Financial and Data Science background
 Last years in Research and Design projects
Agenda
• What is observability and monitoring
• Why this is hard
• Possible approaches
• How we solved it
• Future of the instrumentation and observability
Objectives
 Get in touch with time-series analysis
 Understanding Distributed Systems pro’s and con’s
 Understanding observability and instrumentation concepts
Observability
 Monitoring is for operating software/systems
 Instrumentation is for writing software
 Observability is for understanding systems
Charity Majors
Why is it difficult
 1. Various problems may lead to non-obvious system behaviour.
 2. Various metrics may have different correlations in time and space.
 3. Monitoring a complex application is a significant engineering endeavor in and of itself.
 4. There is a mix of different measurements and metrics.
System monitoring
in Playtech
 50+ multibranded sites, distributed all over
the world
 Multiple products
 Multichannel
 Different mix of integrations
On the shoulders of giants
A lot of companies
built their own
solutions for
monitoring their
systems.
There was not
always success
stories.
Etsy
 Etsy is a large online
marketplace of handmade
goods
 Their engineering team
collected more than 250,000
different metrics from their
servers
 They tried to find anomalies
using complex math
approaches.
Lessons
learnt from
KALE 1.0
Anomalies in other metrics should be used for root cause
analysis.
Alerts should only be sent out when anomalies are detected in
business and user metrics
A one-size-fits-all type of approach will probably not fit
at all
Anomaly detection is more than just outlier detection
Google SRE team’s BorgMon
 Google has trended toward simpler and faster monitoring
systems, with better tools for post hoc analysis
 [They] avoid “magic” systems that try to learn thresholds or
automatically detect causality
 Rules that generate alerts for humans should be simple to
understand and represent a clear failure
According to the authors of Site Reliability Engineering
Playtech
case
Past tool from HP is “one-fits-for-
all”
Low efficiency and side effects
False Positives and missed incidents
Horrible operability
Time Series
 A time series is a series of data points indexed (or listed or graphed) in time order
 Economical processes have a regular structure
 These are amount of sales in the shops, production of champagne, online transactions
 Usually they have seasonal periods and trend lines
 Using this information, simplifies analysis
Stationary Time-Series Data
 Is a stochastic process, which characteristics does not change
 White noise
Non Stationary Time Series
 Trend line
 Dispersion change
How to model that?
 Every measurement consists of a signal and an error
component/noise, because our processes are affected by many
factors
 Point_of_measurement = signal + error
 Subtract the model’s values from our measurements
 The more our model resembles the real signal, the more our
residue will approximate the error component or stationarity or
white noise
Example
Cut 30 min data piece
Regression or finding a trend line
Trend line subtracted
Looks like white noise
Dickey-Fuller test of an initial piece of data
Stationary hypothesis rejected
And after subtraction
Result is a stationary time series
Let’s take a moving average from our example
A bit of Salvador Dali
Compared with a next week data
Why Time Series DB matters
Optimized for handling time series data
No Updates. Facts do not change ever
Appending data only
Last data has been queried more often
InfluxDB is one of the best time series database
An
Important
Notice
The second level involves receiving such information and
making decisions as to whether they represent real problems
or outages.
This is the information consumption level.
The first level involves searching for anomalies in metrics and
sending out notifications if outliers are found.
This is the information emission level.
Overall
Architecture
 Python stack
 Built as a set of loosely
coupled components
 Executed on their own
Python virtual machines
 Event-driven design
Event Streamer
 Component that holds Workers, fetches data regularly, and tests this data against the statistical
models managed by Workers
 A Worker is the main working unit that holds a set of models together with meta-information
 Workers are fully independent and every cycle is executed using a threading pool
Rule Engine
 Consumes the information provided by the
Event Streamer
 Rules built as Abstract Syntax Tree
 Around 1500 matches per sec in one
process
We also measure dynamics
 We can take into account the speed and acceleration of the degradation of the metrics
 It correspond to, respectively, the severity and the predicted change in the severity of the incident
 Speed is an angular coefficient or a discrete derivative of a particular metric, which is calculated
for every violation
 The same applies to acceleration or the second order derivative
Some of our Rules examples
Model ensemble can be fine tuned
For every alert report is created
Alerta – open-source product for alerts
aggregation
Non Functional Moments
What future brings us
Q&A
 Thank you very much
 aleks.tavgen@gmail.com
 https://medium.com/@ATavgen/time-series-modelling-a9bf4f467687
 https://medium.com/@ATavgen/never-fail-twice-608147cb49b

More Related Content

What's hot (19)

PPT
Laboratory Information Managment System
neptunesol
 
PPTX
Becoma an Ace in Analytics
Ken Goossens
 
PPTX
Rule based expert system
Abhishek Kori
 
PPTX
Software testing-and-risk-analysis
Ajit Waje
 
PPTX
Continual Monitoring
Tripwire
 
PDF
Service Assurance for Modern Apps - BigPanda NA SNO - April 2015 - Dan Turchin
PeopleReign, Inc.
 
PPTX
New technology new approaches - tmf - july 2016
Stevan Zivanovic
 
PPTX
SplunkLive! Houston Improving Healthcare Operations
Splunk
 
PDF
Unified Monitoring Webinar with Dustin Whittle
AppDynamics
 
PPTX
Esm application management version 1.0
PaVan G Jakati
 
PPTX
Unomaly - product presentation
Rudi Wynen
 
PPTX
NuvoSys Solutions, LLC
nygonz
 
PPTX
Using Machine Learning to Optimize DevOps Practices
Peter Varhol
 
PDF
Computer Audit an Introductory
MNorazizi HM
 
PPT
Why Use Westech Solutions
Jhugueno
 
PPT
Why Use Wes Tech Solutions
doughold
 
PPTX
Perfexpert
gystell
 
PDF
CCXG Special Event, November 2020, Michael Vartanyan
OECD Environment
 
PPTX
Vulnerability Assessment & Analysis (VAA) Overview
Susan Rantall
 
Laboratory Information Managment System
neptunesol
 
Becoma an Ace in Analytics
Ken Goossens
 
Rule based expert system
Abhishek Kori
 
Software testing-and-risk-analysis
Ajit Waje
 
Continual Monitoring
Tripwire
 
Service Assurance for Modern Apps - BigPanda NA SNO - April 2015 - Dan Turchin
PeopleReign, Inc.
 
New technology new approaches - tmf - july 2016
Stevan Zivanovic
 
SplunkLive! Houston Improving Healthcare Operations
Splunk
 
Unified Monitoring Webinar with Dustin Whittle
AppDynamics
 
Esm application management version 1.0
PaVan G Jakati
 
Unomaly - product presentation
Rudi Wynen
 
NuvoSys Solutions, LLC
nygonz
 
Using Machine Learning to Optimize DevOps Practices
Peter Varhol
 
Computer Audit an Introductory
MNorazizi HM
 
Why Use Westech Solutions
Jhugueno
 
Why Use Wes Tech Solutions
doughold
 
Perfexpert
gystell
 
CCXG Special Event, November 2020, Michael Vartanyan
OECD Environment
 
Vulnerability Assessment & Analysis (VAA) Overview
Susan Rantall
 

Similar to Monitoring Distributed Systems (20)

PPTX
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Aleksandr Tavgen
 
PDF
Using Time Series for Full Observability of a SaaS Platform
DevOps.com
 
PPTX
Observability - the good, the bad, and the ugly
Aleksandr Tavgen
 
PPTX
Observability – the good, the bad, and the ugly
Timetrix
 
PPTX
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
InfluxData
 
PPTX
Lesson 1 introduction_to_time_series
ankit_ppt
 
PPTX
Machine Learning for Forecasting: From Data to Deployment
Anant Agarwal
 
PPTX
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Fwdays
 
PPTX
Equinor has developed and open-sourced a library to combine cybernetics and m...
Steinar Elgsæter
 
PDF
Is this normal?
Theo Schlossnagle
 
PDF
Final observability starts_with_data
Dave McAllister
 
PPTX
Time Series Anomaly Detection with .net and Azure
Marco Parenzan
 
PPTX
Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017
DevOpsDays Tel Aviv
 
PDF
Discovering signal in financial time series- where and how to start
NicholasSherman11
 
PPTX
A Framework for Periodic Outlier Pattern Detection in Time-Series Sequences
KaashivInfoTech Company
 
PDF
Short Data Rules for Observability.pdf
Dave McAllister
 
PDF
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Hyderabad Scalability Meetup
 
PDF
Dataday Texas 2016 - Datadog
Datadog
 
PPTX
Time Series Analysis for Network Secruity
mrphilroth
 
PDF
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Science Council of America
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Aleksandr Tavgen
 
Using Time Series for Full Observability of a SaaS Platform
DevOps.com
 
Observability - the good, the bad, and the ugly
Aleksandr Tavgen
 
Observability – the good, the bad, and the ugly
Timetrix
 
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
InfluxData
 
Lesson 1 introduction_to_time_series
ankit_ppt
 
Machine Learning for Forecasting: From Data to Deployment
Anant Agarwal
 
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Fwdays
 
Equinor has developed and open-sourced a library to combine cybernetics and m...
Steinar Elgsæter
 
Is this normal?
Theo Schlossnagle
 
Final observability starts_with_data
Dave McAllister
 
Time Series Anomaly Detection with .net and Azure
Marco Parenzan
 
Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017
DevOpsDays Tel Aviv
 
Discovering signal in financial time series- where and how to start
NicholasSherman11
 
A Framework for Periodic Outlier Pattern Detection in Time-Series Sequences
KaashivInfoTech Company
 
Short Data Rules for Observability.pdf
Dave McAllister
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Hyderabad Scalability Meetup
 
Dataday Texas 2016 - Datadog
Datadog
 
Time Series Analysis for Network Secruity
mrphilroth
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Science Council of America
 
Ad

Recently uploaded (20)

PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
What Is Data Integration and Transformation?
subhashenia
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
Ad

Monitoring Distributed Systems

  • 1. Never Fail Twice How Playtech Mastered Failure Detection Across Distributed Systems
  • 2. Bio  Technical Architect with more than 18 y. of experience  Passionate about IT  Financial and Data Science background  Last years in Research and Design projects
  • 3. Agenda • What is observability and monitoring • Why this is hard • Possible approaches • How we solved it • Future of the instrumentation and observability
  • 4. Objectives  Get in touch with time-series analysis  Understanding Distributed Systems pro’s and con’s  Understanding observability and instrumentation concepts
  • 5. Observability  Monitoring is for operating software/systems  Instrumentation is for writing software  Observability is for understanding systems Charity Majors
  • 6. Why is it difficult  1. Various problems may lead to non-obvious system behaviour.  2. Various metrics may have different correlations in time and space.  3. Monitoring a complex application is a significant engineering endeavor in and of itself.  4. There is a mix of different measurements and metrics.
  • 7. System monitoring in Playtech  50+ multibranded sites, distributed all over the world  Multiple products  Multichannel  Different mix of integrations
  • 8. On the shoulders of giants A lot of companies built their own solutions for monitoring their systems. There was not always success stories.
  • 9. Etsy  Etsy is a large online marketplace of handmade goods  Their engineering team collected more than 250,000 different metrics from their servers  They tried to find anomalies using complex math approaches.
  • 10. Lessons learnt from KALE 1.0 Anomalies in other metrics should be used for root cause analysis. Alerts should only be sent out when anomalies are detected in business and user metrics A one-size-fits-all type of approach will probably not fit at all Anomaly detection is more than just outlier detection
  • 11. Google SRE team’s BorgMon  Google has trended toward simpler and faster monitoring systems, with better tools for post hoc analysis  [They] avoid “magic” systems that try to learn thresholds or automatically detect causality  Rules that generate alerts for humans should be simple to understand and represent a clear failure According to the authors of Site Reliability Engineering
  • 12. Playtech case Past tool from HP is “one-fits-for- all” Low efficiency and side effects False Positives and missed incidents Horrible operability
  • 13. Time Series  A time series is a series of data points indexed (or listed or graphed) in time order  Economical processes have a regular structure  These are amount of sales in the shops, production of champagne, online transactions  Usually they have seasonal periods and trend lines  Using this information, simplifies analysis
  • 14. Stationary Time-Series Data  Is a stochastic process, which characteristics does not change  White noise
  • 15. Non Stationary Time Series  Trend line  Dispersion change
  • 16. How to model that?  Every measurement consists of a signal and an error component/noise, because our processes are affected by many factors  Point_of_measurement = signal + error  Subtract the model’s values from our measurements  The more our model resembles the real signal, the more our residue will approximate the error component or stationarity or white noise
  • 18. Cut 30 min data piece
  • 19. Regression or finding a trend line
  • 20. Trend line subtracted Looks like white noise
  • 21. Dickey-Fuller test of an initial piece of data Stationary hypothesis rejected
  • 22. And after subtraction Result is a stationary time series
  • 23. Let’s take a moving average from our example
  • 24. A bit of Salvador Dali
  • 25. Compared with a next week data
  • 26. Why Time Series DB matters Optimized for handling time series data No Updates. Facts do not change ever Appending data only Last data has been queried more often InfluxDB is one of the best time series database
  • 27. An Important Notice The second level involves receiving such information and making decisions as to whether they represent real problems or outages. This is the information consumption level. The first level involves searching for anomalies in metrics and sending out notifications if outliers are found. This is the information emission level.
  • 28. Overall Architecture  Python stack  Built as a set of loosely coupled components  Executed on their own Python virtual machines  Event-driven design
  • 29. Event Streamer  Component that holds Workers, fetches data regularly, and tests this data against the statistical models managed by Workers  A Worker is the main working unit that holds a set of models together with meta-information  Workers are fully independent and every cycle is executed using a threading pool
  • 30. Rule Engine  Consumes the information provided by the Event Streamer  Rules built as Abstract Syntax Tree  Around 1500 matches per sec in one process
  • 31. We also measure dynamics  We can take into account the speed and acceleration of the degradation of the metrics  It correspond to, respectively, the severity and the predicted change in the severity of the incident  Speed is an angular coefficient or a discrete derivative of a particular metric, which is calculated for every violation  The same applies to acceleration or the second order derivative
  • 32. Some of our Rules examples
  • 33. Model ensemble can be fine tuned
  • 34. For every alert report is created
  • 35. Alerta – open-source product for alerts aggregation
  • 38. Q&A  Thank you very much  [email protected]  https://medium.com/@ATavgen/time-series-modelling-a9bf4f467687  https://medium.com/@ATavgen/never-fail-twice-608147cb49b

Editor's Notes

  • #8: Во-первых, из-за высокой сложности продуктов и огромного количества настроек существуют ситуации, когда неправильные настройки приводят к деградации финансовых показателей, или скрытые баги в логике отражаются на общем функционале всей системы.  Во-вторых, есть специфические 3d-party интеграции для разных стран, и проблемы, возникающие у партнеров, начинают протекать к нам. Проблемы такого рода не ловятся low level monitoring; для их решения нужно мониторить ключевые индикаторы (KPI), сравнивать их со статистикой по системе и искать корреляции. В компании было ранее внедрено решение от Hewlett Packard Service Health Analyzer, которое было (мягко говоря) неидеально. Судя по маркетинговому проспекту, это система, которая сама обучается и обеспечивает раннее обнаружение проблем (SHA can take data streams from multiple sources and apply advanced, predictive algorithms in order to alert to and diagnose potential problems before they occur). По факту же это был черный ящик, который невозможно настроить, со всеми проблемами нужно было обращаться в HP и ждать месяцами, пока инженеры поддержки сделают что-то, что тоже не будет работать, как нужно. А еще — ужасный пользовательский интерфейс, старая JVM экосистема (Java 6.0), и, что самое главное — большое число False Positives и (что еще хуже) False Negatives, то есть некоторые серьезные проблемы либо не обнаруживались, либо были пойманы намного позже чем следует, что выражалось во вполне конкретном финансовом убытке.
  • #28: Опыт проекта Kale говорит об очень важном моменте. Алертинг — это не то же самое, что и поиск аномалий и outliers в метриках, поскольку, как уже говорилось, аномалии на единичных метриках будут всегда.  В действительности, у нас есть два логических уровня.  — Первый — это поиск аномалий в метриках и посылка нотификации о нарушении, если аномалия найдена. Это уровень эмиссии информации.  — Второй уровень — это компонент, получающий информацию о нарушениях и принимающий решения о том, является это критическим инцидентом или нет.  Таким образом действуем и мы, люди, когда исследуем проблему. Мы смотрим на что-либо, при обнаружении отклонений от нормы смотрим еще, и затем принимаем решение на основании наблюдений. В начале проекта мы решили попробовать Kapacitor, поскольку в нем есть возможность определения пользовательских функций на Python. Но каждая функция сидит в отдельном процессе, что создало бы overhead для сотен и тысяч метрик. Из-за этой и некоторых других проблем от него решено было отказаться. Для построения собственной системы в качестве основного стека был выбран Python, поскольку существует отличная экосистема для анализа данных, быстрые библиотеки (pandas, numpy и т.д.), отличная поддержка веб-решений. You name it. Для меня это был первый большой проект, целиком и полностью выполненный на Python. Сам я пришел к Python из Java мира. Мне не хотелось множить зоопарк стеков для одной системы, что в конечном счете было вознаграждено.
  • #29: Общая архитектура. Система построена в виде набора слабо связанных компонентов или сервисов, которые крутятся в своих процессах на своих Python VM. Это естественно для общего логического разбиения (events emitter / rules engine) и дает другие плюсы. Каждый компонент делает ограниченное количество специфических вещей. В дaльнейшем это позволит очень быстро расширять систему и добавлять новые пользовательские интерфейсы, не затрагивая основную логику и не боясь ее сломать. Между компонентами проведены достаточно четкие границы. Распределенный deploy удобен, если нужно разместить агент локально к ближе к сайту, который он мониторит — или же можно аггрегировать вместе большое количество разных систем.  Коммуникация должна быть построена на базе сообщений, поскольку вся система должна быть асинхронной.  В качестве Message Queue я выбрал ActiveMQ, но при желании сменить, например, на RabbitMQ, проблем не возникнет, поскольку все компоненты общаются по стандартному протоколу STOMP.
  • #30: Worker — это основной рабочий юнит, который хранит одну модель одной метрики вместе с мета-информацией. Он состоит из дата коннектора и хендлера, которому передает данные. Хендлер тестирует их на статистической модели, и если обнаруживает нарушения, то передает их агенту, который посылает событие в очередь. Workers полностью независимы друг от друга, каждый цикл выполняется через пул потоков. Поскольку большая часть времени тратится на I/O операции, то Global Interpreter Lock Python не сильно влияет на результат. Количество потоков ставится в конфиге; на текущей конфигурации оптимальным количеством оказалось 8 потоков. Information emmiter
  • #31: Information consuming Каждое сообщение отправляется в Rule Engine, и тут начинается самое интересное. В самом начале разработки я жестко задал правила в коде: когда одна метрика падает, а другая растет, то послать алерт. Но это решение не универсально и требует залезать в код для любого расширения. Поэтому нужен был какой-то язык, задающий правила. Тут пришлось вспоминать Абстрактные Синтаксические Деревья и дефинировать простой язык для описания правил. При запуске компонента правила считываются и строится синтаксическое дерево. При получении каждого сообщения все события с этого сайта за один тик проверяются согласно заданным правилам, и если правило срабатывает, то генерируется алерт. Сработавших правил может быть несколько. Если рассматривать динамику инцидентов, развивающихся во времени, то можно учитывать также скорость падения (уровень severity) и изменение скорости (прогноз изменения severity)
  • #32: Скорость — это угловой коэффициент или дискретная производная, который подсчитывается для каждого нарушения. То же касается и акселерации, дискретной производной второго порядка. Это значения можно задавать в правилах. Кумулятивные производные первых и вторых порядков могут учитываться в общей оценке инцидента.
  • #33: Правила описываются в формате YAML, но можно использовать любой другой, достаточно добавить свой парсер. Правила задаются в виде регулярных выражений имен метрик или просто префиксов метрик. Speed — это скорость деградации метрик, об этом ниже.
  • #37: SHA required Oracle server tens of Gb Ram PT-Pas 28962 lines of code. 1500 matches/per sec one process, up to 125 000 matches in current configuration (sharding probabilities)
  • #38: Non funfunc 28982 LOC