SlideShare a Scribd company logo
Mind the App
How to monitor your Kafka Streams applications
Bruno Cadonna, Kafka Summit 2021 Europe
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
About me
2
Bruno Cadonna
Contributor to Apache Kafka &
Software Developer at Confluent
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Content
3
• Basics about metrics in Kafka
• Metrics in Kafka Streams
• KIP-444: Improving Kafka Streams’ metrics
• KIP-471 and KIP-607: RocksDB metrics
• KIP-613: End-to-end latency metrics
• Takeaways
Basics about metrics in Kafka
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A metric in Kafka
5
• consists of a name, a value, and a configuration
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A metric in Kafka
6
• consists of a name, a value, and a configuration
• a metric name is composed of
• name
• group
• tags
• description
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A metric in Kafka
7
• consists of a name, a value, and a configuration
• a metric name is composed of
• name
• group
• tags
• description
• a metric value inherits from the Object class, e.g. integral number, decimal number, string, …
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A metric in Kafka
8
• consists of a name, a value, and a configuration
• a metric name is composed of
• name
• group
• tags
• description
• a metric value inherits from the Object class, e.g. integral number, decimal number, string, …
• metric config contains the recording level which can be INFO, DEBUG, TRACE
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A metric in Kafka
9
• consists of a name, a value, and a configuration
• a metric name is composed of
• name
• group
• tags
• description
• a metric value inherits from the Object class, e.g. integral number, decimal number, string, …
• metric config contains the recording level which can be INFO, DEBUG, TRACE
• example:
• name: process-rate
• group: stream-thread-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1
• description: The average number of processed records per second
• value: 123456.78
• recording level: INFO
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A sensor in Kafka
10
• maintains a sequence of recorded values
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A sensor in Kafka
11
• maintains a sequence of recorded values
• maintains a set of metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A sensor in Kafka
12
• maintains a sequence of recorded values
• maintains a set of metrics
• each metric specifies an aggregation on the recorded values
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A sensor in Kafka
13
• maintains a sequence of recorded values
• maintains a set of metrics
• each metric specifies an aggregation on the recorded values
• each time a value is recorded all metrics in a sensor are updated
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A sensor in Kafka
14
• maintains a sequence of recorded values
• maintains a set of metrics
• each metric specifies an aggregation for the recorded values
• each time a value is recorded all metrics in a sensor are updated
• example:
• process-rate and process-total are recorded by the same sensor
• process-rate computes the number of processed records over time
• process-total computes the total number of processed records
Metrics in Kafka Streams
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Anatomy of a Kafka Streams application
16
Kafka Streams client
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Anatomy of a Kafka Streams application
17
stream thread 1
stream thread 2
Kafka Streams client
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Anatomy of a Kafka Streams application
18
stream thread 1
task 1
task 2
task 3
task 4
task 5
processor node
state store
cache
stream thread 2
Kafka Streams client
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does Kafka Streams report metrics?
19
Kafka Streams client
metrics()
read-only map of metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does Kafka Streams report metrics?
20
metrics()
read-only map of metrics
JMX reporter
implements
MetricsReporter
my reporter
implements
MetricsReporter
Kafka Streams config:
metric.reporter
by default,
no need to set
Kafka Streams client
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does Kafka Streams report metrics?
21
metrics()
read-only map of metrics
JMX reporter
implements
MetricsReporter
my reporter
implements
MetricsReporter
Kafka Streams config:
metric.reporter
interface MetricsReporter {
// called when a metric is added or updated
void metricChange(KafkaMetric metric);
// called when a metric is removed
void metricRemoval(KafkaMetric metric);
}
by default,
no need to set
Kafka Streams client
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
jconsole
22
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
jconsole
23
metric name
metric description
metric value
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
jconsole
24
metric name
tag: thread-id
metric group
metric description
metric value
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Datadog
25
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Datadog
26
metric name
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Datadog
27
metric group
tags
metric name
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What metrics does Kafka Streams expose?
28
• Kafka Streams client level:
• name: state
• group: stream-metrics
• tags: client-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What metrics does Kafka Streams expose?
29
• Kafka Streams client level:
• name: state
• group: stream-metrics
• tags: client-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003
• stream thread level:
• name: process-rate
• group: stream-thread-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What metrics does Kafka Streams expose?
30
• Kafka Streams client level:
• name: state
• group: stream-metrics
• tags: client-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003
• stream thread level:
• name: process-rate
• group: stream-thread-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1
• task level:
• name: process-latency-avg
• group: stream-task-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
…some more metrics
31
• processor node level
• name: process-rate
• group: stream-processor-node-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
processor-node-id = KSTREAM-SINK-0000000004
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
…some more metrics
32
• processor node level
• name: process-rate
• group: stream-processor-node-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
processor-node-id = KSTREAM-SINK-0000000004
• state store level
• name: put-rate
• group: stream-state-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
rocksdb-state-id = count-items
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
…some more metrics
33
• processor node level
• name: process-rate
• group: stream-processor-node-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
processor-node-id = KSTREAM-SINK-0000000004
• state store level
• name: put-rate
• group: stream-state-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
rocksdb-state-id = count-items
• cache level
• name: hit-ratio-avg
• group: stream-record-cache-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
record-cache-id = 0_1-count-items
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
… and finally
34
• all metrics of embedded consumers, producers, and admin client
• name: last-rebalance-seconds-ago
• group: consumer-coordinator-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1-consumer
KIP-444:
Improving Kafka Streams’ metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
New metrics
36
• introduces client-level metrics
• version,
• commit-id,
• application-id,
• topology-description,
• state,
• alive-stream-threads
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
New metrics
37
• introduces client-level metrics
• version,
• commit-id,
• application-id,
• topology-description,
• state,
• alive-stream-threads
• introduces new task level metrics
• active-process-ratio,
• standby-process-ratio (not yet implemented),
• dropped-records
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Refactorings
38
• renames some metric names and some metric tags
• client-level and stream thread-level metrics on INFO and most metrics on lower levels on
DEBUG
• removes all parent metrics except one and let users do the roll-up themselves
• removes overlapping metrics
• dropped-records (task-level, INFO) replaces
• late-records-drop (processor node, INFO),
• skipped-records (processor node, INFO),
• expired-window-record-drop (state store, DEBUG)
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Improving custom metrics
39
• Sensor addLatencyRateTotalSensor(final String scopeName,
final String entityName,
final String operationName,
final Sensor.RecordingLevel recordingLevel,
final String... tags);
• Sensor addRateTotalSensor(final String scopeName,
final String entityName,
final String operationName,
final Sensor.RecordingLevel recordingLevel,
final String... tags);
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Improving custom metrics
40
• Sensor addLatencyRateTotalSensor(final String scopeName,
final String entityName,
final String operationName,
final Sensor.RecordingLevel recordingLevel,
final String... tags);
• Sensor addRateTotalSensor(final String scopeName,
final String entityName,
final String operationName,
final Sensor.RecordingLevel recordingLevel,
final String... tags);
• only available where you have access to the ProcessorContext
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Improving custom metrics
41
• Sensor addLatencyRateTotalSensor(final String scopeName,
final String entityName,
final String operationName,
final Sensor.RecordingLevel recordingLevel,
final String... tags);
• Sensor addRateTotalSensor(final String scopeName,
final String entityName,
final String operationName,
final Sensor.RecordingLevel recordingLevel,
final String... tags);
• only available where you have access to the ProcessorContext
• you can add additional metrics to the sensor with Sensor#add()
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Example of custom metrics
42
public class Processor<String, String, String, String>() {
private ProcessorContext context;
private KeyValueStore<String, Integer> kvStore;
private Sensor countEmptyRecords;
@Overrid
public void init(final ProcessorContext<String, String> context) {
this.context = context;
countEmptyRecords = context.metrics().addRateTotalSensor(
"word-counter",
"word-counter" + context.taskId(),
"count-empty-messages",
RecordingLevel.INFO
);
kvStore = context.getStateStore("Counts");
}
@Override
public void process(final Record<String, String> record) {
final String[] words = record.value().toLowerCase(Locale.getDefault()).split(" ");
if (words.length == 0) {
countEmptyRecords.record();
}
for (final String word : words) {
final Integer oldValue = kvStore.get(word);
if (oldValue == null) {
kvStore.put(word, 1);
} else {
kvStore.put(word, oldValue + 1);
}
}
}
};
KIP-471 and KIP-607:
RocksDB metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
RocksDB metrics
44
• RocksDB is the default state store in Kafka Streams
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
RocksDB metrics
45
• RocksDB is the default state store in Kafka Streams
• statistics-based metrics (KIP-471, AK 2.4): cumulative measurements over time collected by
RocksDB
• name: bytes-written-rate
• group: stream-state-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
rocksdb-state-id = count-items
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
RocksDB metrics
46
• RocksDB is the default state store in Kafka Streams
• statistics-based metrics (KIP-471, AK 2.4): cumulative measurements over time collected by
RocksDB
• name: bytes-written-rate
• group: stream-state-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
rocksdb-state-id = count-items
• properties-based metrics (KIP-607, AK 2.7): properties exposed by RocksDB providing current
measurements
• name: block-cache-usage
• group: stream-state-metrics
• tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1,
task-id = 0_1,
rocksdb-state-id = count-items
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Recording RocksDB metrics
47
• statistics-based metrics
• collecting statistics-based metrics may have an impact on performance
• recording metrics during state store operations might be costly
• instead each state store has a metric recorder
• all metric recorders are triggered once per minute by one dedicated thread that is started at Kafka Streams client start-up
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Recording RocksDB metrics
48
• statistics-based metrics
• collecting statistics-based metrics may have an impact on performance
• recording metrics during state store operations might be costly
• instead each state store has a metric recorder
• all metric recorders are triggered once per minute by one dedicated thread that is started at Kafka Streams client start-up
• properties-based metrics
• all properties-based metrics are gauges
• a gauge executes some given code each time the metric is queried
• properties-based metrics query RocksDB properties
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
When to look at RocksDB metrics?
49
• high memory usage
• size-all-mem-tables
• block-cache-usage
• block-cache-pinned-usage
• estimate-table-readers-mem
statistics-based metrics
properties-based metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
When to look at RocksDB metrics?
50
• high memory usage
• size-all-mem-tables
• block-cache-usage
• block-cache-pinned-usage
• estimate-table-readers-mem
• high disk usage
• total-sst-files-size
statistics-based metrics
properties-based metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
When to look at RocksDB metrics?
51
• high memory usage
• size-all-mem-tables
• block-cache-usage
• block-cache-pinned-usage
• estimate-table-readers-mem
• high disk usage
• total-sst-files-size
• high disk I/O and write stalls
• memtable-bytes-flushed-[rate | total]
• bytes-[read | written]-compaction-rate
• write-stall-duration-[avg | total]
• memtable-hit-ratio
• block-cache-[data | index | filter]-hit-ratio
statistics-based metrics
properties-based metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
When to look at RocksDB metrics?
52
• high memory usage
• size-all-mem-tables
• block-cache-usage
• block-cache-pinned-usage
• estimate-table-readers-mem
• high disk usage
• total-sst-files-size
• high disk I/O and write stalls
• memtable-bytes-flushed-[rate | total]
• bytes-[read | written]-compaction-rate
• write-stall-duration-[avg | total]
• memtable-hit-ratio
• block-cache-[data | index | filter]-hit-ratio
• too many open files
• number-open-files
statistics-based metrics
properties-based metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
When to look at RocksDB metrics?
53
• high memory usage
• size-all-mem-tables
• block-cache-usage
• block-cache-pinned-usage
• estimate-table-readers-mem
• high disk usage
• total-sst-files-size
• high disk I/O and write stalls
• memtable-bytes-flushed-[rate | total]
• bytes-[read | written]-compaction-rate
• write-stall-duration-[avg | total]
• memtable-hit-ratio
• block-cache-[data | index | filter]-hit-ratio
• too many open files
• number-open-files
for more details, check out the blog post:
How to Tune RocksDB for Your Kafka Streams Application
https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/
statistics-based metrics
properties-based metrics
KIP-613:
End-to-end latency metrics
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
End-to-end-latency metrics
55
source node filter
aggregation
sink node
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
End-to-end-latency metrics
56
source node filter
aggregation
sink node
consumption latency (INFO) name: record-e2e-latency-[min | max | avg]
group: stream-processor-node-metrics
tags: thread-id = myapp-…,
task-id = 0_1,
processor-node-id = KSTREAM-SOURCE-0000000004
event time processing time
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
End-to-end-latency metrics
57
source node filter
aggregation
sink node
consumption latency (INFO) name: record-e2e-latency-[min | max | avg]
group: stream-processor-node-metrics
tags: thread-id = myapp-…,
task-id = 0_1,
processor-node-id = KSTREAM-SOURCE-0000000004
event time processing time
full end-to-end latency (INFO) name: record-e2e-latency-[min | max | avg]
group: stream-processor-node-metrics
tags: thread-id = myapp-…,
task-id = 0_1,
processor-node-id = KSTREAM-SINK-0000000004
event time processing time
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
End-to-end-latency metrics
58
source node filter
aggregation
sink node
begin-to-state latency (TRACE)
event time processing time
name: record-e2e-latency-[min | max | avg]
group: stream-state-metrics
tags: thread-id = myapp-…,
task-id = 0_1,
rocksdb-state-id = count-items
consumption latency (INFO) name: record-e2e-latency-[min | max | avg]
group: stream-processor-node-metrics
tags: thread-id = myapp-…,
task-id = 0_1,
processor-node-id = KSTREAM-SOURCE-0000000004
event time processing time
full end-to-end latency (INFO) name: record-e2e-latency-[min | max | avg]
group: stream-processor-node-metrics
tags: thread-id = myapp-…,
task-id = 0_1,
processor-node-id = KSTREAM-SINK-0000000004
event time processing time
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
End-to-end-latency metrics (advanced)
59
source node filter
aggregation
sink node source node filter
aggregation
sink node
task 1 task 2
event time processing time
processing time
event time
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
End-to-end-latency metrics (advanced)
60
source node filter
aggregation
sink node source node filter
aggregation
sink node
task 1 task 2
event time processing time
processing time
event time
event time processing time
processing delay of task 2
Takeaways
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Takeaways
62
• Kafka Streams exposes various metrics on different levels
• metrics were consolidated recently-ish
• RocksDB metrics let you gain insight into state stores
• Kafka Streams allows monitoring record end-to-end latencies
Thank you!
bruno@confluent.io
63
cnfl.io/slack
cnfl.io/blog
cnfl.io/meetups
cnfl.io/forum

More Related Content

What's hot (20)

PPTX
Kafka replication apachecon_2013
Jun Rao
 
PDF
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
PDF
The automation challenge: Kubernetes Operators vs Helm Charts
Ana-Maria Mihalceanu
 
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
PDF
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
Kai Wähner
 
PDF
Delta from a Data Engineer's Perspective
Databricks
 
PDF
Getting Started with Apache Spark on Kubernetes
Databricks
 
PPTX
Functional Application Logging : Code Examples Using Spring Boot and Logback
Mohammad Sabir Khan
 
PPTX
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
PDF
Kafka High Availability in multi data center setup with floating Observers wi...
HostedbyConfluent
 
PPTX
Data Streaming with Apache Kafka & MongoDB
confluent
 
PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Databricks
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
KEY
Big Data in Real-Time at Twitter
nkallen
 
PDF
Kafka Streams: What it is, and how to use it?
confluent
 
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
PPTX
Kafka Connect - debezium
Kasun Don
 
PDF
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Kafka replication apachecon_2013
Jun Rao
 
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
The automation challenge: Kubernetes Operators vs Helm Charts
Ana-Maria Mihalceanu
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
Kai Wähner
 
Delta from a Data Engineer's Perspective
Databricks
 
Getting Started with Apache Spark on Kubernetes
Databricks
 
Functional Application Logging : Code Examples Using Spring Boot and Logback
Mohammad Sabir Khan
 
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Kafka High Availability in multi data center setup with floating Observers wi...
HostedbyConfluent
 
Data Streaming with Apache Kafka & MongoDB
confluent
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Databricks
 
Introduction to Kafka Streams
Guozhang Wang
 
Big Data in Real-Time at Twitter
nkallen
 
Kafka Streams: What it is, and how to use it?
confluent
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Kafka Connect - debezium
Kasun Don
 
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
From Zero to Hero with Kafka Connect
confluent
 
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

Similar to Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna, Confluent (20)

PPTX
Data Pipelines with Kafka Connect
Kaufman Ng
 
PDF
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
HostedbyConfluent
 
PDF
Introducing Kafka's Streams API
confluent
 
PPTX
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
HostedbyConfluent
 
PPTX
What’s new in Apache Spark 2.3
DataWorks Summit
 
PDF
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
PPT
Kubernetes for Cloud-Native Environments
AdiB912552
 
PDF
Deploying Kafka Streams Applications with Docker and Kubernetes
confluent
 
PDF
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Timothy Spann
 
PDF
dA Platform Overview
Robert Metzger
 
PDF
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward
 
PDF
Web Scale Reasoning and the LarKC Project
Saltlux Inc.
 
PDF
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
confluent
 
PPTX
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Data Con LA
 
PDF
Concepts and Patterns for Streaming Services with Kafka
QAware GmbH
 
PDF
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
UA DevOps Conference
 
PDF
Apache spark 2.4 and beyond
Xiao Li
 
PDF
Resume2015
David Youngworth
 
PDF
Presentación11.pdf
PabloCanesta
 
PDF
Load Balancing in the Cloud using Nginx & Kubernetes
Lee Calcote
 
Data Pipelines with Kafka Connect
Kaufman Ng
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
HostedbyConfluent
 
Introducing Kafka's Streams API
confluent
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
HostedbyConfluent
 
What’s new in Apache Spark 2.3
DataWorks Summit
 
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
Kubernetes for Cloud-Native Environments
AdiB912552
 
Deploying Kafka Streams Applications with Docker and Kubernetes
confluent
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Timothy Spann
 
dA Platform Overview
Robert Metzger
 
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward
 
Web Scale Reasoning and the LarKC Project
Saltlux Inc.
 
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
confluent
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Data Con LA
 
Concepts and Patterns for Streaming Services with Kafka
QAware GmbH
 
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
UA DevOps Conference
 
Apache spark 2.4 and beyond
Xiao Li
 
Resume2015
David Youngworth
 
Presentación11.pdf
PabloCanesta
 
Load Balancing in the Cloud using Nginx & Kubernetes
Lee Calcote
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PPTX
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
PDF
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
Fwdays
 
PDF
CIFDAQ Market Insight for 14th July 2025
CIFDAQ
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PPTX
UI5Con 2025 - Beyond UI5 Controls with the Rise of Web Components
Wouter Lemaire
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
PDF
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
Julia Furst Morgado The Lazy Guide to Kubernetes with EKS Auto Mode + Karpenter
AWS Chicago
 
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
How a Code Plagiarism Checker Protects Originality in Programming
Code Quiry
 
PDF
The Past, Present & Future of Kenya's Digital Transformation
Moses Kemibaro
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
Fwdays
 
CIFDAQ Market Insight for 14th July 2025
CIFDAQ
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
UI5Con 2025 - Beyond UI5 Controls with the Rise of Web Components
Wouter Lemaire
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Julia Furst Morgado The Lazy Guide to Kubernetes with EKS Auto Mode + Karpenter
AWS Chicago
 
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
How a Code Plagiarism Checker Protects Originality in Programming
Code Quiry
 
The Past, Present & Future of Kenya's Digital Transformation
Moses Kemibaro
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 

Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna, Confluent

  • 1. Mind the App How to monitor your Kafka Streams applications Bruno Cadonna, Kafka Summit 2021 Europe
  • 2. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. About me 2 Bruno Cadonna Contributor to Apache Kafka & Software Developer at Confluent
  • 3. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Content 3 • Basics about metrics in Kafka • Metrics in Kafka Streams • KIP-444: Improving Kafka Streams’ metrics • KIP-471 and KIP-607: RocksDB metrics • KIP-613: End-to-end latency metrics • Takeaways
  • 5. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A metric in Kafka 5 • consists of a name, a value, and a configuration
  • 6. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A metric in Kafka 6 • consists of a name, a value, and a configuration • a metric name is composed of • name • group • tags • description
  • 7. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A metric in Kafka 7 • consists of a name, a value, and a configuration • a metric name is composed of • name • group • tags • description • a metric value inherits from the Object class, e.g. integral number, decimal number, string, …
  • 8. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A metric in Kafka 8 • consists of a name, a value, and a configuration • a metric name is composed of • name • group • tags • description • a metric value inherits from the Object class, e.g. integral number, decimal number, string, … • metric config contains the recording level which can be INFO, DEBUG, TRACE
  • 9. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A metric in Kafka 9 • consists of a name, a value, and a configuration • a metric name is composed of • name • group • tags • description • a metric value inherits from the Object class, e.g. integral number, decimal number, string, … • metric config contains the recording level which can be INFO, DEBUG, TRACE • example: • name: process-rate • group: stream-thread-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1 • description: The average number of processed records per second • value: 123456.78 • recording level: INFO
  • 10. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A sensor in Kafka 10 • maintains a sequence of recorded values
  • 11. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A sensor in Kafka 11 • maintains a sequence of recorded values • maintains a set of metrics
  • 12. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A sensor in Kafka 12 • maintains a sequence of recorded values • maintains a set of metrics • each metric specifies an aggregation on the recorded values
  • 13. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A sensor in Kafka 13 • maintains a sequence of recorded values • maintains a set of metrics • each metric specifies an aggregation on the recorded values • each time a value is recorded all metrics in a sensor are updated
  • 14. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A sensor in Kafka 14 • maintains a sequence of recorded values • maintains a set of metrics • each metric specifies an aggregation for the recorded values • each time a value is recorded all metrics in a sensor are updated • example: • process-rate and process-total are recorded by the same sensor • process-rate computes the number of processed records over time • process-total computes the total number of processed records
  • 15. Metrics in Kafka Streams
  • 16. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Anatomy of a Kafka Streams application 16 Kafka Streams client
  • 17. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Anatomy of a Kafka Streams application 17 stream thread 1 stream thread 2 Kafka Streams client
  • 18. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Anatomy of a Kafka Streams application 18 stream thread 1 task 1 task 2 task 3 task 4 task 5 processor node state store cache stream thread 2 Kafka Streams client
  • 19. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does Kafka Streams report metrics? 19 Kafka Streams client metrics() read-only map of metrics
  • 20. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does Kafka Streams report metrics? 20 metrics() read-only map of metrics JMX reporter implements MetricsReporter my reporter implements MetricsReporter Kafka Streams config: metric.reporter by default, no need to set Kafka Streams client
  • 21. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does Kafka Streams report metrics? 21 metrics() read-only map of metrics JMX reporter implements MetricsReporter my reporter implements MetricsReporter Kafka Streams config: metric.reporter interface MetricsReporter { // called when a metric is added or updated void metricChange(KafkaMetric metric); // called when a metric is removed void metricRemoval(KafkaMetric metric); } by default, no need to set Kafka Streams client
  • 22. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. jconsole 22
  • 23. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. jconsole 23 metric name metric description metric value
  • 24. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. jconsole 24 metric name tag: thread-id metric group metric description metric value
  • 25. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Datadog 25
  • 26. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Datadog 26 metric name
  • 27. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Datadog 27 metric group tags metric name
  • 28. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What metrics does Kafka Streams expose? 28 • Kafka Streams client level: • name: state • group: stream-metrics • tags: client-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003
  • 29. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What metrics does Kafka Streams expose? 29 • Kafka Streams client level: • name: state • group: stream-metrics • tags: client-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003 • stream thread level: • name: process-rate • group: stream-thread-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1
  • 30. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What metrics does Kafka Streams expose? 30 • Kafka Streams client level: • name: state • group: stream-metrics • tags: client-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003 • stream thread level: • name: process-rate • group: stream-thread-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1 • task level: • name: process-latency-avg • group: stream-task-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1
  • 31. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. …some more metrics 31 • processor node level • name: process-rate • group: stream-processor-node-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, processor-node-id = KSTREAM-SINK-0000000004
  • 32. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. …some more metrics 32 • processor node level • name: process-rate • group: stream-processor-node-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, processor-node-id = KSTREAM-SINK-0000000004 • state store level • name: put-rate • group: stream-state-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, rocksdb-state-id = count-items
  • 33. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. …some more metrics 33 • processor node level • name: process-rate • group: stream-processor-node-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, processor-node-id = KSTREAM-SINK-0000000004 • state store level • name: put-rate • group: stream-state-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, rocksdb-state-id = count-items • cache level • name: hit-ratio-avg • group: stream-record-cache-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, record-cache-id = 0_1-count-items
  • 34. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. … and finally 34 • all metrics of embedded consumers, producers, and admin client • name: last-rebalance-seconds-ago • group: consumer-coordinator-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1-consumer
  • 36. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. New metrics 36 • introduces client-level metrics • version, • commit-id, • application-id, • topology-description, • state, • alive-stream-threads
  • 37. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. New metrics 37 • introduces client-level metrics • version, • commit-id, • application-id, • topology-description, • state, • alive-stream-threads • introduces new task level metrics • active-process-ratio, • standby-process-ratio (not yet implemented), • dropped-records
  • 38. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Refactorings 38 • renames some metric names and some metric tags • client-level and stream thread-level metrics on INFO and most metrics on lower levels on DEBUG • removes all parent metrics except one and let users do the roll-up themselves • removes overlapping metrics • dropped-records (task-level, INFO) replaces • late-records-drop (processor node, INFO), • skipped-records (processor node, INFO), • expired-window-record-drop (state store, DEBUG)
  • 39. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Improving custom metrics 39 • Sensor addLatencyRateTotalSensor(final String scopeName, final String entityName, final String operationName, final Sensor.RecordingLevel recordingLevel, final String... tags); • Sensor addRateTotalSensor(final String scopeName, final String entityName, final String operationName, final Sensor.RecordingLevel recordingLevel, final String... tags);
  • 40. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Improving custom metrics 40 • Sensor addLatencyRateTotalSensor(final String scopeName, final String entityName, final String operationName, final Sensor.RecordingLevel recordingLevel, final String... tags); • Sensor addRateTotalSensor(final String scopeName, final String entityName, final String operationName, final Sensor.RecordingLevel recordingLevel, final String... tags); • only available where you have access to the ProcessorContext
  • 41. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Improving custom metrics 41 • Sensor addLatencyRateTotalSensor(final String scopeName, final String entityName, final String operationName, final Sensor.RecordingLevel recordingLevel, final String... tags); • Sensor addRateTotalSensor(final String scopeName, final String entityName, final String operationName, final Sensor.RecordingLevel recordingLevel, final String... tags); • only available where you have access to the ProcessorContext • you can add additional metrics to the sensor with Sensor#add()
  • 42. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Example of custom metrics 42 public class Processor<String, String, String, String>() { private ProcessorContext context; private KeyValueStore<String, Integer> kvStore; private Sensor countEmptyRecords; @Overrid public void init(final ProcessorContext<String, String> context) { this.context = context; countEmptyRecords = context.metrics().addRateTotalSensor( "word-counter", "word-counter" + context.taskId(), "count-empty-messages", RecordingLevel.INFO ); kvStore = context.getStateStore("Counts"); } @Override public void process(final Record<String, String> record) { final String[] words = record.value().toLowerCase(Locale.getDefault()).split(" "); if (words.length == 0) { countEmptyRecords.record(); } for (final String word : words) { final Integer oldValue = kvStore.get(word); if (oldValue == null) { kvStore.put(word, 1); } else { kvStore.put(word, oldValue + 1); } } } };
  • 44. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. RocksDB metrics 44 • RocksDB is the default state store in Kafka Streams
  • 45. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. RocksDB metrics 45 • RocksDB is the default state store in Kafka Streams • statistics-based metrics (KIP-471, AK 2.4): cumulative measurements over time collected by RocksDB • name: bytes-written-rate • group: stream-state-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, rocksdb-state-id = count-items
  • 46. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. RocksDB metrics 46 • RocksDB is the default state store in Kafka Streams • statistics-based metrics (KIP-471, AK 2.4): cumulative measurements over time collected by RocksDB • name: bytes-written-rate • group: stream-state-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, rocksdb-state-id = count-items • properties-based metrics (KIP-607, AK 2.7): properties exposed by RocksDB providing current measurements • name: block-cache-usage • group: stream-state-metrics • tags: thread-id = myapp-2d0b492c-87f1-11eb-8dcd-0242ac130003-StreamThread-1, task-id = 0_1, rocksdb-state-id = count-items
  • 47. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Recording RocksDB metrics 47 • statistics-based metrics • collecting statistics-based metrics may have an impact on performance • recording metrics during state store operations might be costly • instead each state store has a metric recorder • all metric recorders are triggered once per minute by one dedicated thread that is started at Kafka Streams client start-up
  • 48. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Recording RocksDB metrics 48 • statistics-based metrics • collecting statistics-based metrics may have an impact on performance • recording metrics during state store operations might be costly • instead each state store has a metric recorder • all metric recorders are triggered once per minute by one dedicated thread that is started at Kafka Streams client start-up • properties-based metrics • all properties-based metrics are gauges • a gauge executes some given code each time the metric is queried • properties-based metrics query RocksDB properties
  • 49. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. When to look at RocksDB metrics? 49 • high memory usage • size-all-mem-tables • block-cache-usage • block-cache-pinned-usage • estimate-table-readers-mem statistics-based metrics properties-based metrics
  • 50. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. When to look at RocksDB metrics? 50 • high memory usage • size-all-mem-tables • block-cache-usage • block-cache-pinned-usage • estimate-table-readers-mem • high disk usage • total-sst-files-size statistics-based metrics properties-based metrics
  • 51. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. When to look at RocksDB metrics? 51 • high memory usage • size-all-mem-tables • block-cache-usage • block-cache-pinned-usage • estimate-table-readers-mem • high disk usage • total-sst-files-size • high disk I/O and write stalls • memtable-bytes-flushed-[rate | total] • bytes-[read | written]-compaction-rate • write-stall-duration-[avg | total] • memtable-hit-ratio • block-cache-[data | index | filter]-hit-ratio statistics-based metrics properties-based metrics
  • 52. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. When to look at RocksDB metrics? 52 • high memory usage • size-all-mem-tables • block-cache-usage • block-cache-pinned-usage • estimate-table-readers-mem • high disk usage • total-sst-files-size • high disk I/O and write stalls • memtable-bytes-flushed-[rate | total] • bytes-[read | written]-compaction-rate • write-stall-duration-[avg | total] • memtable-hit-ratio • block-cache-[data | index | filter]-hit-ratio • too many open files • number-open-files statistics-based metrics properties-based metrics
  • 53. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. When to look at RocksDB metrics? 53 • high memory usage • size-all-mem-tables • block-cache-usage • block-cache-pinned-usage • estimate-table-readers-mem • high disk usage • total-sst-files-size • high disk I/O and write stalls • memtable-bytes-flushed-[rate | total] • bytes-[read | written]-compaction-rate • write-stall-duration-[avg | total] • memtable-hit-ratio • block-cache-[data | index | filter]-hit-ratio • too many open files • number-open-files for more details, check out the blog post: How to Tune RocksDB for Your Kafka Streams Application https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/ statistics-based metrics properties-based metrics
  • 55. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. End-to-end-latency metrics 55 source node filter aggregation sink node
  • 56. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. End-to-end-latency metrics 56 source node filter aggregation sink node consumption latency (INFO) name: record-e2e-latency-[min | max | avg] group: stream-processor-node-metrics tags: thread-id = myapp-…, task-id = 0_1, processor-node-id = KSTREAM-SOURCE-0000000004 event time processing time
  • 57. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. End-to-end-latency metrics 57 source node filter aggregation sink node consumption latency (INFO) name: record-e2e-latency-[min | max | avg] group: stream-processor-node-metrics tags: thread-id = myapp-…, task-id = 0_1, processor-node-id = KSTREAM-SOURCE-0000000004 event time processing time full end-to-end latency (INFO) name: record-e2e-latency-[min | max | avg] group: stream-processor-node-metrics tags: thread-id = myapp-…, task-id = 0_1, processor-node-id = KSTREAM-SINK-0000000004 event time processing time
  • 58. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. End-to-end-latency metrics 58 source node filter aggregation sink node begin-to-state latency (TRACE) event time processing time name: record-e2e-latency-[min | max | avg] group: stream-state-metrics tags: thread-id = myapp-…, task-id = 0_1, rocksdb-state-id = count-items consumption latency (INFO) name: record-e2e-latency-[min | max | avg] group: stream-processor-node-metrics tags: thread-id = myapp-…, task-id = 0_1, processor-node-id = KSTREAM-SOURCE-0000000004 event time processing time full end-to-end latency (INFO) name: record-e2e-latency-[min | max | avg] group: stream-processor-node-metrics tags: thread-id = myapp-…, task-id = 0_1, processor-node-id = KSTREAM-SINK-0000000004 event time processing time
  • 59. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. End-to-end-latency metrics (advanced) 59 source node filter aggregation sink node source node filter aggregation sink node task 1 task 2 event time processing time processing time event time
  • 60. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. End-to-end-latency metrics (advanced) 60 source node filter aggregation sink node source node filter aggregation sink node task 1 task 2 event time processing time processing time event time event time processing time processing delay of task 2
  • 62. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Takeaways 62 • Kafka Streams exposes various metrics on different levels • metrics were consolidated recently-ish • RocksDB metrics let you gain insight into state stores • Kafka Streams allows monitoring record end-to-end latencies