SlideShare a Scribd company logo
What to Do if Your
Kafka Streams App Gets OOMKilled?
Andrey Serebryanskiy
{
Andrey Serebryanskiy
Streaming Platform Owner at Raiffeisen Bank
What to Do if Your Kafka
Streams App Gets
OOMKilled?
The problem
3
Kafka Kafka Streams App
kubernetes
RocksDB
What is the problem with
this app?
Launch with resource limits
5
helm/templates/deployment.yaml
...
resources:
limits:
memory: 256Mi
requests:
memory: 128Mi
command:
- java
args:
- -jar
- app.jar
...
How to check if it is OOMKilled?
6
kubectl describe pod your-pod-name -n your-namespace
Name: your-pod-name
...
Containers:
app:
...
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Example app
Example app
8
Simple Kafka Streams topology
Application.java
public static void main(String[] args) {
var builder = new StreamsBuilder();
var stream = builder.stream(INPUT_TOPIC);
builder.addStateStore(Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STATE_STORE_NAME));
var persistedStream = stream.transformValues((readOnlyKey, val) -> {
...
stateStore.put(readOnlyKey, val);
...
}, STATE_STORE_NAME);
persistedStream.foreach((key, val) -> ...);
var topology = builder.build();
var kafkaStreams = new KafkaStreams(topology);
runApp(kafkaStreams);
}
Example app
9
Simple Kafka Streams topology
Application.java
public static void main(String[] args) {
var builder = new StreamsBuilder();
var stream = builder.stream(INPUT_TOPIC);
builder.addStateStore(Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STATE_STORE_NAME));
var persistedStream = stream.transformValues((readOnlyKey, val) -> {
...
stateStore.put(readOnlyKey, val);
...
}, STATE_STORE_NAME);
persistedStream.foreach((key, val) -> ...);
var topology = builder.build();
var kafkaStreams = new KafkaStreams(topology);
runApp(kafkaStreams);
}
Example app
10
Simple Kafka Streams topology
Application.java
public static void main(String[] args) {
var builder = new StreamsBuilder();
var stream = builder.stream(INPUT_TOPIC);
builder.addStateStore(Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STATE_STORE_NAME));
var persistedStream = stream.transformValues((readOnlyKey, val) -> {
...
stateStore.put(readOnlyKey, val);
...
}, STATE_STORE_NAME);
persistedStream.foreach((key, val) -> logMessageCount());
var topology = builder.build();
var kafkaStreams = new KafkaStreams(topology);
runApp(kafkaStreams);
}
Example app
11
Simple Kafka Streams topology
Application.java
public static void main(String[] args) {
var builder = new StreamsBuilder();
var stream = builder.stream(INPUT_TOPIC);
builder.addStateStore(Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STATE_STORE_NAME));
var persistedStream = stream.transformValues((readOnlyKey, val) -> {
...
stateStore.put(readOnlyKey, val);
...
}, STATE_STORE_NAME);
persistedStream.foreach((key, val) -> logMessageCount());
var topology = builder.build();
var kafkaStreams = new KafkaStreams(topology);
runApp(kafkaStreams);
}
Example app
12
Simple Kafka Streams topology
Application.java
public static void main(String[] args) {
...
runApp(kafkaStreams);
}
Please find full application code here:
https://github.com/a-serebryanskiy/kafka-streams-oom-killed
So you have your app
OOMKilled
Launch with resource limits
14
helm/templates/deployment.yaml
...
resources:
limits:
memory: 256Mi
requests:
memory: 128Mi
command:
- java
args:
- -jar
- app.jar
...
Let’s add heap limits
15
helm/templates/deployment.yaml
...
resources:
limits:
memory: 256Mi
requests:
memory: 128Mi
command:
- java
args:
- -XshowSettings:VM
- -XX:MinRAMPercentage=50.0
- -jar
- app.jar
...
Let’s add heap limits
16
helm/templates/deployment.yaml
...
resources:
limits:
memory: 256Mi
requests:
memory: 128Mi
command:
- java
args:
- -XshowSettings:VM
- -XX:MinRAMPercentage=50.0
- -jar
- app.jar
...
https://www.baeldung.com/java-jvm-parameters-rampercentage
VM settings:
Max. Heap Size (Estimated): 121.81M
Property settings:
java.version = 11.0.12
Turns out 50.0 is already a default value
App memory performance
17
Taken from grafana
container memory limit
container memory usage
jvm memory
100 mb
Kafka Streams
memory usage
Kafka Streams app memory
19
JVM Heap + RocksDB
Confluent article about Kafka Streams memory tunning:
https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html
State Store
Kafka Streams
App
put(key, val)
Kafka Streams app memory
20
JVM Heap + RocksDB
Kafka Streams
App
CachingKeyValueStore
- ThreadCache context.cache()
Indexes
bloom
filters
block
cache
OS page
cache
memtable
State Store - RocksDB
! These items size
depends on number of
unique keys
JVM Heap
Not controlled by JVM
RocksDB memory details: https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB
put(key, val)
native put(key, val)
How to fix unbounded RocksDB memory usage?
21
Application.java
properties.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG,
”your.package.BoundedMemoryRocksDBConfig");
your.package.BoundedMemoryRocksDBConfig.java
@Override
public void setConfig(..., Options options, ...) {
BlockBasedTableConfig tableConfig = options.tableFormatConfig();
Cache cache = new LRUCache(computeTotalRocksDbMem(), -1, false);
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
options.setWriteBufferManager(writeBufferManager);
...
}
How to fix unbounded RocksDB memory usage?
22
Application.java
properties.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG,
”your.package.BoundedMemoryRocksDBConfig");
your.package.BoundedMemoryRocksDBConfig.java
@Override
public void setConfig(..., Options options, ...) {
BlockBasedTableConfig tableConfig = options.tableFormatConfig();
Cache cache = new LRUCache(computeTotalRocksDbMem(), -1, false);
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
options.setWriteBufferManager(writeBufferManager);
...
}
How to fix unbounded RocksDB memory usage?
23
Application.java
properties.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG,
”your.package.BoundedMemoryRocksDBConfig");
your.package.BoundedMemoryRocksDBConfig.java
@Override
public void setConfig(..., Options options, ...) {
BlockBasedTableConfig tableConfig = options.tableFormatConfig();
Cache cache = new LRUCache(computeTotalRocksDbMem(), -1, false);
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
}
Pay attention to the number of stores and partitions:
https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html#rocksdb
How to compute my
RocksDB memory?
Dynamic memory allocation
25
your.package.BoundedMemoryRocksDBConfig.java
private static long computeTotalRocksDbMem() {
long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT"));
double osPercentage = getEnv("OS_MEMORY_PERCENTAGE"));
long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB"));
long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB"));
long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024;
return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes;
}
Dynamic memory allocation
26
your.package.BoundedMemoryRocksDBConfig.java
private static long computeTotalRocksDbMem() {
long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT"));
double osPercentage = getEnv("OS_MEMORY_PERCENTAGE"));
long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB"));
long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB"));
long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024;
return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes;
}
Dynamic memory allocation
27
About using container props as env vars:
https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/
helm/templates/deployment.yaml
...
env:
- name: CONTAINER_MEMORY_LIMIT
valueFrom:
resourceFieldRef:
containerName: app
resource: limits.memory
...
Dynamic memory allocation
28
your.package.BoundedMemoryRocksDBConfig.java
private static long computeTotalRocksDbMem() {
long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT"));
double osPercentage = getEnv("OS_MEMORY_PERCENTAGE"));
long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB"));
long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB"));
long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024;
return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes;
}
Dynamic memory allocation
29
helm/templates/deployment.yaml
env:
- name: OS_MEMORY_PERCENTAGE
value: "0.1"
- name: OFF_HEAP_SIZE_MB
value: "128"
- name: MAX_HEAP_SIZE_MB
value: "{{ .Values.heapSizeMb }}"
Dynamic memory allocation
30
your.package.BoundedMemoryRocksDBConfig.java
private static long computeTotalRocksDbMem() {
long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT"));
double osPercentage = getEnv("OS_MEMORY_PERCENTAGE"));
long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB"));
long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB"));
long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024;
return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes;
}
Dynamic memory allocation
31
helm/templates/deployment.yaml
env:
- name: OS_MEMORY_PERCENTAGE
value: "0.1"
# computed it based on the jcmd output
- name: OFF_HEAP_SIZE_MB
value: "128"
- name: MAX_HEAP_SIZE_MB
value: "{{ .Values.heapSizeMb }}"
If you would like to analyze non-heap JVM mem
32
1. Make sure you use JDK (not JRE) Docker image
2. Add to JVM args -XX:NativeMemoryTracking=summary
3. Execute command in container shell:
helm/templates/deployment.yaml
command:
- java
args:
- -XX:NativeMemoryTracking=summary
- -jar
- app.jar
bash
kubectl exec pod/your-pod-name -n your-namespace –it -- /bin/bash -c “jcmd 1 VM.native_memory”
Dynamic memory allocation
33
your.package.BoundedMemoryRocksDBConfig.java
private static long computeTotalRocksDbMem() {
long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT"));
double osPercentage = getEnv("OS_MEMORY_PERCENTAGE"));
long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB"));
long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB"));
long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024;
return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes;
}
Dynamic memory allocation
34
helm/templates/deployment.yaml
env:
- name: OFF_HEAP_SIZE_MB
value: "128"
- name: MAX_HEAP_SIZE_MB
value: "{{ .Values.heapSizeMb }}”
args:
- -Xmx{{ .Values.heapSizeMb }}m
- -jar
- app.jar
helm/values.yaml
heapSizeMb: 64
If you would like to profile your app
35
1) helm/templates/deployment.yaml
command:
- java
args:
- -Dcom.sun.management.jmxremote
- -Dcom.sun.management.jmxremote.port=13089
- -Dcom.sun.management.jmxremote.ssl=false
- -Dcom.sun.management.jmxremote.local.only=false
- -Dcom.sun.management.jmxremote.authenticate=false
- -Dcom.sun.management.jmxremote.rmi.port=13089
- -Djava.rmi.server.hostname=localhost
- -jar
- app.jar
ports:
- containerPort: 13089
name: jmx
protocol: TCP
2) kubectl port-forward pod/your-pod-name -n your-namespace 13089:13089
App memory performance (after fix)
36
Taken from grafana
Limitations are not the only way!
37
Links and materials
38
• How JVM analyze memory inside Docker container–
https://merikan.com/2019/04/jvm-in-a-container/#java-10
• How to use jcmd to analyze non-heap memory–
https://www.baeldung.com/native-memory-tracking-in-jvm
• Why use container_memory_working_set_bytes instead of container_memory_usage_bytes
https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-part-3-container-resource-metrics-361c5ee46e6
• How RocksDB store works with Kafka Streams:
https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/
• Linux memory controller for cgroups
https://lwn.net/Articles/432224/
/
Thank you!
39
https://t.me/aserebryanskiy
a.serebrianskiy@gmail.com

More Related Content

What's hot (20)

PDF
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
PDF
Boosting I/O Performance with KVM io_uring
ShapeBlue
 
PPTX
Extending Flink SQL for stream processing use cases
Flink Forward
 
PDF
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
HostedbyConfluent
 
PDF
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
HostedbyConfluent
 
PDF
Common Patterns of Multi Data-Center Architectures with Apache Kafka
confluent
 
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
PDF
Secure Session Management
GuidePoint Security, LLC
 
PPTX
Veeam Solutions for SMB_2022.pptx
Prince Joseph
 
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
PDF
Performance Tuning Oracle Weblogic Server 12c
Ajith Narayanan
 
PPTX
Practical learnings from running thousands of Flink jobs
Flink Forward
 
PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
PPTX
Deep Dive into Apache Kafka
confluent
 
PDF
Kafka 101 and Developer Best Practices
confluent
 
PDF
MongoDB WiredTiger Internals
Norberto Leite
 
PDF
Scalability, Availability & Stability Patterns
Jonas Bonér
 
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Boosting I/O Performance with KVM io_uring
ShapeBlue
 
Extending Flink SQL for stream processing use cases
Flink Forward
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
HostedbyConfluent
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
HostedbyConfluent
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
confluent
 
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
Secure Session Management
GuidePoint Security, LLC
 
Veeam Solutions for SMB_2022.pptx
Prince Joseph
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
Performance Tuning Oracle Weblogic Server 12c
Ajith Narayanan
 
Practical learnings from running thousands of Flink jobs
Flink Forward
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Deep Dive into Apache Kafka
confluent
 
Kafka 101 and Developer Best Practices
confluent
 
MongoDB WiredTiger Internals
Norberto Leite
 
Scalability, Availability & Stability Patterns
Jonas Bonér
 

Similar to What to do if Your Kafka Streams App Gets OOMKilled? with Andrey Serebryanskiy (7)

PDF
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
James Anderson
 
PDF
demystifyingflinkmemoryallocationandtuning-roshannaikuber-191023150305.pdf
SergioBruno21
 
PPTX
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Forward
 
PDF
Preparing Your Kafka Streams Application For Production and Beyond
HostedbyConfluent
 
PPTX
Considerations when deploying Java on Kubernetes
superserch
 
PDF
Lessons Learned Scaling Stateful Kafka Streams Topologies with Ferran Galí i ...
HostedbyConfluent
 
PDF
Optimizing {Java} Application Performance on Kubernetes
Dinakar Guniguntala
 
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
James Anderson
 
demystifyingflinkmemoryallocationandtuning-roshannaikuber-191023150305.pdf
SergioBruno21
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Forward
 
Preparing Your Kafka Streams Application For Production and Beyond
HostedbyConfluent
 
Considerations when deploying Java on Kubernetes
superserch
 
Lessons Learned Scaling Stateful Kafka Streams Topologies with Ferran Galí i ...
HostedbyConfluent
 
Optimizing {Java} Application Performance on Kubernetes
Dinakar Guniguntala
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
July Patch Tuesday
Ivanti
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
July Patch Tuesday
Ivanti
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 

What to do if Your Kafka Streams App Gets OOMKilled? with Andrey Serebryanskiy

  • 1. What to Do if Your Kafka Streams App Gets OOMKilled? Andrey Serebryanskiy
  • 2. { Andrey Serebryanskiy Streaming Platform Owner at Raiffeisen Bank What to Do if Your Kafka Streams App Gets OOMKilled?
  • 3. The problem 3 Kafka Kafka Streams App kubernetes RocksDB
  • 4. What is the problem with this app?
  • 5. Launch with resource limits 5 helm/templates/deployment.yaml ... resources: limits: memory: 256Mi requests: memory: 128Mi command: - java args: - -jar - app.jar ...
  • 6. How to check if it is OOMKilled? 6 kubectl describe pod your-pod-name -n your-namespace Name: your-pod-name ... Containers: app: ... Last State: Terminated Reason: OOMKilled Exit Code: 137
  • 8. Example app 8 Simple Kafka Streams topology Application.java public static void main(String[] args) { var builder = new StreamsBuilder(); var stream = builder.stream(INPUT_TOPIC); builder.addStateStore(Stores.keyValueStoreBuilder( Stores.persistentKeyValueStore(STATE_STORE_NAME)); var persistedStream = stream.transformValues((readOnlyKey, val) -> { ... stateStore.put(readOnlyKey, val); ... }, STATE_STORE_NAME); persistedStream.foreach((key, val) -> ...); var topology = builder.build(); var kafkaStreams = new KafkaStreams(topology); runApp(kafkaStreams); }
  • 9. Example app 9 Simple Kafka Streams topology Application.java public static void main(String[] args) { var builder = new StreamsBuilder(); var stream = builder.stream(INPUT_TOPIC); builder.addStateStore(Stores.keyValueStoreBuilder( Stores.persistentKeyValueStore(STATE_STORE_NAME)); var persistedStream = stream.transformValues((readOnlyKey, val) -> { ... stateStore.put(readOnlyKey, val); ... }, STATE_STORE_NAME); persistedStream.foreach((key, val) -> ...); var topology = builder.build(); var kafkaStreams = new KafkaStreams(topology); runApp(kafkaStreams); }
  • 10. Example app 10 Simple Kafka Streams topology Application.java public static void main(String[] args) { var builder = new StreamsBuilder(); var stream = builder.stream(INPUT_TOPIC); builder.addStateStore(Stores.keyValueStoreBuilder( Stores.persistentKeyValueStore(STATE_STORE_NAME)); var persistedStream = stream.transformValues((readOnlyKey, val) -> { ... stateStore.put(readOnlyKey, val); ... }, STATE_STORE_NAME); persistedStream.foreach((key, val) -> logMessageCount()); var topology = builder.build(); var kafkaStreams = new KafkaStreams(topology); runApp(kafkaStreams); }
  • 11. Example app 11 Simple Kafka Streams topology Application.java public static void main(String[] args) { var builder = new StreamsBuilder(); var stream = builder.stream(INPUT_TOPIC); builder.addStateStore(Stores.keyValueStoreBuilder( Stores.persistentKeyValueStore(STATE_STORE_NAME)); var persistedStream = stream.transformValues((readOnlyKey, val) -> { ... stateStore.put(readOnlyKey, val); ... }, STATE_STORE_NAME); persistedStream.foreach((key, val) -> logMessageCount()); var topology = builder.build(); var kafkaStreams = new KafkaStreams(topology); runApp(kafkaStreams); }
  • 12. Example app 12 Simple Kafka Streams topology Application.java public static void main(String[] args) { ... runApp(kafkaStreams); } Please find full application code here: https://github.com/a-serebryanskiy/kafka-streams-oom-killed
  • 13. So you have your app OOMKilled
  • 14. Launch with resource limits 14 helm/templates/deployment.yaml ... resources: limits: memory: 256Mi requests: memory: 128Mi command: - java args: - -jar - app.jar ...
  • 15. Let’s add heap limits 15 helm/templates/deployment.yaml ... resources: limits: memory: 256Mi requests: memory: 128Mi command: - java args: - -XshowSettings:VM - -XX:MinRAMPercentage=50.0 - -jar - app.jar ...
  • 16. Let’s add heap limits 16 helm/templates/deployment.yaml ... resources: limits: memory: 256Mi requests: memory: 128Mi command: - java args: - -XshowSettings:VM - -XX:MinRAMPercentage=50.0 - -jar - app.jar ... https://www.baeldung.com/java-jvm-parameters-rampercentage VM settings: Max. Heap Size (Estimated): 121.81M Property settings: java.version = 11.0.12 Turns out 50.0 is already a default value
  • 17. App memory performance 17 Taken from grafana container memory limit container memory usage jvm memory 100 mb
  • 19. Kafka Streams app memory 19 JVM Heap + RocksDB Confluent article about Kafka Streams memory tunning: https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html State Store Kafka Streams App put(key, val)
  • 20. Kafka Streams app memory 20 JVM Heap + RocksDB Kafka Streams App CachingKeyValueStore - ThreadCache context.cache() Indexes bloom filters block cache OS page cache memtable State Store - RocksDB ! These items size depends on number of unique keys JVM Heap Not controlled by JVM RocksDB memory details: https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB put(key, val) native put(key, val)
  • 21. How to fix unbounded RocksDB memory usage? 21 Application.java properties.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, ”your.package.BoundedMemoryRocksDBConfig"); your.package.BoundedMemoryRocksDBConfig.java @Override public void setConfig(..., Options options, ...) { BlockBasedTableConfig tableConfig = options.tableFormatConfig(); Cache cache = new LRUCache(computeTotalRocksDbMem(), -1, false); tableConfig.setBlockCache(cache); tableConfig.setCacheIndexAndFilterBlocks(true); options.setWriteBufferManager(writeBufferManager); ... }
  • 22. How to fix unbounded RocksDB memory usage? 22 Application.java properties.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, ”your.package.BoundedMemoryRocksDBConfig"); your.package.BoundedMemoryRocksDBConfig.java @Override public void setConfig(..., Options options, ...) { BlockBasedTableConfig tableConfig = options.tableFormatConfig(); Cache cache = new LRUCache(computeTotalRocksDbMem(), -1, false); tableConfig.setBlockCache(cache); tableConfig.setCacheIndexAndFilterBlocks(true); options.setWriteBufferManager(writeBufferManager); ... }
  • 23. How to fix unbounded RocksDB memory usage? 23 Application.java properties.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, ”your.package.BoundedMemoryRocksDBConfig"); your.package.BoundedMemoryRocksDBConfig.java @Override public void setConfig(..., Options options, ...) { BlockBasedTableConfig tableConfig = options.tableFormatConfig(); Cache cache = new LRUCache(computeTotalRocksDbMem(), -1, false); tableConfig.setBlockCache(cache); tableConfig.setCacheIndexAndFilterBlocks(true); } Pay attention to the number of stores and partitions: https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html#rocksdb
  • 24. How to compute my RocksDB memory?
  • 25. Dynamic memory allocation 25 your.package.BoundedMemoryRocksDBConfig.java private static long computeTotalRocksDbMem() { long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT")); double osPercentage = getEnv("OS_MEMORY_PERCENTAGE")); long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB")); long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB")); long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024; return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes; }
  • 26. Dynamic memory allocation 26 your.package.BoundedMemoryRocksDBConfig.java private static long computeTotalRocksDbMem() { long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT")); double osPercentage = getEnv("OS_MEMORY_PERCENTAGE")); long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB")); long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB")); long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024; return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes; }
  • 27. Dynamic memory allocation 27 About using container props as env vars: https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/ helm/templates/deployment.yaml ... env: - name: CONTAINER_MEMORY_LIMIT valueFrom: resourceFieldRef: containerName: app resource: limits.memory ...
  • 28. Dynamic memory allocation 28 your.package.BoundedMemoryRocksDBConfig.java private static long computeTotalRocksDbMem() { long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT")); double osPercentage = getEnv("OS_MEMORY_PERCENTAGE")); long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB")); long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB")); long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024; return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes; }
  • 29. Dynamic memory allocation 29 helm/templates/deployment.yaml env: - name: OS_MEMORY_PERCENTAGE value: "0.1" - name: OFF_HEAP_SIZE_MB value: "128" - name: MAX_HEAP_SIZE_MB value: "{{ .Values.heapSizeMb }}"
  • 30. Dynamic memory allocation 30 your.package.BoundedMemoryRocksDBConfig.java private static long computeTotalRocksDbMem() { long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT")); double osPercentage = getEnv("OS_MEMORY_PERCENTAGE")); long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB")); long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB")); long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024; return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes; }
  • 31. Dynamic memory allocation 31 helm/templates/deployment.yaml env: - name: OS_MEMORY_PERCENTAGE value: "0.1" # computed it based on the jcmd output - name: OFF_HEAP_SIZE_MB value: "128" - name: MAX_HEAP_SIZE_MB value: "{{ .Values.heapSizeMb }}"
  • 32. If you would like to analyze non-heap JVM mem 32 1. Make sure you use JDK (not JRE) Docker image 2. Add to JVM args -XX:NativeMemoryTracking=summary 3. Execute command in container shell: helm/templates/deployment.yaml command: - java args: - -XX:NativeMemoryTracking=summary - -jar - app.jar bash kubectl exec pod/your-pod-name -n your-namespace –it -- /bin/bash -c “jcmd 1 VM.native_memory”
  • 33. Dynamic memory allocation 33 your.package.BoundedMemoryRocksDBConfig.java private static long computeTotalRocksDbMem() { long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT")); double osPercentage = getEnv("OS_MEMORY_PERCENTAGE")); long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB")); long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB")); long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024; return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes; }
  • 34. Dynamic memory allocation 34 helm/templates/deployment.yaml env: - name: OFF_HEAP_SIZE_MB value: "128" - name: MAX_HEAP_SIZE_MB value: "{{ .Values.heapSizeMb }}” args: - -Xmx{{ .Values.heapSizeMb }}m - -jar - app.jar helm/values.yaml heapSizeMb: 64
  • 35. If you would like to profile your app 35 1) helm/templates/deployment.yaml command: - java args: - -Dcom.sun.management.jmxremote - -Dcom.sun.management.jmxremote.port=13089 - -Dcom.sun.management.jmxremote.ssl=false - -Dcom.sun.management.jmxremote.local.only=false - -Dcom.sun.management.jmxremote.authenticate=false - -Dcom.sun.management.jmxremote.rmi.port=13089 - -Djava.rmi.server.hostname=localhost - -jar - app.jar ports: - containerPort: 13089 name: jmx protocol: TCP 2) kubectl port-forward pod/your-pod-name -n your-namespace 13089:13089
  • 36. App memory performance (after fix) 36 Taken from grafana
  • 37. Limitations are not the only way! 37
  • 38. Links and materials 38 • How JVM analyze memory inside Docker container– https://merikan.com/2019/04/jvm-in-a-container/#java-10 • How to use jcmd to analyze non-heap memory– https://www.baeldung.com/native-memory-tracking-in-jvm • Why use container_memory_working_set_bytes instead of container_memory_usage_bytes https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-part-3-container-resource-metrics-361c5ee46e6 • How RocksDB store works with Kafka Streams: https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/ • Linux memory controller for cgroups https://lwn.net/Articles/432224/