SlideShare a Scribd company logo
Handle Large Messages In
Apache Kafka
Jiangjie (Becket) Qin @ LinkedIn
Kafka Meetup - Feb 23, 2016
What is a “large message” ?
● Kafka has a limit on the maximum size of a single message
○ Enforced on the compressed wrapper message if compression is used
BrokerProducer
{
…
if (message.size > message.max.bytes)
reject!
…
}
RecordTooLargeException
Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
● Good workarounds exist (Reference Based Messaging)
KafkaProducer
Data
Store
Consumer
Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
● Good workarounds exist (Reference Based Messaging)
KafkaProducer
Data
Store
Consumer
data
Ref.
Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
● Good workarounds exist (Reference Based Messaging)
KafkaProducer
Data
Store
Consumer
data
Ref.
Ref.
Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
● Good workarounds exist (Reference Based Messaging)
KafkaProducer
Data
Store
Consumer
data
Ref.
Ref. Ref.
Why does Kafka limit the message size?
● Increase the memory pressure in the broker
● Large messages are expensive to handle and could slow down the brokers.
● A reasonable message size limit can handle vast majority of the use cases.
● Good workarounds exist (Reference Based Messaging)
KafkaProducer
Data
Store
Consumer
data
Ref.
Ref. Ref.
data
Ref.
Reference Based Messaging
● One of our use cases: database replication
○ Unknown maximum row size
○ Strict no data loss
○ Strict message order guarantee
KafkaProducer
Data
Store
Consumer
data
Ref.
Ref. Ref.
data
Ref.
Works fine as long as the durability of the data store can be guaranteed.
Reference Based Messaging
● One of our use cases: database replication
○ Replicates a data store by using another data store....
○ Sporadic large messages
■ Option 1: Send all the messages using reference and take unnecessary overhead.
■ Option 2: Only send large messages using references and live with low storage utilization.
○ Low end to end latency
■ There are more round trips in the system.
■ Need to make sure the data store is fast
KafkaProducer
Data
Store
Consumer
data
Ref.
Ref. Ref.
data
Ref.
In-line Large Message Support
Reference Based Messaging In-line large message support
Operational complexity Two systems to maintain Only maintain Kafka
System stability Depend on :
● The consistency between Kafka
and the external storage
● The durability of external storage
Only depend on Kafka
Cost to serve Kafka + External Storage Only maintain Kafka
End to end latency Depend on the external storage The latency of Kafka
Client complexity Need to deal with envelopes Much more involved (coming soon)
Functional limitations Almost none Some limitations
Our solution - chunk and re-assemble
A normal-sized
message is sent as
a single-segment
message.
Consumer
Client Modules
Kafka brokers
KafkaConsumer<byte[], byte[]>
Producer
MessageAssembler
DeliveredMessageOffsetTracker
LargeMessageBufferPool
MessageSplitter
KafkaProducer<byte[], byte[]>
Compatible interface
with open source Kafka
producer / consumer
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
A closer look at large message handling
● The offset of a large message
● Offset tracking
● Producer callback
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Broker
0: msg0-seg0
Consumer
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Broker
0: msg0-seg0
1: msg1-seg0
Consumer
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Broker
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
Consumer
Cannot deliver msg1 until msg0 is delivered.
The consumer has to buffer the msg1.
Difficult to handle partially sent messages.
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery (Need to buffer all the message segments until the current
large message is complete)
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Broker
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Consumer User
0: msg0
1: msg1
The offset of a large message
● The offset of the first segment?
○ First seen first serve
○ Easy to seek
○ Expensive for in order delivery
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Broker
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Consumer User
0: msg0
1: msg1
seek to 0
seek to 1
The consumer can simply
seek to the message offset.
The offset of a large message
● The offset of the last segment?
○ First completed first serve
○ Needs additional work for seek (more details on this soon)
○ Least memory needed for in order delivery
● We chose offset of the last segment
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Broker
0: msg0-seg0
Consumer User
The offset of a large message
● The offset of the last segment?
○ First completed first serve
○ Needs additional work for seek (more details on this soon)
○ Least memory needed for in order delivery
● We chose offset of the last segment
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Broker
0: msg0-seg0
1: msg1-seg0
Consumer User
The offset of a large message
● The offset of the last segment?
○ First completed first serve
○ Needs additional work for seek (more details on this soon)
○ Least memory needed for in order delivery
● We chose offset of the last segment
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Broker
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
Consumer
2: msg1
Deliver msg1 once it completes.
User
The offset of a large message
● The offset of the last segment?
○ First completed first serve
○ Needs additional work for seek (more details in offset tracking)
○ Least memory needed for in order delivery
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Broker
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
Consumer
2: msg1
3: msg0-seg1
3: msg0
User
The offset of a large message
● We chose offset of the last segment
○ Less memory consumption
○ Better tolerance for partially sent large messages.
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
Broker
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
Consumer
2: msg1
3: msg0-seg1
3: msg0
User
A closer look at large message handling
● The offset of a large message
● Offset tracking
● Producer callback
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Offset tracking
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
User
2: msg1
Offset tracking
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
User
2: msg1
commit( Map{(tp->2)} )
● We cannot commit offset 2 because m0-s0 hasn’t been
delivered to the user.
● We should commit offset 0 so there is no message loss.
Offset tracking
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
User
2: msg1
3: msg0-seg1
3: msg0
Offset tracking
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
User
2: msg1
seek(tp, 2)
● seek to m1-s0, i.e offset 1 instead of offset 2
3: msg0-seg1
3: msg0
Offset tracking
Broker Consumer User
● Safe offset - the offset that can be committed without message loss
● Starting offset - the starting offset of a large message.
Offset tracker map
{
(2 -> start=1, safe=0),
(3 -> start=0, safe=4),
…
}
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
2: msg1
3: msg0-seg1
3: msg0
Offset tracking
● Limitations
○ Consumers can only track the message they have already seen.
■ When the users seek forward, consumer does not check if user is seeking to a message
boundary.
○ Consumers cannot keep track of all the messages they have ever seen.
■ Consumers only track a configured number of recently delivered message for each
partition. e.g. 5,000.
○ After rebalance, the new owner of a partition will not have any tracked message from the
newly assigned partitions.
A closer look at large message handling
● The offset of a large message
● Offset tracking
● Producer callback
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Producer Callback
0: msg0-seg0
1: msg0-seg1
2: msg0-seg2
...
Producer Broker
0: msg0-seg0 All the segments will be sent to the same partition.
{
numSegments=3
ackedSegments=1;
userCallback;
}
Do not fire user callback
Producer Callback
0: msg0-seg0
1: msg0-seg1
2: msg0-seg2
...
Producer Broker
0: msg0-seg0 All the segments will be sent to the same partition.
Do not fire user callback
1: msg0-seg1 {
numSegments=3
ackedSegments=2;
userCallback;
}
Producer Callback
0: msg0-seg0
1: msg0-seg1
2: msg0-seg2
...
{
numSegments=3
ackedSegments=3;
userCallback;
}
Producer Broker
0: msg0-seg0 All the segments will be sent to the same partition.
Fire the user callback
● The offset of the last segment is passed to the user callback
● The first exception received is passed to the user callback
1: msg0-seg1
2: msg0-seg2
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Rebalance and duplicates handling
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer 0
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
User
2: msg1
Consumer rebalance occurred
Note: User has already seen msg1.
Offset tracker map
{
(2 -> start=1, safe=0),
…
}
Rebalance and duplicates handling
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer 0
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
User
2: msg1
Consumer 0 committed offset 0.
Note: User has already seen msg1.
Offset tracker map
{
(2 -> start=1, safe=0),
…
}
Rebalance and duplicates handling
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer 1
0: msg0-seg0
User
New owner consumer 1 resumes reading from msg0-seg0
Rebalance and duplicates handling
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer 1
0: msg0-seg0
User
1: msg1-seg0
Rebalance and duplicates handling
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer 1
0: msg0-seg0
2: msg1-seg1
User
2: msg1
Consumer 1 will deliver msg1 again to the user.
1: msg1-seg0
Duplicate
Rebalance and duplicates handling
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer 0
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
User
2: msg1
1. Consumer rebalance occurred
2. Consumer 0 committed offset 0 with
metadata {delivered=2}
Note: User has already seen msg1.
Offset tracker map
{
(2 -> start=1, safe=0),
…
}
delivered=2
Rebalance and duplicates handling
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer 1
0: msg0-seg0
User
● New owner consumer 1 resumes reading from msg0-seg0
● Consumer 1 receives the committed metadata {delivered=2}
delivered=2
Rebalance and duplicates handling
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer 1
0: msg0-seg0
User
1: msg1-seg0
delivered=2
● New owner consumer 1 resumes reading from msg0-seg0
● Consumer 1 receives the committed metadata {delivered=2}
Rebalance and duplicates handling
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer 1
0: msg0-seg0
2: msg1-seg1
User
● msg1.offset <= delivered
● Consumer 1 will NOT deliver msg1 again to the user
1: msg1-seg0
delivered=2
Rebalance and duplicates handling
0: msg0-seg0
1: msg1-seg0
2: msg1-seg1
3: msg0-seg1
4: msg2-seg0
5: msg3-seg0
...
Broker Consumer 1
0: msg0-seg0
2: msg1-seg1
User
3: msg0
The first message delivered to user will be msg0 whose offset is 3
1: msg1-seg0
delivered=2
3: msg0-seg1
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Memory management
● Producer
○ No material change to memory overhead except splitting and copying the message.
● Consumer side
○ buffer.capacity
■ The users can set maximum bytes to buffer the segments. If buffer is full, consumers
evict the oldest incomplete message.
○ expiration.offset.gap
■ Suppose a message has starting offset X and the consumer is now consuming from
offset Y.
■ The message will be removed from the buffer if Y - X is greater than the expiration.
offset.gap. i.e. “timeout”.
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Performance Overhead
● Potentially additional segment serialization/deserialization cost
○ Default segment serde is cheap
{
// segment fields
public final UUID messageId;
public final int sequenceNumber;
public final int numberOfSegments;
public final int messageSizeInBytes;
public final ByteBuffer payload;
}
Serialization
Segment
serializer
Deserialization
Segment
deserializer
Kafka
Maybe split
Re-assemble
ProducerRecord
Performance Overhead
● Additional memory footprint in consumers
○ Buffer for segments of incomplete large messages
○ Additional memory needed to track the message offsets.
■ 24 bytes per message. It takes 12 MB to track the most recent 5000 messages from 100
partitions.
■ We can choose to only track large messages if users are trustworthy.
Serialization
Segment
serializer
Deserialization
Segment
deserializer
Kafka
Maybe split
Re-assemble
ProducerRecord
A closer look at large message handling
● The offset of a large message
● Producer callback
● Offset tracking
● Rebalance and duplicates handling
● Memory management
● Performance overhead
● Compatibility with existing messages
Compatibility with existing messages
0: msg0 (existing msg)
1: msg1-seg0
2: msg1-seg1
3: msg2-seg0 (single seg msg)
...
Broker Consumer
Segment deserializer
NotLargeMessageSegmentException
Value deserializer
● When consumers see NotLargeMessageSegmentException,they
will assume the message is an existing message and use value
deserializer to handle it.
● Default segment deserializer implementation has handled this.
● In the segment deserializer, user implementation should throw
NotLargeMessageSegmentException
The answer to a question after the meetup
● Does it work for compacted topics?
○ Add suffix “-segmentSeq” to the key
■ It works with a flaw when large messages with the same key do NOT interleave
0: m0(key=”k-0”)
1: m0(key=”k-1”)
6: m1(key=”k-1”)
...
Before compaction
Scenario 1 after
compaction
...
Scenario 2 after
compaction
Note that consumer won’t assemble segments
of m0 with segments of m1 together because
their messageId are different.
5: m1(key=”k-0”)
2: m0(key=”k-2”)
1: m0(key=”k-1”)
6: m1(key=”k-1”)
...
...
5: m1(key=”k-0”)
2: m0(key=”k-2”)
6: m1(key=”k-1”)
...
...
5: m1(key=”k-0”)
2: m0(key=”k-2”) Zombie Segment
The answer to a question after the meetup
● Does it work for compacted topics?
○ Add suffix “-segmentSeq” to the key ()
■ It does not work when large messages with the same key may interleave
0: m0(key=”k0-0”)
1: m1(key=”k0-0”)
2: m1(key=”k0-1”)
3: m0(key=”k0-1”)
...
Before compaction
1: m1(key=”k0-0”)
3: m0(key=”k0-1”)
...
Failure Scenario
(Doesn’t work)
Note that consumer won’t assemble m0-
seg1 and m1-seg0 together because
their messageId are different
Summary
● Reference based messaging works in most cases.
● Sometimes it is handy to have in-line support for large message
○ Sporadic large messages
○ low latency
○ Small number of interleaved large messages
○ Save cost
Acknowledgements
Thanks for the great help and support from
Dong Lin
Joel Koshy
Kartik Paramasivam
Onur Karaman
Yi Pan
LinkedIn Espresso and Datastream team
Q&A
Ad

More Related Content

What's hot (20)

Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
confluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
kafka
kafkakafka
kafka
Amikam Snir
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA 2022 - Making real-time analytics a reality for digital transform...Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Diego Pacheco
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
Adam Kotwasinski
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
Jiangjie Qin
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
confluent
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
emreakis
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
Gleb Kanterov
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
confluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA 2022 - Making real-time analytics a reality for digital transform...Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
Jiangjie Qin
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
confluent
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
emreakis
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
Gleb Kanterov
 

Viewers also liked (9)

HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon
 
Identity Federation Patterns with WSO2 Identity Server​
Identity Federation Patterns with WSO2 Identity Server​Identity Federation Patterns with WSO2 Identity Server​
Identity Federation Patterns with WSO2 Identity Server​
WSO2
 
[WSO2Con EU 2017] Keynote: Mobile Identity in the Digital Economy
[WSO2Con EU 2017] Keynote: Mobile Identity in the Digital Economy[WSO2Con EU 2017] Keynote: Mobile Identity in the Digital Economy
[WSO2Con EU 2017] Keynote: Mobile Identity in the Digital Economy
WSO2
 
Leveraging federation capabilities of Identity Server for API gateway
Leveraging federation capabilities of Identity Server for API gatewayLeveraging federation capabilities of Identity Server for API gateway
Leveraging federation capabilities of Identity Server for API gateway
WSO2
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
Cynthia Saracco
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
 
[WSO2Con EU 2017] The Win-Win-Win of Water Authority HHNK
[WSO2Con EU 2017] The Win-Win-Win of Water Authority HHNK[WSO2Con EU 2017] The Win-Win-Win of Water Authority HHNK
[WSO2Con EU 2017] The Win-Win-Win of Water Authority HHNK
WSO2
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon
 
Identity Federation Patterns with WSO2 Identity Server​
Identity Federation Patterns with WSO2 Identity Server​Identity Federation Patterns with WSO2 Identity Server​
Identity Federation Patterns with WSO2 Identity Server​
WSO2
 
[WSO2Con EU 2017] Keynote: Mobile Identity in the Digital Economy
[WSO2Con EU 2017] Keynote: Mobile Identity in the Digital Economy[WSO2Con EU 2017] Keynote: Mobile Identity in the Digital Economy
[WSO2Con EU 2017] Keynote: Mobile Identity in the Digital Economy
WSO2
 
Leveraging federation capabilities of Identity Server for API gateway
Leveraging federation capabilities of Identity Server for API gatewayLeveraging federation capabilities of Identity Server for API gateway
Leveraging federation capabilities of Identity Server for API gateway
WSO2
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
Cynthia Saracco
 
[WSO2Con EU 2017] The Win-Win-Win of Water Authority HHNK
[WSO2Con EU 2017] The Win-Win-Win of Water Authority HHNK[WSO2Con EU 2017] The Win-Win-Win of Water Authority HHNK
[WSO2Con EU 2017] The Win-Win-Win of Water Authority HHNK
WSO2
 
Ad

Similar to Handle Large Messages In Apache Kafka (20)

FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Otávio Carvalho
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
vishnu rao
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
Mayank Bansal
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
DataWorks Summit
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
HostedbyConfluent
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
Avinash Ramineni
 
Event driven architectures with Kinesis
Event driven architectures with KinesisEvent driven architectures with Kinesis
Event driven architectures with Kinesis
Mark Harrison
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Flink Forward
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
Knoldus Inc.
 
A Beginner’s Guide to Kafka Performance in Cloud Environments with Steffen Ha...
A Beginner’s Guide to Kafka Performance in Cloud Environments with Steffen Ha...A Beginner’s Guide to Kafka Performance in Cloud Environments with Steffen Ha...
A Beginner’s Guide to Kafka Performance in Cloud Environments with Steffen Ha...
HostedbyConfluent
 
Stateful stream processing with kafka and samza
Stateful stream processing with kafka and samzaStateful stream processing with kafka and samza
Stateful stream processing with kafka and samza
George Li
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Saroj Panyasrivanit
 
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATSKubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
NATS
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Otávio Carvalho
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
vishnu rao
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
Mayank Bansal
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
DataWorks Summit
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
HostedbyConfluent
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
Avinash Ramineni
 
Event driven architectures with Kinesis
Event driven architectures with KinesisEvent driven architectures with Kinesis
Event driven architectures with Kinesis
Mark Harrison
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Flink Forward
 
A Beginner’s Guide to Kafka Performance in Cloud Environments with Steffen Ha...
A Beginner’s Guide to Kafka Performance in Cloud Environments with Steffen Ha...A Beginner’s Guide to Kafka Performance in Cloud Environments with Steffen Ha...
A Beginner’s Guide to Kafka Performance in Cloud Environments with Steffen Ha...
HostedbyConfluent
 
Stateful stream processing with kafka and samza
Stateful stream processing with kafka and samzaStateful stream processing with kafka and samza
Stateful stream processing with kafka and samza
George Li
 
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATSKubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
NATS
 
Ad

Recently uploaded (20)

Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 

Handle Large Messages In Apache Kafka

  • 1. Handle Large Messages In Apache Kafka Jiangjie (Becket) Qin @ LinkedIn Kafka Meetup - Feb 23, 2016
  • 2. What is a “large message” ? ● Kafka has a limit on the maximum size of a single message ○ Enforced on the compressed wrapper message if compression is used BrokerProducer { … if (message.size > message.max.bytes) reject! … } RecordTooLargeException
  • 3. Why does Kafka limit the message size? ● Increase the memory pressure in the broker ● Large messages are expensive to handle and could slow down the brokers. ● A reasonable message size limit can handle vast majority of the use cases.
  • 4. Why does Kafka limit the message size? ● Increase the memory pressure in the broker ● Large messages are expensive to handle and could slow down the brokers. ● A reasonable message size limit can handle vast majority of the use cases. ● Good workarounds exist (Reference Based Messaging) KafkaProducer Data Store Consumer
  • 5. Why does Kafka limit the message size? ● Increase the memory pressure in the broker ● Large messages are expensive to handle and could slow down the brokers. ● A reasonable message size limit can handle vast majority of the use cases. ● Good workarounds exist (Reference Based Messaging) KafkaProducer Data Store Consumer data Ref.
  • 6. Why does Kafka limit the message size? ● Increase the memory pressure in the broker ● Large messages are expensive to handle and could slow down the brokers. ● A reasonable message size limit can handle vast majority of the use cases. ● Good workarounds exist (Reference Based Messaging) KafkaProducer Data Store Consumer data Ref. Ref.
  • 7. Why does Kafka limit the message size? ● Increase the memory pressure in the broker ● Large messages are expensive to handle and could slow down the brokers. ● A reasonable message size limit can handle vast majority of the use cases. ● Good workarounds exist (Reference Based Messaging) KafkaProducer Data Store Consumer data Ref. Ref. Ref.
  • 8. Why does Kafka limit the message size? ● Increase the memory pressure in the broker ● Large messages are expensive to handle and could slow down the brokers. ● A reasonable message size limit can handle vast majority of the use cases. ● Good workarounds exist (Reference Based Messaging) KafkaProducer Data Store Consumer data Ref. Ref. Ref. data Ref.
  • 9. Reference Based Messaging ● One of our use cases: database replication ○ Unknown maximum row size ○ Strict no data loss ○ Strict message order guarantee KafkaProducer Data Store Consumer data Ref. Ref. Ref. data Ref. Works fine as long as the durability of the data store can be guaranteed.
  • 10. Reference Based Messaging ● One of our use cases: database replication ○ Replicates a data store by using another data store.... ○ Sporadic large messages ■ Option 1: Send all the messages using reference and take unnecessary overhead. ■ Option 2: Only send large messages using references and live with low storage utilization. ○ Low end to end latency ■ There are more round trips in the system. ■ Need to make sure the data store is fast KafkaProducer Data Store Consumer data Ref. Ref. Ref. data Ref.
  • 11. In-line Large Message Support Reference Based Messaging In-line large message support Operational complexity Two systems to maintain Only maintain Kafka System stability Depend on : ● The consistency between Kafka and the external storage ● The durability of external storage Only depend on Kafka Cost to serve Kafka + External Storage Only maintain Kafka End to end latency Depend on the external storage The latency of Kafka Client complexity Need to deal with envelopes Much more involved (coming soon) Functional limitations Almost none Some limitations
  • 12. Our solution - chunk and re-assemble A normal-sized message is sent as a single-segment message.
  • 13. Consumer Client Modules Kafka brokers KafkaConsumer<byte[], byte[]> Producer MessageAssembler DeliveredMessageOffsetTracker LargeMessageBufferPool MessageSplitter KafkaProducer<byte[], byte[]> Compatible interface with open source Kafka producer / consumer
  • 14. A closer look at large message handling ● The offset of a large message ● Producer callback ● Offset tracking ● Rebalance and duplicates handling ● Memory management ● Performance overhead ● Compatibility with existing messages
  • 15. A closer look at large message handling ● The offset of a large message ● Offset tracking ● Producer callback ● Rebalance and duplicates handling ● Memory management ● Performance overhead ● Compatibility with existing messages
  • 16. The offset of a large message ● The offset of the first segment? ○ First seen first serve ○ Easy to seek ○ Expensive for in order delivery 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 Broker 0: msg0-seg0 Consumer
  • 17. The offset of a large message ● The offset of the first segment? ○ First seen first serve ○ Easy to seek ○ Expensive for in order delivery 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 Broker 0: msg0-seg0 1: msg1-seg0 Consumer
  • 18. The offset of a large message ● The offset of the first segment? ○ First seen first serve ○ Easy to seek ○ Expensive for in order delivery 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 Broker 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 Consumer Cannot deliver msg1 until msg0 is delivered. The consumer has to buffer the msg1. Difficult to handle partially sent messages.
  • 19. The offset of a large message ● The offset of the first segment? ○ First seen first serve ○ Easy to seek ○ Expensive for in order delivery (Need to buffer all the message segments until the current large message is complete) 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 Broker 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 Consumer User 0: msg0 1: msg1
  • 20. The offset of a large message ● The offset of the first segment? ○ First seen first serve ○ Easy to seek ○ Expensive for in order delivery 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 Broker 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 Consumer User 0: msg0 1: msg1 seek to 0 seek to 1 The consumer can simply seek to the message offset.
  • 21. The offset of a large message ● The offset of the last segment? ○ First completed first serve ○ Needs additional work for seek (more details on this soon) ○ Least memory needed for in order delivery ● We chose offset of the last segment 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 Broker 0: msg0-seg0 Consumer User
  • 22. The offset of a large message ● The offset of the last segment? ○ First completed first serve ○ Needs additional work for seek (more details on this soon) ○ Least memory needed for in order delivery ● We chose offset of the last segment 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 Broker 0: msg0-seg0 1: msg1-seg0 Consumer User
  • 23. The offset of a large message ● The offset of the last segment? ○ First completed first serve ○ Needs additional work for seek (more details on this soon) ○ Least memory needed for in order delivery ● We chose offset of the last segment 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 Broker 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 Consumer 2: msg1 Deliver msg1 once it completes. User
  • 24. The offset of a large message ● The offset of the last segment? ○ First completed first serve ○ Needs additional work for seek (more details in offset tracking) ○ Least memory needed for in order delivery 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 Broker 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 Consumer 2: msg1 3: msg0-seg1 3: msg0 User
  • 25. The offset of a large message ● We chose offset of the last segment ○ Less memory consumption ○ Better tolerance for partially sent large messages. 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 Broker 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 Consumer 2: msg1 3: msg0-seg1 3: msg0 User
  • 26. A closer look at large message handling ● The offset of a large message ● Offset tracking ● Producer callback ● Rebalance and duplicates handling ● Memory management ● Performance overhead ● Compatibility with existing messages
  • 27. Offset tracking 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 User 2: msg1
  • 28. Offset tracking 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 User 2: msg1 commit( Map{(tp->2)} ) ● We cannot commit offset 2 because m0-s0 hasn’t been delivered to the user. ● We should commit offset 0 so there is no message loss.
  • 29. Offset tracking 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 User 2: msg1 3: msg0-seg1 3: msg0
  • 30. Offset tracking 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 User 2: msg1 seek(tp, 2) ● seek to m1-s0, i.e offset 1 instead of offset 2 3: msg0-seg1 3: msg0
  • 31. Offset tracking Broker Consumer User ● Safe offset - the offset that can be committed without message loss ● Starting offset - the starting offset of a large message. Offset tracker map { (2 -> start=1, safe=0), (3 -> start=0, safe=4), … } 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 2: msg1 3: msg0-seg1 3: msg0
  • 32. Offset tracking ● Limitations ○ Consumers can only track the message they have already seen. ■ When the users seek forward, consumer does not check if user is seeking to a message boundary. ○ Consumers cannot keep track of all the messages they have ever seen. ■ Consumers only track a configured number of recently delivered message for each partition. e.g. 5,000. ○ After rebalance, the new owner of a partition will not have any tracked message from the newly assigned partitions.
  • 33. A closer look at large message handling ● The offset of a large message ● Offset tracking ● Producer callback ● Rebalance and duplicates handling ● Memory management ● Performance overhead ● Compatibility with existing messages
  • 34. Producer Callback 0: msg0-seg0 1: msg0-seg1 2: msg0-seg2 ... Producer Broker 0: msg0-seg0 All the segments will be sent to the same partition. { numSegments=3 ackedSegments=1; userCallback; } Do not fire user callback
  • 35. Producer Callback 0: msg0-seg0 1: msg0-seg1 2: msg0-seg2 ... Producer Broker 0: msg0-seg0 All the segments will be sent to the same partition. Do not fire user callback 1: msg0-seg1 { numSegments=3 ackedSegments=2; userCallback; }
  • 36. Producer Callback 0: msg0-seg0 1: msg0-seg1 2: msg0-seg2 ... { numSegments=3 ackedSegments=3; userCallback; } Producer Broker 0: msg0-seg0 All the segments will be sent to the same partition. Fire the user callback ● The offset of the last segment is passed to the user callback ● The first exception received is passed to the user callback 1: msg0-seg1 2: msg0-seg2
  • 37. A closer look at large message handling ● The offset of a large message ● Producer callback ● Offset tracking ● Rebalance and duplicates handling ● Memory management ● Performance overhead ● Compatibility with existing messages
  • 38. Rebalance and duplicates handling 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 0 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 User 2: msg1 Consumer rebalance occurred Note: User has already seen msg1. Offset tracker map { (2 -> start=1, safe=0), … }
  • 39. Rebalance and duplicates handling 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 0 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 User 2: msg1 Consumer 0 committed offset 0. Note: User has already seen msg1. Offset tracker map { (2 -> start=1, safe=0), … }
  • 40. Rebalance and duplicates handling 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 1 0: msg0-seg0 User New owner consumer 1 resumes reading from msg0-seg0
  • 41. Rebalance and duplicates handling 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 1 0: msg0-seg0 User 1: msg1-seg0
  • 42. Rebalance and duplicates handling 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 1 0: msg0-seg0 2: msg1-seg1 User 2: msg1 Consumer 1 will deliver msg1 again to the user. 1: msg1-seg0 Duplicate
  • 43. Rebalance and duplicates handling 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 0 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 User 2: msg1 1. Consumer rebalance occurred 2. Consumer 0 committed offset 0 with metadata {delivered=2} Note: User has already seen msg1. Offset tracker map { (2 -> start=1, safe=0), … } delivered=2
  • 44. Rebalance and duplicates handling 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 1 0: msg0-seg0 User ● New owner consumer 1 resumes reading from msg0-seg0 ● Consumer 1 receives the committed metadata {delivered=2} delivered=2
  • 45. Rebalance and duplicates handling 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 1 0: msg0-seg0 User 1: msg1-seg0 delivered=2 ● New owner consumer 1 resumes reading from msg0-seg0 ● Consumer 1 receives the committed metadata {delivered=2}
  • 46. Rebalance and duplicates handling 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 1 0: msg0-seg0 2: msg1-seg1 User ● msg1.offset <= delivered ● Consumer 1 will NOT deliver msg1 again to the user 1: msg1-seg0 delivered=2
  • 47. Rebalance and duplicates handling 0: msg0-seg0 1: msg1-seg0 2: msg1-seg1 3: msg0-seg1 4: msg2-seg0 5: msg3-seg0 ... Broker Consumer 1 0: msg0-seg0 2: msg1-seg1 User 3: msg0 The first message delivered to user will be msg0 whose offset is 3 1: msg1-seg0 delivered=2 3: msg0-seg1
  • 48. A closer look at large message handling ● The offset of a large message ● Producer callback ● Offset tracking ● Rebalance and duplicates handling ● Memory management ● Performance overhead ● Compatibility with existing messages
  • 49. Memory management ● Producer ○ No material change to memory overhead except splitting and copying the message. ● Consumer side ○ buffer.capacity ■ The users can set maximum bytes to buffer the segments. If buffer is full, consumers evict the oldest incomplete message. ○ expiration.offset.gap ■ Suppose a message has starting offset X and the consumer is now consuming from offset Y. ■ The message will be removed from the buffer if Y - X is greater than the expiration. offset.gap. i.e. “timeout”.
  • 50. A closer look at large message handling ● The offset of a large message ● Producer callback ● Offset tracking ● Rebalance and duplicates handling ● Memory management ● Performance overhead ● Compatibility with existing messages
  • 51. Performance Overhead ● Potentially additional segment serialization/deserialization cost ○ Default segment serde is cheap { // segment fields public final UUID messageId; public final int sequenceNumber; public final int numberOfSegments; public final int messageSizeInBytes; public final ByteBuffer payload; } Serialization Segment serializer Deserialization Segment deserializer Kafka Maybe split Re-assemble ProducerRecord
  • 52. Performance Overhead ● Additional memory footprint in consumers ○ Buffer for segments of incomplete large messages ○ Additional memory needed to track the message offsets. ■ 24 bytes per message. It takes 12 MB to track the most recent 5000 messages from 100 partitions. ■ We can choose to only track large messages if users are trustworthy. Serialization Segment serializer Deserialization Segment deserializer Kafka Maybe split Re-assemble ProducerRecord
  • 53. A closer look at large message handling ● The offset of a large message ● Producer callback ● Offset tracking ● Rebalance and duplicates handling ● Memory management ● Performance overhead ● Compatibility with existing messages
  • 54. Compatibility with existing messages 0: msg0 (existing msg) 1: msg1-seg0 2: msg1-seg1 3: msg2-seg0 (single seg msg) ... Broker Consumer Segment deserializer NotLargeMessageSegmentException Value deserializer ● When consumers see NotLargeMessageSegmentException,they will assume the message is an existing message and use value deserializer to handle it. ● Default segment deserializer implementation has handled this. ● In the segment deserializer, user implementation should throw NotLargeMessageSegmentException
  • 55. The answer to a question after the meetup ● Does it work for compacted topics? ○ Add suffix “-segmentSeq” to the key ■ It works with a flaw when large messages with the same key do NOT interleave 0: m0(key=”k-0”) 1: m0(key=”k-1”) 6: m1(key=”k-1”) ... Before compaction Scenario 1 after compaction ... Scenario 2 after compaction Note that consumer won’t assemble segments of m0 with segments of m1 together because their messageId are different. 5: m1(key=”k-0”) 2: m0(key=”k-2”) 1: m0(key=”k-1”) 6: m1(key=”k-1”) ... ... 5: m1(key=”k-0”) 2: m0(key=”k-2”) 6: m1(key=”k-1”) ... ... 5: m1(key=”k-0”) 2: m0(key=”k-2”) Zombie Segment
  • 56. The answer to a question after the meetup ● Does it work for compacted topics? ○ Add suffix “-segmentSeq” to the key () ■ It does not work when large messages with the same key may interleave 0: m0(key=”k0-0”) 1: m1(key=”k0-0”) 2: m1(key=”k0-1”) 3: m0(key=”k0-1”) ... Before compaction 1: m1(key=”k0-0”) 3: m0(key=”k0-1”) ... Failure Scenario (Doesn’t work) Note that consumer won’t assemble m0- seg1 and m1-seg0 together because their messageId are different
  • 57. Summary ● Reference based messaging works in most cases. ● Sometimes it is handy to have in-line support for large message ○ Sporadic large messages ○ low latency ○ Small number of interleaved large messages ○ Save cost
  • 58. Acknowledgements Thanks for the great help and support from Dong Lin Joel Koshy Kartik Paramasivam Onur Karaman Yi Pan LinkedIn Espresso and Datastream team
  • 59. Q&A