SlideShare a Scribd company logo
1
Why is My Stream Processing Job Slow?
Xavier Léauté, Software Engineer
Gwen Shapira, Principal Data Architect
2
Kafka 101
Distributed
Scalable
Fault-Tolerant
Partitioned + Replicated Log
Ordering guarantees
Consumers advance independently
Exactly-once delivery
Transactional commits
What people think of Stream Monitoring 3
What our typical experience is
4
Confidential 5
Real Customer Experiences
Confidential 5
Real Customer Experiences
Client Side Broken Streaming Job / App
Confidential 5
Real Customer Experiences
Client Side Broken Streaming Job / App
End-to-End Slow Replication
Your Kafka stream job stopped
humming… now what?
6
Confidential 7
What we check
Consumer Lag
Partition Assignment
Partition Skew
Client Logs
GC Log
Metrics
Request Latencies
Commit Rates
Group Rebalancing
Basic Tuning
Batch Sizes
Commit Rate
Application Profiling
8
The Newbie - During an incident…
GC Logs? Metrics?

How do I get those?
I’ll just change some configs
and reboot everything.
9
Consumer Lag
Wait for me!
10
Bad Capacity Allocation
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group
fast-data-reader
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID
fast-data 1 8661694 8703404 41710 myapp-1
fast-data 3 8577975 8616490 38515 myapp-2
fast-data 0 4902354 8741872 3839518 myapp-3
fast-data 2 4922614 8621757 3699143 myapp-3
10
Bad Capacity Allocation
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group
fast-data-reader
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID
fast-data 1 8661694 8703404 41710 myapp-1
fast-data 3 8577975 8616490 38515 myapp-2
fast-data 0 4902354 8741872 3839518 myapp-3
fast-data 2 4922614 8621757 3699143 myapp-3
10
Bad Capacity Allocation
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group
fast-data-reader
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID
fast-data 1 8661694 8703404 41710 myapp-1
fast-data 3 8577975 8616490 38515 myapp-2
fast-data 0 4902354 8741872 3839518 myapp-3
fast-data 2 4922614 8621757 3699143 myapp-3
11
Watch for Partition Skew
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group
fast-data-reader
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID
fast-data 1 8661694 8703404 41710 myapp-1
fast-data 3 8577975 8616490 38515 myapp-2
fast-data 0 4902354 8741872 3839518 myapp-3
fast-data 2 4922614 8621757 3699143 myapp-3
11
Watch for Partition Skew
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group
fast-data-reader
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID
fast-data 1 8661694 8703404 41710 myapp-1
fast-data 3 8577975 8616490 38515 myapp-2
fast-data 0 4902354 8741872 3839518 myapp-3
fast-data 2 4922614 8621757 3699143 myapp-3
12
Not all partitions are created equal
Important for
Keyed topics
Custom partitioned topics
Early warning signs
some partitions lagging
uneven CPU / Network usage
Typical cause
skewed key distribution in your data
bad joins (null keys)
imbalance across brokers
13
Clients have metrics too!
Start with the basics GC / CPU / Network
General Slowness
Consumer or Producer Side?
Global Request Latencies
Some partitions still lagging
Per Broker metrics (bad node / network)
Per Topic metrics (data / tuning)
Buffer Size
Offset Commit
14
Turn up the log level
The logs took too
much space, so we
deleted them.
15
Time for some profiling
https://github.com/jvm-profiling-tools/async-profiler
https://github.com/brendangregg/FlameGraph
./profiler.sh -d 30 -f flamegraph.svg <pid>
To impress your coworkers
https://github.com/Netflix/flamescope
16
Here’s where your CPU cycles went
% CPU Time
Stack
16
Here’s where your CPU cycles went
GC
% CPU Time
Stack
16
Here’s where your CPU cycles went
RocksDB
% CPU Time
Stack
16
Here’s where your CPU cycles went
Kafka poll() loop
% CPU Time
Stack
16
Here’s where your CPU cycles went
Actual Processing Time
% CPU Time
Stack
17
Spark Streaming Clickstream Example (using Kafka)
18
Spark Streaming Clickstream Example (using Kafka)
18
Spark Streaming Clickstream Example (using Kafka)
Scheduler
Event Loop
18
Spark Streaming Clickstream Example (using Kafka)
Shuffle Writes
Scheduler
Event Loop
18
Spark Streaming Clickstream Example (using Kafka)
30% deserialization
Shuffle Writes
Scheduler
Event Loop
18
Spark Streaming Clickstream Example (using Kafka)
30% deserialization
Shuffle Writes
Scheduler
Event Loop
Read from Kafka
& Processing
19
Maybe it’s your code
20
Let’s commit, just to be safe, right?
Common beginner mistake
Commit only as needed
keep recovery short
maximize throughput
Metrics to validate
commit-rate
commit-latency-avg
MESSAGES
COMMIT
MESSAGES
21
Right-size your batches
21
Right-size your batches
Bigger Batches
increase throughput
improve compression
21
Right-size your batches
Bigger Batches
increase throughput
improve compression
Small enough (<< 10MB) to keep GC low
21
Right-size your batches
Bigger Batches
increase throughput
improve compression
Small enough (<< 10MB) to keep GC low
batch.size + linger.ms
21
Right-size your batches
Bigger Batches
increase throughput
improve compression
Small enough (<< 10MB) to keep GC low
batch.size + linger.ms
don’t forget!
21
Right-size your batches
Bigger Batches
increase throughput
improve compression
Small enough (<< 10MB) to keep GC low
batch.size + linger.ms
Watch
request-rate
request-latency-avg
compression-rate
don’t forget!
22
My app keeps rebalancing
Symptoms
low throughput
high network chatter
consumer logs galore
no progress
hanging
23
Kafka Consumer Group Rebalancing 101
Consumer A
Consumer B
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
23
Kafka Consumer Group Rebalancing 101
Consumer A
Consumer B
Consumer C
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Hi!
23
Kafka Consumer Group Rebalancing 101
Consumer A
Consumer B
Consumer C
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Join Group
Hi!
23
Kafka Consumer Group Rebalancing 101
Consumer A
Consumer B
Consumer C
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Join Group
Hi!
23
Kafka Consumer Group Rebalancing 101
Consumer A
Consumer B
Consumer C
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Join Group
Hi!
23
Kafka Consumer Group Rebalancing 101
Consumer A
Consumer B
Consumer C
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Join Response
23
Kafka Consumer Group Rebalancing 101
Consumer A
Consumer B
Consumer C
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Join Response
23
Kafka Consumer Group Rebalancing 101
Consumer A
Consumer B
Consumer C
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Sync Group
23
Kafka Consumer Group Rebalancing 101
Consumer A
Consumer B
Consumer C
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Sync Response
23
Kafka Consumer Group Rebalancing 101
Consumer A
Consumer B
Consumer C
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
New Assignment
24
Restoring a Happy Balance
Timing Issues
long GC pauses (tens of seconds)
infrequent calls to poll()
timeouts too short?
flaky network
1 bad machine affects the entire group
Watch
join-rate
sync-rate
25
Competent Users
• Monitor Consumer Lag
• Lookout for Partition Skew
• Commit Offsets Sparingly
• Collect Logs
• Understand how to tune Batch Sizes
26
Kafka Pros
• Watch Group Partition Assignment
• Monitor Client Metrics
• Understand Consumer Rebalancing
• Profile their applications
• Distinguish Client/App/Broker problems
Replication Everything is Slow
27
28
Famous last words…
“You just consume, and
produce. How hard
can this be?”
29
Famous last words…
“We have a disaster in our
main cluster. Can we fail over
to secondary? We can’t lose
more than 7 seconds of data.”
30
Monitor Replication Lag - In messages
31
Monitor Replication Lag - or in seconds…
Screenshot of replicator streams monitoring
Confidential 32
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
Buffer
Confidential 32
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
Buffer
Confidential 32
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
batch-size-avg

batch-size-max
Buffer
Confidential 32
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
batch-size-avg

batch-size-max
record-retry-rate

record-error-rate

Buffer
Confidential 32
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
batch-size-avg

batch-size-max
record-retry-rate

record-error-rate

waiting-threads

bufferpool-wait-time
Buffer
Confidential 32
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
batch-size-avg

batch-size-max
record-retry-rate

record-error-rate

waiting-threads

bufferpool-wait-time
io-ratio

io-wait-ratio

byte-consumed-rate
Buffer
Confidential 32
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
batch-size-avg

batch-size-max
record-retry-rate

record-error-rate

waiting-threads

bufferpool-wait-time
io-ratio

io-wait-ratio

byte-consumed-rate
Buffer
fetch-size-avg

fetch-size-max

fetch-rate
Confidential 32
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
batch-size-avg

batch-size-max
record-retry-rate

record-error-rate

waiting-threads

bufferpool-wait-time
io-ratio

io-wait-ratio

byte-consumed-rate
Buffer
fetch-size-avg

fetch-size-max

fetch-rate
record-max-lag
Confidential 33
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
Buffer
Confidential 33
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
network or
destination kafka
performance
Buffer
Confidential 33
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
network or
destination kafka
performance
increase
batch.size
Buffer
Confidential 33
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
network or
destination kafka
performance
increase
batch.size
destination kafka
issues
Buffer
Confidential 33
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
network or
destination kafka
performance
increase
batch.size
destination kafka
issues
network or origin
kafka performance
Buffer
Confidential 33
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
network or
destination kafka
performance
increase
batch.size
destination kafka
issues
network or origin
kafka performance
Buffer
fetch.max.bytes

fetch.min.bytes

fetch.max.wait
34
Network Tuning
• WAN has high latency. We deal with it.
• Compute buffer size to match:  https://www.switch.ch/network/tools/tcp_throughput/
• send.buffer.bytes and receive.buffer.bytes on producer, consumer, brokers
• OS tuning: https://wwwx.cs.unc.edu/~sparkst/howto/network_tuning.php 

net.core.rmem_default, net.core.rmem_max, net.core.wmem_default,
net.core.wmem_max
• Enable logging to check if this had any effect:
log4j.logger.org.apache.kafka.common.network.Selector=DEBUG
• Additional tips in our docs
35
Competent users
• Monitor consumer lag
• Add processes when things are slow
• Automate deployment
36
Kafka Pros
• Monitor time lag
• Collect client metrics
• Knows which side to blame
• Know which configs to tune
• Tunes the network over the WAN
Resources and Next Steps
https://github.com/confluentinc/cp-demo
https://www.confluent.io/download/
https://slackpass.io/confluentcommunity
https://www.confluent.io/blog
Thank you!
@gwenshap

gwen@confluent.io
@xvrl

xavier@confluent.io
Ad

More Related Content

What's hot (20)

Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
Aljoscha Krettek
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
Dan McKinley
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
confluent
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
colorant
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
HostedbyConfluent
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Flink Forward
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
Slim Baltagi
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
SeongJae Park
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
Netronome
 
Reigning in Protobuf with David Navalho and Graham Stirling | Kafka Summit Lo...
Reigning in Protobuf with David Navalho and Graham Stirling | Kafka Summit Lo...Reigning in Protobuf with David Navalho and Graham Stirling | Kafka Summit Lo...
Reigning in Protobuf with David Navalho and Graham Stirling | Kafka Summit Lo...
HostedbyConfluent
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
confluent
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
Aljoscha Krettek
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
Dan McKinley
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
confluent
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
colorant
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
HostedbyConfluent
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Flink Forward
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
Slim Baltagi
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
SeongJae Park
 
Reigning in Protobuf with David Navalho and Graham Stirling | Kafka Summit Lo...
Reigning in Protobuf with David Navalho and Graham Stirling | Kafka Summit Lo...Reigning in Protobuf with David Navalho and Graham Stirling | Kafka Summit Lo...
Reigning in Protobuf with David Navalho and Graham Stirling | Kafka Summit Lo...
HostedbyConfluent
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
confluent
 

Similar to Why is My Stream Processing Job Slow? with Xavier Leaute (20)

Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
Stephen Rose
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
HostedbyConfluent
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
J On The Beach
 
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
HostedbyConfluent
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
Peter Lawrey
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Knoldus Inc.
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
TechWell
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
confluent
 
Building Robust, Adaptive Streaming Apps with Spark Streaming
Building Robust, Adaptive Streaming Apps with Spark StreamingBuilding Robust, Adaptive Streaming Apps with Spark Streaming
Building Robust, Adaptive Streaming Apps with Spark Streaming
Databricks
 
Autoscaling Confluent Cloud: Should We? How Would We?
Autoscaling Confluent Cloud: Should We? How Would We?Autoscaling Confluent Cloud: Should We? How Would We?
Autoscaling Confluent Cloud: Should We? How Would We?
HostedbyConfluent
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
Andreas Grabner
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
Araf Karsh Hamid
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Ricardo Bravo
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
HostedbyConfluent
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsWill it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Navina Ramesh
 
Etl, esb, mq? no! es Apache Kafka®
Etl, esb, mq?  no! es Apache Kafka®Etl, esb, mq?  no! es Apache Kafka®
Etl, esb, mq? no! es Apache Kafka®
confluent
 
101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)
Henning Spjelkavik
 
Tokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdfTokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdf
ssuser2ae721
 
Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
VMware Tanzu
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
Stephen Rose
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
HostedbyConfluent
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
J On The Beach
 
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
HostedbyConfluent
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
Peter Lawrey
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Knoldus Inc.
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
TechWell
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
confluent
 
Building Robust, Adaptive Streaming Apps with Spark Streaming
Building Robust, Adaptive Streaming Apps with Spark StreamingBuilding Robust, Adaptive Streaming Apps with Spark Streaming
Building Robust, Adaptive Streaming Apps with Spark Streaming
Databricks
 
Autoscaling Confluent Cloud: Should We? How Would We?
Autoscaling Confluent Cloud: Should We? How Would We?Autoscaling Confluent Cloud: Should We? How Would We?
Autoscaling Confluent Cloud: Should We? How Would We?
HostedbyConfluent
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
Andreas Grabner
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
Araf Karsh Hamid
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Ricardo Bravo
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
HostedbyConfluent
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsWill it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Navina Ramesh
 
Etl, esb, mq? no! es Apache Kafka®
Etl, esb, mq?  no! es Apache Kafka®Etl, esb, mq?  no! es Apache Kafka®
Etl, esb, mq? no! es Apache Kafka®
confluent
 
101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)
Henning Spjelkavik
 
Tokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdfTokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdf
ssuser2ae721
 
Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
VMware Tanzu
 
Ad

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 

Why is My Stream Processing Job Slow? with Xavier Leaute

  • 1. 1 Why is My Stream Processing Job Slow? Xavier Léauté, Software Engineer Gwen Shapira, Principal Data Architect
  • 2. 2 Kafka 101 Distributed Scalable Fault-Tolerant Partitioned + Replicated Log Ordering guarantees Consumers advance independently Exactly-once delivery Transactional commits
  • 3. What people think of Stream Monitoring 3
  • 4. What our typical experience is 4
  • 6. Confidential 5 Real Customer Experiences Client Side Broken Streaming Job / App
  • 7. Confidential 5 Real Customer Experiences Client Side Broken Streaming Job / App End-to-End Slow Replication
  • 8. Your Kafka stream job stopped humming… now what? 6
  • 9. Confidential 7 What we check Consumer Lag Partition Assignment Partition Skew Client Logs GC Log Metrics Request Latencies Commit Rates Group Rebalancing Basic Tuning Batch Sizes Commit Rate Application Profiling
  • 10. 8 The Newbie - During an incident… GC Logs? Metrics?
 How do I get those? I’ll just change some configs and reboot everything.
  • 12. 10 Bad Capacity Allocation kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group fast-data-reader TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID fast-data 1 8661694 8703404 41710 myapp-1 fast-data 3 8577975 8616490 38515 myapp-2 fast-data 0 4902354 8741872 3839518 myapp-3 fast-data 2 4922614 8621757 3699143 myapp-3
  • 13. 10 Bad Capacity Allocation kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group fast-data-reader TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID fast-data 1 8661694 8703404 41710 myapp-1 fast-data 3 8577975 8616490 38515 myapp-2 fast-data 0 4902354 8741872 3839518 myapp-3 fast-data 2 4922614 8621757 3699143 myapp-3
  • 14. 10 Bad Capacity Allocation kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group fast-data-reader TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID fast-data 1 8661694 8703404 41710 myapp-1 fast-data 3 8577975 8616490 38515 myapp-2 fast-data 0 4902354 8741872 3839518 myapp-3 fast-data 2 4922614 8621757 3699143 myapp-3
  • 15. 11 Watch for Partition Skew kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group fast-data-reader TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID fast-data 1 8661694 8703404 41710 myapp-1 fast-data 3 8577975 8616490 38515 myapp-2 fast-data 0 4902354 8741872 3839518 myapp-3 fast-data 2 4922614 8621757 3699143 myapp-3
  • 16. 11 Watch for Partition Skew kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group fast-data-reader TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID fast-data 1 8661694 8703404 41710 myapp-1 fast-data 3 8577975 8616490 38515 myapp-2 fast-data 0 4902354 8741872 3839518 myapp-3 fast-data 2 4922614 8621757 3699143 myapp-3
  • 17. 12 Not all partitions are created equal Important for Keyed topics Custom partitioned topics Early warning signs some partitions lagging uneven CPU / Network usage Typical cause skewed key distribution in your data bad joins (null keys) imbalance across brokers
  • 18. 13 Clients have metrics too! Start with the basics GC / CPU / Network General Slowness Consumer or Producer Side? Global Request Latencies Some partitions still lagging Per Broker metrics (bad node / network) Per Topic metrics (data / tuning) Buffer Size Offset Commit
  • 19. 14 Turn up the log level The logs took too much space, so we deleted them.
  • 20. 15 Time for some profiling https://github.com/jvm-profiling-tools/async-profiler https://github.com/brendangregg/FlameGraph ./profiler.sh -d 30 -f flamegraph.svg <pid> To impress your coworkers https://github.com/Netflix/flamescope
  • 21. 16 Here’s where your CPU cycles went % CPU Time Stack
  • 22. 16 Here’s where your CPU cycles went GC % CPU Time Stack
  • 23. 16 Here’s where your CPU cycles went RocksDB % CPU Time Stack
  • 24. 16 Here’s where your CPU cycles went Kafka poll() loop % CPU Time Stack
  • 25. 16 Here’s where your CPU cycles went Actual Processing Time % CPU Time Stack
  • 26. 17 Spark Streaming Clickstream Example (using Kafka)
  • 27. 18 Spark Streaming Clickstream Example (using Kafka)
  • 28. 18 Spark Streaming Clickstream Example (using Kafka) Scheduler Event Loop
  • 29. 18 Spark Streaming Clickstream Example (using Kafka) Shuffle Writes Scheduler Event Loop
  • 30. 18 Spark Streaming Clickstream Example (using Kafka) 30% deserialization Shuffle Writes Scheduler Event Loop
  • 31. 18 Spark Streaming Clickstream Example (using Kafka) 30% deserialization Shuffle Writes Scheduler Event Loop Read from Kafka & Processing
  • 33. 20 Let’s commit, just to be safe, right? Common beginner mistake Commit only as needed keep recovery short maximize throughput Metrics to validate commit-rate commit-latency-avg MESSAGES COMMIT MESSAGES
  • 35. 21 Right-size your batches Bigger Batches increase throughput improve compression
  • 36. 21 Right-size your batches Bigger Batches increase throughput improve compression Small enough (<< 10MB) to keep GC low
  • 37. 21 Right-size your batches Bigger Batches increase throughput improve compression Small enough (<< 10MB) to keep GC low batch.size + linger.ms
  • 38. 21 Right-size your batches Bigger Batches increase throughput improve compression Small enough (<< 10MB) to keep GC low batch.size + linger.ms don’t forget!
  • 39. 21 Right-size your batches Bigger Batches increase throughput improve compression Small enough (<< 10MB) to keep GC low batch.size + linger.ms Watch request-rate request-latency-avg compression-rate don’t forget!
  • 40. 22 My app keeps rebalancing Symptoms low throughput high network chatter consumer logs galore no progress hanging
  • 41. 23 Kafka Consumer Group Rebalancing 101 Consumer A Consumer B Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6
  • 42. 23 Kafka Consumer Group Rebalancing 101 Consumer A Consumer B Consumer C Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Hi!
  • 43. 23 Kafka Consumer Group Rebalancing 101 Consumer A Consumer B Consumer C Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Join Group Hi!
  • 44. 23 Kafka Consumer Group Rebalancing 101 Consumer A Consumer B Consumer C Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Join Group Hi!
  • 45. 23 Kafka Consumer Group Rebalancing 101 Consumer A Consumer B Consumer C Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Join Group Hi!
  • 46. 23 Kafka Consumer Group Rebalancing 101 Consumer A Consumer B Consumer C Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Join Response
  • 47. 23 Kafka Consumer Group Rebalancing 101 Consumer A Consumer B Consumer C Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Join Response
  • 48. 23 Kafka Consumer Group Rebalancing 101 Consumer A Consumer B Consumer C Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Sync Group
  • 49. 23 Kafka Consumer Group Rebalancing 101 Consumer A Consumer B Consumer C Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Sync Response
  • 50. 23 Kafka Consumer Group Rebalancing 101 Consumer A Consumer B Consumer C Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 New Assignment
  • 51. 24 Restoring a Happy Balance Timing Issues long GC pauses (tens of seconds) infrequent calls to poll() timeouts too short? flaky network 1 bad machine affects the entire group Watch join-rate sync-rate
  • 52. 25 Competent Users • Monitor Consumer Lag • Lookout for Partition Skew • Commit Offsets Sparingly • Collect Logs • Understand how to tune Batch Sizes
  • 53. 26 Kafka Pros • Watch Group Partition Assignment • Monitor Client Metrics • Understand Consumer Rebalancing • Profile their applications • Distinguish Client/App/Broker problems
  • 55. 28 Famous last words… “You just consume, and produce. How hard can this be?”
  • 56. 29 Famous last words… “We have a disaster in our main cluster. Can we fail over to secondary? We can’t lose more than 7 seconds of data.”
  • 57. 30 Monitor Replication Lag - In messages
  • 58. 31 Monitor Replication Lag - or in seconds… Screenshot of replicator streams monitoring
  • 59. Confidential 32 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full Buffer
  • 60. Confidential 32 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate Buffer
  • 61. Confidential 32 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate batch-size-avg
 batch-size-max Buffer
  • 62. Confidential 32 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate batch-size-avg
 batch-size-max record-retry-rate
 record-error-rate
 Buffer
  • 63. Confidential 32 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate batch-size-avg
 batch-size-max record-retry-rate
 record-error-rate
 waiting-threads
 bufferpool-wait-time Buffer
  • 64. Confidential 32 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate batch-size-avg
 batch-size-max record-retry-rate
 record-error-rate
 waiting-threads
 bufferpool-wait-time io-ratio
 io-wait-ratio
 byte-consumed-rate Buffer
  • 65. Confidential 32 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate batch-size-avg
 batch-size-max record-retry-rate
 record-error-rate
 waiting-threads
 bufferpool-wait-time io-ratio
 io-wait-ratio
 byte-consumed-rate Buffer fetch-size-avg
 fetch-size-max
 fetch-rate
  • 66. Confidential 32 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate batch-size-avg
 batch-size-max record-retry-rate
 record-error-rate
 waiting-threads
 bufferpool-wait-time io-ratio
 io-wait-ratio
 byte-consumed-rate Buffer fetch-size-avg
 fetch-size-max
 fetch-rate record-max-lag
  • 67. Confidential 33 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full Buffer
  • 68. Confidential 33 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full network or destination kafka performance Buffer
  • 69. Confidential 33 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full network or destination kafka performance increase batch.size Buffer
  • 70. Confidential 33 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full network or destination kafka performance increase batch.size destination kafka issues Buffer
  • 71. Confidential 33 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full network or destination kafka performance increase batch.size destination kafka issues network or origin kafka performance Buffer
  • 72. Confidential 33 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full network or destination kafka performance increase batch.size destination kafka issues network or origin kafka performance Buffer fetch.max.bytes
 fetch.min.bytes
 fetch.max.wait
  • 73. 34 Network Tuning • WAN has high latency. We deal with it. • Compute buffer size to match:  https://www.switch.ch/network/tools/tcp_throughput/ • send.buffer.bytes and receive.buffer.bytes on producer, consumer, brokers • OS tuning: https://wwwx.cs.unc.edu/~sparkst/howto/network_tuning.php 
 net.core.rmem_default, net.core.rmem_max, net.core.wmem_default, net.core.wmem_max • Enable logging to check if this had any effect: log4j.logger.org.apache.kafka.common.network.Selector=DEBUG • Additional tips in our docs
  • 74. 35 Competent users • Monitor consumer lag • Add processes when things are slow • Automate deployment
  • 75. 36 Kafka Pros • Monitor time lag • Collect client metrics • Knows which side to blame • Know which configs to tune • Tunes the network over the WAN
  • 76. Resources and Next Steps https://github.com/confluentinc/cp-demo https://www.confluent.io/download/ https://slackpass.io/confluentcommunity https://www.confluent.io/blog