SlideShare a Scribd company logo
Performance Analysis and
Optimizations for Kafka Streams
Guozhang Wang
Kafka Summit London, May. 14, 2019
Outline
• Streams application execution: from 1000 to 400 feet
• Processor topology generation: an optimization problem
• Kafka Streams topology optimization framework
2
3
So you want to write a Streams app?
4
Step I: Read the demo
streams/examples/src/main/java/o.a.k.streams/examples/wordcount/WordCountDemo.java
5
Step II: Modify the demo code to yours
kubectl create -f my-kafka-streams-deployment.yaml
Deployment “my-kafka-streams-app” created
kubectl scale deployment my-kafka-streams-app --replicas=2
Deployment “my-kafka-streams-app” scaled
6
java -cp /my/app -Dlog4j.configuration=file:/my/app/conf/log4j.properties /


-Dcom.sun.management.jmxremote.authenticate=false /

-Dcom.sun.management.jmxremote.ssl=false /

-Dcom.sun.management.jmxremote.port=17072 /

-XX:+UseGCLogFileRotation -XX:GCLogFileSize=10M /

-Xloggc:/my/app/logs/GC.log /

my.streams.MyStreamsApp /my/app/conf/config.properties
The fancy way
The not-so-fancy
way
Step III: Run It!
[Kafka Summit SF 2018, Shapira & Sax]
Wait, what’s this?
7
Wait, what’s this?
8
Wait, what’s this?
9
10
Kafka Streams Execution
Kafka Streams Kafka
StreamThread
JVM
(from 1000 feet)
11
Kafka Streams Execution
Kafka Streams Kafka
ProcessorTopology
(from 900 feet)
Define your Processor Topology in DSL
12
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.groupBy(…).aggregate(…);
aggregated.toStream().to(”topic3”);
Define your Processor Topology in DSL
13
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.table(”topic2”);
.addSource(”Source2”, ”topic2”)
topology.addSource(”Source1”, ”topic1”)
.addProcessor(”Join”, LeftJoin:new, ”Source1”, ”Source2”)
.addProcessor(”Aggregate”, Aggregate:new, ”Join”)
.addStateStore(Stores.persistent(…).build(), ”Aggregate”)
.addSink(”Sink”, ”topic3”, ”Aggregate”)
State
Examine your Processor Topology
14
State
topology.addSource(”Source1”, ”topic1”)
.addProcessor(”Join”, LeftJoin:new, ”Source1”, ”Source2”)
.addProcessor(”Aggregate”, Aggregate:new, ”Join”)
.addStateStore(Stores.persistent(…).build(), ”Aggregate”)
.addSink(”Sink”, ”topic3”, ”Aggregate”)
.addSource(”Source2”, ”topic2”)
topology = builder.build();
System.out.println(topology.describe());
15
Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [topic1])
--> KSTREAM-WINDOWED-0000000002
Source: KSTREAM-SOURCE-0000000001 (topics: [topic2])
--> KSTREAM-WINDOWED-0000000003
……
Processor: KSTREAM-KEY-SELECT-0000000007 (stores: [])
--> KSTREAM-FILTER-0000000011
<-- KSTREAM-MERGE-0000000006
……
Sink: KSTREAM-SINK-0000000010 (topic: MyAggStore-repartition)
<-- KSTREAM-FILTER-0000000011
Sub-topology: 1
Source: KSTREAM-SOURCE-0000000012 (topics: [MyAggStore-repartition])
--> KSTREAM-AGGREGATE-0000000009
Processor: KSTREAM-AGGREGATE-0000000009 (stores: [MyAggStore])
--> KTABLE-TOSTREAM-0000000013
<-- KSTREAM-SOURCE-0000000012
……
Sink: KSTREAM-SINK-0000000014 (topic: topic3)
<-- KTABLE-TOSTREAM-0000000013
That looks familiar..
State
[Kafka Streams Topology Visualizer. https://zz85.github.io/kafka-streams-viz/]
16
Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [topic1])
--> KSTREAM-WINDOWED-0000000002
Source: KSTREAM-SOURCE-0000000001 (topics: [topic2])
--> KSTREAM-WINDOWED-0000000003
……
Processor: KSTREAM-KEY-SELECT-0000000007 (stores: [])
--> KSTREAM-FILTER-0000000011
<-- KSTREAM-MERGE-0000000006
……
Sink: KSTREAM-SINK-0000000010 (topic: MyAggStore-repartition)
<-- KSTREAM-FILTER-0000000011
Sub-topology: 1
Source: KSTREAM-SOURCE-0000000012 (topics: [MyAggStore-repartition])
--> KSTREAM-AGGREGATE-0000000009
Processor: KSTREAM-AGGREGATE-0000000009 (stores: [MyAggStore])
--> KTABLE-TOSTREAM-0000000013
<-- KSTREAM-SOURCE-0000000012
……
Sink: KSTREAM-SINK-0000000014 (topic: topic3)
<-- KTABLE-TOSTREAM-0000000013
Hmm, that looks bizarre?
State
[Kafka Streams Topology Visualizer. https://zz85.github.io/kafka-streams-viz/]
17
Data Parallelism 101 (specially tailored for Kafka)
Topic 1
Partitions
Producers Consumers
18
Data Parallelism 101 (specially tailored for Kafka)
groupBy(…).count(…)
19
Data Parallelism 101 (specially tailored for Kafka)
groupBy(…).count(…)
20
Data Parallelism 101 (specially tailored for Kafka)
groupBy(…).count(…)
21
Data Parallelism 101 (specially tailored for Kafka)
groupBy(…).count(…)
count
3

3
3

3
22
Shuffling in Streams: RepartitionTopics
State
23
Shuffling in Streams: RepartitionTopics
State
24
Shuffling in Streams: RepartitionTopics
State
Repartition Topic (partition key = grouped key)
Topologies:
Sub-topology: 0
……
Sink: KSTREAM-SINK-0000000010 (topic: MyAggStore-repartition)
<-- KSTREAM-FILTER-0000000011
Sub-topology: 1
Source: KSTREAM-SOURCE-0000000012 (topics: [MyAggStore-repartition])
--> KSTREAM-AGGREGATE-0000000009
……
25Kafka Streams Kafka
Kafka Streams Execution (from 500 feet)
State
Kafka Topic B
Task2Task1
Stream Partitions and Tasks
26
Kafka Topic A
Kafka Topic B
Stream Partitions and Tasks
27
Kafka Topic A
Task2Task1
28Kafka Streams Kafka
Kafka Streams Execution (from 400 feet)
State State
instance-1
instance-2
instance-3
29
Repartition Topics for Shuffling
State State
instance-1
instance-2
instance-3
…
…
30
Case #1: Duplicate Repartitioning
KStream source = builder.stream(“topic1");
KStream mapped = source.map(..);
KTable counts = mapped.groupByKey().aggregate(..);
KStream sink = mapped.leftJoin(counts, ..);
31
Case #1: Duplicate Repartitioning
KStream source = builder.stream(“topic1");
KStream mapped = source.map(..);
KTable counts = mapped.groupByKey().aggregate(..);
KStream sink = mapped.leftJoin(counts, ..);
32
Case #1: Duplicate Repartitioning
KStream source = builder.stream(“topic1");
KStream mapped = source.map(..);
KTable counts = mapped.groupByKey().aggregate(..);
KStream sink = mapped.leftJoin(counts, ..);
Map
State
Agg
agg-repartition
33
Case #1: Duplicate Repartitioning
KStream source = builder.stream(“topic1");
KStream mapped = source.map(..);
KTable counts = mapped.groupByKey().aggregate(..);
KStream sink = mapped.leftJoin(counts, ..);
Map
State
Join
Agg
map-join-repartition
agg-repartition
34
Case #1: Duplicate Repartitioning
KStream source = builder.stream(“topic1");
KStream mapped = source.map(..);
KTable counts = mapped.groupByKey().aggregate(..);
KStream sink = mapped.leftJoin(counts, ..);
Map
State
Join
Agg
map-join-repartition
agg-repartition
35
Case #1: Duplicate Repartitioning
KStream source = builder.stream(“topic1");
KStream shuffled = source.map(..)

.through(“topic2”);
KTable counts = shuffled.groupByKey().count(..);
KStream sink = shuffled.leftJoin(counts, ..);
36
Case #1: Duplicate Repartitioning
KStream source = builder.stream(“topic1");
KStream shuffled = source.map(..)

.through(“topic2”);
KTable counts = shuffled.groupByKey().count(..);
KStream sink = shuffled.leftJoin(counts, ..);
Map
topic2
37
Case #1: Duplicate Repartitioning
KStream source = builder.stream(“topic1");
KStream shuffled = source.map(..)

.through(“topic2”);
KTable counts = shuffled.groupByKey().count(..);
KStream sink = shuffled.leftJoin(counts, ..);
Map
State
Agg
topic2
38
Case #1: Duplicate Repartitioning
KStream source = builder.stream(“topic1");
KStream shuffled = source.map(..)

.through(“topic2”);
KTable counts = shuffled.groupByKey().count(..);
KStream sink = shuffled.leftJoin(counts, ..);
Map
State
Join
Agg
topic2
39
Key-Changing Operations
Key-Value Value Only
map mapValues
flatMap flatMapValues
transform transformValues
flatTransform flatTransformValues
40
Case #2: Clumsy Repartitioning
KStream source = builder.stream(“topic1");
KTable aggregated = source.groupBy(..).count(..);
41
Case #2: Clumsy Repartitioning
SelectKey
Agg
agg-repartition
KStream source = builder.stream(“topic1");
KTable aggregated = source.groupBy(..).count(..);
State
42
Case #2: Clumsy Repartitioning
SelectKey
Agg
agg-repartition
KStream source = builder.stream(“topic1");
KTable aggregated = source.groupBy(..).count(..);
State
43
Case #2: Clumsy Repartitioning
KStream source = builder.stream(“topic1");
KStream projected = source.mapValues(..);
KTable aggregated = projected.groupBy(..).count(..);
44
Case #2: Clumsy Repartitioning
KStream source = builder.stream(“topic1");
KStream projected = source.mapValues(..);
KTable aggregated = projected.groupBy(..).count(..);
mapValues
45
Case #2: Clumsy Repartitioning
KStream source = builder.stream(“topic1");
KStream projected = source.mapValues(..);
KTable aggregated = projected.groupBy(..).count(..);
SelectKey
mapValues
Agg
agg-repartition
State
46
Topology Generation(an optimization problem)
Define: write DSL code
Examine: topology.describe
Refine: modify DSL code
Examine: topology.describe
…
47
Topology Generation(an optimization problem)
Define: write DSL code
Examine: topology.describe
Refine: modify DSL code
Examine: topology.describe
…
Root Cause: Step-by-step Translation
48
Do I Really have to learn all this? No!
(beyond Kafka 2.1)
49
Key Idea: Replace step-by-step translation



with logical plan compilation
50
Topology Optimization Framework
User DSL code
Processor Topology
Parsing and Generation
51
Topology Optimization Framework
User DSL code
Logical Plan: Operator Graph
Physical Plan: Processor Topology
Parsing
Compilation
Logical plan optimization
52
Logical Plan Optimization
• Currently rule based
• Repartitioning push-up and consolidation

• Sharing topics for source / changelog

• Logical views for table materialization [2.2+]

• etc..
53
Case #3: Unnecessary Materialization
KTable table1 = builder.table(“topic1");
KTable table2 = builder.table(“topic2");


table1.filter().join(table2, ..);
StateState
54
Case #3: Unnecessary Materialization
KTable table1 = builder.table(“topic1");
KTable table2 = builder.table(“topic2");


table1.filter().join(table2, ..);
filter
StateState
55
Case #3: Unnecessary Materialization
KTable table1 = builder.table(“topic1");
KTable table2 = builder.table(“topic2");


table1.filter().join(table2, ..);
filter
StateState
join
56
Case #3: Unnecessary Materialization
KTable table1 = builder.table(“topic1");
KTable table2 = builder.table(“topic2");


table1.filter().join(table2, ..);
filter
State
State
State
join
57
Case #3: Unnecessary Materialization
KTable table1 = builder.table(“topic1");
KTable table2 = builder.table(“topic2");


table1.filter().join(table2, ..);
filter
State
State
State
join
58
Case #3 with Optimization Enabled
KTable table1 = builder.table(“topic1");
KTable table2 = builder.table(“topic2");


table1.filter().join(table2, ..);
filter
State
View
State
join
59
Case #3 with Optimization Enabled
KTable table1 = builder.table(“topic1");
KTable table2 = builder.table(“topic2");


table1.filter().join(table2, ..);
filter
State
View
State
join
60
Enable Topology Optimization
config: topology.optimization = all 

(default = none)
[KIP-295]
code: StreamBuilder#build(props) 

upgrade: watch out for compatibility
[https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/]
61
What’s next
[KIP-372, KIP-307]
• Extensible optimization framework

• More re-write rules!

• Beyond all-or-nothing config

• Compatibility support for optimized topology
[KAFKA-6034]
Take-aways
• Optimize your topology for better performance
and less footprint
62
System.out.println(topology.describe());
Take-aways
• Optimize your topology for better performance
and less footprint
• It’s OK if you forget the first bullet point: 



Kafka Streams will help doing that for you!
63
System.out.println(topology.describe());
Take-aways
64
THANKS!
Guozhang Wang | guozhang@confluent.io | @guozhangwang
• Optimize your topology for better performance
and less footprint
• It’s OK if you forget the first bullet point: 



Kafka Streams will help doing that for you!
System.out.println(topology.describe());
65
We are Hiring!

More Related Content

Similar to Performance Analysis and Optimizations for Kafka Streams Applications (20)

Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
HostedbyConfluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
confluent
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
Yoni Farin
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
confluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
Guozhang Wang
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streams
confluent
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
Yoni Farin
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
Yoni Farin
 
Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streams
confluent
 
Apache Kafka Streams Use Case
Apache Kafka Streams Use CaseApache Kafka Streams Use Case
Apache Kafka Streams Use Case
Apache Kafka TLV
 
Apache Kafka Streams
Apache Kafka StreamsApache Kafka Streams
Apache Kafka Streams
Apache Kafka TLV
 
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
HostedbyConfluent
 
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Michael Noll
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
Slim Baltagi
 
What every software engineer should know about streams and tables in kafka ...
What every software engineer should know about streams and tables in kafka   ...What every software engineer should know about streams and tables in kafka   ...
What every software engineer should know about streams and tables in kafka ...
confluent
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
Kafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the CloudKafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the Cloud
VMware Tanzu
 
Data Quality Monitoring in Realtime and at Scale
Data Quality Monitoring in Realtime and at ScaleData Quality Monitoring in Realtime and at Scale
Data Quality Monitoring in Realtime and at Scale
Alexis Seigneurin
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
HostedbyConfluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
confluent
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
Yoni Farin
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
confluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
Guozhang Wang
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streams
confluent
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
Yoni Farin
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
Yoni Farin
 
Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streams
confluent
 
Apache Kafka Streams Use Case
Apache Kafka Streams Use CaseApache Kafka Streams Use Case
Apache Kafka Streams Use Case
Apache Kafka TLV
 
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
HostedbyConfluent
 
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Michael Noll
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
Slim Baltagi
 
What every software engineer should know about streams and tables in kafka ...
What every software engineer should know about streams and tables in kafka   ...What every software engineer should know about streams and tables in kafka   ...
What every software engineer should know about streams and tables in kafka ...
confluent
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
Kafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the CloudKafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the Cloud
VMware Tanzu
 
Data Quality Monitoring in Realtime and at Scale
Data Quality Monitoring in Realtime and at ScaleData Quality Monitoring in Realtime and at Scale
Data Quality Monitoring in Realtime and at Scale
Alexis Seigneurin
 

More from Guozhang Wang (11)

Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
Guozhang Wang
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Guozhang Wang
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Guozhang Wang
 
Introduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaIntroduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of Kafka
Guozhang Wang
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Guozhang Wang
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Guozhang Wang
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
Guozhang Wang
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
Guozhang Wang
 
Behavioral Simulations in MapReduce
Behavioral Simulations in MapReduceBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce
Guozhang Wang
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative Computations
Guozhang Wang
 
Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
Guozhang Wang
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Guozhang Wang
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Guozhang Wang
 
Introduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaIntroduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of Kafka
Guozhang Wang
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Guozhang Wang
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Guozhang Wang
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
Guozhang Wang
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
Guozhang Wang
 
Behavioral Simulations in MapReduce
Behavioral Simulations in MapReduceBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce
Guozhang Wang
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative Computations
Guozhang Wang
 

Recently uploaded (20)

Software_Engineering_in_6_Hours_lyst1728638742594.pdf
Software_Engineering_in_6_Hours_lyst1728638742594.pdfSoftware_Engineering_in_6_Hours_lyst1728638742594.pdf
Software_Engineering_in_6_Hours_lyst1728638742594.pdf
VanshMunjal7
 
Influence line diagram in a robust model
Influence line diagram in a robust modelInfluence line diagram in a robust model
Influence line diagram in a robust model
ParthaSengupta26
 
HVAC Air Filter Equipment-Catalouge-Final.pdf
HVAC Air Filter Equipment-Catalouge-Final.pdfHVAC Air Filter Equipment-Catalouge-Final.pdf
HVAC Air Filter Equipment-Catalouge-Final.pdf
FILTRATION ENGINEERING & CUNSULTANT
 
Video Games and Artificial-Realities.pptx
Video Games and Artificial-Realities.pptxVideo Games and Artificial-Realities.pptx
Video Games and Artificial-Realities.pptx
HadiBadri1
 
world subdivision.pdf...................
world subdivision.pdf...................world subdivision.pdf...................
world subdivision.pdf...................
bmmederos12
 
UNIT-1-PPT-Introduction about Power System Operation and Control
UNIT-1-PPT-Introduction about Power System Operation and ControlUNIT-1-PPT-Introduction about Power System Operation and Control
UNIT-1-PPT-Introduction about Power System Operation and Control
Sridhar191373
 
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCH
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCHUNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCH
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCH
Sridhar191373
 
Design of a Hand Rehabilitation Device for Post-Stroke Patients..pptx
Design of a Hand Rehabilitation Device for Post-Stroke Patients..pptxDesign of a Hand Rehabilitation Device for Post-Stroke Patients..pptx
Design of a Hand Rehabilitation Device for Post-Stroke Patients..pptx
younisalsadah
 
Air Filter Flat Sheet Media-Catalouge-Final.pdf
Air Filter Flat Sheet Media-Catalouge-Final.pdfAir Filter Flat Sheet Media-Catalouge-Final.pdf
Air Filter Flat Sheet Media-Catalouge-Final.pdf
FILTRATION ENGINEERING & CUNSULTANT
 
Tesia Dobrydnia - A Leader In Her Industry
Tesia Dobrydnia - A Leader In Her IndustryTesia Dobrydnia - A Leader In Her Industry
Tesia Dobrydnia - A Leader In Her Industry
Tesia Dobrydnia
 
DIY Gesture Control ESP32 LiteWing Drone using Python
DIY Gesture Control ESP32 LiteWing Drone using  PythonDIY Gesture Control ESP32 LiteWing Drone using  Python
DIY Gesture Control ESP32 LiteWing Drone using Python
CircuitDigest
 
All about the Snail Power Catalog Product 2025
All about the Snail Power Catalog  Product 2025All about the Snail Power Catalog  Product 2025
All about the Snail Power Catalog Product 2025
kstgroupvn
 
BEC602- Module 3-2-Notes.pdf.Vlsi design and testing notes
BEC602- Module 3-2-Notes.pdf.Vlsi design and testing notesBEC602- Module 3-2-Notes.pdf.Vlsi design and testing notes
BEC602- Module 3-2-Notes.pdf.Vlsi design and testing notes
VarshithaP6
 
Department of Environment (DOE) Mix Design with Fly Ash.
Department of Environment (DOE) Mix Design with Fly Ash.Department of Environment (DOE) Mix Design with Fly Ash.
Department of Environment (DOE) Mix Design with Fly Ash.
MdManikurRahman
 
DE-UNIT-V MEMORY DEVICES AND DIGITAL INTEGRATED CIRCUITS
DE-UNIT-V MEMORY DEVICES AND DIGITAL INTEGRATED CIRCUITSDE-UNIT-V MEMORY DEVICES AND DIGITAL INTEGRATED CIRCUITS
DE-UNIT-V MEMORY DEVICES AND DIGITAL INTEGRATED CIRCUITS
Sridhar191373
 
Highway Engineering - Pavement materials
Highway Engineering - Pavement materialsHighway Engineering - Pavement materials
Highway Engineering - Pavement materials
AmrutaBhosale9
 
1. Mix Design M20 CT.pdf for M20 Grade mix design
1. Mix Design M20 CT.pdf for M20 Grade mix design1. Mix Design M20 CT.pdf for M20 Grade mix design
1. Mix Design M20 CT.pdf for M20 Grade mix design
smghumare
 
Fresh concrete Workability Measurement
Fresh concrete  Workability  MeasurementFresh concrete  Workability  Measurement
Fresh concrete Workability Measurement
SasiVarman5
 
Prediction of Unconfined Compressive Strength of Expansive Soil Amended with ...
Prediction of Unconfined Compressive Strength of Expansive Soil Amended with ...Prediction of Unconfined Compressive Strength of Expansive Soil Amended with ...
Prediction of Unconfined Compressive Strength of Expansive Soil Amended with ...
Journal of Soft Computing in Civil Engineering
 
Software_Engineering_in_6_Hours_lyst1728638742594.pdf
Software_Engineering_in_6_Hours_lyst1728638742594.pdfSoftware_Engineering_in_6_Hours_lyst1728638742594.pdf
Software_Engineering_in_6_Hours_lyst1728638742594.pdf
VanshMunjal7
 
Influence line diagram in a robust model
Influence line diagram in a robust modelInfluence line diagram in a robust model
Influence line diagram in a robust model
ParthaSengupta26
 
Video Games and Artificial-Realities.pptx
Video Games and Artificial-Realities.pptxVideo Games and Artificial-Realities.pptx
Video Games and Artificial-Realities.pptx
HadiBadri1
 
world subdivision.pdf...................
world subdivision.pdf...................world subdivision.pdf...................
world subdivision.pdf...................
bmmederos12
 
UNIT-1-PPT-Introduction about Power System Operation and Control
UNIT-1-PPT-Introduction about Power System Operation and ControlUNIT-1-PPT-Introduction about Power System Operation and Control
UNIT-1-PPT-Introduction about Power System Operation and Control
Sridhar191373
 
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCH
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCHUNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCH
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCH
Sridhar191373
 
Design of a Hand Rehabilitation Device for Post-Stroke Patients..pptx
Design of a Hand Rehabilitation Device for Post-Stroke Patients..pptxDesign of a Hand Rehabilitation Device for Post-Stroke Patients..pptx
Design of a Hand Rehabilitation Device for Post-Stroke Patients..pptx
younisalsadah
 
Tesia Dobrydnia - A Leader In Her Industry
Tesia Dobrydnia - A Leader In Her IndustryTesia Dobrydnia - A Leader In Her Industry
Tesia Dobrydnia - A Leader In Her Industry
Tesia Dobrydnia
 
DIY Gesture Control ESP32 LiteWing Drone using Python
DIY Gesture Control ESP32 LiteWing Drone using  PythonDIY Gesture Control ESP32 LiteWing Drone using  Python
DIY Gesture Control ESP32 LiteWing Drone using Python
CircuitDigest
 
All about the Snail Power Catalog Product 2025
All about the Snail Power Catalog  Product 2025All about the Snail Power Catalog  Product 2025
All about the Snail Power Catalog Product 2025
kstgroupvn
 
BEC602- Module 3-2-Notes.pdf.Vlsi design and testing notes
BEC602- Module 3-2-Notes.pdf.Vlsi design and testing notesBEC602- Module 3-2-Notes.pdf.Vlsi design and testing notes
BEC602- Module 3-2-Notes.pdf.Vlsi design and testing notes
VarshithaP6
 
Department of Environment (DOE) Mix Design with Fly Ash.
Department of Environment (DOE) Mix Design with Fly Ash.Department of Environment (DOE) Mix Design with Fly Ash.
Department of Environment (DOE) Mix Design with Fly Ash.
MdManikurRahman
 
DE-UNIT-V MEMORY DEVICES AND DIGITAL INTEGRATED CIRCUITS
DE-UNIT-V MEMORY DEVICES AND DIGITAL INTEGRATED CIRCUITSDE-UNIT-V MEMORY DEVICES AND DIGITAL INTEGRATED CIRCUITS
DE-UNIT-V MEMORY DEVICES AND DIGITAL INTEGRATED CIRCUITS
Sridhar191373
 
Highway Engineering - Pavement materials
Highway Engineering - Pavement materialsHighway Engineering - Pavement materials
Highway Engineering - Pavement materials
AmrutaBhosale9
 
1. Mix Design M20 CT.pdf for M20 Grade mix design
1. Mix Design M20 CT.pdf for M20 Grade mix design1. Mix Design M20 CT.pdf for M20 Grade mix design
1. Mix Design M20 CT.pdf for M20 Grade mix design
smghumare
 
Fresh concrete Workability Measurement
Fresh concrete  Workability  MeasurementFresh concrete  Workability  Measurement
Fresh concrete Workability Measurement
SasiVarman5
 

Performance Analysis and Optimizations for Kafka Streams Applications

  • 1. Performance Analysis and Optimizations for Kafka Streams Guozhang Wang Kafka Summit London, May. 14, 2019
  • 2. Outline • Streams application execution: from 1000 to 400 feet • Processor topology generation: an optimization problem • Kafka Streams topology optimization framework 2
  • 3. 3 So you want to write a Streams app?
  • 4. 4 Step I: Read the demo streams/examples/src/main/java/o.a.k.streams/examples/wordcount/WordCountDemo.java
  • 5. 5 Step II: Modify the demo code to yours
  • 6. kubectl create -f my-kafka-streams-deployment.yaml Deployment “my-kafka-streams-app” created kubectl scale deployment my-kafka-streams-app --replicas=2 Deployment “my-kafka-streams-app” scaled 6 java -cp /my/app -Dlog4j.configuration=file:/my/app/conf/log4j.properties /

 -Dcom.sun.management.jmxremote.authenticate=false /
 -Dcom.sun.management.jmxremote.ssl=false /
 -Dcom.sun.management.jmxremote.port=17072 /
 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=10M /
 -Xloggc:/my/app/logs/GC.log /
 my.streams.MyStreamsApp /my/app/conf/config.properties The fancy way The not-so-fancy way Step III: Run It! [Kafka Summit SF 2018, Shapira & Sax]
  • 10. 10 Kafka Streams Execution Kafka Streams Kafka StreamThread JVM (from 1000 feet)
  • 11. 11 Kafka Streams Execution Kafka Streams Kafka ProcessorTopology (from 900 feet)
  • 12. Define your Processor Topology in DSL 12 KStream<..> stream1 = builder.stream(”topic1”); KStream<..> stream2 = builder.stream(”topic2”); KStream<..> joined = stream1.leftJoin(stream2, ...); KTable<..> aggregated = joined.groupBy(…).aggregate(…); aggregated.toStream().to(”topic3”);
  • 13. Define your Processor Topology in DSL 13 KStream<..> stream1 = builder.stream(”topic1”); KStream<..> stream2 = builder.table(”topic2”); .addSource(”Source2”, ”topic2”) topology.addSource(”Source1”, ”topic1”) .addProcessor(”Join”, LeftJoin:new, ”Source1”, ”Source2”) .addProcessor(”Aggregate”, Aggregate:new, ”Join”) .addStateStore(Stores.persistent(…).build(), ”Aggregate”) .addSink(”Sink”, ”topic3”, ”Aggregate”) State
  • 14. Examine your Processor Topology 14 State topology.addSource(”Source1”, ”topic1”) .addProcessor(”Join”, LeftJoin:new, ”Source1”, ”Source2”) .addProcessor(”Aggregate”, Aggregate:new, ”Join”) .addStateStore(Stores.persistent(…).build(), ”Aggregate”) .addSink(”Sink”, ”topic3”, ”Aggregate”) .addSource(”Source2”, ”topic2”) topology = builder.build(); System.out.println(topology.describe());
  • 15. 15 Topologies: Sub-topology: 0 Source: KSTREAM-SOURCE-0000000000 (topics: [topic1]) --> KSTREAM-WINDOWED-0000000002 Source: KSTREAM-SOURCE-0000000001 (topics: [topic2]) --> KSTREAM-WINDOWED-0000000003 …… Processor: KSTREAM-KEY-SELECT-0000000007 (stores: []) --> KSTREAM-FILTER-0000000011 <-- KSTREAM-MERGE-0000000006 …… Sink: KSTREAM-SINK-0000000010 (topic: MyAggStore-repartition) <-- KSTREAM-FILTER-0000000011 Sub-topology: 1 Source: KSTREAM-SOURCE-0000000012 (topics: [MyAggStore-repartition]) --> KSTREAM-AGGREGATE-0000000009 Processor: KSTREAM-AGGREGATE-0000000009 (stores: [MyAggStore]) --> KTABLE-TOSTREAM-0000000013 <-- KSTREAM-SOURCE-0000000012 …… Sink: KSTREAM-SINK-0000000014 (topic: topic3) <-- KTABLE-TOSTREAM-0000000013 That looks familiar.. State [Kafka Streams Topology Visualizer. https://zz85.github.io/kafka-streams-viz/]
  • 16. 16 Topologies: Sub-topology: 0 Source: KSTREAM-SOURCE-0000000000 (topics: [topic1]) --> KSTREAM-WINDOWED-0000000002 Source: KSTREAM-SOURCE-0000000001 (topics: [topic2]) --> KSTREAM-WINDOWED-0000000003 …… Processor: KSTREAM-KEY-SELECT-0000000007 (stores: []) --> KSTREAM-FILTER-0000000011 <-- KSTREAM-MERGE-0000000006 …… Sink: KSTREAM-SINK-0000000010 (topic: MyAggStore-repartition) <-- KSTREAM-FILTER-0000000011 Sub-topology: 1 Source: KSTREAM-SOURCE-0000000012 (topics: [MyAggStore-repartition]) --> KSTREAM-AGGREGATE-0000000009 Processor: KSTREAM-AGGREGATE-0000000009 (stores: [MyAggStore]) --> KTABLE-TOSTREAM-0000000013 <-- KSTREAM-SOURCE-0000000012 …… Sink: KSTREAM-SINK-0000000014 (topic: topic3) <-- KTABLE-TOSTREAM-0000000013 Hmm, that looks bizarre? State [Kafka Streams Topology Visualizer. https://zz85.github.io/kafka-streams-viz/]
  • 17. 17 Data Parallelism 101 (specially tailored for Kafka) Topic 1 Partitions Producers Consumers
  • 18. 18 Data Parallelism 101 (specially tailored for Kafka) groupBy(…).count(…)
  • 19. 19 Data Parallelism 101 (specially tailored for Kafka) groupBy(…).count(…)
  • 20. 20 Data Parallelism 101 (specially tailored for Kafka) groupBy(…).count(…)
  • 21. 21 Data Parallelism 101 (specially tailored for Kafka) groupBy(…).count(…) count 3 3 3 3
  • 22. 22 Shuffling in Streams: RepartitionTopics State
  • 23. 23 Shuffling in Streams: RepartitionTopics State
  • 24. 24 Shuffling in Streams: RepartitionTopics State Repartition Topic (partition key = grouped key) Topologies: Sub-topology: 0 …… Sink: KSTREAM-SINK-0000000010 (topic: MyAggStore-repartition) <-- KSTREAM-FILTER-0000000011 Sub-topology: 1 Source: KSTREAM-SOURCE-0000000012 (topics: [MyAggStore-repartition]) --> KSTREAM-AGGREGATE-0000000009 ……
  • 25. 25Kafka Streams Kafka Kafka Streams Execution (from 500 feet) State
  • 26. Kafka Topic B Task2Task1 Stream Partitions and Tasks 26 Kafka Topic A
  • 27. Kafka Topic B Stream Partitions and Tasks 27 Kafka Topic A Task2Task1
  • 28. 28Kafka Streams Kafka Kafka Streams Execution (from 400 feet) State State instance-1 instance-2 instance-3
  • 29. 29 Repartition Topics for Shuffling State State instance-1 instance-2 instance-3 … …
  • 30. 30 Case #1: Duplicate Repartitioning KStream source = builder.stream(“topic1"); KStream mapped = source.map(..); KTable counts = mapped.groupByKey().aggregate(..); KStream sink = mapped.leftJoin(counts, ..);
  • 31. 31 Case #1: Duplicate Repartitioning KStream source = builder.stream(“topic1"); KStream mapped = source.map(..); KTable counts = mapped.groupByKey().aggregate(..); KStream sink = mapped.leftJoin(counts, ..);
  • 32. 32 Case #1: Duplicate Repartitioning KStream source = builder.stream(“topic1"); KStream mapped = source.map(..); KTable counts = mapped.groupByKey().aggregate(..); KStream sink = mapped.leftJoin(counts, ..); Map State Agg agg-repartition
  • 33. 33 Case #1: Duplicate Repartitioning KStream source = builder.stream(“topic1"); KStream mapped = source.map(..); KTable counts = mapped.groupByKey().aggregate(..); KStream sink = mapped.leftJoin(counts, ..); Map State Join Agg map-join-repartition agg-repartition
  • 34. 34 Case #1: Duplicate Repartitioning KStream source = builder.stream(“topic1"); KStream mapped = source.map(..); KTable counts = mapped.groupByKey().aggregate(..); KStream sink = mapped.leftJoin(counts, ..); Map State Join Agg map-join-repartition agg-repartition
  • 35. 35 Case #1: Duplicate Repartitioning KStream source = builder.stream(“topic1"); KStream shuffled = source.map(..)
 .through(“topic2”); KTable counts = shuffled.groupByKey().count(..); KStream sink = shuffled.leftJoin(counts, ..);
  • 36. 36 Case #1: Duplicate Repartitioning KStream source = builder.stream(“topic1"); KStream shuffled = source.map(..)
 .through(“topic2”); KTable counts = shuffled.groupByKey().count(..); KStream sink = shuffled.leftJoin(counts, ..); Map topic2
  • 37. 37 Case #1: Duplicate Repartitioning KStream source = builder.stream(“topic1"); KStream shuffled = source.map(..)
 .through(“topic2”); KTable counts = shuffled.groupByKey().count(..); KStream sink = shuffled.leftJoin(counts, ..); Map State Agg topic2
  • 38. 38 Case #1: Duplicate Repartitioning KStream source = builder.stream(“topic1"); KStream shuffled = source.map(..)
 .through(“topic2”); KTable counts = shuffled.groupByKey().count(..); KStream sink = shuffled.leftJoin(counts, ..); Map State Join Agg topic2
  • 39. 39 Key-Changing Operations Key-Value Value Only map mapValues flatMap flatMapValues transform transformValues flatTransform flatTransformValues
  • 40. 40 Case #2: Clumsy Repartitioning KStream source = builder.stream(“topic1"); KTable aggregated = source.groupBy(..).count(..);
  • 41. 41 Case #2: Clumsy Repartitioning SelectKey Agg agg-repartition KStream source = builder.stream(“topic1"); KTable aggregated = source.groupBy(..).count(..); State
  • 42. 42 Case #2: Clumsy Repartitioning SelectKey Agg agg-repartition KStream source = builder.stream(“topic1"); KTable aggregated = source.groupBy(..).count(..); State
  • 43. 43 Case #2: Clumsy Repartitioning KStream source = builder.stream(“topic1"); KStream projected = source.mapValues(..); KTable aggregated = projected.groupBy(..).count(..);
  • 44. 44 Case #2: Clumsy Repartitioning KStream source = builder.stream(“topic1"); KStream projected = source.mapValues(..); KTable aggregated = projected.groupBy(..).count(..); mapValues
  • 45. 45 Case #2: Clumsy Repartitioning KStream source = builder.stream(“topic1"); KStream projected = source.mapValues(..); KTable aggregated = projected.groupBy(..).count(..); SelectKey mapValues Agg agg-repartition State
  • 46. 46 Topology Generation(an optimization problem) Define: write DSL code Examine: topology.describe Refine: modify DSL code Examine: topology.describe …
  • 47. 47 Topology Generation(an optimization problem) Define: write DSL code Examine: topology.describe Refine: modify DSL code Examine: topology.describe … Root Cause: Step-by-step Translation
  • 48. 48 Do I Really have to learn all this? No! (beyond Kafka 2.1)
  • 49. 49 Key Idea: Replace step-by-step translation
 
 with logical plan compilation
  • 50. 50 Topology Optimization Framework User DSL code Processor Topology Parsing and Generation
  • 51. 51 Topology Optimization Framework User DSL code Logical Plan: Operator Graph Physical Plan: Processor Topology Parsing Compilation Logical plan optimization
  • 52. 52 Logical Plan Optimization • Currently rule based • Repartitioning push-up and consolidation
 • Sharing topics for source / changelog
 • Logical views for table materialization [2.2+]
 • etc..
  • 53. 53 Case #3: Unnecessary Materialization KTable table1 = builder.table(“topic1"); KTable table2 = builder.table(“topic2"); 
 table1.filter().join(table2, ..); StateState
  • 54. 54 Case #3: Unnecessary Materialization KTable table1 = builder.table(“topic1"); KTable table2 = builder.table(“topic2"); 
 table1.filter().join(table2, ..); filter StateState
  • 55. 55 Case #3: Unnecessary Materialization KTable table1 = builder.table(“topic1"); KTable table2 = builder.table(“topic2"); 
 table1.filter().join(table2, ..); filter StateState join
  • 56. 56 Case #3: Unnecessary Materialization KTable table1 = builder.table(“topic1"); KTable table2 = builder.table(“topic2"); 
 table1.filter().join(table2, ..); filter State State State join
  • 57. 57 Case #3: Unnecessary Materialization KTable table1 = builder.table(“topic1"); KTable table2 = builder.table(“topic2"); 
 table1.filter().join(table2, ..); filter State State State join
  • 58. 58 Case #3 with Optimization Enabled KTable table1 = builder.table(“topic1"); KTable table2 = builder.table(“topic2"); 
 table1.filter().join(table2, ..); filter State View State join
  • 59. 59 Case #3 with Optimization Enabled KTable table1 = builder.table(“topic1"); KTable table2 = builder.table(“topic2"); 
 table1.filter().join(table2, ..); filter State View State join
  • 60. 60 Enable Topology Optimization config: topology.optimization = all 
 (default = none) [KIP-295] code: StreamBuilder#build(props) 
 upgrade: watch out for compatibility [https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/]
  • 61. 61 What’s next [KIP-372, KIP-307] • Extensible optimization framework
 • More re-write rules!
 • Beyond all-or-nothing config
 • Compatibility support for optimized topology [KAFKA-6034]
  • 62. Take-aways • Optimize your topology for better performance and less footprint 62 System.out.println(topology.describe());
  • 63. Take-aways • Optimize your topology for better performance and less footprint • It’s OK if you forget the first bullet point: 
 
 Kafka Streams will help doing that for you! 63 System.out.println(topology.describe());
  • 64. Take-aways 64 THANKS! Guozhang Wang | [email protected] | @guozhangwang • Optimize your topology for better performance and less footprint • It’s OK if you forget the first bullet point: 
 
 Kafka Streams will help doing that for you! System.out.println(topology.describe());