Kafka streams windowing behind the curtain

Kafka Streams Windowing
Behind the Curtain
Neil Buesing
Principal Solutions Architect, Rill
Confluent Meetup
July 15th, 2021

• Operational intelligence for data in motion
• Easy In - Easy Up - Easy Out
• Work with customers to build & modernize their pipelines
What Does Rill Do?

• Principal Solutions Architect
• Help customers with pipelines leveraging Apache Druid, Apache Kafka, Kafka
Streams, Apache Beam, and other technologies.
• Data Modeling and Governance
• Rill Data / Apache Druid
What Do I Do?

• Overview of the Windowing Options within Kafka Streams
• Windowing Use-Cases
• Examples of Aggregate Windowing
• What each windowing options does within RocksDB and the -changelog topics
• The key serialization of the -changeling topics
• Developer Tools & Ideas
Takeaways

• Stream / Table Duality
• Compacted Topics need stateful front-end
• Stateful Operations
• Finite datasets — tables
• Boundaries for unbounded data — windows
Why Kafka Streams

Windowing Options
Window Type Time boundary Examples
# records for key
@ point in time
Fixed Size
Tumbling Epoch
[8:00, 8:30)
[8:30, 9:00)
single
Yes
Hopping Epoch
[8:00, 8:30)
[8:15, 8:45)
[8:30, 8:45)
[8:45, 9:00)
constant
Yes
Sliding Record
[8:02, 8:32]
[8:20, 8:50]
[8:21, 8:51]
variable
Yes
Session Record
[8:02, 8:02]
[8:02, 8:10]
[9:10, 12:56]
single
(by tombstoning)
No

001 / 12:00
002 / 12:00
003 / 12:00
001 / 12:30
002 / 12:30
003 / 12:30
12:00 12:15 12:30 12:45 1:00
01
5
Tumbling Time Windows
01
3
02
9
03
5
01
4
5 8 12
5
9
03
11
02
3
02
6
02
5
02
1
03
2
03
7
01
7
01
5
01
3
16 7 9
6 11 12
3 8 15
12

Hopping Time Windows
001/ 12:00
002 / 12:00
003 / 12:00
001 / 12:30
002 / 12:30
003 / 12:30
12:00 12:15 12:30 12:45 1:00
01
5
01
3
02
9
03
5
01
4
5 8 12
5
9
03
11
02
3
02
6
02
5
02
1
03
2
03
7
01
7
01
5
01
3
16 7 9
6 11 12
3 8 15
001 / 11:45
5 8 12
001 / 12:45
002 / 11:45 002 / 12:15 002 / 12:45
003 / 11:45 003 / 12:45
003 / 12:15
3 8 15
1
18 20
11
5
9
12
3 9 14

Sliding Time Windows
001 / 11:31:00.000
12:00 12:15 12:30 12:45 1:00
01
5
01
3
01
4
5
01
5
01
3
001 / 11:33:00.000
8
001 / 11:47.00.000
12
001 / 12:31:00.001
7
001 / 11:33.00:001
4
001 / 11:45:00.000
3
12:01
12:03
12:12
12:48
12:55
001 / 12:31:00.001
3
001 / 11:55:00.000
001 / 11:45:00.001
8
5

12:00 12:15 12:30 12:45 1:00
01
5
Session Windows
01
3
02
9
03
5
01
4
03
11
02
3
02
6
02
5
02
1
03
2
03
7
01
7
01
5
01
3
5 8 12 3 8 15
5
9
16 23 25
18 23 24
12

• Good
• Web Visitors
• Products Purchased
• Inventory Management
• IoT Sensors
• Ad Impressions*
• Bad
• Fraud Detection
• User Interactions
• Composition*
Tumbling Time Windows
* event timestamp & grace period

• Good
• Web Visitors
• Products Purchased
• Fraud Detection
• IoT Sensors
• Bad
• Inventory Management
• Composition
• Ad Impressions
Hopping Time Windows

• Good
• Fraud Detection
• Usage Changes
• Bad
• Composition
• IoT Sensors
Sliding Time Windows

• Good
• User Interactions / Click Stream
• User Behavior Analysis
• IoT device - session oriented
(running)
• Bad
• Data Analytics (Generalizations)
• IoT sensors - always on
(pacemaker)
Session Windows

• Good
• Composition*
• Finite Datasets
• Bad
• Fraud Detection
• Monitoring
• Unbounded data*
No Windows
* manual tombstoning

Order Processing
Order Analytics
Demo Applications
orders-purchase orders-pickup
repartition
attach
user
& store
attach
line item
pricing
assemble
product
analytics
pickup-order-handler-purchase-order-join-product-repartition
product-repartition product-stats
State

Materialized!<> store = Materialized.as("po").withCachingDisabled();
builder.<String, PurchaseOrder>stream(opt.getTopic())
.groupByKey(Grouped.as("groupByKey"))
.windowedBy(TimeWindows.of(Duration.ofSeconds(opt.getWindowSize()))
.grace(Duration.ofSeconds(opt.getGracePeriod())))
.aggregate(Streams!::initialize,
Streams!::aggregator,
Named.as("aggregate"),
store)
.toStream(Named.as("toStream"))
.selectKey((k, v) !-> k.key() + " " + toStr(k.window()) + "," + ")")
.mapValues(Streams!::minimize)
.to(opt.topic(), Produced.as("to"));

Materialized!<> store = Materialized.as("po").withCachingDisabled();
builder.<String, PurchaseOrder>stream(opt.getTopic())
.groupByKey(Grouped.as("groupByKey"))
.windowedBy(TimeWindows.of(Duration.ofSeconds(opt.getWindowSize()))
.grace(Duration.ofSeconds(opt.getGracePeriod())))
.aggregate(Streams!::initialize,
Streams!::aggregator,
Named.as("aggregate"),
store)
.toStream(Named.as("toStream"))
.selectKey((k, v) !-> k.key() + " " + toStr(k.window()) + "," + ")")
.mapValues(Streams!::minimize)
.to(opt.topic(), Produced.as("to"));
TimeWindows.of(Duration.ofSeconds(opt.getWindowSize()))
.advanceBy(Duration.ofSeconds(opt.getWindowSize() / 2))
.grace(Duration.ofSeconds(opt.getGracePeriod())
SlidingWindows.withTimeDifferenceAndGrace(
Duration.ofSeconds(opt.getWindowSize()),
Duration.ofSeconds(opt.getGracePeriod()))
SessionWindows.with(Duration.ofSeconds(opt.getWindowSize()))

Demo “Time”
product
analytics
product-repartition product-stats
rocksdb_ldb
console-
consumer
console-
consumer
Metrics
-changelog
* Caching disabled

• Deserializers. % ln -s {jar} /usr/local/confluent/share/java/kafka
• WindowDeserializer
• SessionDeserializer
• RocksDB
• RocksDB’s rocksdb_ldb % brew install rocksdb
• Scripts
• rocksdb_key_parser
• rocksdb_window_parser
• rocksdb_session_parser
Demo Tools

• Emitting Results
• Suppression
• Commit Time
• Window Boundaries
• Epoch vs. Event
• Long Windows*
• Join Windowing
• RocksDB Tuning
• RocksDB state store instances…
What Next

• Overview of the Windowing Options within Kafka Streams
• Windowing Use-Cases
• Examples of Aggregate Windowing
• What each windowing options does within RocksDB and the -changelog topics
• The key serialization of the -changeling topics
• Advance Considerations
Takeaways

• Demo Application
https://github.com/nbuesing/kafka-streams-dashboards
• Kafka Summit Europe 2021
https://www.confluent.io/events/kafka-summit-europe-2021/what-is-the-state-of-
my-kafka-streams-application-unleashing-metrics/
Resources

• Nick Dearden’s
https://www.confluent.io/kafka-summit-ny19/zen-and-the-art-of-streaming-joins/
• Anna McDonald’s
https://www.confluent.io/kafka-summit-san-francisco-2019/using-kafka-to-discover-events-hidden-in-
your-database/
• Matthias Sax’s
https://www.confluent.io/kafka-summit-san-francisco-2019/whats-the-time-and-why/
https://www.confluent.io/resources/kafka-summit-2020/the-flux-capacitor-of-kafka-streams-and-ksqldb/
Additional Resources

Kafka streams windowing behind the curtain

Kafka streams windowing behind the curtain

Recommended

More Related Content

What's hot (20)

Similar to Kafka streams windowing behind the curtain (20)

More from confluent (20)

Recently uploaded (20)

Kafka streams windowing behind the curtain