ksqlDB Workshop

ksqlDB Workshop
June 24th, 2020
Patrick Druley
Senior Solution Engineer @Conﬂuent
Twitter @PatrickLovesAK

Today’s Agenda
3
10:00 - 10:45 AM
Streams Processing/KSQL Overview
Patrick
10:45 AM - 12:15 PM
Interactive Streams Lab
Patrick with help from JT, Chris, Dan, Brian
12:15 - 12:30 PM
Q&A and Next Steps
Open Discussion
Workshop Tips & Help:
1. Disconnect from VPN.
2. Check the ‘Chat’ window during
the session for instructions
[icon located at the bottom of
the Zoom toolbar]
3. For any technical issues, click
the ‘Raise Hand’ button or post
in the ‘Chat’ window
[a Conﬂuent team member will
assist you]

Apache Kafka is a Distributed Event
Streaming Platform
Process streams of events In real time, as they occur
110101
010111
001101
100010
Publish and subscribe to
streams of events
Similar to a message queue or
enterprise messaging system
110101
010111
001101
100010
Store streams of events In a fault tolerant way
110101
010111
001101
100010
4

Anatomy of a Kafka Topic
1 2 3 4 5 6 8 97Partition 1
Old New
1 2 3 4 5 6 87Partition 0 109 11 12
Partition 2 1 2 3 4 5 6 87 109 11 12
Writes
1 2 3 4 5 6 87 109 11 12
Producers
Writes
Consumer A
(offset=4)
Consumer B
(offset=7)
Reads

Kafka Connect and Kafka Streams
SinkSource
KAFKA
STREAMS
KAFKA
CONNECT
KAFKA
CONNECT
Your App
6

Stream Processing by Analogy
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt

Client Trade-offs
Consumer,
Producer
Kafka
Streams
KSQL
Flexibility
Simplicity
subscribe(),
poll(), send(),
ﬂush()
mapValues(),
ﬁlter(),
aggregate()
Select…from…
join…where…
group by..

Stream processing with Kafka
Example: Using Kafka’s Streams API for writing
elastic, scalable, fault-tolerant Java and Scala
applications
Main
Logi
c

Stream processing with Kafka
CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8;
Same example, now with KSQL.
Not a single line of Java or Scala code needed.

Data exploration Data enrichment Streaming ETL
Filter, cleanse, mask Real-time monitoring Anomaly detection
ksqlDB Example Use Cases
11

ksqlDB for
Real-Time
Monitoring
● Log data monitoring
● Tracking and alerting
● Syslog data
● Sensor / IoT data
● Application metrics
CREATE STREAM syslog_invalid_users AS
SELECT host, message
FROM syslog
WHERE message LIKE '%Invalid user%';
http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting

ksqlDB for
Anomaly
Detection
● Identify patterns or
anomalies in real-time
data, surfaced in
milliseconds
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING COUNT(*) > 3;

ksqlDB for
Streaming ETL
● Joining, ﬁltering, and
aggregating streams
of event data
CREATE STREAM vip_actions AS
SELECT user_id, page, action
FROM clickstream c
LEFT JOIN users u
ON c.user_id = u.user_id
WHERE u.level = 'Platinum';

ksqlDB for Data
Transformation
● Easily make derivations
of existing topics CREATE STREAM pageviews_avro
WITH (PARTITIONS=6,
VALUE_FORMAT='AVRO') AS
SELECT * FROM pageviews_json
PARTITION BY user_id;

Do you think that’s a table you are querying ?

Where is KSQL not such a great ﬁt?
BI reports (Tableau etc.)
•No secondary indexes
•No JDBC (most BI tools are not
good with continuous results!)
Post-fact Ad-hoc queries
•Limited span of time usually
retained in Kafka
•No indexes for random lookups

Streams & Tables
● STREAM and TABLE as ﬁrst-class citizens
● Interpretations of topic content
● STREAM - data in motion
● TABLE - collected state of a stream
• One record per key (per window)
• Current values (compacted topic)
• Changelog
● STREAM – TABLE Joins

alice 1
alice 1
charlie 1
alice 2
charlie 1
alice 2
charlie 1
bob 1
TABLE STREAM TABLE
(“alice”, 1)
(“charlie”, 1)
(“alice”, 2)
(“bob”, 1)
alice 1
alice 1
charlie 1
alice 2
charlie 1
alice 2
charlie 1
bob 1

Persistent Volumes - AWS EBS, GlusterFS, GCE Persistent Disk
External
Access
Load
Balancers
Configurations
ConfigMaps
K8 Node
KSQL Pod REST Proxy Pod
K8 Node
SR Pod Replicator PodC3 Pod
K8 Node
ZK Pod
K8 Node
ZK Pod
Confluent Operator Architecture and Deployment
Kubernetes
Cluster
Operator

user-id =
first 3 letters of first name + first 3 letters of last name
example: Patrick Druley = patdru
It’s up to 3 letters, so if either name is less just use those letters.
Go to
http://<user-id>.us-southeast.gcp.confluent-demo.io

Confluent Developer
developer.confluent.io
Learn Kafka.
Start building with
Apache Kafka at
Confluent Developer.

Project Metamorphosis
Unveiling the next-gen event
streaming platform
For Updates Visit
cnﬂ.io/pm
Jay Kreps
Co-founder and CEO
Conﬂuent

Kafka Summit 2020: Event Streaming
Everywhere will be hosted virtually.
August 24-25, 2020
Register Today
https://events.kafka-summit.org/2020

Download your Apache Kafka and Stream Processing
O'Reilly Book Bundle
Download at: https://cnﬂ.io/FM4OReillyBooks

Stay in touch!
Confluent Blog
cnfl.io/blog
Streaming Audio
cnfl.io/podcast
Try Confluent
cnfl.io/download

Thank you!
cnfl.io/meetups cnfl.io/blog cnfl.io/slack

ksqlDB Workshop

Recommended

More Related Content

What's hot (20)

Similar to ksqlDB Workshop (20)

More from confluent (20)

Recently uploaded (20)

ksqlDB Workshop