SlideShare a Scribd company logo
KSQL
The Streaming SQL Engine for Apache Kafka
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
2Confidential
1.0 Enterprise
Ready J
A Brief History of Apache Kafka and Confluent
0.11 Exactly-once
semantics
0.10 Data processing
(Streams API)
0.9 Data integration
(Connect API)
Intra-cluster
replication
0.8
2012 2014
Cluster mirroring0.7
2015 2016 20172013 2018
CP 4.1
KSQL GA
3Confidential
KSQL – A Streaming SQL Engine for Apache Kafka
4Confidential
Independent Dev / Test / Prod of independent Apps
5Confidential
No Matter Where it Runs
6KSQL- Streaming SQL for Apache Kafka
Why KSQL?
Population
CodingSophistication
Realm of Stream Processing
New, Expanded Realm
BI
Analysts
Core
Developers
Data
Engineers
Core Developers
who don’t like
Java
Kafka
Streams
KSQL
7KSQL- Streaming SQL for Apache Kafka
Shoulders of Streaming Giants
subscribe(), poll(), send(),
flush(), beginTransaction(), …
KStream, KTable, filter(), map(), flatMap(), join(),
aggregate(), transform(), …
CREATE STREAM, CREATE TABLE,
SELECT, JOIN, GROUP BY, SUM, …
KSQL UDFs
8KSQL- Streaming SQL for Apache Kafka
KSQL for Data Exploration
An easy way to inspect your data in Kafka
SHOW TOPICS;
SELECT page, user_id, status, bytes
FROM clickstream
WHERE user_agent LIKE 'Mozilla/5.0%';
PRINT 'my-topic' FROM BEGINNING;
9KSQL- Streaming SQL for Apache Kafka
KSQL for Data Transformation
Quickly make derivations of existing data in Kafka
CREATE STREAM clicks_by_user_id
WITH (PARTITIONS=6,
TIMESTAMP='view_time’
VALUE_FORMAT='JSON') AS
SELECT * FROM clickstream
PARTITION BY user_id;
Change number of partitions1
Convert data to JSON2
Repartition the data3
10KSQL- Streaming SQL for Apache Kafka
KSQL for Real-Time, Streaming ETL
Filter, cleanse, process data while it is in motion
CREATE STREAM clicks_from_vip_users AS
SELECT user_id, u.country, page, action
FROM clickstream c
LEFT JOIN users u ON c.user_id = u.user_id
WHERE u.level ='Platinum'; Pick only VIP users1
11KSQL- Streaming SQL for Apache Kafka
Example: CDC from DB via Kafka to Elastic
12KSQL- Streaming SQL for Apache Kafka
KSQL for Real-time Data Enrichment
Join data from a variety of sources to see the full picture
CREATE STREAM enriched_payments AS
SELECT payment_id, c.country, total
FROM payments_stream p
LEFT JOIN customers_table c
ON p.user_id = c.user_id;
Stream-Table Join1
13KSQL- Streaming SQL for Apache Kafka
Example: Retail
14KSQL- Streaming SQL for Apache Kafka
KSQL for Real-Time Monitoring
Derive insights from events (IoT, sensors, etc.) and turn them into actions
CREATE TABLE failing_vehicles AS
SELECT vehicle, COUNT(*)
FROM vehicle_monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE event_type = 'ERROR’
GROUP BY vehicle
HAVING COUNT(*) >= 5; Now we know to alert, and whom1
15KSQL- Streaming SQL for Apache Kafka
Example: IoT, Automotive, Connected Cars
streams
16KSQL- Streaming SQL for Apache Kafka
KSQL for Anomaly Detection
Aggregate data to identify patterns and anomalies in real-time
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 30 SECONDS)
GROUP BY card_number
HAVING COUNT(*) > 3;
Aggregate data1
… per 30-sec windows2
17KSQL- Streaming SQL for Apache Kafka
KSQL Concepts
● No need for source code
• Zero, none at all, not even one line.
• No SerDes, no generics, no lambdas, ...
● All the Kafka Streams “magic” out-of-the-box
• Exactly Once Semantics
• Windowing
• Event-time aggregation
• Late-arriving data
• Distributed, fault-tolerant, scalable, ...
18KSQL- Streaming SQL for Apache Kafka
KSQL is Equally viable for S / M / L / XL / XXL use cases
Ok. Ok. Ok.
… and KSQL is ready for production, including 24/7 support!
19KSQL- Streaming SQL for Apache Kafka
Fault-Tolerance, powered by Kafka
20KSQL- Streaming SQL for Apache Kafka
STREAM and TABLE as first-class citizens
21KSQL- Streaming SQL for Apache Kafka
WINDOWing
● Not ANSI SQL ! à Continuous Queries
• TUMBLING
• SELECT appname, ip, COUNT(appname) AS problem_count FROM
logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR'
GROUP BY appname, ip;
• HOPPING
• SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING
(size 20 second, advance by 5 second) GROUP BY itemid;
• SESSION
• SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION
(20 second) GROUP BY itemid;
22KSQL- Streaming SQL for Apache Kafka
KSQL - Components
KSQL has 3 main components:
1. The Engine which actually runs the Kafka Streams topologies
2. The REST server interface enables an Engine to receive instructions from the CLI
or any other client
3. The CLI, designed to be familiar to users of MySQL, Postgres etc.
(Note that you also need a Kafka Cluster… KSQL is deployed independently)
23KSQL- Streaming SQL for Apache Kafka
KSQL can be used interactively + programmatically
ksql> POST /query
1UI
2CLI
3REST
4Headless
24KSQL- Streaming SQL for Apache Kafka
Architecture (Client – Server Mode)
JVM
KSQL Server
KSQL CLI or any REST Client
JVM
KSQL Server
JVM
KSQL Server
Kafka Cluster
25KSQL- Streaming SQL for Apache Kafka
Architecture (Headless Mode)
JVM
KSQL Server
JVM
KSQL Server
JVM
KSQL Server
Kafka Cluster
26KSQL- Streaming SQL for Apache Kafka
Dedicating resources
Join Engines to the same
‘service pool’ by means of the
ksql.service.id property
27KSQL- Streaming SQL for Apache Kafka
User Defined Functions (UDF, UDAF)
Write UDF code in Java, mark with annotations @UdfDescription, @Udf.
SELECT address, STRINGLENGTH(address->street) FROM orders;
Make UDF available to KSQL (next slides), then use it like any other KSQL function in your queries:
The UDF name in KSQL queries is
whatever you define in the `name` field in
the annotation (here: “stringLength”).
28KSQL- Streaming SQL for Apache Kafka
KSQL Quick Start – Getting Started in Minutes!
https://docs.confluent.io/
current/quickstart/index.html
Local runtime
or
Docker container
29KSQL- Streaming SQL for Apache Kafka
Demo - Clickstream Analysis
• https://docs.confluent.io/current/ksql/docs/tutorials/clickstream-docker.html#ksql-clickstream-
docker
• Leverages Apache Kafka, Kafka Connect, KSQL, Elasticsearch and Grafana
• 5min screencast: https://www.youtube.com/watch?v=A45uRzJiv7I
• Setup in 5 minutes (with or without Docker)
SELECT STREAM
CEIL(timestamp TO HOUR) AS timeWindow, productId,
COUNT(*) AS hourlyOrders, SUM(units) AS units
FROM Orders GROUP BY CEIL(timestamp TO HOUR),
productId;
timeWindow | productId | hourlyOrders | units
------------+-----------+--------------+-------
08:00:00 | 10 | 2 | 5
08:00:00 | 20 | 1 | 8
09:00:00 | 10 | 4 | 22
09:00:00 | 40 | 1 | 45
... | ... | ... | ...
30KSQL- Streaming SQL for Apache Kafka
KSQL Recipes
https://www.confluent.io/stream-processing-cookbook
31KSQL- Streaming SQL for Apache Kafka
Resources and Next Steps
Get Involved
• Try the Quickstart on GitHub
• Check out the code
• Play with the examples
KSQL is GA… You can already use it for production deployments!
https://github.com/confluentinc/ksql
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql
KSQLis the
Streaming
SQL Engine
for
Apache Kafka
Questions?
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de

More Related Content

What's hot (20)

PDF
Real-Time Stream Processing with KSQL and Apache Kafka
confluent
 
PDF
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL
confluent
 
PDF
Streams, Tables, and Time in KSQL
confluent
 
PDF
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 
PPTX
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
PDF
Streaming ETL to Elastic with Apache Kafka and KSQL
confluent
 
PDF
Akka Streams - From Zero to Kafka
Mark Harrison
 
PDF
Akka streams kafka kinesis
Peter Vandenabeele
 
PDF
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
confluent
 
PDF
Scala usergroup stockholm - reactive integrations with akka streams
Johan Andrén
 
PDF
Streaming Microservices With Akka Streams And Kafka Streams
Lightbend
 
PDF
Chti jug - 2018-06-26
Florent Ramiere
 
PDF
Riviera Jug - 20/03/2018 - KSQL
Florent Ramiere
 
PPTX
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
confluent
 
PPTX
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward
 
PDF
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Lightbend
 
PDF
Interactive Kafka Streams
confluent
 
PDF
Performance Analysis and Optimizations for Kafka Streams Applications
Guozhang Wang
 
PDF
Integrating Apache Kafka Into Your Environment
confluent
 
PDF
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
confluent
 
Real-Time Stream Processing with KSQL and Apache Kafka
confluent
 
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL
confluent
 
Streams, Tables, and Time in KSQL
confluent
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
Streaming ETL to Elastic with Apache Kafka and KSQL
confluent
 
Akka Streams - From Zero to Kafka
Mark Harrison
 
Akka streams kafka kinesis
Peter Vandenabeele
 
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
confluent
 
Scala usergroup stockholm - reactive integrations with akka streams
Johan Andrén
 
Streaming Microservices With Akka Streams And Kafka Streams
Lightbend
 
Chti jug - 2018-06-26
Florent Ramiere
 
Riviera Jug - 20/03/2018 - KSQL
Florent Ramiere
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
confluent
 
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Lightbend
 
Interactive Kafka Streams
confluent
 
Performance Analysis and Optimizations for Kafka Streams Applications
Guozhang Wang
 
Integrating Apache Kafka Into Your Environment
confluent
 
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
confluent
 

Similar to Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - Codemotion Berlin 2018 (20)

PDF
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
PDF
KSQL – An Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
PDF
KSQL---Streaming SQL for Apache Kafka
Matthias J. Sax
 
PDF
KSQL Intro
confluent
 
PPTX
Real Time Stream Processing with KSQL and Kafka
David Peterson
 
ODP
KSQL- Streaming Sql for Kafka
Knoldus Inc.
 
PDF
KSQL: Open Source Streaming for Apache Kafka
confluent
 
PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
PDF
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
confluent
 
PDF
Chicago Kafka Meetup
Cliff Gilmore
 
PDF
APAC ksqlDB Workshop
confluent
 
PDF
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
confluent
 
PDF
ksqlDB Workshop
confluent
 
PDF
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Matt Stubbs
 
PDF
KSQL - Stream Processing simplified!
Guido Schmutz
 
PDF
Paris jug ksql - 2018-06-28
Florent Ramiere
 
PDF
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Kai Wähner
 
PPTX
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
KafkaZone
 
PDF
Concepts and Patterns for Streaming Services with Kafka
QAware GmbH
 
PPTX
Exploring KSQL Patterns
confluent
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
KSQL – An Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
KSQL---Streaming SQL for Apache Kafka
Matthias J. Sax
 
KSQL Intro
confluent
 
Real Time Stream Processing with KSQL and Kafka
David Peterson
 
KSQL- Streaming Sql for Kafka
Knoldus Inc.
 
KSQL: Open Source Streaming for Apache Kafka
confluent
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
confluent
 
Chicago Kafka Meetup
Cliff Gilmore
 
APAC ksqlDB Workshop
confluent
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
confluent
 
ksqlDB Workshop
confluent
 
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Matt Stubbs
 
KSQL - Stream Processing simplified!
Guido Schmutz
 
Paris jug ksql - 2018-06-28
Florent Ramiere
 
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Kai Wähner
 
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
KafkaZone
 
Concepts and Patterns for Streaming Services with Kafka
QAware GmbH
 
Exploring KSQL Patterns
confluent
 
Ad

More from Codemotion (20)

PDF
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
PDF
Pompili - From hero to_zero: The FatalNoise neverending story
Codemotion
 
PPTX
Pastore - Commodore 65 - La storia
Codemotion
 
PPTX
Pennisi - Essere Richard Altwasser
Codemotion
 
PPTX
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Codemotion
 
PPTX
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Codemotion
 
PPTX
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Codemotion
 
PPTX
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Codemotion
 
PDF
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Codemotion
 
PDF
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Codemotion
 
PDF
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Codemotion
 
PDF
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Codemotion
 
PDF
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Codemotion
 
PDF
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Codemotion
 
PPTX
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Codemotion
 
PPTX
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Codemotion
 
PDF
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Codemotion
 
PDF
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Codemotion
 
PDF
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Codemotion
 
PDF
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Codemotion
 
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Codemotion
 
Pastore - Commodore 65 - La storia
Codemotion
 
Pennisi - Essere Richard Altwasser
Codemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Codemotion
 
Ad

Recently uploaded (20)

PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 

Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - Codemotion Berlin 2018

  • 1. KSQL The Streaming SQL Engine for Apache Kafka Kai Waehner Technology Evangelist [email protected] LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de
  • 2. 2Confidential 1.0 Enterprise Ready J A Brief History of Apache Kafka and Confluent 0.11 Exactly-once semantics 0.10 Data processing (Streams API) 0.9 Data integration (Connect API) Intra-cluster replication 0.8 2012 2014 Cluster mirroring0.7 2015 2016 20172013 2018 CP 4.1 KSQL GA
  • 3. 3Confidential KSQL – A Streaming SQL Engine for Apache Kafka
  • 4. 4Confidential Independent Dev / Test / Prod of independent Apps
  • 6. 6KSQL- Streaming SQL for Apache Kafka Why KSQL? Population CodingSophistication Realm of Stream Processing New, Expanded Realm BI Analysts Core Developers Data Engineers Core Developers who don’t like Java Kafka Streams KSQL
  • 7. 7KSQL- Streaming SQL for Apache Kafka Shoulders of Streaming Giants subscribe(), poll(), send(), flush(), beginTransaction(), … KStream, KTable, filter(), map(), flatMap(), join(), aggregate(), transform(), … CREATE STREAM, CREATE TABLE, SELECT, JOIN, GROUP BY, SUM, … KSQL UDFs
  • 8. 8KSQL- Streaming SQL for Apache Kafka KSQL for Data Exploration An easy way to inspect your data in Kafka SHOW TOPICS; SELECT page, user_id, status, bytes FROM clickstream WHERE user_agent LIKE 'Mozilla/5.0%'; PRINT 'my-topic' FROM BEGINNING;
  • 9. 9KSQL- Streaming SQL for Apache Kafka KSQL for Data Transformation Quickly make derivations of existing data in Kafka CREATE STREAM clicks_by_user_id WITH (PARTITIONS=6, TIMESTAMP='view_time’ VALUE_FORMAT='JSON') AS SELECT * FROM clickstream PARTITION BY user_id; Change number of partitions1 Convert data to JSON2 Repartition the data3
  • 10. 10KSQL- Streaming SQL for Apache Kafka KSQL for Real-Time, Streaming ETL Filter, cleanse, process data while it is in motion CREATE STREAM clicks_from_vip_users AS SELECT user_id, u.country, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level ='Platinum'; Pick only VIP users1
  • 11. 11KSQL- Streaming SQL for Apache Kafka Example: CDC from DB via Kafka to Elastic
  • 12. 12KSQL- Streaming SQL for Apache Kafka KSQL for Real-time Data Enrichment Join data from a variety of sources to see the full picture CREATE STREAM enriched_payments AS SELECT payment_id, c.country, total FROM payments_stream p LEFT JOIN customers_table c ON p.user_id = c.user_id; Stream-Table Join1
  • 13. 13KSQL- Streaming SQL for Apache Kafka Example: Retail
  • 14. 14KSQL- Streaming SQL for Apache Kafka KSQL for Real-Time Monitoring Derive insights from events (IoT, sensors, etc.) and turn them into actions CREATE TABLE failing_vehicles AS SELECT vehicle, COUNT(*) FROM vehicle_monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE event_type = 'ERROR’ GROUP BY vehicle HAVING COUNT(*) >= 5; Now we know to alert, and whom1
  • 15. 15KSQL- Streaming SQL for Apache Kafka Example: IoT, Automotive, Connected Cars streams
  • 16. 16KSQL- Streaming SQL for Apache Kafka KSQL for Anomaly Detection Aggregate data to identify patterns and anomalies in real-time CREATE TABLE possible_fraud AS SELECT card_number, COUNT(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 30 SECONDS) GROUP BY card_number HAVING COUNT(*) > 3; Aggregate data1 … per 30-sec windows2
  • 17. 17KSQL- Streaming SQL for Apache Kafka KSQL Concepts ● No need for source code • Zero, none at all, not even one line. • No SerDes, no generics, no lambdas, ... ● All the Kafka Streams “magic” out-of-the-box • Exactly Once Semantics • Windowing • Event-time aggregation • Late-arriving data • Distributed, fault-tolerant, scalable, ...
  • 18. 18KSQL- Streaming SQL for Apache Kafka KSQL is Equally viable for S / M / L / XL / XXL use cases Ok. Ok. Ok. … and KSQL is ready for production, including 24/7 support!
  • 19. 19KSQL- Streaming SQL for Apache Kafka Fault-Tolerance, powered by Kafka
  • 20. 20KSQL- Streaming SQL for Apache Kafka STREAM and TABLE as first-class citizens
  • 21. 21KSQL- Streaming SQL for Apache Kafka WINDOWing ● Not ANSI SQL ! à Continuous Queries • TUMBLING • SELECT appname, ip, COUNT(appname) AS problem_count FROM logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR' GROUP BY appname, ip; • HOPPING • SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING (size 20 second, advance by 5 second) GROUP BY itemid; • SESSION • SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20 second) GROUP BY itemid;
  • 22. 22KSQL- Streaming SQL for Apache Kafka KSQL - Components KSQL has 3 main components: 1. The Engine which actually runs the Kafka Streams topologies 2. The REST server interface enables an Engine to receive instructions from the CLI or any other client 3. The CLI, designed to be familiar to users of MySQL, Postgres etc. (Note that you also need a Kafka Cluster… KSQL is deployed independently)
  • 23. 23KSQL- Streaming SQL for Apache Kafka KSQL can be used interactively + programmatically ksql> POST /query 1UI 2CLI 3REST 4Headless
  • 24. 24KSQL- Streaming SQL for Apache Kafka Architecture (Client – Server Mode) JVM KSQL Server KSQL CLI or any REST Client JVM KSQL Server JVM KSQL Server Kafka Cluster
  • 25. 25KSQL- Streaming SQL for Apache Kafka Architecture (Headless Mode) JVM KSQL Server JVM KSQL Server JVM KSQL Server Kafka Cluster
  • 26. 26KSQL- Streaming SQL for Apache Kafka Dedicating resources Join Engines to the same ‘service pool’ by means of the ksql.service.id property
  • 27. 27KSQL- Streaming SQL for Apache Kafka User Defined Functions (UDF, UDAF) Write UDF code in Java, mark with annotations @UdfDescription, @Udf. SELECT address, STRINGLENGTH(address->street) FROM orders; Make UDF available to KSQL (next slides), then use it like any other KSQL function in your queries: The UDF name in KSQL queries is whatever you define in the `name` field in the annotation (here: “stringLength”).
  • 28. 28KSQL- Streaming SQL for Apache Kafka KSQL Quick Start – Getting Started in Minutes! https://docs.confluent.io/ current/quickstart/index.html Local runtime or Docker container
  • 29. 29KSQL- Streaming SQL for Apache Kafka Demo - Clickstream Analysis • https://docs.confluent.io/current/ksql/docs/tutorials/clickstream-docker.html#ksql-clickstream- docker • Leverages Apache Kafka, Kafka Connect, KSQL, Elasticsearch and Grafana • 5min screencast: https://www.youtube.com/watch?v=A45uRzJiv7I • Setup in 5 minutes (with or without Docker) SELECT STREAM CEIL(timestamp TO HOUR) AS timeWindow, productId, COUNT(*) AS hourlyOrders, SUM(units) AS units FROM Orders GROUP BY CEIL(timestamp TO HOUR), productId; timeWindow | productId | hourlyOrders | units ------------+-----------+--------------+------- 08:00:00 | 10 | 2 | 5 08:00:00 | 20 | 1 | 8 09:00:00 | 10 | 4 | 22 09:00:00 | 40 | 1 | 45 ... | ... | ... | ...
  • 30. 30KSQL- Streaming SQL for Apache Kafka KSQL Recipes https://www.confluent.io/stream-processing-cookbook
  • 31. 31KSQL- Streaming SQL for Apache Kafka Resources and Next Steps Get Involved • Try the Quickstart on GitHub • Check out the code • Play with the examples KSQL is GA… You can already use it for production deployments! https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql