SlideShare a Scribd company logo
Kafka Streams
From pub/sub to a complete
stream processing platform
Kafka Meetup Utrecht
Thursday, 8th June 2017
< paolo @ confluent.io >
https://www.confluent.io/blog/stream-data-platform-1/
Industry shift from Big Data
to Fast Data and Stream Processing
$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Apache Kafka APIs and UNIX analogy
$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Connect APIs
Apache Kafka APIs and UNIX analogy
$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Producer/Consumer APIs
Apache Kafka APIs and UNIX analogy
$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Streams APIs
Apache Kafka APIs and UNIX analogy
Streams APIs
part of Apache Kafka
http://kafka.apache.org/documentation/streams
http://docs.confluent.io/current/streams
Build applications, not clusters
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.10.2.1</version>
</dependency>
Spot the difference(s)
How do I run in production?
How do I run in production?
As any other Java applications...
How do I run in production?
Uncool Cool
How do I run in production?
http://docs.confluent.io/current/streams/introduction.html
Elastic and scalable
http://docs.confluent.io/current/streams/developer-guide.html#elastic-scaling-of-your-application
Elastic and scalable
http://docs.confluent.io/current/streams/developer-guide.html#elastic-scaling-of-your-application
Elastic and scalable
http://docs.confluent.io/current/streams/developer-guide.html#elastic-scaling-of-your-application
Typical high level architecture
Typical high level architecture
Real-time
Data
Ingestion
Typical high level architecture
Stream
Processing
Storage
Real-time
Data
Ingestion
Typical high level architecture
Data
Publishing /
Visualization
Stream
Processing
Storage
Real-time
Data
Ingestion
How many clusters do you count?
NoSQL
(Cassandra,
HBase,
Couchbase,
MongoDB, …)
or
Elasticsearch,
Solr,
…
Storm, Flink,
Spark
Streaming,
Ignite, Akka
Streams, Apex,
…
HDFS, NFS,
Ceph,
GlusterFS,
Lustre,
...
Apache Kafka
Simplicity is the ultimate sophistication
Apache Kafka
and Kafka Streams APIs
Stream Processing Platform
Publish & Subscribe
to streams of data like a
messaging system
Store
streams of data safely in a
distributed replicated cluster
Process
streams of data efficiently
and in real-time
Node.js
Duality of Streams and Tables
http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
Duality of Streams and Tables
http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
Interactive Queries
http://docs.confluent.io/current/streams/developer-guide.html#streams-developer-guide-interactive-queries
Interactive Queries
http://docs.confluent.io/current/streams/developer-guide.html#streams-developer-guide-interactive-queries
Kafka Streams DSL
http://docs.confluent.io/current/streams/developer-guide.html#kafka-streams-dsl
WorldCount (and Java 8)
WordCountLambdaExample.java
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-lambda-example");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
...
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();
final KStreamBuilder builder = new KStreamBuilder();
final KStream<String, String> textLines = builder.stream(stringSerde, stringSerde, "TextLinesTopic");
final Pattern pattern = Pattern.compile("W+", Pattern.UNICODE_CHARACTER_CLASS);
final KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase())))
.groupBy((key, word) -> word)
.count("Counts");
wordCounts.to(stringSerde, longSerde, "WordsWithCountsTopic");
final KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration);
streams.cleanUp();
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
Easy to Develop, Easy to Test
WordCountLambdaIntegrationTest.java
EmbeddedSingleNodeKafkaCluster CLUSTER =
new EmbeddedSingleNodeKafkaCluster();
…
CLUSTER.createTopic(inputTopic);
…
Properties producerConfig = new Properties();
producerConfig.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
CLUSTER.bootstrapServers());
Apache Kafka and Streams APIs benefits
• Build applications, not clusters
• Native integration with Apacke Kafka
• Elastic, fast, distributed, fault-tolerant, secure
• Scalable: S, M, L, XL, XXL
• Run everywhere: from containers to cloud
• Streams (with KStream) and tables (with KTable)
• Local state replicated to Kafka for fault-tolerance
• Windowing and event time semantics out of the box
• Supports late-arriving and out-of-order events
References
• http://kafka.apache.org/
• http://kafka.apache.org/documentation/streams/
• http://docs.confluent.io/
• http://docs.confluent.io/current/streams/
• http://docs.confluent.io/current/streams/javadocs/
• http://blog.confluent.io/
• http://github.com/confluentinc/examples/
• http://github.com/apache/kafka/tree/trunk/streams/
References
The easiest way to get you started
https://www.confluent.io/download/
SIMPLICITY
WE
YOUR FEEDBACK!
Discount code: kafcom17
‪Use the Apache Kafka community discount code to get $50 off
‪www.kafka-summit.org
Kafka Summit San Francisco: August 28
Presented by

More Related Content

What's hot (20)

PPSX
وتحديد الاهدافLادارة الوقت
Mostafa Elgamala
 
DOC
Word اسئلة امتحان
Mahmoud Soliman
 
PPTX
عرض الأرشفة الالكترونية
Ashraf Ghareeb
 
PDF
8نصائح التميز في التحدث أمام الجمهور.pdf
HalaMiniawi
 
PDF
التواصل غير اللفظي دليل لغة الجسم
Prof. Mohamed Belal
 
PPT
قدرتك علي الاقناع والتأثير علي الاخرين
Mandour Abdel-salam (E-mail)
 
PPTX
استراتيجيات فن الاقناع لأي مُخَاطَب
A. M. Wadi Qualitytcourse
 
DOC
مذكرة صيانة الحاسب الآلي
وليد العليان
 
PDF
Encryption - التشفير
abdullah_al-shehri
 
PDF
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
PPTX
Introduction to Operating Systems
Trinity Dwarka
 
PDF
نظام ارشفة و ادارة الوثائق - برنامج فكرة للارشفة الالكترونية
برنامج فكرة للاتصالات الادارية
 
PPT
الارشفة الالكترونية1
Ibrahim Alhariri
 
PDF
إدارة الجودة الشاملة و ايزو 9001
Qualitas Business Academy - Sweden
 
PDF
تحسين المشاركة السياسية لشباب على إمتداد الدورة الإنتخابية
Jamaity
 
PDF
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...
Essam Obaid
 
PPTX
Using Hadoop to build a Data Quality Service for both real-time and batch data
DataWorks Summit/Hadoop Summit
 
PDF
تطوير الأداء وفق منهجية كايزن اليابانية l - م.12 - مبادرة #تواصل _ تطوير - ال...
Egyptian Engineers Association
 
PPTX
فن إدارة الوقت
aljoharahks
 
PPTX
تنظيم إدارة الجودة الشاملة
Izzeddin AlAtari
 
وتحديد الاهدافLادارة الوقت
Mostafa Elgamala
 
Word اسئلة امتحان
Mahmoud Soliman
 
عرض الأرشفة الالكترونية
Ashraf Ghareeb
 
8نصائح التميز في التحدث أمام الجمهور.pdf
HalaMiniawi
 
التواصل غير اللفظي دليل لغة الجسم
Prof. Mohamed Belal
 
قدرتك علي الاقناع والتأثير علي الاخرين
Mandour Abdel-salam (E-mail)
 
استراتيجيات فن الاقناع لأي مُخَاطَب
A. M. Wadi Qualitytcourse
 
مذكرة صيانة الحاسب الآلي
وليد العليان
 
Encryption - التشفير
abdullah_al-shehri
 
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
Introduction to Operating Systems
Trinity Dwarka
 
نظام ارشفة و ادارة الوثائق - برنامج فكرة للارشفة الالكترونية
برنامج فكرة للاتصالات الادارية
 
الارشفة الالكترونية1
Ibrahim Alhariri
 
إدارة الجودة الشاملة و ايزو 9001
Qualitas Business Academy - Sweden
 
تحسين المشاركة السياسية لشباب على إمتداد الدورة الإنتخابية
Jamaity
 
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...
Essam Obaid
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
DataWorks Summit/Hadoop Summit
 
تطوير الأداء وفق منهجية كايزن اليابانية l - م.12 - مبادرة #تواصل _ تطوير - ال...
Egyptian Engineers Association
 
فن إدارة الوقت
aljoharahks
 
تنظيم إدارة الجودة الشاملة
Izzeddin AlAtari
 

Similar to Kafka streams - From pub/sub to a complete stream processing platform (20)

PDF
Introducción a Stream Processing utilizando Kafka Streams
confluent
 
PDF
Apache kafka-a distributed streaming platform
confluent
 
PDF
Apache Kafka - A Distributed Streaming Platform
Paolo Castagna
 
PPTX
Kafka Streams for Java enthusiasts
Slim Baltagi
 
PDF
How to Build Streaming Apps with Confluent II
confluent
 
PDF
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 
PPTX
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Michael Noll
 
PDF
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
confluent
 
PDF
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PDF
Streaming all over the world Real life use cases with Kafka Streams
confluent
 
PDF
JHipster conf 2019 - Kafka Ecosystem
Florent Ramiere
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PDF
Introducing Kafka's Streams API
confluent
 
PDF
Riviera Jug - 20/03/2018 - Kafka streams
Florent Ramiere
 
PDF
Introduction to Apache Kafka and Confluent... and why they matter!
Paolo Castagna
 
PPTX
Exactly-once Stream Processing with Kafka Streams
Guozhang Wang
 
PDF
ksqlDB Workshop
confluent
 
PDF
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 
PDF
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
confluent
 
Introducción a Stream Processing utilizando Kafka Streams
confluent
 
Apache kafka-a distributed streaming platform
confluent
 
Apache Kafka - A Distributed Streaming Platform
Paolo Castagna
 
Kafka Streams for Java enthusiasts
Slim Baltagi
 
How to Build Streaming Apps with Confluent II
confluent
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Michael Noll
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
confluent
 
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
Introduction to Kafka Streams
Guozhang Wang
 
Streaming all over the world Real life use cases with Kafka Streams
confluent
 
JHipster conf 2019 - Kafka Ecosystem
Florent Ramiere
 
Stream processing using Kafka
Knoldus Inc.
 
Introducing Kafka's Streams API
confluent
 
Riviera Jug - 20/03/2018 - Kafka streams
Florent Ramiere
 
Introduction to Apache Kafka and Confluent... and why they matter!
Paolo Castagna
 
Exactly-once Stream Processing with Kafka Streams
Guozhang Wang
 
ksqlDB Workshop
confluent
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
confluent
 
Ad

More from Paolo Castagna (6)

PDF
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
PDF
Message Driven and Event Sourcing
Paolo Castagna
 
PDF
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Paolo Castagna
 
PDF
IoT Data Platforms
Paolo Castagna
 
PDF
Confluent and Elastic
Paolo Castagna
 
PDF
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Paolo Castagna
 
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Message Driven and Event Sourcing
Paolo Castagna
 
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Paolo Castagna
 
IoT Data Platforms
Paolo Castagna
 
Confluent and Elastic
Paolo Castagna
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Paolo Castagna
 
Ad

Recently uploaded (20)

PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 

Kafka streams - From pub/sub to a complete stream processing platform