SlideShare a Scribd company logo
1Confidential
Apache Kafka + Machine Learning
Analytic Models Applied to Real Time Stream Processing
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.kai-waehner.de
2Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
4) Online Training of Models
3Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
4) Online Training of Models
4Apache Kafka and Machine Learning
Machine Learning
... allows computers to find hidden insights without being
explicitly programmed where to look.
5Apache Kafka and Machine Learning
Real World Examples of Machine Learning
Spam Detection
Search Results +
Product Recommendation
Picture Detection
(Friends, Locations, Products)
Your Company
The Next Disruption:
Google Beats Go Champion
6Apache Kafka and Machine Learning
Leverage Machine Learning to Analyze and Act on Critical Business Moments
Seconds Minutes Hours
Price
Optimization
Predictive
Maintenance
Fraud
Detection
Cross
Selling
Transportation
Rerouting
Customer
Service
Inventory
Management
Windows of Opportunity
7Apache Kafka and Machine Learning
How to realize
these use cases?
8Apache Kafka and Machine Learning
Big Data Analytics
Volume
(terabytes,
petabytes)
Variety
(social networks,
blog posts, logs,
sensors, etc.)
Velocity
(„real time“)
Value
9Apache Kafka and Machine Learning
Big Data Analytics for Actionable Insights
From Insight to Action
(continuously closed loop)
10Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming
Producer
…..
DWH
Data
Integration
C
O
N
N
E
C
T
C
O
N
N
E
C
T
Data	Lake
Model
Building
Batch
Real
Time
Stream
Processing
REST
Interface
IoT Device
Mobile App
Streaming
Consumer
C
O
N
N
E
C
T
C
O
N
N
E
C
T
BI Tool
Messaging
Web
Application
Model
Schema Registry
/ Governance
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
11Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
4) Online Training of Models
12Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming
Producer
…..
DWH
Data
Integration
C
O
N
N
E
C
T
C
O
N
N
E
C
T
Data	Lake
Model
Building
Batch
Real
Time
Stream
Processing
REST
Interface
IoT Device
Mobile App
Streaming
Consumer
C
O
N
N
E
C
T
C
O
N
N
E
C
T
BI Tool
Messaging
Web
Application
Model
Schema Registry
/ Governance
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
13Apache Kafka and Machine Learning
Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Writing
source code
is not the
time-consuming
task!
!
14Apache Kafka and Machine Learning
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Execution
6. Model Validation
7. Deployment
15Apache Kafka and Machine Learning
Data Access
Find insights to create
added business value
by correlating
various data sources!
16Apache Kafka and Machine Learning
Data Preparation
http://www.slideshare.net/odsc/feature-engineering
Data Preparation
17Apache Kafka and Machine Learning
Exploratory Data Analysis
© Copyright 2000-2017 TIBCO Software Inc.
• Scripting
• Visual Analytics
• Machine Learning
18Apache Kafka and Machine Learning
Model Building
A model is a simplification of the truth
that helps you with decision making.
19Apache Kafka and Machine Learning
Model Execution (Coding)
Apply Model
to New Data
20Apache Kafka and Machine Learning
Model Execution (Tooling)
Apply Model
to New Data
21Apache Kafka and Machine Learning
Model Validation
https://genome.tugraz.at/proclassify/help/pages/XV.html
Cross-Validation
Procedure
22Apache Kafka and Machine Learning
Frameworks
and Tooling?
23Apache Kafka and Machine Learning
Languages, Frameworks and Tools
Many more ….
Portable Format
for Analytics (PFA)
24Apache Kafka and Machine Learning
Live Demos with Open Source Technologies
Development of Analytic Models
with R, TensorFlow, Apache Spark, H2O.ai, RapidMiner
25Apache Kafka and Machine Learning
Live Demo
Use Case:
Customer Churn Prediction
Machine Learning Algorithm:
Generalized Linear Model (GLM)
using Logistic Regression
Technology:
Open Source R
26Apache Kafka and Machine Learning
Live Demo
Use Case:
Airline Flight Delay Prediction
Machine Learning Algorithm:
Gradient Boosted Machines (GBM)
using Decision Trees
Technology:
H2O.ai
27Apache Kafka and Machine Learning
Live Demo
Use Case:
Predictive Maintenance
(Anomaly Detection in Telco Networks)
Deep Learning Algorithm:
Artificial Neural Networks (ANN)
using Autoencoders
Technology:
TensorFlow + Python API
28Apache Kafka and Machine Learning
Live Demo
Use Case:
Classification
(Prediction of Titanic Survivors)
Deep Learning Algorithm:
Recurrent Neural Networks (RNN)
Technology:
RapidMiner
29Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
4) Online Training of Models
30Apache Kafka and Machine Learning
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Execution
6. Model Validation
7. Deployment
31Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming
Producer
…..
DWH
Data
Integration
C
O
N
N
E
C
T
C
O
N
N
E
C
T
Data	Lake
Model
Building
Batch
Real
Time
Stream
Processing
REST
Interface
IoT Device
Mobile App
Streaming
Consumer
C
O
N
N
E
C
T
C
O
N
N
E
C
T
BI Tool
Messaging
Web
Application
Model
Schema Registry
/ Governance
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
32Apache Kafka and Machine Learning
Definition of Stream Processsing
Data at Rest Data in Motion
33Apache Kafka and Machine Learning
Key Concepts
34Apache Kafka and Machine Learning
Key Concepts
35Apache Kafka and Machine Learning
Key Concepts
36Apache Kafka and Machine Learning
Stream Processing
Use Cases
• Real Time Applications
• Stateful Streaming Analytics
• Stateless “Real Time ETL”
37Apache Kafka and Machine Learning
Event Processing Windows
Various Options for Windowing (Fixed, Sliding, Session, …)
38Apache Kafka and Machine Learning
How to
apply analytic models
to real time processing
without redevelopment?
39Apache Kafka and Machine Learning
Application of Analytic Models to Real Time without Redevelopment
Stream
Processing
H20.ai
R
Python
Spark ML
MATLAB
SAS
PMML
40Apache Kafka and Machine Learning
Streaming Analytics - Processing Pipeline
APIs
Adapters /
Channels
Integration
Messaging
Stream
Ingest
Transformation
Aggregation
Enrichment
Filtering
Stream
Preprocessing
Process
Management
Analytics
(Real Time)
Applications
& APIs
Analytics /
DW Reporting
Stream
Outcomes
• Contextual Rules
• Windowing
• Patterns
• Analytics
• Machine Learning
• …
Stream
Analytics
Index / SearchNormalization
Applying an Analytic Model
is just a piece of the puzzle!
41Apache Kafka and Machine Learning
Frameworks
and Tooling?
42Apache Kafka and Machine Learning
Frameworks and Products
OPEN SOURCE CLOSED SOURCE
PRODUCT
FRAMEWORK
Azure Microsoft
Stream Analytics
43Apache Kafka and Machine Learning
When to use Kafka Streams for Stream Processing?
44Apache Kafka and Machine Learning
When to use Kafka Streams for Stream Processing?
No need for a
Big Data cluster
Deploy in your
existing infrastructure
Kafka manages
scalability / fail-over
Focus on development
of business logic
in your department
45Apache Kafka and Machine Learning
Kafka Streams
Map, filter, aggregate,
apply analytic model,
„any business logic“
Input Stream
(Kafka Topic)
Kafka Cluster
Output Stream
(Kafka Topic)
Kafka Cluster
Stream Processing
Microservice
(Kafka Streams)
Deployed anywhere:
Docker, Kubernetes,
Mesos, Java App, …
46Apache Kafka and Machine Learning
A complete streaming microservices, ready for production at large-scale
Word
Count
App configuration
Define processing
(here: WordCount)
Start processing
47Apache Kafka and Machine Learning
Confluent Platform: the Free, Open-Source Streaming Platform
Open Source ExternalCommercial
Confluent Platform
Monitoring
Analytics
Custom Apps
Transformations
Real-time
Applications
…
CRM
Data Warehouse
Database
Hadoop
Data
Integration
…
Control Center
Auto-data
Balancing
Multi-Data
Center Replication
24/7 Support
Supported
Connectors
Clients
Schema
Registry
REST
Proxy
Apache Kafka
Kafka
Connect
Kafka
Streams
Kafka
Core
Database Changes Log Events loT Data Web Events …
48Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming
Producer
…..
DWH
Data
Integration
C
O
N
N
E
C
T
C
O
N
N
E
C
T
Data	Lake
Model
Building
Batch
Real
Time
Stream
Processing
REST
Interface
IoT Device
Mobile App
Streaming
Consumer
C
O
N
N
E
C
T
C
O
N
N
E
C
T
BI Tool
Messaging
Web
Application
Model
Schema Registry
/ Governance
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
49Apache Kafka and Machine Learning
STREAMING PLATFORM
BIG DATAANALYTICS
Oracle DB
CoaP IoT
Kafka
Java Client
…..
HP Vertica
Data
Integration
F
L
U
M
E
H2O.ai,
Spark,
TensorFlow
Batch
Real
Time
Confluent
REST Proxy
MQTT IoT
iPhone App
Kafka
Go Client
C
K O
A N
F N
K E
A C
T
H
I
V
E
Grafana
Kafka
Java EE
Web App
Hadoop
C
K O
A N
F N
K E
A C
T
Confluent
Schema Registry
Kafka Streams
H2O.ai
Mesos
Kafka Streams
TensorFlow
Kubernetes
Avro
Avro
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
50Apache Kafka and Machine Learning
Live Demos with Open Source Technologies
Development of Analytic Models
with Apache Kafka Messaging, Kafka Streams, Kafka Connect, Confluent Schema Registry
51Apache Kafka and Machine Learning
Live Demo
Use Case:
Airline Flight Delay Prediction
Machine Learning Algorithm:
Any! (in our example, H2O.ai GBM)
Streaming Platform:
Apache Kafka Core, Kafka Connect,
Kafka Streams, Confluent Schema Registry
52Apache Kafka and Machine Learning
H2O.ai Model + Kafka Streams
Filter
Map
1) Create H2O ML model
2) Configure Kafka Streams Application
3) Apply H2O ML model to Streaming Data
4) Start Kafka Streams App
53Apache Kafka and Machine Learning
End-to-End Stream Monitoring and Alerting
Confluent Control Center
Data Stream Monitoring and Alerting
Multi-cluster monitoring and management
Kafka Connect Configuration
• Message delivery?
• Delays?
• Where got it stuck?
• Lost messages?
• Broker issues?
• Performance?
http://docs.confluent.io/3.2.0/control-center/docs/monitoring.html
54Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
4) Online Training of Models
55Apache Kafka and Machine Learning
Let’s improve
the analytic model
continuously…
56Apache Kafka and Machine Learning
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Execution
6. Model Validation
7. Deployment
Online
Training
Continuously train and improve the model with every new event
57Apache Kafka and Machine Learning
Online Model Training of Analytic Models
How to improve models?
1.Manual Update
2.Automated Batch
3.Real Time
58Apache Kafka and Machine Learning
STREAMING PLATFORM
BIG DATAANALYTICS
F
L
U
M
E
H2O.ai,
Spark,
TensorFlow
H
I
V
E
Kafka
Hadoop
Confluent
Schema Registry
Kafka Streams
H2O.ai
Mesos
Kafka Streams
TensorFlow
Kubernetes
Avro
Avro
1) Get new Input Event
via Kafka Topic
2) Improve Model in
Big Data Cluster
3) Update deployed Model
via Kafka Topic
4) Leverage
Improved Model
for new Events
59Apache Kafka and Machine Learning
Caveats for Online Model Training
• Processes and infrastructure not ready
• Validation needed before production
• Slows down the system
• Only a few ML implementations supported
• Many use cases do not need it
60Apache Kafka and Machine Learning
Key Take-Aways
Ø Insights are hidden in Historical Data on Big Data Platforms
Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models
Ø Streaming Platform uses these Models (without Redevelopment) to take Action in Real Time
61Apache Kafka and Machine Learning
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
@KaiWaehner
www.kai-waehner.de
LinkedIn
Questions? Feedback?
Please contact me!

More Related Content

What's hot (20)

PDF
Kafka Streams: What it is, and how to use it?
confluent
 
PDF
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
PDF
Apache Flink internals
Kostas Tzoumas
 
PPTX
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
PPTX
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
PPTX
NiFi Best Practices for the Enterprise
Gregory Keys
 
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
PPTX
Apache Flink and what it is used for
Aljoscha Krettek
 
PDF
Mainframe Integration, Offloading and Replacement with Apache Kafka
Kai Wähner
 
PPTX
Spark
Koushik Mondal
 
PPTX
Apache Flink Deep Dive
DataWorks Summit
 
PDF
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
confluent
 
PDF
Intro to HBase
alexbaranau
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
PPSX
Apache Flink, AWS Kinesis, Analytics
Araf Karsh Hamid
 
PDF
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
PDF
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
PDF
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
confluent
 
Kafka Streams: What it is, and how to use it?
confluent
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
Apache Flink internals
Kostas Tzoumas
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
Introduction to Kafka Streams
Guozhang Wang
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
NiFi Best Practices for the Enterprise
Gregory Keys
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
Apache Flink and what it is used for
Aljoscha Krettek
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Kai Wähner
 
Apache Flink Deep Dive
DataWorks Summit
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
confluent
 
Intro to HBase
alexbaranau
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Apache Flink, AWS Kinesis, Analytics
Araf Karsh Hamid
 
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
confluent
 

Viewers also liked (6)

PDF
Get your mobile app in production in 3 months: Backend
Ackee
 
PDF
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Kai Wähner
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PPTX
Streaming platform Kafka in SK planet
Byeongsu Kang
 
PPTX
Spark machine learning & deep learning
hoondong kim
 
PDF
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Kai Wähner
 
Get your mobile app in production in 3 months: Backend
Ackee
 
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Kai Wähner
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Streaming platform Kafka in SK planet
Byeongsu Kang
 
Spark machine learning & deep learning
hoondong kim
 
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Kai Wähner
 
Ad

Similar to Apache Kafka Streams + Machine Learning / Deep Learning (20)

PDF
Kai Wähner, Technology Evangelist at Confluent: "Development of Scalable Mac...
Dataconomy Media
 
PDF
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Kai Wähner
 
PDF
How to Leverage the Apache Kafka Ecosystem to Productionize Machine Learning ...
Codemotion
 
PDF
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Kai Wähner
 
PDF
Machine Learning Trends of 2018 combined with the Apache Kafka Ecosystem
Kai Wähner
 
PDF
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
PDF
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Codemotion
 
PDF
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Kai Wähner
 
PDF
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
PDF
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
confluent
 
PDF
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
PDF
Apache Kafka as Event Streaming Platform for Microservice Architectures
Kai Wähner
 
PDF
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
confluent
 
PDF
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
confluent
 
PPTX
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Slim Baltagi
 
PPTX
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Slim Baltagi
 
PPTX
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
DataWorks Summit/Hadoop Summit
 
PDF
Real-Time Analytics with Confluent and MemSQL
SingleStore
 
Kai Wähner, Technology Evangelist at Confluent: "Development of Scalable Mac...
Dataconomy Media
 
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Kai Wähner
 
How to Leverage the Apache Kafka Ecosystem to Productionize Machine Learning ...
Codemotion
 
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Kai Wähner
 
Machine Learning Trends of 2018 combined with the Apache Kafka Ecosystem
Kai Wähner
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Codemotion
 
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Kai Wähner
 
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
confluent
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Kai Wähner
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
confluent
 
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
confluent
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Slim Baltagi
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
DataWorks Summit/Hadoop Summit
 
Real-Time Analytics with Confluent and MemSQL
SingleStore
 
Ad

More from Kai Wähner (20)

PDF
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Kai Wähner
 
PDF
When NOT to use Apache Kafka?
Kai Wähner
 
PDF
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kai Wähner
 
PDF
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai Wähner
 
PDF
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Kai Wähner
 
PDF
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
PDF
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
PDF
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Kai Wähner
 
PDF
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Kai Wähner
 
PDF
Apache Kafka in the Healthcare Industry
Kai Wähner
 
PDF
Apache Kafka in the Healthcare Industry
Kai Wähner
 
PDF
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Kai Wähner
 
PDF
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
PDF
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Kai Wähner
 
PDF
Apache Kafka Landscape for Automotive and Manufacturing
Kai Wähner
 
PDF
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
PDF
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Kai Wähner
 
PDF
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Kai Wähner
 
PDF
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Kai Wähner
 
PDF
Apache Kafka in the Transportation and Logistics
Kai Wähner
 
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Kai Wähner
 
When NOT to use Apache Kafka?
Kai Wähner
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kai Wähner
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai Wähner
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Kai Wähner
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Kai Wähner
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Kai Wähner
 
Apache Kafka in the Healthcare Industry
Kai Wähner
 
Apache Kafka in the Healthcare Industry
Kai Wähner
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Kai Wähner
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Kai Wähner
 
Apache Kafka Landscape for Automotive and Manufacturing
Kai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Kai Wähner
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Kai Wähner
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Kai Wähner
 
Apache Kafka in the Transportation and Logistics
Kai Wähner
 

Recently uploaded (20)

PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 

Apache Kafka Streams + Machine Learning / Deep Learning

  • 1. 1Confidential Apache Kafka + Machine Learning Analytic Models Applied to Real Time Stream Processing Kai Waehner Technology Evangelist [email protected] LinkedIn @KaiWaehner www.kai-waehner.de
  • 2. 2Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  • 3. 3Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  • 4. 4Apache Kafka and Machine Learning Machine Learning ... allows computers to find hidden insights without being explicitly programmed where to look.
  • 5. 5Apache Kafka and Machine Learning Real World Examples of Machine Learning Spam Detection Search Results + Product Recommendation Picture Detection (Friends, Locations, Products) Your Company The Next Disruption: Google Beats Go Champion
  • 6. 6Apache Kafka and Machine Learning Leverage Machine Learning to Analyze and Act on Critical Business Moments Seconds Minutes Hours Price Optimization Predictive Maintenance Fraud Detection Cross Selling Transportation Rerouting Customer Service Inventory Management Windows of Opportunity
  • 7. 7Apache Kafka and Machine Learning How to realize these use cases?
  • 8. 8Apache Kafka and Machine Learning Big Data Analytics Volume (terabytes, petabytes) Variety (social networks, blog posts, logs, sensors, etc.) Velocity („real time“) Value
  • 9. 9Apache Kafka and Machine Learning Big Data Analytics for Actionable Insights From Insight to Action (continuously closed loop)
  • 10. 10Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 11. 11Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  • 12. 12Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 13. 13Apache Kafka and Machine Learning Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Writing source code is not the time-consuming task! !
  • 14. 14Apache Kafka and Machine Learning Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Execution 6. Model Validation 7. Deployment
  • 15. 15Apache Kafka and Machine Learning Data Access Find insights to create added business value by correlating various data sources!
  • 16. 16Apache Kafka and Machine Learning Data Preparation http://www.slideshare.net/odsc/feature-engineering Data Preparation
  • 17. 17Apache Kafka and Machine Learning Exploratory Data Analysis © Copyright 2000-2017 TIBCO Software Inc. • Scripting • Visual Analytics • Machine Learning
  • 18. 18Apache Kafka and Machine Learning Model Building A model is a simplification of the truth that helps you with decision making.
  • 19. 19Apache Kafka and Machine Learning Model Execution (Coding) Apply Model to New Data
  • 20. 20Apache Kafka and Machine Learning Model Execution (Tooling) Apply Model to New Data
  • 21. 21Apache Kafka and Machine Learning Model Validation https://genome.tugraz.at/proclassify/help/pages/XV.html Cross-Validation Procedure
  • 22. 22Apache Kafka and Machine Learning Frameworks and Tooling?
  • 23. 23Apache Kafka and Machine Learning Languages, Frameworks and Tools Many more …. Portable Format for Analytics (PFA)
  • 24. 24Apache Kafka and Machine Learning Live Demos with Open Source Technologies Development of Analytic Models with R, TensorFlow, Apache Spark, H2O.ai, RapidMiner
  • 25. 25Apache Kafka and Machine Learning Live Demo Use Case: Customer Churn Prediction Machine Learning Algorithm: Generalized Linear Model (GLM) using Logistic Regression Technology: Open Source R
  • 26. 26Apache Kafka and Machine Learning Live Demo Use Case: Airline Flight Delay Prediction Machine Learning Algorithm: Gradient Boosted Machines (GBM) using Decision Trees Technology: H2O.ai
  • 27. 27Apache Kafka and Machine Learning Live Demo Use Case: Predictive Maintenance (Anomaly Detection in Telco Networks) Deep Learning Algorithm: Artificial Neural Networks (ANN) using Autoencoders Technology: TensorFlow + Python API
  • 28. 28Apache Kafka and Machine Learning Live Demo Use Case: Classification (Prediction of Titanic Survivors) Deep Learning Algorithm: Recurrent Neural Networks (RNN) Technology: RapidMiner
  • 29. 29Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  • 30. 30Apache Kafka and Machine Learning Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Execution 6. Model Validation 7. Deployment
  • 31. 31Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 32. 32Apache Kafka and Machine Learning Definition of Stream Processsing Data at Rest Data in Motion
  • 33. 33Apache Kafka and Machine Learning Key Concepts
  • 34. 34Apache Kafka and Machine Learning Key Concepts
  • 35. 35Apache Kafka and Machine Learning Key Concepts
  • 36. 36Apache Kafka and Machine Learning Stream Processing Use Cases • Real Time Applications • Stateful Streaming Analytics • Stateless “Real Time ETL”
  • 37. 37Apache Kafka and Machine Learning Event Processing Windows Various Options for Windowing (Fixed, Sliding, Session, …)
  • 38. 38Apache Kafka and Machine Learning How to apply analytic models to real time processing without redevelopment?
  • 39. 39Apache Kafka and Machine Learning Application of Analytic Models to Real Time without Redevelopment Stream Processing H20.ai R Python Spark ML MATLAB SAS PMML
  • 40. 40Apache Kafka and Machine Learning Streaming Analytics - Processing Pipeline APIs Adapters / Channels Integration Messaging Stream Ingest Transformation Aggregation Enrichment Filtering Stream Preprocessing Process Management Analytics (Real Time) Applications & APIs Analytics / DW Reporting Stream Outcomes • Contextual Rules • Windowing • Patterns • Analytics • Machine Learning • … Stream Analytics Index / SearchNormalization Applying an Analytic Model is just a piece of the puzzle!
  • 41. 41Apache Kafka and Machine Learning Frameworks and Tooling?
  • 42. 42Apache Kafka and Machine Learning Frameworks and Products OPEN SOURCE CLOSED SOURCE PRODUCT FRAMEWORK Azure Microsoft Stream Analytics
  • 43. 43Apache Kafka and Machine Learning When to use Kafka Streams for Stream Processing?
  • 44. 44Apache Kafka and Machine Learning When to use Kafka Streams for Stream Processing? No need for a Big Data cluster Deploy in your existing infrastructure Kafka manages scalability / fail-over Focus on development of business logic in your department
  • 45. 45Apache Kafka and Machine Learning Kafka Streams Map, filter, aggregate, apply analytic model, „any business logic“ Input Stream (Kafka Topic) Kafka Cluster Output Stream (Kafka Topic) Kafka Cluster Stream Processing Microservice (Kafka Streams) Deployed anywhere: Docker, Kubernetes, Mesos, Java App, …
  • 46. 46Apache Kafka and Machine Learning A complete streaming microservices, ready for production at large-scale Word Count App configuration Define processing (here: WordCount) Start processing
  • 47. 47Apache Kafka and Machine Learning Confluent Platform: the Free, Open-Source Streaming Platform Open Source ExternalCommercial Confluent Platform Monitoring Analytics Custom Apps Transformations Real-time Applications … CRM Data Warehouse Database Hadoop Data Integration … Control Center Auto-data Balancing Multi-Data Center Replication 24/7 Support Supported Connectors Clients Schema Registry REST Proxy Apache Kafka Kafka Connect Kafka Streams Kafka Core Database Changes Log Events loT Data Web Events …
  • 48. 48Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 49. 49Apache Kafka and Machine Learning STREAMING PLATFORM BIG DATAANALYTICS Oracle DB CoaP IoT Kafka Java Client ….. HP Vertica Data Integration F L U M E H2O.ai, Spark, TensorFlow Batch Real Time Confluent REST Proxy MQTT IoT iPhone App Kafka Go Client C K O A N F N K E A C T H I V E Grafana Kafka Java EE Web App Hadoop C K O A N F N K E A C T Confluent Schema Registry Kafka Streams H2O.ai Mesos Kafka Streams TensorFlow Kubernetes Avro Avro 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 50. 50Apache Kafka and Machine Learning Live Demos with Open Source Technologies Development of Analytic Models with Apache Kafka Messaging, Kafka Streams, Kafka Connect, Confluent Schema Registry
  • 51. 51Apache Kafka and Machine Learning Live Demo Use Case: Airline Flight Delay Prediction Machine Learning Algorithm: Any! (in our example, H2O.ai GBM) Streaming Platform: Apache Kafka Core, Kafka Connect, Kafka Streams, Confluent Schema Registry
  • 52. 52Apache Kafka and Machine Learning H2O.ai Model + Kafka Streams Filter Map 1) Create H2O ML model 2) Configure Kafka Streams Application 3) Apply H2O ML model to Streaming Data 4) Start Kafka Streams App
  • 53. 53Apache Kafka and Machine Learning End-to-End Stream Monitoring and Alerting Confluent Control Center Data Stream Monitoring and Alerting Multi-cluster monitoring and management Kafka Connect Configuration • Message delivery? • Delays? • Where got it stuck? • Lost messages? • Broker issues? • Performance? http://docs.confluent.io/3.2.0/control-center/docs/monitoring.html
  • 54. 54Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  • 55. 55Apache Kafka and Machine Learning Let’s improve the analytic model continuously…
  • 56. 56Apache Kafka and Machine Learning Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Execution 6. Model Validation 7. Deployment Online Training Continuously train and improve the model with every new event
  • 57. 57Apache Kafka and Machine Learning Online Model Training of Analytic Models How to improve models? 1.Manual Update 2.Automated Batch 3.Real Time
  • 58. 58Apache Kafka and Machine Learning STREAMING PLATFORM BIG DATAANALYTICS F L U M E H2O.ai, Spark, TensorFlow H I V E Kafka Hadoop Confluent Schema Registry Kafka Streams H2O.ai Mesos Kafka Streams TensorFlow Kubernetes Avro Avro 1) Get new Input Event via Kafka Topic 2) Improve Model in Big Data Cluster 3) Update deployed Model via Kafka Topic 4) Leverage Improved Model for new Events
  • 59. 59Apache Kafka and Machine Learning Caveats for Online Model Training • Processes and infrastructure not ready • Validation needed before production • Slows down the system • Only a few ML implementations supported • Many use cases do not need it
  • 60. 60Apache Kafka and Machine Learning Key Take-Aways Ø Insights are hidden in Historical Data on Big Data Platforms Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models Ø Streaming Platform uses these Models (without Redevelopment) to take Action in Real Time
  • 61. 61Apache Kafka and Machine Learning Kai Waehner Technology Evangelist [email protected] @KaiWaehner www.kai-waehner.de LinkedIn Questions? Feedback? Please contact me!