SlideShare a Scribd company logo
1Confidential
Apache Kafka + H2O.ai
Machine Learning Applied to Real Time Stream Processing
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.kai-waehner.de
2Apache Kafka and Machine Learning
Agenda
1) Machine Learning and Real Time Applications
2) Building an Analytic Model with H2O.ai
3) Applying an Analytic Model with Apache Kafka
3Apache Kafka and Machine Learning
Agenda
1) Machine Learning and Real Time Applications
2) Building an Analytic Model with H2O.ai
3) Applying an Analytic Model with Apache Kafka
4Apache Kafka and Machine Learning
Machine Learning
... allows computers to find hidden insights without being
explicitly programmed where to look.
5Apache Kafka and Machine Learning
Real World Examples of Machine Learning
Spam Detection
Search Results +
Product Recommendation
Picture Detection
(Friends, Locations, Products)
Your Company
The Next Disruption:
Google Beats Go Champion
6Apache Kafka and Machine Learning
Leverage Machine Learning to Analyze and Act on Critical Business Moments
Seconds Minutes Hours
Price
Optimization
Predictive
Maintenance
Fraud
Detection
Cross Selling
Transportation
Rerouting
Customer
Service
Inventory
Management
Windows of Opportunity
7Apache Kafka and Machine Learning
Big Data Analytics for Actionable Insights
From Insight to Action
(continuous loop)
8Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming
Producer
…..
DWH
Data
Integration
C
O
N
N
E
C
T
C
O
N
N
E
C
T
Data	Lake
Model
Building
Batch
Real
Time
Stream
Processing
REST
Interface
IoT Device
Mobile
App
Streaming
Consumer
C
O
N
N
E
C
T
C
O
N
N
E
C
T
BI Tool
Messaging
Web
Application
Model
Schema Registry
/ Governance
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
9Apache Kafka and Machine Learning
Agenda
1) Machine Learning and Real Time Applications
2) Building an Analytic Model with H2O.ai
3) Applying an Analytic Model with Apache Kafka
10Apache Kafka and Machine Learning
STREAMING PLATFORM
BIG DATA ANALYTICS
Oracle DB
CoaP IoT
Kafka
Java Client
…..
HP Vertica
Data
Integration
F
L
U
M
E
H2O.ai,
TensorFlow
Batch
Real
Time
Confluent
REST Proxy
MQTT IoT
iPhone
App
Kafka
Go Client
C
K O
A N
F N
K E
A C
T
H
I
V
E
Grafana
Kafka
Java EE
Web App
Hadoop
C
K O
A N
F N
K E
A C
T
Confluent
Schema Registry
Kafka Streams
H2O.ai
Mesos
Kafka Streams
TensorFlow
Kubernetes
Avro
Avro
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
11Apache Kafka and Machine Learning
Languages, Frameworks and Tools
Many more ….
Portable Format
for Analytics (PFA)
12Apache Kafka and Machine Learning
Live Demo
Use Case:
Airline Flight Delay Prediction
Machine Learning Algorithm:
Gradient Boosted Machines (GBM)
using Decision Trees
Technology:
H2O.ai
13Apache Kafka and Machine Learning
Agenda
1) Machine Learning and Real Time Applications
2) Building an Analytic Model with H2O.ai
3) Applying an Analytic Model with Apache Kafka
14Apache Kafka and Machine Learning
STREAMING PLATFORM
BIG DATA ANALYTICS
Oracle DB
CoaP IoT
Kafka
Java Client
…..
HP Vertica
Data
Integration
F
L
U
M
E
H2O.ai,
TensorFlow
Batch
Real
Time
Confluent
REST Proxy
MQTT IoT
iPhone
App
Kafka
Go Client
C
K O
A N
F N
K E
A C
T
H
I
V
E
Grafana
Kafka
Java EE
Web App
Hadoop
C
K O
A N
F N
K E
A C
T
Confluent
Schema Registry
Kafka Streams
H2O.ai
Mesos
Kafka Streams
TensorFlow
Kubernetes
Avro
Avro
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
15Apache Kafka and Machine Learning
When to use Kafka Streams for Stream Processing?
No need for a
Big Data cluster
Deploy in your
existing infrastructure
Kafka manages
scalability / fail-over
Focus on development
of business logic
in your department
16Apache Kafka and Machine Learning
Use Case:
Airline Flight Delay Prediction
Machine Learning Algorithm:
Any! (in our example, H2O.ai GBM)
Streaming Platform:
Apache Kafka Core, Kafka’s Streams API
Live Demo with Open Source Technologies
17Apache Kafka and Machine Learning
H2O.ai Model + Kafka Streams
Filter
Map
1) Create H2O ML model
2) Configure Kafka Streams Application
3) Apply H2O ML model to Streaming Data
4) Start Kafka Streams App
18Apache Kafka and Machine Learning
Online Model Training
How to improve models?
1.Manual Update
2.Automated Batch
3.Real Time
à Apache Kafka for Messaging and Real Time Apps
19Apache Kafka and Machine Learning
Caveats for Online Model Training
• Processes and infrastructure not ready
• Validation needed before production
• Slows down the system
• Only a few ML implementations
supported
• Many use cases do not need it
20Apache Kafka and Machine Learning
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
@KaiWaehner
www.kai-waehner.de
LinkedIn
Questions? Feedback?
Please contact me!

More Related Content

What's hot (20)

PDF
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
confluent
 
PDF
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
confluent
 
PDF
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
confluent
 
PDF
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
confluent
 
PPTX
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
confluent
 
PDF
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
confluent
 
PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
PDF
EDA Meets Data Engineering – What's the Big Deal?
confluent
 
PDF
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
confluent
 
PDF
End to-end large messages processing with Kafka Streams & Kafka Connect
confluent
 
PDF
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
HostedbyConfluent
 
PDF
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
confluent
 
PPTX
One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...
HostedbyConfluent
 
PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
PDF
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 
PDF
Kafka for Real-Time Event Processing in Serverless Environments
confluent
 
PDF
What is Apache Kafka®?
confluent
 
PDF
Matching the Scale at Tinder with Kafka
confluent
 
PPTX
The Past, Present, and Future of Apache Flink®
Aljoscha Krettek
 
PPTX
INTRODUCING: CREATE PIPELINE
SingleStore
 
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
confluent
 
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
confluent
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
confluent
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
confluent
 
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
confluent
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
confluent
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
EDA Meets Data Engineering – What's the Big Deal?
confluent
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
confluent
 
End to-end large messages processing with Kafka Streams & Kafka Connect
confluent
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
HostedbyConfluent
 
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
confluent
 
One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...
HostedbyConfluent
 
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 
Kafka for Real-Time Event Processing in Serverless Environments
confluent
 
What is Apache Kafka®?
confluent
 
Matching the Scale at Tinder with Kafka
confluent
 
The Past, Present, and Future of Apache Flink®
Aljoscha Krettek
 
INTRODUCING: CREATE PIPELINE
SingleStore
 

Similar to Kai Wähner, Technology Evangelist at Confluent: "Development of Scalable Machine Learning Microservices with Apache Kafka Streams and H2O.ai" (20)

PDF
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
PDF
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
confluent
 
PDF
How to Leverage the Apache Kafka Ecosystem to Productionize Machine Learning ...
Codemotion
 
PDF
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Kai Wähner
 
PDF
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Kai Wähner
 
PDF
Machine Learning Trends of 2018 combined with the Apache Kafka Ecosystem
Kai Wähner
 
PDF
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Kai Wähner
 
PDF
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Codemotion
 
PDF
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Kai Wähner
 
PDF
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
PDF
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
PDF
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
confluent
 
PDF
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Kai Wähner
 
PDF
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Kai Wähner
 
PDF
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
PDF
Build intelligent, real-time applications using Machine Learning
Hotstar
 
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
PPTX
Apache Kafka® + Machine Learning for Supply Chain 
confluent
 
PPTX
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
Kai Wähner
 
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
confluent
 
How to Leverage the Apache Kafka Ecosystem to Productionize Machine Learning ...
Codemotion
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Kai Wähner
 
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Kai Wähner
 
Machine Learning Trends of 2018 combined with the Apache Kafka Ecosystem
Kai Wähner
 
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Kai Wähner
 
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Codemotion
 
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Kai Wähner
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
confluent
 
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Kai Wähner
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Kai Wähner
 
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Build intelligent, real-time applications using Machine Learning
Hotstar
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
Apache Kafka® + Machine Learning for Supply Chain 
confluent
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
Kai Wähner
 
Ad

More from Dataconomy Media (20)

PDF
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Dataconomy Media
 
PDF
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Dataconomy Media
 
PDF
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Dataconomy Media
 
PDF
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Dataconomy Media
 
PPTX
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Dataconomy Media
 
PPTX
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Dataconomy Media
 
PPTX
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Dataconomy Media
 
PDF
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Dataconomy Media
 
PPTX
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Dataconomy Media
 
PDF
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Dataconomy Media
 
PPTX
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Dataconomy Media
 
PDF
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Dataconomy Media
 
PDF
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Dataconomy Media
 
PDF
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Dataconomy Media
 
PDF
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Dataconomy Media
 
PPTX
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Dataconomy Media
 
PDF
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Dataconomy Media
 
PPTX
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Dataconomy Media
 
PPTX
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Dataconomy Media
 
PPTX
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Dataconomy Media
 
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Dataconomy Media
 
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Dataconomy Media
 
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Dataconomy Media
 
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Dataconomy Media
 
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Dataconomy Media
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Dataconomy Media
 
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Dataconomy Media
 
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Dataconomy Media
 
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Dataconomy Media
 
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Dataconomy Media
 
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Dataconomy Media
 
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Dataconomy Media
 
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Dataconomy Media
 
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Dataconomy Media
 
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Dataconomy Media
 
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Dataconomy Media
 
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Dataconomy Media
 
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Dataconomy Media
 
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Dataconomy Media
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Dataconomy Media
 
Ad

Recently uploaded (20)

PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
What Is Data Integration and Transformation?
subhashenia
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
Research Methodology Overview Introduction
ayeshagul29594
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 

Kai Wähner, Technology Evangelist at Confluent: "Development of Scalable Machine Learning Microservices with Apache Kafka Streams and H2O.ai"

  • 1. 1Confidential Apache Kafka + H2O.ai Machine Learning Applied to Real Time Stream Processing Kai Waehner Technology Evangelist [email protected] LinkedIn @KaiWaehner www.kai-waehner.de
  • 2. 2Apache Kafka and Machine Learning Agenda 1) Machine Learning and Real Time Applications 2) Building an Analytic Model with H2O.ai 3) Applying an Analytic Model with Apache Kafka
  • 3. 3Apache Kafka and Machine Learning Agenda 1) Machine Learning and Real Time Applications 2) Building an Analytic Model with H2O.ai 3) Applying an Analytic Model with Apache Kafka
  • 4. 4Apache Kafka and Machine Learning Machine Learning ... allows computers to find hidden insights without being explicitly programmed where to look.
  • 5. 5Apache Kafka and Machine Learning Real World Examples of Machine Learning Spam Detection Search Results + Product Recommendation Picture Detection (Friends, Locations, Products) Your Company The Next Disruption: Google Beats Go Champion
  • 6. 6Apache Kafka and Machine Learning Leverage Machine Learning to Analyze and Act on Critical Business Moments Seconds Minutes Hours Price Optimization Predictive Maintenance Fraud Detection Cross Selling Transportation Rerouting Customer Service Inventory Management Windows of Opportunity
  • 7. 7Apache Kafka and Machine Learning Big Data Analytics for Actionable Insights From Insight to Action (continuous loop)
  • 8. 8Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 9. 9Apache Kafka and Machine Learning Agenda 1) Machine Learning and Real Time Applications 2) Building an Analytic Model with H2O.ai 3) Applying an Analytic Model with Apache Kafka
  • 10. 10Apache Kafka and Machine Learning STREAMING PLATFORM BIG DATA ANALYTICS Oracle DB CoaP IoT Kafka Java Client ….. HP Vertica Data Integration F L U M E H2O.ai, TensorFlow Batch Real Time Confluent REST Proxy MQTT IoT iPhone App Kafka Go Client C K O A N F N K E A C T H I V E Grafana Kafka Java EE Web App Hadoop C K O A N F N K E A C T Confluent Schema Registry Kafka Streams H2O.ai Mesos Kafka Streams TensorFlow Kubernetes Avro Avro 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 11. 11Apache Kafka and Machine Learning Languages, Frameworks and Tools Many more …. Portable Format for Analytics (PFA)
  • 12. 12Apache Kafka and Machine Learning Live Demo Use Case: Airline Flight Delay Prediction Machine Learning Algorithm: Gradient Boosted Machines (GBM) using Decision Trees Technology: H2O.ai
  • 13. 13Apache Kafka and Machine Learning Agenda 1) Machine Learning and Real Time Applications 2) Building an Analytic Model with H2O.ai 3) Applying an Analytic Model with Apache Kafka
  • 14. 14Apache Kafka and Machine Learning STREAMING PLATFORM BIG DATA ANALYTICS Oracle DB CoaP IoT Kafka Java Client ….. HP Vertica Data Integration F L U M E H2O.ai, TensorFlow Batch Real Time Confluent REST Proxy MQTT IoT iPhone App Kafka Go Client C K O A N F N K E A C T H I V E Grafana Kafka Java EE Web App Hadoop C K O A N F N K E A C T Confluent Schema Registry Kafka Streams H2O.ai Mesos Kafka Streams TensorFlow Kubernetes Avro Avro 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 15. 15Apache Kafka and Machine Learning When to use Kafka Streams for Stream Processing? No need for a Big Data cluster Deploy in your existing infrastructure Kafka manages scalability / fail-over Focus on development of business logic in your department
  • 16. 16Apache Kafka and Machine Learning Use Case: Airline Flight Delay Prediction Machine Learning Algorithm: Any! (in our example, H2O.ai GBM) Streaming Platform: Apache Kafka Core, Kafka’s Streams API Live Demo with Open Source Technologies
  • 17. 17Apache Kafka and Machine Learning H2O.ai Model + Kafka Streams Filter Map 1) Create H2O ML model 2) Configure Kafka Streams Application 3) Apply H2O ML model to Streaming Data 4) Start Kafka Streams App
  • 18. 18Apache Kafka and Machine Learning Online Model Training How to improve models? 1.Manual Update 2.Automated Batch 3.Real Time à Apache Kafka for Messaging and Real Time Apps
  • 19. 19Apache Kafka and Machine Learning Caveats for Online Model Training • Processes and infrastructure not ready • Validation needed before production • Slows down the system • Only a few ML implementations supported • Many use cases do not need it
  • 20. 20Apache Kafka and Machine Learning Kai Waehner Technology Evangelist [email protected] @KaiWaehner www.kai-waehner.de LinkedIn Questions? Feedback? Please contact me!