SlideShare a Scribd company logo
Machine Learning on
Streaming Data
with Apache Kafka, Apache Beam, & TensorFlow
About Us
Mikhail Chrestkha
Machine Learning Specialist
Google Cloud
linkedin.com/in/mchrestkha
Stéphane Maarek
CEO & Kafka Instructor
DataCumulus
linkedin.com/in/stephanemaarek
Big Thanks to:
Julianne Cuneo
Big Data Specialist, Google Cloud
Kai Waehner
Technology Evangelist, Confluent
Agenda
1. Motivation
2. Architecture
3. Use Case Walk-Through w/ Demo
4. Summary
1 Motivation
Technology Landscape
Smart
Analytics
Streaming
InfoWorld’s 2019 Technology of the
Year Award Winners:
● Apache Beam
● Apache Kafka
● Elastic Stack
● DataStax Enterprise
● Firebase
● Horovod
● H2O Driverless AI
● Keras
● Kubernetes
● LLVM
● .Net Core
● PyTorch
● Redis
● TensorFlow
● Visual Studio Code
● XGBoost
Cloud
?
https://www.globenewswire.com/news-release/2019/01/30/1707685/0/en/InfoWorld-Announces-2019-Technology-of-the-Year-Award-Winners.html
Data Ingestion
Data Analysis &
Transformation
Trainer
Model Evaluation
& Validation
Serving
Notebook
Orchestration
ML Framework
ML Platform
OSS Managed Service
Apache Kafka
Event streaming platform
Confluent Cloud
Monitoring, Replication, Data Balancing
Apache Beam
Data processing pipelines
Unified batch & streaming
Dataflow
Automated resource management of workers
TensorFlow
Robust foundation for machine and
deep learning
Cloud Machine Learning Engine
● Training: Distributed training infrastructure that supports
CPUs, GPUs, and TPUs
● Serving: Host models for batch & online prediction
2 Architecture
Reference Kafka ML Architecture
● Data pipelines are simplified
● Building analytic modules is decoupled
from servicing them
● Usage of real time or batch as needed
● Analytic models can be deployed in a
performant, scalable and
mission-critical environment
Kai Waehner
Technology Evangelist, Confluent
https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/
Confluent Cloud
Managed by Confluent Analytics, ML training & deployment path
ML serving path
Data warehouse
BigQuery
ML Training
Cloud ML Engine
Topic 1
Raw transaction
Topic 2
Predictions
Kafka
Cluster
Processing
Cloud Dataflow
Leverage managed services to simplify & focus on code not infrastructure
Producer
Consumer
Consumer
ML Notebook
KSQL
SQL Submit ML
Training jobs
ML Serving
Cloud ML Engine
ML notebook development / experimentation
Deploy
ML model
Automate w/
AirFlow
Dataflow
Template
3 Use Case Walk-Through
Kaggle Case Study
Fraud Detection of Credit Card Transactions
● Collect transaction data
● Analyze historical data
● Train model on historic sample
● Evaluate model based on precision & recall
● Predict fraud on new streaming data
492
Fraud (0.172%)
284,807
transactions
● Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In
Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015
● Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a
practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon
● Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning
strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE
○ Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)
● Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming
credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier
● Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection:
assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing
https://opendatacommons.org/licenses/dbcl/1.0/
DEMO 1 - 5 min
Sending our credit card data
Confluent Cloud, Creating a Topic, Python Script, Security
Kafka to BigQuery
Dataflow Template
Java Code
KafkaIO.<String, String>read() BigQueryIO.writeTableRows()
Create a template for easy re-usability by an analyst
Images from https://beam.apache.org/documentation/pipelines/design-your-pipeline/
redacted
Explore data & train ML model
from ksql import KSQLAPI
redacted
%%bigquery
redacted
gcloud ml-engine jobs
submit training
redacted
Query directly from topic Query petabytes of data Submit ML training job
DEMO 2 - 5 min
Dataflow template & job
Jupyter: KSQL, BQML, TensorFlow CMLE job
Send Predictions back to Kafka
Java Code
Cloud Machine Learning
Engine
Request Response
Hosted ML Model
Image from https://beam.apache.org/documentation/pipelines/design-your-pipeline/
Publish models
KafkaIO.<String, String>read() KafkaIO.<String, String>write()
Train
Model
DEMO 3 - 5 min
(1) Deploy model as an end point
(2) Prediction sent to Kafka topic to be consumed
(3) Track models & monitor predictions in CMLE UI
Futuristic Architecture: Pure Kafka-based ML
Resilient, highly available, sync & async
Confluent Cloud
Managed by Confluent
Topic 1
Raw transaction
Topic 2
Predictions
Producer
Consumer
Consumer
Kafka Streams ML
Synchronous
Application
training
serving
Interactive
Query API
gRPC or
REST API
Internal Topic
ML Model
(compacted?)
Model
state
4 Summary
Summary
● Kafka + Beam + TensorFlow = Great foundation
for future
○ Batch today → streaming tomorrow
○ Small data → big data tomorrow
○ Shallow learning today → deep learning tomorrow
● Make data & ML easier for yourself by using
managed services
● Build for many other use cases:
○ Predictive maintenance
○ Logistics routing
○ Image search & recommendations in e-commerce
Smart
AnalyticsStreaming
Cloud
Talk to Google Cloud
K1Booth
Learn More
Blog: Enabling connected
transformation with Apache Kafka
and TensorFlow on Google Cloud
Platform
bit.ly/2CHERol
KafkaIO on Beam
bit.ly/2YwL3Jc
KafkaToBigQuery Dataflow
Template Example
bit.ly/2HQqVN0
Contact us
linkedin.com/in/mchrestkha
linkedin.com/in/stephanemaarek
Confluent Cloud
Managed by Confluent Analytics, ML training & deployment path
ML serving path
Data warehouse
BigQuery
ML Training
Cloud ML Engine
Topic 1
Raw transaction
Topic 2
Predictions
Kafka
Cluster
Processing
Cloud Dataflow
Questions
Producer
Consumer
Consumer
Cloud ML Notebook
KSQL
SQL Submit ML
Training jobs
ML Serving
Cloud ML Engine
ML notebook development / experimentation
Deploy
ML model
Automate w/
AirFlow
Dataflow
Template

More Related Content

What's hot (20)

PDF
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Kai Wähner
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PPTX
Flink history, roadmap and vision
Stephan Ewen
 
PDF
클라우드 상에서의 효율적인 데이터 보관 방법 - 김민형 클라우드 솔루션 아키텍트
NAVER CLOUD PLATFORMㅣ네이버 클라우드 플랫폼
 
PDF
Introduction to Serverless
Nikolaus Graf
 
PPTX
Understanding gRPC Authentication Methods
Anthony Chow
 
PPTX
AWS basics
mbaric
 
PPTX
Azure Key Vault Integration in Scala
Braja Krishna Das
 
PDF
Cloud Native Debugging in Production - Dig Deep into your agents
Shai Almog
 
PDF
Serverless computing with AWS Lambda
Apigee | Google Cloud
 
PDF
엔터프라이즈를 위한 AWS 지원 및 사례 (서수영) - AWS 웨비나 시리즈
Amazon Web Services Korea
 
PDF
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
PPTX
The Future of Mainframe Data is in the Cloud
Precisely
 
PPTX
Apache kafka
Long Nguyen
 
PDF
Best Practice on using Azure OpenAI Service
Kumton Suttiraksiri
 
PPTX
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
PPTX
AWS CloudFormation Session
Kamal Maiti
 
PDF
Handling GDPR with Apache Kafka: How to Comply Without Freaking Out? (David J...
confluent
 
PPTX
Kafka Tutorial: Kafka Security
Jean-Paul Azar
 
PDF
Aws
mahes3231
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Kai Wähner
 
Stream processing using Kafka
Knoldus Inc.
 
Flink history, roadmap and vision
Stephan Ewen
 
클라우드 상에서의 효율적인 데이터 보관 방법 - 김민형 클라우드 솔루션 아키텍트
NAVER CLOUD PLATFORMㅣ네이버 클라우드 플랫폼
 
Introduction to Serverless
Nikolaus Graf
 
Understanding gRPC Authentication Methods
Anthony Chow
 
AWS basics
mbaric
 
Azure Key Vault Integration in Scala
Braja Krishna Das
 
Cloud Native Debugging in Production - Dig Deep into your agents
Shai Almog
 
Serverless computing with AWS Lambda
Apigee | Google Cloud
 
엔터프라이즈를 위한 AWS 지원 및 사례 (서수영) - AWS 웨비나 시리즈
Amazon Web Services Korea
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
The Future of Mainframe Data is in the Cloud
Precisely
 
Apache kafka
Long Nguyen
 
Best Practice on using Azure OpenAI Service
Kumton Suttiraksiri
 
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
AWS CloudFormation Session
Kamal Maiti
 
Handling GDPR with Apache Kafka: How to Comply Without Freaking Out? (David J...
confluent
 
Kafka Tutorial: Kafka Security
Jean-Paul Azar
 

Similar to Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail Chrestkha, Google Cloud; Stephane Maarek, DataCumulus) Kafka Summit NYC 2019 (20)

PPTX
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
PDF
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Kai Wähner
 
PDF
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Kai Wähner
 
PDF
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
PDF
Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
PDF
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Kai Wähner
 
PDF
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Holden Karau
 
PDF
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Codemotion
 
PDF
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
PDF
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
confluent
 
PDF
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
MeetupDataScienceRoma
 
PDF
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Kai Wähner
 
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
PPTX
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
Microsoft Tech Community
 
PDF
Machine Learning using Kubeflow and Kubernetes
Arun Gupta
 
PDF
Introduction to Mahout and Machine Learning
Varad Meru
 
PDF
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
PDF
Ibm machine learning for z os
Cuneyt Goksu
 
DOC
Download Materials
butest
 
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Kai Wähner
 
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Kai Wähner
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Kai Wähner
 
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Holden Karau
 
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Codemotion
 
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
confluent
 
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
MeetupDataScienceRoma
 
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Kai Wähner
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
Microsoft Tech Community
 
Machine Learning using Kubeflow and Kubernetes
Arun Gupta
 
Introduction to Mahout and Machine Learning
Varad Meru
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
Ibm machine learning for z os
Cuneyt Goksu
 
Download Materials
butest
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Ad

Recently uploaded (20)

PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Digital Circuits, important subject in CS
contactparinay1
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 

Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail Chrestkha, Google Cloud; Stephane Maarek, DataCumulus) Kafka Summit NYC 2019

  • 1. Machine Learning on Streaming Data with Apache Kafka, Apache Beam, & TensorFlow
  • 2. About Us Mikhail Chrestkha Machine Learning Specialist Google Cloud linkedin.com/in/mchrestkha Stéphane Maarek CEO & Kafka Instructor DataCumulus linkedin.com/in/stephanemaarek Big Thanks to: Julianne Cuneo Big Data Specialist, Google Cloud Kai Waehner Technology Evangelist, Confluent
  • 3. Agenda 1. Motivation 2. Architecture 3. Use Case Walk-Through w/ Demo 4. Summary
  • 5. Technology Landscape Smart Analytics Streaming InfoWorld’s 2019 Technology of the Year Award Winners: ● Apache Beam ● Apache Kafka ● Elastic Stack ● DataStax Enterprise ● Firebase ● Horovod ● H2O Driverless AI ● Keras ● Kubernetes ● LLVM ● .Net Core ● PyTorch ● Redis ● TensorFlow ● Visual Studio Code ● XGBoost Cloud ? https://www.globenewswire.com/news-release/2019/01/30/1707685/0/en/InfoWorld-Announces-2019-Technology-of-the-Year-Award-Winners.html
  • 6. Data Ingestion Data Analysis & Transformation Trainer Model Evaluation & Validation Serving Notebook Orchestration ML Framework ML Platform
  • 7. OSS Managed Service Apache Kafka Event streaming platform Confluent Cloud Monitoring, Replication, Data Balancing Apache Beam Data processing pipelines Unified batch & streaming Dataflow Automated resource management of workers TensorFlow Robust foundation for machine and deep learning Cloud Machine Learning Engine ● Training: Distributed training infrastructure that supports CPUs, GPUs, and TPUs ● Serving: Host models for batch & online prediction
  • 9. Reference Kafka ML Architecture ● Data pipelines are simplified ● Building analytic modules is decoupled from servicing them ● Usage of real time or batch as needed ● Analytic models can be deployed in a performant, scalable and mission-critical environment Kai Waehner Technology Evangelist, Confluent https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/
  • 10. Confluent Cloud Managed by Confluent Analytics, ML training & deployment path ML serving path Data warehouse BigQuery ML Training Cloud ML Engine Topic 1 Raw transaction Topic 2 Predictions Kafka Cluster Processing Cloud Dataflow Leverage managed services to simplify & focus on code not infrastructure Producer Consumer Consumer ML Notebook KSQL SQL Submit ML Training jobs ML Serving Cloud ML Engine ML notebook development / experimentation Deploy ML model Automate w/ AirFlow Dataflow Template
  • 11. 3 Use Case Walk-Through
  • 12. Kaggle Case Study Fraud Detection of Credit Card Transactions ● Collect transaction data ● Analyze historical data ● Train model on historic sample ● Evaluate model based on precision & recall ● Predict fraud on new streaming data 492 Fraud (0.172%) 284,807 transactions ● Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015 ● Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon ● Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE ○ Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi) ● Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier ● Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing https://opendatacommons.org/licenses/dbcl/1.0/
  • 13. DEMO 1 - 5 min Sending our credit card data Confluent Cloud, Creating a Topic, Python Script, Security
  • 14. Kafka to BigQuery Dataflow Template Java Code KafkaIO.<String, String>read() BigQueryIO.writeTableRows() Create a template for easy re-usability by an analyst Images from https://beam.apache.org/documentation/pipelines/design-your-pipeline/ redacted
  • 15. Explore data & train ML model from ksql import KSQLAPI redacted %%bigquery redacted gcloud ml-engine jobs submit training redacted Query directly from topic Query petabytes of data Submit ML training job
  • 16. DEMO 2 - 5 min Dataflow template & job Jupyter: KSQL, BQML, TensorFlow CMLE job
  • 17. Send Predictions back to Kafka Java Code Cloud Machine Learning Engine Request Response Hosted ML Model Image from https://beam.apache.org/documentation/pipelines/design-your-pipeline/ Publish models KafkaIO.<String, String>read() KafkaIO.<String, String>write() Train Model
  • 18. DEMO 3 - 5 min (1) Deploy model as an end point (2) Prediction sent to Kafka topic to be consumed (3) Track models & monitor predictions in CMLE UI
  • 19. Futuristic Architecture: Pure Kafka-based ML Resilient, highly available, sync & async Confluent Cloud Managed by Confluent Topic 1 Raw transaction Topic 2 Predictions Producer Consumer Consumer Kafka Streams ML Synchronous Application training serving Interactive Query API gRPC or REST API Internal Topic ML Model (compacted?) Model state
  • 21. Summary ● Kafka + Beam + TensorFlow = Great foundation for future ○ Batch today → streaming tomorrow ○ Small data → big data tomorrow ○ Shallow learning today → deep learning tomorrow ● Make data & ML easier for yourself by using managed services ● Build for many other use cases: ○ Predictive maintenance ○ Logistics routing ○ Image search & recommendations in e-commerce Smart AnalyticsStreaming Cloud
  • 22. Talk to Google Cloud K1Booth Learn More Blog: Enabling connected transformation with Apache Kafka and TensorFlow on Google Cloud Platform bit.ly/2CHERol KafkaIO on Beam bit.ly/2YwL3Jc KafkaToBigQuery Dataflow Template Example bit.ly/2HQqVN0 Contact us linkedin.com/in/mchrestkha linkedin.com/in/stephanemaarek
  • 23. Confluent Cloud Managed by Confluent Analytics, ML training & deployment path ML serving path Data warehouse BigQuery ML Training Cloud ML Engine Topic 1 Raw transaction Topic 2 Predictions Kafka Cluster Processing Cloud Dataflow Questions Producer Consumer Consumer Cloud ML Notebook KSQL SQL Submit ML Training jobs ML Serving Cloud ML Engine ML notebook development / experimentation Deploy ML model Automate w/ AirFlow Dataflow Template