SlideShare a Scribd company logo
1
Building real-time data processing and model
inferencing platform with Kafka Streams
Navinder Pal Singh Brar
Confidential and Proprietary
Personalization Fraud detection Display advertisement
Email advertisement Omnichannel reorder
Autosuggest for out of
stock products
Delivery Optimization Smart pricing Inventory Forecasting
Voice Commerce
ML @ Walmart
Confidential and Proprietary
Business
understanding
Data Collection
Data Preparation
Exploratory Data
Analysis
Modelling
Model Evaluation
Model Deployment
Data ScienceModel
Life cycle
Remaining 30-40% to
make it production ready
with help of developers
50% + time spending in data collection
and cleaning activity
Courtesy: http://www.oogazone.com, https://www.vectorstock.com
Confidential and Proprietary
Build a platform to process events,
derive inferences and serve knowledge
Reliable, highly available and scalable
and scalable
High throughput and low latency
latency
Universal feature store across models
across models
Pluggable design to onboard new
onboard new models
Reduce dev to prod time
Mission Statement
Confidential and Proprietary
Customer Backbone - CBB
Distributed streams processing platform built on Kafka Streams
Data scientists can bring their trained models and host them on top of CBB, which takes care of
• Data Ingestion
• Data Transformation
• Feature Extraction
• Model Inferencing/Scoring
• Post Processing
Motto: Depth, Freshness & Reach
Confidential and Proprietary 6
CBB Platform
Kafka Streams
Recommendation
Personalization
Fraud Detection
….
CBB
Internal
Kafka
Partition: 0
Kafka Streams
Partition: 1
CBB Data Pipeline
Confidential and Proprietary
Why Streams?
Simple
Library, not a framework
Embedded DB
Interactive Queries
Highly scalable
DSL/Low Level APIs
At least/Exactly once guarantees
Apache Samza
Apache Spark
Apache Flink
Dynomite
Other alternatives
Confidential and Proprietary
Multitenancy: the challenges
Sequential execution of
tenant models
1
Any corrupt model
can bring down the
JVM
2
Any model
upgrade
requires JVM
restart
3
Client Isolation
4
Confidential and Proprietary 9
CBB Data Pipeline
CBB Platform
Kafka Streams
Recommendation
Personalization
Fraud Detection
….
CBB
Internal
Kafka
Confidential and Proprietary
CBB Processor
CBB Store
KIP-408: Add Asynchronous Processing To Kafka Streams
CBB Internals
C storeB storeA store
Model A Model B Model C
Before
Model A
Model B
Model C
A store
B store
C store
After
Confidential and Proprietary
Process events and update CBB stores
Different clients can pull events at own pace
Appropriate sharing and isolation
Multitenancy: the solution
Confidential and Proprietary 12
Data Model
Tenant Stores
Hop-On Store
Platform Store
LEAF Store
1. Linkages –customer
graph
2. Events – customer
interactions
3. Address –
Addressable entities
4. Facets – customer
features
Platform Store
Sequence Store
Confidential and Proprietary
Sequence Store
0 1 2 3 4 5 6 7 8 … … … …9 10 11
CBB Processorwrites
here
Model A
(offset=3)
Model B
(offset=8)
Sequence Store
Confidential and Proprietary
Model Inferencing
Problem
Data scientists use various
machine learning libraries and
need to support them in
production e.g. Spark ML, Scikit-
learn, Tensorflow
Solution
Mleap Runtime
Provides production level scoring
infrastructure independent on the core
libraries
Execute Spark ML Pipelines without the
dependency on the spark context
Execute Scikit-learn pipelines without the
dependency on numpy, pandas
Confidential and Proprietary
VM 1 VM 2 VM 3 VM 4
Global Topic
Global Datastores
App Cluster
Confidential and Proprietary
Global Datastores
Problem
Global data e.g. product catalog
One copy of global store per jvm
Processing global topics doesn't
work with huge data
Global data is required before an
active task moves to a VM
Solution
Create global stores in a different
Kafka streams app and bootstrap
each jvm on update
Confidential and Proprietary
11000 stores in 27
countries
100 million weekly
customers instores
100 million uniquemonthly
visitors @Walmart.com
55 bannersincluding
includingJet.com,
Hayneedle
Source: https://corprate.walmart.com/our-story/our-business
Walmart Scale
Confidential and Proprietary
Problem:
Link different id’s data together when they are identified
to be same person
Identity Graph Processing
Solution: Real time Identity Graph Conflation.
Aims to provide a coherent view of a customer by
building an identity graph uniting all customer
identities across channels and across Walmart
subsidiaries
Confidential and Proprietary
Graph processing co-locates the data of two or more customer identities linked to each other on the same physical node.
id3
id1
id4
id2
id5 id6
id1id6
id5
id4
id3
id2
=
Node A Node B Node A
Customer Identity Graph
Confidential and Proprietary
Benchmarks
Kafka Cluster : 400 cores
Kafka Streams : 800 cores
Confidential and Proprietary
Benefits
Money Time Effort
Minimal duplication Low Latency Reduces maintenance
overhead
Courtsey: https://www.vectorstock.com
22
Thank You!
navinderpalsinghbrar
Ad

More Related Content

What's hot (20)

F5 Networks BIG-IP LTM Virtual Edition
F5 Networks BIG-IP LTM Virtual EditionF5 Networks BIG-IP LTM Virtual Edition
F5 Networks BIG-IP LTM Virtual Edition
DSorensenCPR
 
Advanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation TechniquesAdvanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation Techniques
Zilliz
 
Smart mirror application
Smart mirror application Smart mirror application
Smart mirror application
Araya Solutions
 
Biz req doc for acounts payable system
Biz req  doc for acounts payable systemBiz req  doc for acounts payable system
Biz req doc for acounts payable system
Sunil Kumar Gunasekaran
 
Maker of Things - the open IoT cloud for makers chapter.
Maker of Things - the open IoT cloud for makers chapter.Maker of Things - the open IoT cloud for makers chapter.
Maker of Things - the open IoT cloud for makers chapter.
Jollen Chen
 
presentation on Edge computing
presentation on Edge computingpresentation on Edge computing
presentation on Edge computing
sairamgoud16
 
Relationship Between Big Data & AI
Relationship Between Big Data & AIRelationship Between Big Data & AI
Relationship Between Big Data & AI
Maruf Abdullah (Rion)
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
Animesh Singh
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
Databricks
 
Cisco Internet of Things
Cisco Internet of ThingsCisco Internet of Things
Cisco Internet of Things
Panduit
 
College management system ppt
College management system pptCollege management system ppt
College management system ppt
Shanthan Reddy
 
Why Social Media Chat Bots Are the Future of Communication - Deck
Why Social Media Chat Bots Are the Future of Communication - DeckWhy Social Media Chat Bots Are the Future of Communication - Deck
Why Social Media Chat Bots Are the Future of Communication - Deck
Jan Rezab
 
Use WhatsApp Chatbots to automate 80% of your customer service
Use WhatsApp Chatbots to automate 80% of your customer serviceUse WhatsApp Chatbots to automate 80% of your customer service
Use WhatsApp Chatbots to automate 80% of your customer service
Tars
 
Book store php ppt
Book store php  pptBook store php  ppt
Book store php ppt
Priya Chavan
 
AI Product Manager
AI Product Manager AI Product Manager
AI Product Manager
Datentreiber
 
ADVANCED SYSTEM ANALYSIS On Automated Library Management System
ADVANCED SYSTEM ANALYSIS On Automated Library Management SystemADVANCED SYSTEM ANALYSIS On Automated Library Management System
ADVANCED SYSTEM ANALYSIS On Automated Library Management System
Uraz Pokharel
 
Introduction to TinyML - Solomon Muhunyo Githu
Introduction to TinyML - Solomon Muhunyo GithuIntroduction to TinyML - Solomon Muhunyo Githu
Introduction to TinyML - Solomon Muhunyo Githu
Solomon Githu
 
What is Web 3,0?
What is Web 3,0?What is Web 3,0?
What is Web 3,0?
dWebGuide1
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
Andre Muscat
 
The API Economy: API Provider Perspective / European Identity Summit 2012
The API Economy: API Provider Perspective / European Identity Summit 2012The API Economy: API Provider Perspective / European Identity Summit 2012
The API Economy: API Provider Perspective / European Identity Summit 2012
3scale
 
F5 Networks BIG-IP LTM Virtual Edition
F5 Networks BIG-IP LTM Virtual EditionF5 Networks BIG-IP LTM Virtual Edition
F5 Networks BIG-IP LTM Virtual Edition
DSorensenCPR
 
Advanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation TechniquesAdvanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation Techniques
Zilliz
 
Maker of Things - the open IoT cloud for makers chapter.
Maker of Things - the open IoT cloud for makers chapter.Maker of Things - the open IoT cloud for makers chapter.
Maker of Things - the open IoT cloud for makers chapter.
Jollen Chen
 
presentation on Edge computing
presentation on Edge computingpresentation on Edge computing
presentation on Edge computing
sairamgoud16
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
Animesh Singh
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
Databricks
 
Cisco Internet of Things
Cisco Internet of ThingsCisco Internet of Things
Cisco Internet of Things
Panduit
 
College management system ppt
College management system pptCollege management system ppt
College management system ppt
Shanthan Reddy
 
Why Social Media Chat Bots Are the Future of Communication - Deck
Why Social Media Chat Bots Are the Future of Communication - DeckWhy Social Media Chat Bots Are the Future of Communication - Deck
Why Social Media Chat Bots Are the Future of Communication - Deck
Jan Rezab
 
Use WhatsApp Chatbots to automate 80% of your customer service
Use WhatsApp Chatbots to automate 80% of your customer serviceUse WhatsApp Chatbots to automate 80% of your customer service
Use WhatsApp Chatbots to automate 80% of your customer service
Tars
 
Book store php ppt
Book store php  pptBook store php  ppt
Book store php ppt
Priya Chavan
 
AI Product Manager
AI Product Manager AI Product Manager
AI Product Manager
Datentreiber
 
ADVANCED SYSTEM ANALYSIS On Automated Library Management System
ADVANCED SYSTEM ANALYSIS On Automated Library Management SystemADVANCED SYSTEM ANALYSIS On Automated Library Management System
ADVANCED SYSTEM ANALYSIS On Automated Library Management System
Uraz Pokharel
 
Introduction to TinyML - Solomon Muhunyo Githu
Introduction to TinyML - Solomon Muhunyo GithuIntroduction to TinyML - Solomon Muhunyo Githu
Introduction to TinyML - Solomon Muhunyo Githu
Solomon Githu
 
What is Web 3,0?
What is Web 3,0?What is Web 3,0?
What is Web 3,0?
dWebGuide1
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
Andre Muscat
 
The API Economy: API Provider Perspective / European Identity Summit 2012
The API Economy: API Provider Perspective / European Identity Summit 2012The API Economy: API Provider Perspective / European Identity Summit 2012
The API Economy: API Provider Perspective / European Identity Summit 2012
3scale
 

Similar to Real time data processing and model inferncing platform with Kafka streams (Navinder Singh - Walmart) (20)

Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Data Con LA
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
Stepan Pushkarev
 
Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systems
Andrzej Michałowski
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
ScyllaDB
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
BATbern
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
TarekHamdi8
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Databricks
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at Starbucks
Databricks
 
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Databricks
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
Sri Ambati
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEA
Andrew Morgan
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
MongoDB
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
confluent
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Data Con LA
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systems
Andrzej Michałowski
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
ScyllaDB
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
BATbern
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
TarekHamdi8
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Databricks
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at Starbucks
Databricks
 
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Databricks
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
Sri Ambati
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEA
Andrew Morgan
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
MongoDB
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
confluent
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 
Ad

More from KafkaZone (7)

Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
KafkaZone
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
KafkaZone
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)
KafkaZone
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
KafkaZone
 
Data science at scale with Kafka and Flink (Razorpay)
Data science at scale with Kafka and Flink (Razorpay)Data science at scale with Kafka and Flink (Razorpay)
Data science at scale with Kafka and Flink (Razorpay)
KafkaZone
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applications
KafkaZone
 
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
KafkaZone
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
KafkaZone
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)
KafkaZone
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
KafkaZone
 
Data science at scale with Kafka and Flink (Razorpay)
Data science at scale with Kafka and Flink (Razorpay)Data science at scale with Kafka and Flink (Razorpay)
Data science at scale with Kafka and Flink (Razorpay)
KafkaZone
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applications
KafkaZone
 
Ad

Recently uploaded (20)

Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Mastering Advance Window Functions in SQL.pdf
Mastering Advance Window Functions in SQL.pdfMastering Advance Window Functions in SQL.pdf
Mastering Advance Window Functions in SQL.pdf
Spiral Mantra
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Build 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHSBuild 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHS
TECH EHS Solution
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Vaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without HallucinationsVaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without Hallucinations
john409870
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Mastering Advance Window Functions in SQL.pdf
Mastering Advance Window Functions in SQL.pdfMastering Advance Window Functions in SQL.pdf
Mastering Advance Window Functions in SQL.pdf
Spiral Mantra
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Build 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHSBuild 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHS
TECH EHS Solution
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Vaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without HallucinationsVaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without Hallucinations
john409870
 

Real time data processing and model inferncing platform with Kafka streams (Navinder Singh - Walmart)

  • 1. 1 Building real-time data processing and model inferencing platform with Kafka Streams Navinder Pal Singh Brar
  • 2. Confidential and Proprietary Personalization Fraud detection Display advertisement Email advertisement Omnichannel reorder Autosuggest for out of stock products Delivery Optimization Smart pricing Inventory Forecasting Voice Commerce ML @ Walmart
  • 3. Confidential and Proprietary Business understanding Data Collection Data Preparation Exploratory Data Analysis Modelling Model Evaluation Model Deployment Data ScienceModel Life cycle Remaining 30-40% to make it production ready with help of developers 50% + time spending in data collection and cleaning activity Courtesy: http://www.oogazone.com, https://www.vectorstock.com
  • 4. Confidential and Proprietary Build a platform to process events, derive inferences and serve knowledge Reliable, highly available and scalable and scalable High throughput and low latency latency Universal feature store across models across models Pluggable design to onboard new onboard new models Reduce dev to prod time Mission Statement
  • 5. Confidential and Proprietary Customer Backbone - CBB Distributed streams processing platform built on Kafka Streams Data scientists can bring their trained models and host them on top of CBB, which takes care of • Data Ingestion • Data Transformation • Feature Extraction • Model Inferencing/Scoring • Post Processing Motto: Depth, Freshness & Reach
  • 6. Confidential and Proprietary 6 CBB Platform Kafka Streams Recommendation Personalization Fraud Detection …. CBB Internal Kafka Partition: 0 Kafka Streams Partition: 1 CBB Data Pipeline
  • 7. Confidential and Proprietary Why Streams? Simple Library, not a framework Embedded DB Interactive Queries Highly scalable DSL/Low Level APIs At least/Exactly once guarantees Apache Samza Apache Spark Apache Flink Dynomite Other alternatives
  • 8. Confidential and Proprietary Multitenancy: the challenges Sequential execution of tenant models 1 Any corrupt model can bring down the JVM 2 Any model upgrade requires JVM restart 3 Client Isolation 4
  • 9. Confidential and Proprietary 9 CBB Data Pipeline CBB Platform Kafka Streams Recommendation Personalization Fraud Detection …. CBB Internal Kafka
  • 10. Confidential and Proprietary CBB Processor CBB Store KIP-408: Add Asynchronous Processing To Kafka Streams CBB Internals C storeB storeA store Model A Model B Model C Before Model A Model B Model C A store B store C store After
  • 11. Confidential and Proprietary Process events and update CBB stores Different clients can pull events at own pace Appropriate sharing and isolation Multitenancy: the solution
  • 12. Confidential and Proprietary 12 Data Model Tenant Stores Hop-On Store Platform Store LEAF Store 1. Linkages –customer graph 2. Events – customer interactions 3. Address – Addressable entities 4. Facets – customer features Platform Store Sequence Store
  • 13. Confidential and Proprietary Sequence Store 0 1 2 3 4 5 6 7 8 … … … …9 10 11 CBB Processorwrites here Model A (offset=3) Model B (offset=8) Sequence Store
  • 14. Confidential and Proprietary Model Inferencing Problem Data scientists use various machine learning libraries and need to support them in production e.g. Spark ML, Scikit- learn, Tensorflow Solution Mleap Runtime Provides production level scoring infrastructure independent on the core libraries Execute Spark ML Pipelines without the dependency on the spark context Execute Scikit-learn pipelines without the dependency on numpy, pandas
  • 15. Confidential and Proprietary VM 1 VM 2 VM 3 VM 4 Global Topic Global Datastores App Cluster
  • 16. Confidential and Proprietary Global Datastores Problem Global data e.g. product catalog One copy of global store per jvm Processing global topics doesn't work with huge data Global data is required before an active task moves to a VM Solution Create global stores in a different Kafka streams app and bootstrap each jvm on update
  • 17. Confidential and Proprietary 11000 stores in 27 countries 100 million weekly customers instores 100 million uniquemonthly visitors @Walmart.com 55 bannersincluding includingJet.com, Hayneedle Source: https://corprate.walmart.com/our-story/our-business Walmart Scale
  • 18. Confidential and Proprietary Problem: Link different id’s data together when they are identified to be same person Identity Graph Processing Solution: Real time Identity Graph Conflation. Aims to provide a coherent view of a customer by building an identity graph uniting all customer identities across channels and across Walmart subsidiaries
  • 19. Confidential and Proprietary Graph processing co-locates the data of two or more customer identities linked to each other on the same physical node. id3 id1 id4 id2 id5 id6 id1id6 id5 id4 id3 id2 = Node A Node B Node A Customer Identity Graph
  • 20. Confidential and Proprietary Benchmarks Kafka Cluster : 400 cores Kafka Streams : 800 cores
  • 21. Confidential and Proprietary Benefits Money Time Effort Minimal duplication Low Latency Reduces maintenance overhead Courtsey: https://www.vectorstock.com