SlideShare a Scribd company logo
Personalization @ Nearbuy
Ankit Kohli
About Me
- Data Scientist at Nearbuy
- Creating a world of personalization for Nearbuy’s customers
AGENDA
- Creating Data Pipeline
- Kafka ( Real Time Click Stream)
- Hbase ( where data sits)
- Data Transformation
- ML Pipeline
- Spark ( Data Feature Extraction )
- ML Algos ( how to use Spark and use cases of each algo)
- ALS
Data Pipeline @ Nearbuy
Schema
less DB
Click
Stream
KAFKA
SPARK
STREAM
HBASE
SPARK
ML
RECOMMEN
DATIONS
REAL TIME
ANALYTICS
KAFKA EVENTS
Deal view Event
{
“customerId”:”XXXXXXXXX”,
“timeStamp”:”140983388484”
“dealId”:”YYYYY”,
“source”:”APP”,
“os”:”android”
…….
}
Spark Streaming
- Details about how to implement Spark Streaming and set up jobs that runs
24x7 to ingest all click stream data
- Implement sessionization of customer activity on Nearbuy
- Transform data and store in HBASE
Spark ML Pipeline
Data Feature Extraction - Categorical Data
Will talk about Spark ML Algos
- Collaborative Filtering ( Implicit & Explicit Feedback )
- K Means
- Linear Regression
Will explain each algo in depth and its use cases and how to implement it using
SPARK.
Common Pitfalls
- Too much data
- Stream vs Batch Data
- Customer Sessionization
- Spark Cluster Mode Issues
Takeaways
- Personalization of APP can be done in many forms , our use case is one of
them ( ecommerce )
- Data visualization and ML Model selection
- Anyone who is interested to start with Data Analytics will get great insight
- Someone who has to start ML will get to know how to use SPARK .
- Someone who is already doing this can get to know how other companies are
implementing SPARK for ML
- Spark ML best Practices

More Related Content

PDF
Kafka in practice
PDF
Functional programming-in-the-cloud
PPTX
Presentation-QRUA
PDF
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
PPTX
Azure Functions Hands-on lab | Global Azure Bootcamp | Radu Vunvulea
PPTX
Snowplow Analytics and Looker at Oyster.com
PPT
Mindtalk Tech - Behind the scenes
PDF
Big Data on EC2: Mashing Technology in the Cloud
Kafka in practice
Functional programming-in-the-cloud
Presentation-QRUA
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
Azure Functions Hands-on lab | Global Azure Bootcamp | Radu Vunvulea
Snowplow Analytics and Looker at Oyster.com
Mindtalk Tech - Behind the scenes
Big Data on EC2: Mashing Technology in the Cloud

What's hot (20)

PPTX
Building near real-time HTAP solutions using Synapse Link for Azure Cosmos DB
PDF
Detect Fraud Successfully with GrabDefence! | Muqi Li, Grab
PPTX
How we use Hive at SnowPlow, and how the role of HIve is changing
PDF
How to evolve your analytics stack with your business using Snowplow
PPTX
Azure Functions & Serverless Computing
PPTX
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
PDF
Isaac Mosquera, Socialize CTO SplunkLive! presentation
PDF
SplunkLive! San Francisco Dec 2012 - Socialize
PDF
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
PPTX
SplunkLive! Salt Lake City June 2013 - Ancestry.com
PDF
Simply Business - Near Real Time Event Processing
PDF
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
PDF
Snowplow: evolve your analytics stack with your business
PPTX
A taste of Snowplow Analytics data
PDF
Elastic Search Meetup Special - Yann Cluchey, Cogenta
PPTX
Modelling event data in look ml
PPTX
Real-Time Analytics with MemSQL and Spark
PPTX
Web application apis
PDF
Azure functions: Quickstart
PDF
Go for Real Time Streaming Architectures - DotGo 2017
Building near real-time HTAP solutions using Synapse Link for Azure Cosmos DB
Detect Fraud Successfully with GrabDefence! | Muqi Li, Grab
How we use Hive at SnowPlow, and how the role of HIve is changing
How to evolve your analytics stack with your business using Snowplow
Azure Functions & Serverless Computing
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
Isaac Mosquera, Socialize CTO SplunkLive! presentation
SplunkLive! San Francisco Dec 2012 - Socialize
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
SplunkLive! Salt Lake City June 2013 - Ancestry.com
Simply Business - Near Real Time Event Processing
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Snowplow: evolve your analytics stack with your business
A taste of Snowplow Analytics data
Elastic Search Meetup Special - Yann Cluchey, Cogenta
Modelling event data in look ml
Real-Time Analytics with MemSQL and Spark
Web application apis
Azure functions: Quickstart
Go for Real Time Streaming Architectures - DotGo 2017
Ad

Similar to Customer Personalization @ nearbuy (20)

PPTX
Fast Data Intelligence in the IoT - real-time data analytics with Spark
PPTX
How Big Data can be used in the retail industry?
PDF
Real-time big data analytics based on product recommendations case study
PDF
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
PDF
Real Time Recommendation System using Kiji
PPTX
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
PPTX
Data Science in E-commerce
PDF
Big data and AI in Socialbakers
PPTX
Telecom datascience master_public
PDF
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
PDF
1000 track3 Zhao
PDF
Rakuten - Recommendation Platform
PPTX
Datasciencein E-commerce industry
PDF
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
PDF
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
PDF
Analyzing Large-Scale User Data with Hadoop and HBase
PPTX
Accelerating Personalization to Cut Through Digital Noise
PPTX
[Big] Data For Marketers: Targeting the Right Market
PDF
Big Data in e-Commerce
PDF
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Fast Data Intelligence in the IoT - real-time data analytics with Spark
How Big Data can be used in the retail industry?
Real-time big data analytics based on product recommendations case study
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
Real Time Recommendation System using Kiji
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Data Science in E-commerce
Big data and AI in Socialbakers
Telecom datascience master_public
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
1000 track3 Zhao
Rakuten - Recommendation Platform
Datasciencein E-commerce industry
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Analyzing Large-Scale User Data with Hadoop and HBase
Accelerating Personalization to Cut Through Digital Noise
[Big] Data For Marketers: Targeting the Right Market
Big Data in e-Commerce
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Ad

Recently uploaded (20)

PDF
Advanced IT Governance
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Chapter 2 Digital Image Fundamentals.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
Cloud computing and distributed systems.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
REPORT: Heating appliances market in Poland 2024
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
SAP855240_ALP - Defining the Global Template PUBLIC.pdf
PDF
Advanced Soft Computing BINUS July 2025.pdf
Advanced IT Governance
GamePlan Trading System Review: Professional Trader's Honest Take
Chapter 2 Digital Image Fundamentals.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Cloud computing and distributed systems.
20250228 LYD VKU AI Blended-Learning.pptx
REPORT: Heating appliances market in Poland 2024
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Transforming Manufacturing operations through Intelligent Integrations
CIFDAQ's Market Insight: SEC Turns Pro Crypto
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Monthly Chronicles - July 2025
SAP855240_ALP - Defining the Global Template PUBLIC.pdf
Advanced Soft Computing BINUS July 2025.pdf

Customer Personalization @ nearbuy

  • 2. About Me - Data Scientist at Nearbuy - Creating a world of personalization for Nearbuy’s customers
  • 3. AGENDA - Creating Data Pipeline - Kafka ( Real Time Click Stream) - Hbase ( where data sits) - Data Transformation - ML Pipeline - Spark ( Data Feature Extraction ) - ML Algos ( how to use Spark and use cases of each algo) - ALS
  • 4. Data Pipeline @ Nearbuy Schema less DB Click Stream KAFKA SPARK STREAM HBASE SPARK ML RECOMMEN DATIONS REAL TIME ANALYTICS
  • 5. KAFKA EVENTS Deal view Event { “customerId”:”XXXXXXXXX”, “timeStamp”:”140983388484” “dealId”:”YYYYY”, “source”:”APP”, “os”:”android” ……. }
  • 6. Spark Streaming - Details about how to implement Spark Streaming and set up jobs that runs 24x7 to ingest all click stream data - Implement sessionization of customer activity on Nearbuy - Transform data and store in HBASE
  • 7. Spark ML Pipeline Data Feature Extraction - Categorical Data Will talk about Spark ML Algos - Collaborative Filtering ( Implicit & Explicit Feedback ) - K Means - Linear Regression Will explain each algo in depth and its use cases and how to implement it using SPARK.
  • 8. Common Pitfalls - Too much data - Stream vs Batch Data - Customer Sessionization - Spark Cluster Mode Issues
  • 9. Takeaways - Personalization of APP can be done in many forms , our use case is one of them ( ecommerce ) - Data visualization and ML Model selection - Anyone who is interested to start with Data Analytics will get great insight - Someone who has to start ML will get to know how to use SPARK . - Someone who is already doing this can get to know how other companies are implementing SPARK for ML - Spark ML best Practices