SlideShare a Scribd company logo
Data Science @ Instacart
Sharath Rao
Data Scientist / Manager
Search and Discovery
Collaborators: Angadh Singh and Shishir Prasad
v
The Instacart Value Proposition
Groceries from stores
you love
delivered
to your
doorstep
in as little
as an hour
+ + + =
v
Customer Experience
Select a

Store
Shop for
Groceries
Checkout Select Delivery
Time
Delivered
to Doorstep
v
Shopper Experience
Accept Order Find the
Groceries
Out for
Delivery
Delivered
to Doorstep
Scan Barcode
v
Four Sided Marketplace
Customers Shoppers
Products

(Advertisers)
Search
Advertising
Shopping
Delivery
Customer Service
Inventory
Picking
Loyalty
Stores

(Retailers)
v
Two topics today
A Recommendation System
for Discovery
Using Data Science for out of
stock mitigation
v
Online grocery vs Traditional e-commerce
Week 3Week 2
Online
Grocery
Week 1
Traditional
e-commerce
v
Grocery Shopping in “Low Dimensional Space”
Search
Restock
Explore
+
+
=
v
Why personalization at Instacart
Your storeEverybody’s store
v
Repeat purchases increase LTV of recommendations
$5.49
$549
Today A year later
1 +….+ 100
$549
$549
vDifferent recommendation systems address different needs
v
Personalized Top N recommendations
Promote broad-based discovery
in a dynamic catalog
Including from stores customers
may have never shopped
v
Run out of X?
Rank products by
repurchase probability
v
Personalized recommendations of new products
when customers seek out what is new out there
Also addresses product cold start problems
v
Replacement Product Recommendations
Mitigate adverse impact of
last-minute out of stocks
v
“Frequently bought with” Recommendations
Not necessarily
consumed together
Help customers shop for
complementary products
and try alternatives
Probably
consumed together
vPersonalized Top N Recommendations
v
Learning from feedback
Traditionally collaborative filtering used explicit feedback to predict ratings
There may still bias in whether the user chooses to rate
Explicit Feedback Implicit Feedback
v
Learning from Explicit Feedback
• Explicit feedback may be more reliable but there is much less of it
• Less reliable if users rate based on aspirations instead of true preferences
vs
v
Implicit Feedback - trade-off quality and quantity
Strengthofevidence
Number of Events
v
Architecture
Event Data Score and
Select Top N
(Spark/EMR)
User/Product Factors
Event Data
Run-time
ranking for
diversity
Candidate
Selection
ALS
(Spark/EMR)
Generate
User-Product
Matrix
v
A Matrix Factorization Formulation for Implicit Feedback
N Products
MUsers
1
-
-
9
-
-
-
3
20
User Product Matrix
R; (M x N)
1
0
0
1
0
0
0
1
1binary
preferences
Preference Matrix R;
(M x N)
“Collaborative Filtering for Implicit Feedback” - Hu et. al
v
A Matrix Factorization Formulation for Implicit Feedback
~
Y
XT
Product Factors
(k x N)
User Factors
(M x k)
1
0
0
1
0
0
0
1
1
x
Preference Matrix R;
(M x N)
v
Matrix Factorization from Implicit Feedback - The Intuition
#Purchases Preference p Confidence c
0 0 Low
1 1 Low
>>1 1 High
• Confidence increases linearly with purchases r
• c = 1 + alpha * r
• alpha controls the marginal rate of learning from user purchases
• Key questions
• How should the unobserved events be treated
• How should one trade-off observed and the unobserved
v
Regularized Weighted Squared Loss
Confidence
User
Factors
Matrix
Product
Factors
Matrix
Preference
Matrix Regularization
Solve using Alternating Least Squares
v
Architecture
Generate
User-Product
Matrix
ALS
(Spark/EMR)
Score and
Select Top N
(Spark/EMR)
User/Product Factors
Run-time
ranking for
diversity
Candidate
Selection
Event Data
Event Data
v
Spark ALS Hyper-parameter Tuning
• rank k - diminishing returns after 150
• alpha - controls rate of learning from observed events
• iterations - ALS tends to converge within 5, seldom more than 10
• lambda - regularization parameter
v
Architecture
Generate
User-Product
Matrix
ALS Matrix
Factorization
(Spark/EMR)
Candidate
Selection
Score and
Select Top N
(Spark/EMR)
User/Product Factors
Run-time
ranking for
diversity
Event Data
Event Data
v
Scoring user and products
With millions of products and users, scoring every (user, product) pair is prohibitive
Two goals in selecting products to score
• Long tail which have not been discovered
• Products that have an a priori high purchase rate (popular)
~
v
Trade-off popularity and discovery in the tail
We start with simple stratified sampling
For each user, score N products
Sample h products from Head
Sample t products from tail
N ~ 10000
h ~ 3000
t ~7000
v
Tuning Spark For ALS
Understanding Spark execution model and its implementation of ALS helps
• Training is communication heavy1
, set partitions <= #CPU cores
• Scoring is memory intensive
• Broad guidelines2
• Limit executor memory to 64GB
• 5 cores per executor
• Set executors based on data size
1 - http://apache-spark-user-list.1001560.n3.nabble.com/Error-No-space-left-on-device-tp9887p9896.html
2 - http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/
v
A/B Test Setup
Generate
User-Product
Matrix
ALS
(Spark/EMR)
Score and
Select Top N
(Spark/EMR)
User/Product Factors
Run-time
diversity ranking
Candidate
Selection
Event Data
Event Data
Weekly for past N
months data
Weekly for users with
recent activity
v
A/B Test Results
• Statistically significant increases
• Items per order
• GMV per order
• Total product sales spread over more
categories
v
Ok, we have a recommendation system
Where do we go from here?
v
What else do you do with user and product factors?
Score (user, product) pair on demand
Get Top N similar users
Get Top N similar product
As features in other models
v
Products similar to “Haigs Spicy Hummus"
More “Spicy Hummus”
Spicy Salsas
Generated using Approximate Nearest Neighbor
(“annoy” from Spotify)
v
What next
• Make recommendations more contextual
• Explain recommendations (“Because you did X”)
vMitigating the effect of out of stocks
v
• what are out of stocks
• why do they happen
• how data science helps mitigate effects
v
Out of stocks - Customer Context
“Deliver Ice Cream from Whole Foods
Market SOMA at 8 pm tomorrow”
v
Online services
Supply InfiniteLimited
Fulfillment
Immediate
Future
v
Traditional E-commerce
• Manage inventory in warehouses optimized for quick
fulfillment
• Customers only specify the “What”
• Disallow users from ordering out of stock products
• Set expectations
• “3 day shipping” but will ship in 10 business days
v
On-demand delivery from local retailers
• Shoppers navigate a complex environment where products
• may have run out
• may be misplaced
• may be damaged
• Customers specify “What”, “When” and “Where from”
• Improvise under uncertainty
v
Customers
Advertisers

(brands)
Stores

(Retailers)
lose revenue and
trust of customers
Everybody loses when out of stocks happen
• don’t get exactly what
they want
• must contemplate
and/or communicate
replacements
lose revenue and
trust of customers
• waste time searching for
items that aren’t in store
• context switch to
searching and
communicating
replacements
Shoppers
v
Out of stock rate - an illustration
v
v
A probable solution
Do not show or allow customers to order items
that are currently out of stock
v
A probable (but terrible) solution
• Customers really know these stores
• “Missing” items is seen as a sign of an unreliable catalog/service
• May have been out of stock this morning but could be available when the
order is fulfilled
• Sets up negative spirals
“I was there over the
weekend. Please check behind
the cheeses aisle”
“Are you telling
me they don’t carry
strawberries?”
v
Solution that works reasonably well
• Shoppers can see Instacart recommended
replacements while shopping in the store
• Customers may also specify or choose from
recommended replacements
• Relatively more flexibility with groceries
• Some services offer to cancel the order if
an item isn’t available
v
Instacart Recommended Replacements
Flavor PackingSizeBrand Price
• Several product attributes matter
• Context matters, might benefit from personalization
• Must scale to millions of products
• Not always symmetric
• May be ok to replace X with gluten free X but not the other way around
Diet
Info
v
• Shoppers are trained to pick replacements
• But shoppers can benefit from algorithmic suggestions
• Many unfamiliar products in a vast catalog
• Validation for common products
• Finding replacements fast improves operational efficiency
Replacement Recommendations for Shoppers
v
• Customers can specify replacements while placing the order
• Can choose to communicate with the shopper in store to verify
Replacement Recommendations for Customers
v
What could we do if could predict item availability?
Customer
location
Nearest
store
Farther, but
better availability
Controlling for retailer and quality,
customer is indifferent to physical location
v
The Item Availability Prediction
Probability( Item in store | time, context)
What is probability that an item will be at
the store when the shopper shows up to
look for it?
v
Item Availability as a Classification Problem
TIMESTAMP, ITEM IDENTIFIED, IN STORE?
• Millions of examples from historical data
• Feature Engineering
• historical availability at multiple resolutions
• Eg: time since last “not found” event
• Item attributes
• Eg: perishables restocked differently than personal care
• Temporal Features
v
Training and Scoring
Feature
Extraction
XGBoost
Training
Scoring
Feature
Extraction
Event Data
Event Data
Model
Store
Weekly with over 2 months
of training data
Cache
availability
scores
Score tens of millions of items
every hour
v
Serving and Optimization Layer
Fulfillment
Engine
Order
Fulfillment plan:
Store location, Shopper etc.
Items,
eligible store
locations
Availability
scores
Active in production with an acceptable
trade-off between
fulfillment efficiency and refund rate
v
Whats next
• Leverage model predictions for other features/data products
• Avoid negative feedback loops!
• Biased training data
• only have access to what is ordered through Instacart
• Tighter integrations with retailer data
• Scaling: continue to score a growing catalog at tight SLAs
WE’RE HIRING!


@sharathrao
v
Appendix
v
Offline evaluation
• Ideally we want to evaluate user response to recommendations
• But we will only know this from an live A/B test
• Recall based metrics are an offline proxy (albeit not the best)
• Recall: “Fraction of purchased products covered among Top N
recommendations”
• We only use this for hyper parameter tuning
v
Ensembles
Use different types of evidence and/or product metadata to easily create ensembles
User x Products Purchased
User x Products Viewed
User x Brands Purchased
Model or Linear
Combination
…
v
What better promotes broad-based discovery
vs
v
Online ranking for diversity
“Diversity within sessions, Novelty across sessions”
“Establish trust in a fresh and comprehensive catalog”
“Less is more”
Cached list of
~1000 products
per user
Final list of
<100 products
promote diversity
v
Diversity
Top K products - ranked by score
Rank product categories by their median product score
> > >
v
Weighted sampling for diversity
Sample category in
proportion to score
Within category, sample in
proportion to product score
v
Architecture
Generate
User-Product
Matrix
ALS
(Spark/EMR)
Score and
Select Top N
(Spark/EMR)
User/Product Factors
Run-time
diversity ranking
Candidate
Selection
Event Data
Event Data
v
Out of stocks happen due to uncertainty in several places
Order fulfillment in (distant) future
Cannot hold inventory
Real-time inventory tracking across
thousands of locations isn’t perfect (yet)
Customer might reschedule delivery
Ad

More Related Content

What's hot (20)

Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
Liang Xiang
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
Crossing Minds
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
PratisthaSingh5
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
Crossing Minds
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Databricks
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
Stanley Wang
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
Robin Reni
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Francesco Casalegno
 
Prediction of potential customers for term deposit
Prediction of potential customers for term depositPrediction of potential customers for term deposit
Prediction of potential customers for term deposit
Pranov Mishra
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Carlos Castillo (ChaTo)
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
Georgian Micsa
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
Trieu Nguyen
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 
Amazon Sponsored Products & Brands Workshop
Amazon Sponsored Products & Brands WorkshopAmazon Sponsored Products & Brands Workshop
Amazon Sponsored Products & Brands Workshop
Tinuiti
 
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarAnatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur Datar
Naresh Jain
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
David Zibriczky
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
Liang Xiang
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
Crossing Minds
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
Crossing Minds
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Databricks
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
Stanley Wang
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
Robin Reni
 
Prediction of potential customers for term deposit
Prediction of potential customers for term depositPrediction of potential customers for term deposit
Prediction of potential customers for term deposit
Pranov Mishra
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
Georgian Micsa
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
Trieu Nguyen
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 
Amazon Sponsored Products & Brands Workshop
Amazon Sponsored Products & Brands WorkshopAmazon Sponsored Products & Brands Workshop
Amazon Sponsored Products & Brands Workshop
Tinuiti
 
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarAnatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur Datar
Naresh Jain
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
David Zibriczky
 

Viewers also liked (10)

Instacart_Presentation[1]
Instacart_Presentation[1]Instacart_Presentation[1]
Instacart_Presentation[1]
Nishant Saboo
 
PSFK Future of Retail 2015 Report - Summary Presentation
PSFK Future of Retail 2015 Report - Summary PresentationPSFK Future of Retail 2015 Report - Summary Presentation
PSFK Future of Retail 2015 Report - Summary Presentation
PSFK
 
Recommendation Systems @ Instacart
Recommendation Systems @ InstacartRecommendation Systems @ Instacart
Recommendation Systems @ Instacart
Sharath Rao
 
Resume 2.0 for @kymchiho
Resume 2.0 for @kymchihoResume 2.0 for @kymchiho
Resume 2.0 for @kymchiho
Kim Ho
 
Dr.martin luther king
 Dr.martin luther king Dr.martin luther king
Dr.martin luther king
Colette Mcelroy
 
Amazon Fresh - Mock Marketing Plan
Amazon Fresh - Mock Marketing PlanAmazon Fresh - Mock Marketing Plan
Amazon Fresh - Mock Marketing Plan
Helena Lavieri
 
Coffee Like - презентация компании
Coffee Like - презентация компанииCoffee Like - презентация компании
Coffee Like - презентация компании
Аяз Шабутдинов
 
PSFK Future of Retail 2016 Summary Report
PSFK Future of Retail 2016 Summary ReportPSFK Future of Retail 2016 Summary Report
PSFK Future of Retail 2016 Summary Report
PSFK
 
Solving for X: Why the Future of Business is Experiential
Solving for X: Why the Future of Business is ExperientialSolving for X: Why the Future of Business is Experiential
Solving for X: Why the Future of Business is Experiential
Brian Solis
 
Startup Ideas and Validation
Startup Ideas and ValidationStartup Ideas and Validation
Startup Ideas and Validation
Yevgeniy Brikman
 
Instacart_Presentation[1]
Instacart_Presentation[1]Instacart_Presentation[1]
Instacart_Presentation[1]
Nishant Saboo
 
PSFK Future of Retail 2015 Report - Summary Presentation
PSFK Future of Retail 2015 Report - Summary PresentationPSFK Future of Retail 2015 Report - Summary Presentation
PSFK Future of Retail 2015 Report - Summary Presentation
PSFK
 
Recommendation Systems @ Instacart
Recommendation Systems @ InstacartRecommendation Systems @ Instacart
Recommendation Systems @ Instacart
Sharath Rao
 
Resume 2.0 for @kymchiho
Resume 2.0 for @kymchihoResume 2.0 for @kymchiho
Resume 2.0 for @kymchiho
Kim Ho
 
Amazon Fresh - Mock Marketing Plan
Amazon Fresh - Mock Marketing PlanAmazon Fresh - Mock Marketing Plan
Amazon Fresh - Mock Marketing Plan
Helena Lavieri
 
Coffee Like - презентация компании
Coffee Like - презентация компанииCoffee Like - презентация компании
Coffee Like - презентация компании
Аяз Шабутдинов
 
PSFK Future of Retail 2016 Summary Report
PSFK Future of Retail 2016 Summary ReportPSFK Future of Retail 2016 Summary Report
PSFK Future of Retail 2016 Summary Report
PSFK
 
Solving for X: Why the Future of Business is Experiential
Solving for X: Why the Future of Business is ExperientialSolving for X: Why the Future of Business is Experiential
Solving for X: Why the Future of Business is Experiential
Brian Solis
 
Startup Ideas and Validation
Startup Ideas and ValidationStartup Ideas and Validation
Startup Ideas and Validation
Yevgeniy Brikman
 
Ad

Similar to Data Science @ Instacart (20)

WrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data ProductsWrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data Products
Sharath Rao
 
Market Basket Analysis.ppt
Market Basket Analysis.pptMarket Basket Analysis.ppt
Market Basket Analysis.ppt
UshaSeshadri1
 
Data Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analyticsData Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analytics
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
DataEngConf 2017 - Machine Learning Models in Production
DataEngConf 2017 - Machine Learning Models in ProductionDataEngConf 2017 - Machine Learning Models in Production
DataEngConf 2017 - Machine Learning Models in Production
Sharath Rao
 
Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...
Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...
Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...
Cloudera, Inc.
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better Search
Lucidworks
 
Winning Supply Chain in Omnichannel - Trends and Implications
Winning Supply Chain in Omnichannel - Trends and ImplicationsWinning Supply Chain in Omnichannel - Trends and Implications
Winning Supply Chain in Omnichannel - Trends and Implications
Michael Hu
 
Data and Consumer Product Development
Data and Consumer Product DevelopmentData and Consumer Product Development
Data and Consumer Product Development
Gaurav Bhalotia
 
Analytics for the supply chain
Analytics for the supply chain Analytics for the supply chain
Analytics for the supply chain
Saurav Kumar
 
Holly Jolly Holiday Retention Strategies
Holly Jolly Holiday Retention StrategiesHolly Jolly Holiday Retention Strategies
Holly Jolly Holiday Retention Strategies
WhatConts
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
TejaspathiLV
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
prathyusha1234
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabad
prathyusha1234
 
best online data science courses
best online data science coursesbest online data science courses
best online data science courses
prathyusha1234
 
Agile supply chain
Agile supply chainAgile supply chain
Agile supply chain
Archil Nasrashvili
 
Would this clothing fit me
Would this clothing fit meWould this clothing fit me
Would this clothing fit me
Rishabh Misra
 
Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...
Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...
Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...
SIBM Bangalore
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Pedro Ecija Serrano
 
Bmgt 411 chapter_4
Bmgt 411 chapter_4Bmgt 411 chapter_4
Bmgt 411 chapter_4
Chris Lovett
 
Restaurant analytics pdf
Restaurant analytics pdfRestaurant analytics pdf
Restaurant analytics pdf
Manthan Solutions
 
WrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data ProductsWrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data Products
Sharath Rao
 
Market Basket Analysis.ppt
Market Basket Analysis.pptMarket Basket Analysis.ppt
Market Basket Analysis.ppt
UshaSeshadri1
 
DataEngConf 2017 - Machine Learning Models in Production
DataEngConf 2017 - Machine Learning Models in ProductionDataEngConf 2017 - Machine Learning Models in Production
DataEngConf 2017 - Machine Learning Models in Production
Sharath Rao
 
Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...
Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...
Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...
Cloudera, Inc.
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better Search
Lucidworks
 
Winning Supply Chain in Omnichannel - Trends and Implications
Winning Supply Chain in Omnichannel - Trends and ImplicationsWinning Supply Chain in Omnichannel - Trends and Implications
Winning Supply Chain in Omnichannel - Trends and Implications
Michael Hu
 
Data and Consumer Product Development
Data and Consumer Product DevelopmentData and Consumer Product Development
Data and Consumer Product Development
Gaurav Bhalotia
 
Analytics for the supply chain
Analytics for the supply chain Analytics for the supply chain
Analytics for the supply chain
Saurav Kumar
 
Holly Jolly Holiday Retention Strategies
Holly Jolly Holiday Retention StrategiesHolly Jolly Holiday Retention Strategies
Holly Jolly Holiday Retention Strategies
WhatConts
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
TejaspathiLV
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
prathyusha1234
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabad
prathyusha1234
 
best online data science courses
best online data science coursesbest online data science courses
best online data science courses
prathyusha1234
 
Would this clothing fit me
Would this clothing fit meWould this clothing fit me
Would this clothing fit me
Rishabh Misra
 
Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...
Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...
Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...
SIBM Bangalore
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Pedro Ecija Serrano
 
Bmgt 411 chapter_4
Bmgt 411 chapter_4Bmgt 411 chapter_4
Bmgt 411 chapter_4
Chris Lovett
 
Ad

Recently uploaded (19)

APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC
 
Perguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolhaPerguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolha
socaslev
 
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry SweetserAPNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)
APNIC
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
Computers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers NetworksComputers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers Networks
Tito208863
 
5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx
andani26
 
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
White and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptxWhite and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptx
canumatown
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
DataProvider1
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
Understanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep WebUnderstanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep Web
nabilajabin35
 
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation TemplateSmart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
yojeari421237
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
IT Services Workflow From Request to Resolution
IT Services Workflow From Request to ResolutionIT Services Workflow From Request to Resolution
IT Services Workflow From Request to Resolution
mzmziiskd
 
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC
 
Perguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolhaPerguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolha
socaslev
 
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry SweetserAPNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)
APNIC
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
Computers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers NetworksComputers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers Networks
Tito208863
 
5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx
andani26
 
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
White and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptxWhite and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptx
canumatown
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
DataProvider1
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
Understanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep WebUnderstanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep Web
nabilajabin35
 
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation TemplateSmart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
yojeari421237
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
IT Services Workflow From Request to Resolution
IT Services Workflow From Request to ResolutionIT Services Workflow From Request to Resolution
IT Services Workflow From Request to Resolution
mzmziiskd
 

Data Science @ Instacart

  • 1. Data Science @ Instacart Sharath Rao Data Scientist / Manager Search and Discovery Collaborators: Angadh Singh and Shishir Prasad
  • 2. v The Instacart Value Proposition Groceries from stores you love delivered to your doorstep in as little as an hour + + + =
  • 3. v Customer Experience Select a
 Store Shop for Groceries Checkout Select Delivery Time Delivered to Doorstep
  • 4. v Shopper Experience Accept Order Find the Groceries Out for Delivery Delivered to Doorstep Scan Barcode
  • 5. v Four Sided Marketplace Customers Shoppers Products
 (Advertisers) Search Advertising Shopping Delivery Customer Service Inventory Picking Loyalty Stores
 (Retailers)
  • 6. v Two topics today A Recommendation System for Discovery Using Data Science for out of stock mitigation
  • 7. v Online grocery vs Traditional e-commerce Week 3Week 2 Online Grocery Week 1 Traditional e-commerce
  • 8. v Grocery Shopping in “Low Dimensional Space” Search Restock Explore + + =
  • 9. v Why personalization at Instacart Your storeEverybody’s store
  • 10. v Repeat purchases increase LTV of recommendations $5.49 $549 Today A year later 1 +….+ 100 $549 $549
  • 11. vDifferent recommendation systems address different needs
  • 12. v Personalized Top N recommendations Promote broad-based discovery in a dynamic catalog Including from stores customers may have never shopped
  • 13. v Run out of X? Rank products by repurchase probability
  • 14. v Personalized recommendations of new products when customers seek out what is new out there Also addresses product cold start problems
  • 15. v Replacement Product Recommendations Mitigate adverse impact of last-minute out of stocks
  • 16. v “Frequently bought with” Recommendations Not necessarily consumed together Help customers shop for complementary products and try alternatives Probably consumed together
  • 17. vPersonalized Top N Recommendations
  • 18. v Learning from feedback Traditionally collaborative filtering used explicit feedback to predict ratings There may still bias in whether the user chooses to rate Explicit Feedback Implicit Feedback
  • 19. v Learning from Explicit Feedback • Explicit feedback may be more reliable but there is much less of it • Less reliable if users rate based on aspirations instead of true preferences vs
  • 20. v Implicit Feedback - trade-off quality and quantity Strengthofevidence Number of Events
  • 21. v Architecture Event Data Score and Select Top N (Spark/EMR) User/Product Factors Event Data Run-time ranking for diversity Candidate Selection ALS (Spark/EMR) Generate User-Product Matrix
  • 22. v A Matrix Factorization Formulation for Implicit Feedback N Products MUsers 1 - - 9 - - - 3 20 User Product Matrix R; (M x N) 1 0 0 1 0 0 0 1 1binary preferences Preference Matrix R; (M x N) “Collaborative Filtering for Implicit Feedback” - Hu et. al
  • 23. v A Matrix Factorization Formulation for Implicit Feedback ~ Y XT Product Factors (k x N) User Factors (M x k) 1 0 0 1 0 0 0 1 1 x Preference Matrix R; (M x N)
  • 24. v Matrix Factorization from Implicit Feedback - The Intuition #Purchases Preference p Confidence c 0 0 Low 1 1 Low >>1 1 High • Confidence increases linearly with purchases r • c = 1 + alpha * r • alpha controls the marginal rate of learning from user purchases • Key questions • How should the unobserved events be treated • How should one trade-off observed and the unobserved
  • 25. v Regularized Weighted Squared Loss Confidence User Factors Matrix Product Factors Matrix Preference Matrix Regularization Solve using Alternating Least Squares
  • 26. v Architecture Generate User-Product Matrix ALS (Spark/EMR) Score and Select Top N (Spark/EMR) User/Product Factors Run-time ranking for diversity Candidate Selection Event Data Event Data
  • 27. v Spark ALS Hyper-parameter Tuning • rank k - diminishing returns after 150 • alpha - controls rate of learning from observed events • iterations - ALS tends to converge within 5, seldom more than 10 • lambda - regularization parameter
  • 28. v Architecture Generate User-Product Matrix ALS Matrix Factorization (Spark/EMR) Candidate Selection Score and Select Top N (Spark/EMR) User/Product Factors Run-time ranking for diversity Event Data Event Data
  • 29. v Scoring user and products With millions of products and users, scoring every (user, product) pair is prohibitive Two goals in selecting products to score • Long tail which have not been discovered • Products that have an a priori high purchase rate (popular) ~
  • 30. v Trade-off popularity and discovery in the tail We start with simple stratified sampling For each user, score N products Sample h products from Head Sample t products from tail N ~ 10000 h ~ 3000 t ~7000
  • 31. v Tuning Spark For ALS Understanding Spark execution model and its implementation of ALS helps • Training is communication heavy1 , set partitions <= #CPU cores • Scoring is memory intensive • Broad guidelines2 • Limit executor memory to 64GB • 5 cores per executor • Set executors based on data size 1 - http://apache-spark-user-list.1001560.n3.nabble.com/Error-No-space-left-on-device-tp9887p9896.html 2 - http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/
  • 32. v A/B Test Setup Generate User-Product Matrix ALS (Spark/EMR) Score and Select Top N (Spark/EMR) User/Product Factors Run-time diversity ranking Candidate Selection Event Data Event Data Weekly for past N months data Weekly for users with recent activity
  • 33. v A/B Test Results • Statistically significant increases • Items per order • GMV per order • Total product sales spread over more categories
  • 34. v Ok, we have a recommendation system Where do we go from here?
  • 35. v What else do you do with user and product factors? Score (user, product) pair on demand Get Top N similar users Get Top N similar product As features in other models
  • 36. v Products similar to “Haigs Spicy Hummus" More “Spicy Hummus” Spicy Salsas Generated using Approximate Nearest Neighbor (“annoy” from Spotify)
  • 37. v What next • Make recommendations more contextual • Explain recommendations (“Because you did X”)
  • 38. vMitigating the effect of out of stocks
  • 39. v • what are out of stocks • why do they happen • how data science helps mitigate effects
  • 40. v Out of stocks - Customer Context “Deliver Ice Cream from Whole Foods Market SOMA at 8 pm tomorrow”
  • 42. v Traditional E-commerce • Manage inventory in warehouses optimized for quick fulfillment • Customers only specify the “What” • Disallow users from ordering out of stock products • Set expectations • “3 day shipping” but will ship in 10 business days
  • 43. v On-demand delivery from local retailers • Shoppers navigate a complex environment where products • may have run out • may be misplaced • may be damaged • Customers specify “What”, “When” and “Where from” • Improvise under uncertainty
  • 44. v Customers Advertisers
 (brands) Stores
 (Retailers) lose revenue and trust of customers Everybody loses when out of stocks happen • don’t get exactly what they want • must contemplate and/or communicate replacements lose revenue and trust of customers • waste time searching for items that aren’t in store • context switch to searching and communicating replacements Shoppers
  • 45. v Out of stock rate - an illustration
  • 46. v
  • 47. v A probable solution Do not show or allow customers to order items that are currently out of stock
  • 48. v A probable (but terrible) solution • Customers really know these stores • “Missing” items is seen as a sign of an unreliable catalog/service • May have been out of stock this morning but could be available when the order is fulfilled • Sets up negative spirals “I was there over the weekend. Please check behind the cheeses aisle” “Are you telling me they don’t carry strawberries?”
  • 49. v Solution that works reasonably well • Shoppers can see Instacart recommended replacements while shopping in the store • Customers may also specify or choose from recommended replacements • Relatively more flexibility with groceries • Some services offer to cancel the order if an item isn’t available
  • 50. v Instacart Recommended Replacements Flavor PackingSizeBrand Price • Several product attributes matter • Context matters, might benefit from personalization • Must scale to millions of products • Not always symmetric • May be ok to replace X with gluten free X but not the other way around Diet Info
  • 51. v • Shoppers are trained to pick replacements • But shoppers can benefit from algorithmic suggestions • Many unfamiliar products in a vast catalog • Validation for common products • Finding replacements fast improves operational efficiency Replacement Recommendations for Shoppers
  • 52. v • Customers can specify replacements while placing the order • Can choose to communicate with the shopper in store to verify Replacement Recommendations for Customers
  • 53. v What could we do if could predict item availability? Customer location Nearest store Farther, but better availability Controlling for retailer and quality, customer is indifferent to physical location
  • 54. v The Item Availability Prediction Probability( Item in store | time, context) What is probability that an item will be at the store when the shopper shows up to look for it?
  • 55. v Item Availability as a Classification Problem TIMESTAMP, ITEM IDENTIFIED, IN STORE? • Millions of examples from historical data • Feature Engineering • historical availability at multiple resolutions • Eg: time since last “not found” event • Item attributes • Eg: perishables restocked differently than personal care • Temporal Features
  • 56. v Training and Scoring Feature Extraction XGBoost Training Scoring Feature Extraction Event Data Event Data Model Store Weekly with over 2 months of training data Cache availability scores Score tens of millions of items every hour
  • 57. v Serving and Optimization Layer Fulfillment Engine Order Fulfillment plan: Store location, Shopper etc. Items, eligible store locations Availability scores Active in production with an acceptable trade-off between fulfillment efficiency and refund rate
  • 58. v Whats next • Leverage model predictions for other features/data products • Avoid negative feedback loops! • Biased training data • only have access to what is ordered through Instacart • Tighter integrations with retailer data • Scaling: continue to score a growing catalog at tight SLAs
  • 61. v Offline evaluation • Ideally we want to evaluate user response to recommendations • But we will only know this from an live A/B test • Recall based metrics are an offline proxy (albeit not the best) • Recall: “Fraction of purchased products covered among Top N recommendations” • We only use this for hyper parameter tuning
  • 62. v Ensembles Use different types of evidence and/or product metadata to easily create ensembles User x Products Purchased User x Products Viewed User x Brands Purchased Model or Linear Combination …
  • 63. v What better promotes broad-based discovery vs
  • 64. v Online ranking for diversity “Diversity within sessions, Novelty across sessions” “Establish trust in a fresh and comprehensive catalog” “Less is more” Cached list of ~1000 products per user Final list of <100 products promote diversity
  • 65. v Diversity Top K products - ranked by score Rank product categories by their median product score > > >
  • 66. v Weighted sampling for diversity Sample category in proportion to score Within category, sample in proportion to product score
  • 67. v Architecture Generate User-Product Matrix ALS (Spark/EMR) Score and Select Top N (Spark/EMR) User/Product Factors Run-time diversity ranking Candidate Selection Event Data Event Data
  • 68. v Out of stocks happen due to uncertainty in several places Order fulfillment in (distant) future Cannot hold inventory Real-time inventory tracking across thousands of locations isn’t perfect (yet) Customer might reschedule delivery