SlideShare a Scribd company logo
1
From practice to theory
in learning from massive data
Charles Elkan
Amazon Fellow
August 14, 2016
Important
Information here is already public.
Opinions are mine, not Amazon’s.
3
Outline
Only 30 minutes!
1. Detecting anomalies in streaming data
2. Making Spark usable for real-time predictions
3. Amazon’s most important algorithm for recommendations
4. Uplift: We want causation, not merely correlation
Outline
1. Detecting anomalies in streaming data
2. Making Spark usable for real-time predictions
3. Amazon’s most important algorithm for recommendations
4. Uplift: We want causation, not merely correlation
From practice to theory
From theory to practice
Now for everyone!
Outline
1. Detecting anomalies in streaming data
2. Making Spark usable for real-time predictions
3. Amazon’s most important algorithm for recommendations
4. Uplift: We want causation, not merely correlation
From practice to practice
From Practice to Theory in Learning from Massive Data by Charles Elkan at BigMine16
Outline
1. Detecting anomalies in streaming data
2. Making Spark usable for real-time predictions
3. Amazon’s most important algorithm for recommendations
4. Uplift: We want causation, not merely correlation
13
Academic versus applied
In theory, researchers favor simplicity. In practice, they don’t.
In industry, simplicity genuinely wins.
Example: Desiderata for recommender systems:
1. Respect the privacy of users; don’t be creepy.
2. Make recommendations understandable.
3. Make them responsive to the user’s most recent interests.
4. Generate them with millisecond latency.
14
Amazon’s most important recommender system
1. Respect the privacy of users; don’t be creepy.
2. Make recommendations understandable.
3. And responsive to the user’s most recent interests.
4. Generate them with millisecond latency.
Outline
1. Detecting anomalies in streaming data
2. Making Spark usable for real-time predictions
3. Amazon’s most important algorithm for recommendations
4. Uplift: We want causation, not merely correlation
What data scientists do every day
Let x be a user and let R = 0 or 1 be a response. For example, R=1
means the user buys shoes in the next month.
Routinely, we train models to predict the probability p(R=1|x).
We send messages and coupons to users with high p(R=1|x).
16
Is p(R=1|x) actually useful?
In principle, no. "Our goal is not to predict the future; it is to
change the future."
• Merely predicting user behavior is of limited interest.
We want to select treatments that influence users.
• T = t means we choose treatment t.
• For each available t, compute p(R=1|x,T=t).
• Choose the t that gives highest probability.
17
The risk of ignoring uplift
18
Users are ranked by p(R=1|x), shown by the brown line.
The blue dashed line shows p(R=1|x,T=t) .
The treatment t has a negative effect for users in the top 5%:
p(R=1|x,T=t) < p(R=1|x).
Politicians know this …
If you are a Republican, don’t target confirmed Democrat voters!
Instead:
• Send persuasive messages to undecided voters.
• Send “get out the vote” messages to confirmed supporters.
• Send “please donate” messages to these people also.
A common scenario for uplift
Many treatments are almost free to apply, such as sending email.
The uplift question is then which treatment is most effective.
For each user x, we want to know which t has highest value
p(R=1|x,T=t).
Keep in mind: The same treatment may be the best for all x.
20
A public dataset
Published by Kevin Hillstrom, former VP of database marketing
at Nordstrom.
Studied in several published papers on uplift, notably by Nicholas
Radcliffe, professor at the University of Edinburgh.
• 64,000 past customers of an e-commerce site selling clothing.
• Randomized to no email, men’s email, or women’s email.
• Three outcomes: Binary visit? purchase? and numerical spend.
21
Looking at the data
22
Treatments have a larger effect on “visit” than on “purchase
given visit” or on “spend given purchase.”
We'll analyze uplift (i.e., the causal influence of treatments)
for visits.
Table from Hillstrom’s MineThatData email analytics challenge by Radcliffe.
The linear probability model
Assume the linear function p(R=1|x) = b0 + ∑i bi * xi.
• Find coefficients bi to minimize square loss.
Square loss is proper, so predicted probabilities are calibrated.
Avoid overfitting and predictions <0 or >1 by not having too
many predictors.
Commonly used in econometrics, not in ML. In practice, often
quite similar to logistic regression.
23
probability of visit =
7.5% + … +
6.5% IF (men’s past
AND men’s email) +
6.6% IF (women’s
past AND men’s
email) +
6.1% IF (women’s
past AND women’s
email)
24
Including treatment indicators M and W
25
The men’s email is effective for customers who have
previously purchased men’s or women’s clothing.
The women’s email is not effective for customers who have
previously purchased only men’s clothing.
26
Optimal treatment policy:
• If only men’s previous purchases: send men’s email.
• If only women’s purchases: send either email.
• If both: send men’s email.
Hypothesis: Women tend to buy clothing for their families,
but men tend to buy clothing only for themselves.
Validation
How can we confirm that we have found an optimal policy?
Approach:
1. Train models of response for each treatment.
2. For each user x in a test set, plot both predicted probabilities.
3. Three separate test sets: users who previously purchased only
women’s clothing, only men’s, or both.
4. The latter two sets should show p(R=1|x, T=M) > p(R=1|x, T=W)
for most x.
Results using random forests:
Lower two panels: As expected,
p(R=1|x, T=M) > p(R=1|x, T=W).
Top panel: The two treatments
M and W are equally effective.
What comes next?
Conclusion: Indeed, one treatment (the men’s email) can be
optimal for all customers.
The step beyond uplift modeling is reinforcement learning:
Learning a sequence of actions that is best for each user.
• The goal is to maximize total lifetime reward from each
customer.
• Learn simultaneously how customers evolve and how
they respond to actions that we take.
29
Questions?
1. Detecting anomalies in streaming data
2. Making Spark usable for real-time predictions
3. Amazon’s most important algorithm for recommendations
4. Uplift: We want causation, not merely correlation
Ad

More Related Content

What's hot (6)

Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your research
Dorothy Bishop
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced data
SaurabhWani6
 
Statistical Test
Statistical TestStatistical Test
Statistical Test
guestdbf093
 
Qnt 275 final exam july 2017 version
Qnt 275 final exam july 2017 versionQnt 275 final exam july 2017 version
Qnt 275 final exam july 2017 version
Adams-ASs
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your research
Dorothy Bishop
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced data
SaurabhWani6
 
Statistical Test
Statistical TestStatistical Test
Statistical Test
guestdbf093
 
Qnt 275 final exam july 2017 version
Qnt 275 final exam july 2017 versionQnt 275 final exam july 2017 version
Qnt 275 final exam july 2017 version
Adams-ASs
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 

Similar to From Practice to Theory in Learning from Massive Data by Charles Elkan at BigMine16 (20)

Week5-TH [Autosaved].pptx Staatistics Stab22
Week5-TH [Autosaved].pptx Staatistics Stab22Week5-TH [Autosaved].pptx Staatistics Stab22
Week5-TH [Autosaved].pptx Staatistics Stab22
dhritidey24
 
Personalized News Recommendation (Stream Data Based)
Personalized News Recommendation (Stream Data Based)Personalized News Recommendation (Stream Data Based)
Personalized News Recommendation (Stream Data Based)
Umesh Singla
 
ML_Lec4 introduction to linear regression.pdf
ML_Lec4 introduction to linear regression.pdfML_Lec4 introduction to linear regression.pdf
ML_Lec4 introduction to linear regression.pdf
BeshoyArnest
 
Causality without headaches
Causality without headachesCausality without headaches
Causality without headaches
Benoît Rostykus
 
Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis
Minha Hwang
 
Uplift Modeling Workshop
Uplift Modeling WorkshopUplift Modeling Workshop
Uplift Modeling Workshop
odsc
 
Lecture: introduction to Machine Learning.ppt
Lecture: introduction to Machine Learning.pptLecture: introduction to Machine Learning.ppt
Lecture: introduction to Machine Learning.ppt
NiteshJha97
 
Data Science Interview Questions PDF By ScholarHat
Data Science Interview Questions PDF By ScholarHatData Science Interview Questions PDF By ScholarHat
Data Science Interview Questions PDF By ScholarHat
Scholarhat
 
DATA COLLECTION IN RESEARCH
DATA COLLECTION IN RESEARCHDATA COLLECTION IN RESEARCH
DATA COLLECTION IN RESEARCH
SHAYA'A OTHMAN MANAGEMENT & RESEARCH METHODOLOGY
 
Machine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.pptMachine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.ppt
ShivaShiva783981
 
151028_abajpai1
151028_abajpai1151028_abajpai1
151028_abajpai1
Anshumaan Bajpai
 
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedUplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Rising Media Ltd.
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
Arunangsu Sahu
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
gadissaassefa
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
csula its training
 
d4365515-cf15-43ce-9147-da4991ae0dff.pdf
d4365515-cf15-43ce-9147-da4991ae0dff.pdfd4365515-cf15-43ce-9147-da4991ae0dff.pdf
d4365515-cf15-43ce-9147-da4991ae0dff.pdf
vasia1111
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
MLconf
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
Olivier Jeunen
 
Data Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That WayData Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That Way
Melinda Thielbar
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
pujashri1975
 
Week5-TH [Autosaved].pptx Staatistics Stab22
Week5-TH [Autosaved].pptx Staatistics Stab22Week5-TH [Autosaved].pptx Staatistics Stab22
Week5-TH [Autosaved].pptx Staatistics Stab22
dhritidey24
 
Personalized News Recommendation (Stream Data Based)
Personalized News Recommendation (Stream Data Based)Personalized News Recommendation (Stream Data Based)
Personalized News Recommendation (Stream Data Based)
Umesh Singla
 
ML_Lec4 introduction to linear regression.pdf
ML_Lec4 introduction to linear regression.pdfML_Lec4 introduction to linear regression.pdf
ML_Lec4 introduction to linear regression.pdf
BeshoyArnest
 
Causality without headaches
Causality without headachesCausality without headaches
Causality without headaches
Benoît Rostykus
 
Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis
Minha Hwang
 
Uplift Modeling Workshop
Uplift Modeling WorkshopUplift Modeling Workshop
Uplift Modeling Workshop
odsc
 
Lecture: introduction to Machine Learning.ppt
Lecture: introduction to Machine Learning.pptLecture: introduction to Machine Learning.ppt
Lecture: introduction to Machine Learning.ppt
NiteshJha97
 
Data Science Interview Questions PDF By ScholarHat
Data Science Interview Questions PDF By ScholarHatData Science Interview Questions PDF By ScholarHat
Data Science Interview Questions PDF By ScholarHat
Scholarhat
 
Machine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.pptMachine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.ppt
ShivaShiva783981
 
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedUplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Rising Media Ltd.
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
Arunangsu Sahu
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
gadissaassefa
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
csula its training
 
d4365515-cf15-43ce-9147-da4991ae0dff.pdf
d4365515-cf15-43ce-9147-da4991ae0dff.pdfd4365515-cf15-43ce-9147-da4991ae0dff.pdf
d4365515-cf15-43ce-9147-da4991ae0dff.pdf
vasia1111
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
MLconf
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
Olivier Jeunen
 
Data Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That WayData Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That Way
Melinda Thielbar
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
pujashri1975
 
Ad

More from BigMine (10)

Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
BigMine
 
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
BigMine
 
Big Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBig Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina Morik
BigMine
 
Exact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping YeExact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping Ye
BigMine
 
Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...
BigMine
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
BigMine
 
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 Big & Personal: the data and the models behind Netflix recommendations by Xa... Big & Personal: the data and the models behind Netflix recommendations by Xa...
Big & Personal: the data and the models behind Netflix recommendations by Xa...
BigMine
 
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
BigMine
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
BigMine
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
BigMine
 
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
BigMine
 
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
BigMine
 
Big Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBig Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina Morik
BigMine
 
Exact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping YeExact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping Ye
BigMine
 
Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...
BigMine
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
BigMine
 
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 Big & Personal: the data and the models behind Netflix recommendations by Xa... Big & Personal: the data and the models behind Netflix recommendations by Xa...
Big & Personal: the data and the models behind Netflix recommendations by Xa...
BigMine
 
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
BigMine
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
BigMine
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
BigMine
 
Ad

Recently uploaded (20)

GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 

From Practice to Theory in Learning from Massive Data by Charles Elkan at BigMine16

  • 1. 1 From practice to theory in learning from massive data Charles Elkan Amazon Fellow August 14, 2016
  • 2. Important Information here is already public. Opinions are mine, not Amazon’s.
  • 3. 3
  • 4. Outline Only 30 minutes! 1. Detecting anomalies in streaming data 2. Making Spark usable for real-time predictions 3. Amazon’s most important algorithm for recommendations 4. Uplift: We want causation, not merely correlation
  • 5. Outline 1. Detecting anomalies in streaming data 2. Making Spark usable for real-time predictions 3. Amazon’s most important algorithm for recommendations 4. Uplift: We want causation, not merely correlation
  • 7. From theory to practice
  • 9. Outline 1. Detecting anomalies in streaming data 2. Making Spark usable for real-time predictions 3. Amazon’s most important algorithm for recommendations 4. Uplift: We want causation, not merely correlation
  • 10. From practice to practice
  • 12. Outline 1. Detecting anomalies in streaming data 2. Making Spark usable for real-time predictions 3. Amazon’s most important algorithm for recommendations 4. Uplift: We want causation, not merely correlation
  • 13. 13 Academic versus applied In theory, researchers favor simplicity. In practice, they don’t. In industry, simplicity genuinely wins. Example: Desiderata for recommender systems: 1. Respect the privacy of users; don’t be creepy. 2. Make recommendations understandable. 3. Make them responsive to the user’s most recent interests. 4. Generate them with millisecond latency.
  • 14. 14 Amazon’s most important recommender system 1. Respect the privacy of users; don’t be creepy. 2. Make recommendations understandable. 3. And responsive to the user’s most recent interests. 4. Generate them with millisecond latency.
  • 15. Outline 1. Detecting anomalies in streaming data 2. Making Spark usable for real-time predictions 3. Amazon’s most important algorithm for recommendations 4. Uplift: We want causation, not merely correlation
  • 16. What data scientists do every day Let x be a user and let R = 0 or 1 be a response. For example, R=1 means the user buys shoes in the next month. Routinely, we train models to predict the probability p(R=1|x). We send messages and coupons to users with high p(R=1|x). 16
  • 17. Is p(R=1|x) actually useful? In principle, no. "Our goal is not to predict the future; it is to change the future." • Merely predicting user behavior is of limited interest. We want to select treatments that influence users. • T = t means we choose treatment t. • For each available t, compute p(R=1|x,T=t). • Choose the t that gives highest probability. 17
  • 18. The risk of ignoring uplift 18 Users are ranked by p(R=1|x), shown by the brown line. The blue dashed line shows p(R=1|x,T=t) . The treatment t has a negative effect for users in the top 5%: p(R=1|x,T=t) < p(R=1|x).
  • 19. Politicians know this … If you are a Republican, don’t target confirmed Democrat voters! Instead: • Send persuasive messages to undecided voters. • Send “get out the vote” messages to confirmed supporters. • Send “please donate” messages to these people also.
  • 20. A common scenario for uplift Many treatments are almost free to apply, such as sending email. The uplift question is then which treatment is most effective. For each user x, we want to know which t has highest value p(R=1|x,T=t). Keep in mind: The same treatment may be the best for all x. 20
  • 21. A public dataset Published by Kevin Hillstrom, former VP of database marketing at Nordstrom. Studied in several published papers on uplift, notably by Nicholas Radcliffe, professor at the University of Edinburgh. • 64,000 past customers of an e-commerce site selling clothing. • Randomized to no email, men’s email, or women’s email. • Three outcomes: Binary visit? purchase? and numerical spend. 21
  • 22. Looking at the data 22 Treatments have a larger effect on “visit” than on “purchase given visit” or on “spend given purchase.” We'll analyze uplift (i.e., the causal influence of treatments) for visits. Table from Hillstrom’s MineThatData email analytics challenge by Radcliffe.
  • 23. The linear probability model Assume the linear function p(R=1|x) = b0 + ∑i bi * xi. • Find coefficients bi to minimize square loss. Square loss is proper, so predicted probabilities are calibrated. Avoid overfitting and predictions <0 or >1 by not having too many predictors. Commonly used in econometrics, not in ML. In practice, often quite similar to logistic regression. 23
  • 24. probability of visit = 7.5% + … + 6.5% IF (men’s past AND men’s email) + 6.6% IF (women’s past AND men’s email) + 6.1% IF (women’s past AND women’s email) 24 Including treatment indicators M and W
  • 25. 25 The men’s email is effective for customers who have previously purchased men’s or women’s clothing. The women’s email is not effective for customers who have previously purchased only men’s clothing.
  • 26. 26 Optimal treatment policy: • If only men’s previous purchases: send men’s email. • If only women’s purchases: send either email. • If both: send men’s email. Hypothesis: Women tend to buy clothing for their families, but men tend to buy clothing only for themselves.
  • 27. Validation How can we confirm that we have found an optimal policy? Approach: 1. Train models of response for each treatment. 2. For each user x in a test set, plot both predicted probabilities. 3. Three separate test sets: users who previously purchased only women’s clothing, only men’s, or both. 4. The latter two sets should show p(R=1|x, T=M) > p(R=1|x, T=W) for most x.
  • 28. Results using random forests: Lower two panels: As expected, p(R=1|x, T=M) > p(R=1|x, T=W). Top panel: The two treatments M and W are equally effective.
  • 29. What comes next? Conclusion: Indeed, one treatment (the men’s email) can be optimal for all customers. The step beyond uplift modeling is reinforcement learning: Learning a sequence of actions that is best for each user. • The goal is to maximize total lifetime reward from each customer. • Learn simultaneously how customers evolve and how they respond to actions that we take. 29
  • 30. Questions? 1. Detecting anomalies in streaming data 2. Making Spark usable for real-time predictions 3. Amazon’s most important algorithm for recommendations 4. Uplift: We want causation, not merely correlation