SlideShare a Scribd company logo
Using Spark’s Machine
Learning Library to Make
Product Recommendations
Sorin Pește
Technology Solutions Professional, Data & AI
Microsoft
source: xkcd.com
(demo)
A PA C H E S PA R K
A unified, distributed, open source engine for large-scale data processing
Spark Structured
Streaming
Stream processing
Spark MLlib
Machine
Learning
Spark Core Engine
Spark SQL
Interactive
Queries
Yarn Mesos
Standalone
Scheduler
Spark MLlib
Machine
Learning
Spark
Streaming
Stream processing
GraphX
Graph
Computation
S PA R K : A B R I E F H I S T O R Y
S PA R K D ATA F R A M E S
A distributed collection of data that’s conceptually equivalent to a table
S P A R K M A C H I N E L E A R N I N G ( M L L I B )
 Offers a set of parallelized machine learning algorithms for ML
 Supports Model Selection (hyperparameter tuning) using Cross
Validation and Train-Validation Split.
 Supports Java, Scala or Python apps using DataFrame-based API
Enables Parallel, Distributed ML for large datasets on Spark Clusters
S P A R K M L L I B A L G O R I T H M S
Spark MLlib
Algorithms
S P A R K M L L I B P I P E L I N E S
C O L L A B O R A T I V E F I L T E R I N G
C O L L A B O R A T I V E F I L T E R I N G
User Latent Factors
Item Latent Factors
A L T E R N A T I N G L E A S T S Q U A R E S ( A L S )
ALS
https://github.com/neaorin/databricks-demos/
A L S : E X P L I C I T V S I M P L I C I T F E E D B A C K
 Explicit feedback — user rates items
 Implicit feedback — system records user activity
 Browses a product page
 Watches a movie trailer
 Plays a song
 Shares on social media
 etc
Implicit feedback is generally used in real-world implementations
A L S : H Y P E R P A R A M E T E R T U N I N G
 Hyperparameters which can be adjusted:
 rank = the number of latent factors in the model
 maxIter = the maximum number of iterations
 regParam = the regularization parameter
A L S : H Y P E R P A R A M E T E R T U N I N G
A L S : W H A T A B O U T R E A L - T I M E ?
 Near real-time computation of ALS algorithm may be unfeasible
 Streaming variant of ALS, using Stochastic Gradient Descent
https://github.com/brkyvz/streaming-matrix-factorization
• Oryx Framework (http://oryx.io ) also offers streaming ALS
B E Y O N D A L S
 ALS-learned latent factors can be useful as input for other algorithms
D E E P L E A R N I N G
 A set of machine learning techniques that use multiple layers of non-linear processing units to
learn useful data representations of input
D E E P L E A R N I N G W I T H S P A R K
 Integrations with existing DL libraries
• Microsoft CNTK (mmlspark)
• TensorFlow (TensorFlowOnSpark)
• DeepLearning4J
• Caffe (CaffeOnSpark)
• Keras (Elephas)
• mxnet
• Paddle
• and more…
 Implementations of DL on Spark
• BigDL
• DeepDist
• SparkCL
• SparkNet
• Deep Learning Pipelines (Databricks)
• and more…
Distributed Hyperparameter Tuning
D E E P L E A R N I N G F O R R E C O M M E N D E R S
• Neural Collaborative Filtering (He et al, 2017)
https://arxiv.org/abs/1708.05031
https://github.com/hexiangnan/neural_collaborative_filtering
Neural Collaborative Filtering
D E E P L E A R N I N G F O R R E C O M M E N D E R S
• Predict the next item the user will want to interact with
Recommendations as sequence prediction
[a] -> b
[a, b] -> c
[a, b, c] -> d
[0, 0, 0, a] -> b
[0, 0, a, b] -> c
[0, a, b, c] -> d
D E E P L E A R N I N G F O R R E C O M M E N D E R S
• Predict the next item the user will want to interact with
Recommendations as sequence prediction
D E E P L E A R N I N G F O R R E C O M M E N D E R S
 Session-based Recommendations with Recurrent Neural Networks
(Hidasi et al., 2015)
https://arxiv.org/abs/1511.06939
https://github.com/hidasib/GRU4Rec
Recommendations as sequence prediction
D E E P L E A R N I N G F O R R E C O M M E N D E R S

https://arxiv.org/pdf/1510.01784.pdf
Featurize product images
Spark for Recommender Systems
Ad

More Related Content

What's hot (20)

Graph Gurus Episode 6: Community Detection
Graph Gurus Episode 6: Community DetectionGraph Gurus Episode 6: Community Detection
Graph Gurus Episode 6: Community Detection
TigerGraph
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
Yusuke Yamamoto
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
Şeyda Hatipoğlu
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Xavier Amatriain
 
Big data analytics in banking sector
Big data analytics in banking sectorBig data analytics in banking sector
Big data analytics in banking sector
Anil Rana
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
Khushboo Gupta
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
T212
 
CF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At Spotify
Vidhya Murali
 
Zero shot-learning: paper presentation
Zero shot-learning: paper presentationZero shot-learning: paper presentation
Zero shot-learning: paper presentation
Jérémie Kalfon
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
Vikrant Arya
 
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
WON JOON YOO
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
QuantUniversity
 
Recommendation systems
Recommendation systemsRecommendation systems
Recommendation systems
SaurabhWani6
 
Collaborative Filtering - MF, NCF, NGCF
Collaborative Filtering - MF, NCF, NGCFCollaborative Filtering - MF, NCF, NGCF
Collaborative Filtering - MF, NCF, NGCF
Park JunPyo
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
Trieu Nguyen
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
Nurendra Choudhary
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
GiacomoBalloccu
 
Artificial Intelligence (AI) Interview Questions and Answers | Edureka
Artificial Intelligence (AI) Interview Questions and Answers | EdurekaArtificial Intelligence (AI) Interview Questions and Answers | Edureka
Artificial Intelligence (AI) Interview Questions and Answers | Edureka
Edureka!
 
Graph Gurus Episode 6: Community Detection
Graph Gurus Episode 6: Community DetectionGraph Gurus Episode 6: Community Detection
Graph Gurus Episode 6: Community Detection
TigerGraph
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
Yusuke Yamamoto
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
Şeyda Hatipoğlu
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Xavier Amatriain
 
Big data analytics in banking sector
Big data analytics in banking sectorBig data analytics in banking sector
Big data analytics in banking sector
Anil Rana
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
Khushboo Gupta
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
T212
 
CF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At Spotify
Vidhya Murali
 
Zero shot-learning: paper presentation
Zero shot-learning: paper presentationZero shot-learning: paper presentation
Zero shot-learning: paper presentation
Jérémie Kalfon
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
Vikrant Arya
 
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
WON JOON YOO
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
QuantUniversity
 
Recommendation systems
Recommendation systemsRecommendation systems
Recommendation systems
SaurabhWani6
 
Collaborative Filtering - MF, NCF, NGCF
Collaborative Filtering - MF, NCF, NGCFCollaborative Filtering - MF, NCF, NGCF
Collaborative Filtering - MF, NCF, NGCF
Park JunPyo
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
Trieu Nguyen
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
Nurendra Choudhary
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
GiacomoBalloccu
 
Artificial Intelligence (AI) Interview Questions and Answers | Edureka
Artificial Intelligence (AI) Interview Questions and Answers | EdurekaArtificial Intelligence (AI) Interview Questions and Answers | Edureka
Artificial Intelligence (AI) Interview Questions and Answers | Edureka
Edureka!
 

Similar to Spark for Recommender Systems (20)

20181003 Whirlwind tour into Pyspark
20181003 Whirlwind tour into Pyspark20181003 Whirlwind tour into Pyspark
20181003 Whirlwind tour into Pyspark
Andrey Vykhodtsev
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
898-Azure Databricks Technical Deck - sorinpe.pptx
898-Azure Databricks Technical Deck - sorinpe.pptx898-Azure Databricks Technical Deck - sorinpe.pptx
898-Azure Databricks Technical Deck - sorinpe.pptx
ssuserf55974
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Databricks
 
OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015
Joaquin Vanschoren
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
Mark Kerzner
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Vijay Srinivas Agneeswaran, Ph.D
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik
 
eScience Cluster Arch. Overview
eScience Cluster Arch. OvervieweScience Cluster Arch. Overview
eScience Cluster Arch. Overview
Francesco Bongiovanni
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
Databricks
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Databricks
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Big Data Spain
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
DataWorks Summit/Hadoop Summit
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
Paco Nathan
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
ArangoDB Database
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
Spark Summit
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
Databricks
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
20181003 Whirlwind tour into Pyspark
20181003 Whirlwind tour into Pyspark20181003 Whirlwind tour into Pyspark
20181003 Whirlwind tour into Pyspark
Andrey Vykhodtsev
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
898-Azure Databricks Technical Deck - sorinpe.pptx
898-Azure Databricks Technical Deck - sorinpe.pptx898-Azure Databricks Technical Deck - sorinpe.pptx
898-Azure Databricks Technical Deck - sorinpe.pptx
ssuserf55974
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Databricks
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
Mark Kerzner
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
Databricks
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Databricks
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Big Data Spain
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
Paco Nathan
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
ArangoDB Database
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
Databricks
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
Ad

More from Sorin Peste (8)

Microsoft Automated ML Service
Microsoft Automated ML ServiceMicrosoft Automated ML Service
Microsoft Automated ML Service
Sorin Peste
 
Using Deep Learning (Computer Vision) to Search for Oil and Gas
Using Deep Learning (Computer Vision) to Search for Oil and GasUsing Deep Learning (Computer Vision) to Search for Oil and Gas
Using Deep Learning (Computer Vision) to Search for Oil and Gas
Sorin Peste
 
Introduction to Reinforcement Learning
Introduction to Reinforcement LearningIntroduction to Reinforcement Learning
Introduction to Reinforcement Learning
Sorin Peste
 
SQL Server 2017 Machine Learning Services
SQL Server 2017 Machine Learning ServicesSQL Server 2017 Machine Learning Services
SQL Server 2017 Machine Learning Services
Sorin Peste
 
Build an Intelligent Bot (Node.js)
Build an Intelligent Bot (Node.js)Build an Intelligent Bot (Node.js)
Build an Intelligent Bot (Node.js)
Sorin Peste
 
Automate your UI testing for Android and iOS apps with the Xamarin Test Cloud
Automate your UI testing for Android and iOS apps with the Xamarin Test CloudAutomate your UI testing for Android and iOS apps with the Xamarin Test Cloud
Automate your UI testing for Android and iOS apps with the Xamarin Test Cloud
Sorin Peste
 
Build an Intelligent Bot
Build an Intelligent BotBuild an Intelligent Bot
Build an Intelligent Bot
Sorin Peste
 
SQL Server on Linux - march 2017
SQL Server on Linux - march 2017SQL Server on Linux - march 2017
SQL Server on Linux - march 2017
Sorin Peste
 
Microsoft Automated ML Service
Microsoft Automated ML ServiceMicrosoft Automated ML Service
Microsoft Automated ML Service
Sorin Peste
 
Using Deep Learning (Computer Vision) to Search for Oil and Gas
Using Deep Learning (Computer Vision) to Search for Oil and GasUsing Deep Learning (Computer Vision) to Search for Oil and Gas
Using Deep Learning (Computer Vision) to Search for Oil and Gas
Sorin Peste
 
Introduction to Reinforcement Learning
Introduction to Reinforcement LearningIntroduction to Reinforcement Learning
Introduction to Reinforcement Learning
Sorin Peste
 
SQL Server 2017 Machine Learning Services
SQL Server 2017 Machine Learning ServicesSQL Server 2017 Machine Learning Services
SQL Server 2017 Machine Learning Services
Sorin Peste
 
Build an Intelligent Bot (Node.js)
Build an Intelligent Bot (Node.js)Build an Intelligent Bot (Node.js)
Build an Intelligent Bot (Node.js)
Sorin Peste
 
Automate your UI testing for Android and iOS apps with the Xamarin Test Cloud
Automate your UI testing for Android and iOS apps with the Xamarin Test CloudAutomate your UI testing for Android and iOS apps with the Xamarin Test Cloud
Automate your UI testing for Android and iOS apps with the Xamarin Test Cloud
Sorin Peste
 
Build an Intelligent Bot
Build an Intelligent BotBuild an Intelligent Bot
Build an Intelligent Bot
Sorin Peste
 
SQL Server on Linux - march 2017
SQL Server on Linux - march 2017SQL Server on Linux - march 2017
SQL Server on Linux - march 2017
Sorin Peste
 
Ad

Recently uploaded (20)

Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 

Spark for Recommender Systems

  • 1. Using Spark’s Machine Learning Library to Make Product Recommendations Sorin Pește Technology Solutions Professional, Data & AI Microsoft source: xkcd.com
  • 3. A PA C H E S PA R K A unified, distributed, open source engine for large-scale data processing Spark Structured Streaming Stream processing Spark MLlib Machine Learning Spark Core Engine Spark SQL Interactive Queries Yarn Mesos Standalone Scheduler Spark MLlib Machine Learning Spark Streaming Stream processing GraphX Graph Computation
  • 4. S PA R K : A B R I E F H I S T O R Y
  • 5. S PA R K D ATA F R A M E S A distributed collection of data that’s conceptually equivalent to a table
  • 6. S P A R K M A C H I N E L E A R N I N G ( M L L I B )  Offers a set of parallelized machine learning algorithms for ML  Supports Model Selection (hyperparameter tuning) using Cross Validation and Train-Validation Split.  Supports Java, Scala or Python apps using DataFrame-based API Enables Parallel, Distributed ML for large datasets on Spark Clusters
  • 7. S P A R K M L L I B A L G O R I T H M S Spark MLlib Algorithms
  • 8. S P A R K M L L I B P I P E L I N E S
  • 9. C O L L A B O R A T I V E F I L T E R I N G
  • 10. C O L L A B O R A T I V E F I L T E R I N G User Latent Factors Item Latent Factors
  • 11. A L T E R N A T I N G L E A S T S Q U A R E S ( A L S )
  • 13. A L S : E X P L I C I T V S I M P L I C I T F E E D B A C K  Explicit feedback — user rates items  Implicit feedback — system records user activity  Browses a product page  Watches a movie trailer  Plays a song  Shares on social media  etc Implicit feedback is generally used in real-world implementations
  • 14. A L S : H Y P E R P A R A M E T E R T U N I N G  Hyperparameters which can be adjusted:  rank = the number of latent factors in the model  maxIter = the maximum number of iterations  regParam = the regularization parameter
  • 15. A L S : H Y P E R P A R A M E T E R T U N I N G
  • 16. A L S : W H A T A B O U T R E A L - T I M E ?  Near real-time computation of ALS algorithm may be unfeasible  Streaming variant of ALS, using Stochastic Gradient Descent https://github.com/brkyvz/streaming-matrix-factorization • Oryx Framework (http://oryx.io ) also offers streaming ALS
  • 17. B E Y O N D A L S  ALS-learned latent factors can be useful as input for other algorithms
  • 18. D E E P L E A R N I N G  A set of machine learning techniques that use multiple layers of non-linear processing units to learn useful data representations of input
  • 19. D E E P L E A R N I N G W I T H S P A R K  Integrations with existing DL libraries • Microsoft CNTK (mmlspark) • TensorFlow (TensorFlowOnSpark) • DeepLearning4J • Caffe (CaffeOnSpark) • Keras (Elephas) • mxnet • Paddle • and more…  Implementations of DL on Spark • BigDL • DeepDist • SparkCL • SparkNet • Deep Learning Pipelines (Databricks) • and more… Distributed Hyperparameter Tuning
  • 20. D E E P L E A R N I N G F O R R E C O M M E N D E R S • Neural Collaborative Filtering (He et al, 2017) https://arxiv.org/abs/1708.05031 https://github.com/hexiangnan/neural_collaborative_filtering Neural Collaborative Filtering
  • 21. D E E P L E A R N I N G F O R R E C O M M E N D E R S • Predict the next item the user will want to interact with Recommendations as sequence prediction [a] -> b [a, b] -> c [a, b, c] -> d [0, 0, 0, a] -> b [0, 0, a, b] -> c [0, a, b, c] -> d
  • 22. D E E P L E A R N I N G F O R R E C O M M E N D E R S • Predict the next item the user will want to interact with Recommendations as sequence prediction
  • 23. D E E P L E A R N I N G F O R R E C O M M E N D E R S  Session-based Recommendations with Recurrent Neural Networks (Hidasi et al., 2015) https://arxiv.org/abs/1511.06939 https://github.com/hidasib/GRU4Rec Recommendations as sequence prediction
  • 24. D E E P L E A R N I N G F O R R E C O M M E N D E R S  https://arxiv.org/pdf/1510.01784.pdf Featurize product images