SlideShare a Scribd company logo
RecSys 2015 Tutorial
Scalable Recommender Systems
Where Machine Learning
Meets Search!
Presenters
Diana Hu
Senior Data Scientist
@sdianahu
diana.hu@verizon.com
Joaquin Delgado, PhD.
Director of Engineering
@joaquind
joaquin.a.delgado@verizon.com
Disclaimer
The content of this presentation are of the
authors’ personal statements and does not
officially represent their employer’s view in
anyway. Included content is especially not
intended to convey the views of OnCue or Verizon.
Index
1. Introduction
1. What to expect?
2. Scaling recommender systems is hard
2. Recommender System Problem as a Search Problem
1. Representing queries as recommendations
3. Introduction to Search and Information Retrieval
1. Scalability in search
2. Introduction to Elasticsearch
4. Overview of Machine Learning Techniques for Recommender Systems
1. Learning to rank
2. Scalability in machine learning
3. ML software frameworks
5. Re-writing the ranking function
1. Writing a new ranking/scoring function in Elasticsearch
2. Training a spark model as a Elasticsearch plugin for custom ranking/scoring function
6. References
1. Introduction
What to expect from this tutorial?
• The focus is on practical examples of how
to implement scalable recommender
systems using search and learning-to-rank
(machine learning) techniques
• What it is not
• Deep dive into any specific areas (Search,
RecSys, Learning to rank, or Machine learning)
• Algorithmic survey
• Comparative Analysis
Finding commonalities
Ranking
RecSys
Discovery
Information
Retrieval
Search
Advertising
What is a recommendation?
Beyond rating prediction
Paradigms of recommender systems
• Reduce information load by estimating
relevance
• Ranking Approaches:
• Collaborative filtering: “Tell me what is popular
amongst my peers”
• Content Based: “Show me more of what I liked”
• Knowledge Based: “Tell me what fits my needs”
• Hybrid
Model Type Pros Cons
Collaborative • No metadata engineering
effort
• Serendipity of results
• Learns market segments
• Requires rating feedback
• Cold start for new users and
new items
Content-based • No community required
• Comparison between
items possible
• Content descriptions
necessary
• Cold start for new users
• No serendipity
Knowledge-
based
• Deterministic
• Assured quality
• No cold-start
• Interactive user sessions
• Knowledge engineering
effort to bootstrap
• Static
• Does not react to short-term
trends
Scaling recommender systems is hard!
• Millions of users
• Millions of items
• Cold start for ever increasing size of
catalog and new users added
• Imbalanced Datasets – power law
distribution is quite common
• Many algorithms have not been fully tested
at “Internet Scale”
2. Recommender System Problem as a
Search Problem
Content-based methods inspired by IR
• Rec Task: Given a user profile find the best matching
items by their attributes
• Similarity calculation: based on keyword overlap
between user/items
• Neighborhood method (i.e. nearest neighbor)
• Query-based retrieval (i.e Rocchio’s method)
• Probabilistic methods (classical text classification)
• Explicit decision models
• Feature representation: based on content analysis
• Vector space model
• TF-IDF
• Topic Modeling
Search queries as content-based
recommendations
• Exact matching (Boolean)
• Relevant or not relevant (no ranking)
• Ranking by similarity to query (Vector
Space Model)
• Text similarity: Bag of words, TF-IDF, Incidence
Matrix
• Ranking by importance (e.g. PageRank)
Content-based similarity measures
• Simple match
• Dice’s Coefficient
• Jaccard’s Coefficient
• Cosine Coefficient
• Overlap Coefficient
3D Term Vector Space
Knowledge-based methods inspired by IR
• Rec Task: Given explicit recommendation rules find the
best matches between user’s requirements and item’s
characteristics (i.e., which item should be recommended in
which context?)
• Similarity calculation: based on constraint satisfaction
problem and distance similarity requirements<->attributes
• Conjunctive queries
• Similarity metrics for item retrieval
• Feature representation: based on query representation
• User defined preferences
• Utility-based preferences
• Conjoint analysis
Search queries as knowledge-based
recommendations
• Constraint satisfaction problem (CSP) is a tuple
(V,D,C)
• V – set of variables
• D – set of finite domains for V
• C – set of constraints of possible V permutations
• Recommendation as CSP:
(V,D,C) => (Vi U Vu, D, Cr U Ci U Cf U REQ)
• Vu – user properties (possible user’s requirements)
• Vi – item properties
• Cr – compatibility constraints (possible Vc permutations)
• Ci – Item constraints (conjunction fully defines an item)
• Cf – filter conditions (define Vu<->Vi relationships)
• REQ – user’s requirements
3. Introduction to Search and Information
Retrieval
Search
Search is about finding specific things that are either
known or assumed to exist, Discovery is about is
about helping the user encounter what he/she didn’t
even know exists
Both Search and Discovery can be achieved through
a query based data/information system.
Predicate Logic and Declarative Languages Rock!
Examples of query based systems
• Focused on Search
• Search engines
• Database systems
• Focus on Discovery
• Recommender systems
• Advertising systems
IR: The science behind search!
Information Retrieval (IR) is a query based on
data retrieval + relevance ranking (scoring)
usually applied to unstructured data (i.e. text
documents and fields); often referred to as full-
text or keyword search.
Have you heard of Bag-of-Words?
Vector Space Representation?
What about TF-IDF?
IR Architecture
Matched Hits
Representation
Function
Similarity
Calculation
Matched Hits
Documents
Representation
Function
Input Query
Matched Hits
Matched Hits
Retrieved Documents
Online
Processing
Offline
Processing
(*)RelevanceFeedback
Query Representation Doc Representation
Index
(*) Optional
Retrieval Models
Model Type Query
Representation
Document
Representation
Retrieval
Boolean • Boolean
expressions
• Connected by
AND, OR, NOT
• Set of keywords
• Bag of words
• Binary term weight
• Exact match
• Binary relevance
• No ranking
Vector
Space
Model
• Vector
• Desired terms
with optional
weights
• Vectors
• Bag of words with
weight based on
TF-IDF scheme
• Similarity score
• Output documents
are ranked
• Relevance
feedback support
Probabilisti
c
• Similarity with
priors
• Document
relevance
• Ranks documents
in decreasing
probability of
relevance
Ranking in the Vector Space Model
Search Engines: the big hammer!
• Search engines are largely used to solve
non-IR search problems, and here is why:
• Widely available
• Fast and scalable distributed systems
• Integrates well with existing data stores (SQL and NoSQL)
But are we using the right tool?
• Search Engines were originally designed
for IR.
• More complex non-IR search/discovery
tasks sometimes require a multi-phase,
multi-system approach
Filter + Scoring: Two Phase Approach
Filter Rank
Elasticsearch
• What is Elasticsearch?
• Elasticsearch is an open-source search engine
• Elasticsearch is written in Java
• Built on top of Apache Lucene™
• A distributed real-time document store where every field is
indexed and searchable out-of-the box
• A distributed search engine with real-time analytics
• Has a plugin architecture that facilitates extending the
core system
• Written with NRT and cloud support in mind
• Easy index, shard and replicas creation on live cluster
• Has Optimistic Concurrency Control
Examples of scaling challenges
• More than 50 millions of documents a day
• Real time search
• Less than 200ms average query latency
• Throughput of at least 1000 QPS
• Multilingual indexing
• Multilingual querying
Who uses ES?
• Wikipedia
• Uses Elasticsearch to provide full-text search with highlighted
search snippets, and search-as-you-type and did-you-mean
suggestions.
• The Guardian
• Uses Elasticsearch to combine visitor logs with social -network
data to provide real-time feedback to its editors about the
public’s response to new articles.
• Stack Overflow
• Combines full-text search with geo-location queries and uses
more-like-this to find related questions and answers.
• GitHub
• Uses Elasticsearch to query 130 billion lines of code.
How ES scales?
• Sharding and Replicas
• Several indices (at least one index for each day of
data)
• Indices divided into multiple shards
• Multiple replicas of a single shard
• Real-time, synchronous replication
• Near-real-time index refresh (1 to 30 seconds)
Indexing the data
Querying ES
Node 1 Node 2 Node 3 Node 4
Node 5 Node 6 Node 7 Node 8
ES Index
Application
Using Search Engines for RS
• Its not just about rating prediction and ranking
• Business filtering logic
• Age restrictions
• Catalog navigation context (e.g. e-commerce)
• Promotional materials
• Low latency and scale
• SLAs on response times including query, responses
and presentation
• Actual time for computing recommendations is just a
small fraction of total allocated time
Stacking things up
Visualization / UI
Retrieval
Ranking
Query Generation and
Contextual Pre-filtering
Model Building
Index Building
Data/Events Collections
Data Analytics
Contextual Post Filtering
OnlineOffline
Experimentation
Ranking in Elasticsearch
4. Overview of Machine Learning
Techniques for Recommender Systems
Machine Learning
Machine Learning in particular supervised learning
refer to techniques used to learn how to classify or
score previously unseen objects based on a training
dataset
Inference and Generalization are the Key!
Recommendations as data mining
Amatriain, Xavier, et al.
"Data mining methods for
recommender systems."
Recommender Systems
Handbook. Springer US,
2011. 39-71.
Learning to rank
• Formulate the problem as standard
supervised learning
• Training data can be cardinal or binary
• Various approaches:
• Pointwise: Typically approximated by regression
• Pairwise: Approximated via binary classifier
• Listwise: Directly optimize whole list (difficult!)
• A trick with ES is to include raw scores
returned by ES into the feature vector
Learning to rank with ES
Elastic Search
ES
Query
ES
Index
Input:
Contextual features
Potential
Matches
Trained
Ranking
Model
ML
Framework
+
Gold
Dataset
Output:
Ranked Results
Web scale ML challenges
• Massive amount of examples
• Billions of features
• Big models don’t fit in a single machine’s memory
• Variety of algorithms that need to be scaled up
A Note of Caution….
“Invariably, simple models and
a lot of data trump more elaborate
models based on less data.”
Alon Halevy, Peter Norvig, and
Fernando Pereira, Google
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/35179.pdf
Scalability in Machine Learning
• Distributed systems – Fault tolerance,
Throughput vs. latency
• Parallelization Strategies – Hashing, trees
• Processing – Map reduce variants, MPI,
graph parallel
• Databases – Key/Value Stores, NoSQL
What is Spark?
Fast, expressive cluster computing system
45
BlinkDB
approx queries
Spark SQL
structured data
MLlib
machine
learning
Spark
Streaming
real-time
GraphX
graph
Analytics
Spark Core
What is Spark?
• Work on distributed collections like local ones
• RDD:
• Immutable
• Parallel transforms
• Resilient and configurable persistence
• Operations
• Transforms: Lazy operations (map, filter, join,…)
• Actions: Return/write results (collect, save, count,…)
ML Software Framework: Spark MLlib
• Subproject with ML primitives
• Building blocks (as a framework vs. library)
• Large scale statistics
• Classification
• Regression
• Clustering
• Matrix factorization
• Optimization
• Frequent pattern mining
• Dimensionality reduction
What is ML-Scoring?
• Creates an Elastic Search (ES) document index of
instances
• Trains a supervised learning ML model from a dataset
of instances + labels
• Generate an Elasticsearch plugin that uses the trained
ML model to score documents at query time
• A
•
An Open Source POC!
Remember the elephant?
Visualization / UI
Retrieval
Ranking
Query Generation and
Contextual Pre-filtering
Model Building
Index Building
Data/Events Collections
Data Analytics
Contextual Post Filtering
OnlineOffline
Experimentation
Simplifying the Stack!
Visualization / UI
Query Generation and
Contextual Pre-filtering
Model Building
Index Building
Data/Events Collections
Data Analytics
Retrieval
Contextual Post Filtering
Ranking
OnlineOffline
Experimentation
Elastic Search
ML-Scoring Architecture
Instances
+ Labels
Trainer +
Indexer
Instances
Index
ML
Scoring
Plugin
Serialized
ML Model
5. Re-writing the ranking function
Using ML-Scoring
• Creating an ES Index
• Boolean queries
• More-Like-This queries
• Built-in scoring functions
• Scoring script
• Scoring plugin
• ML-Score evaluator using Spark
• ML-Score query
Creating an Index in ES
POST /my_movie_catalog/movies/_bulk
{ "index": { "_id": 1 }}
{ ”genre" : “Documentary”, ”productID" : "XHDK-A-1293-#fJ3" , “title” :
“Olympic Sports”, “content” : “Olympic greateness…“, price” : 20}
{ "index": { "_id": 2 }}
{ ”genre" : “Sports”, ”productID" : "KDKE-B-9947-#kL5", “title” : “NY
Yankees: Winning the World Series”, , “content” : “There is no better
team than the NY…“ “price” :20}
{ "index": { "_id": 3 }}
{ ”genre" : “Action”, “productID" : "JODL-X-1937-#pV7",”title” :
“Rambo III”, , “content” : “Sylvester Stallone is evermore…“ “price” :
18}
{ "index": { "_id": 4 }}
{ ”genre" : “Children”, ”productID" : "QQPX-R-3956-#aD8", “title” :
“Fairy Tale”, , “content” : “Once upon a time…“, “price” : 30}
Boolean queries
• SQL representation
SELECT movie
FROM movies
WHERE (price = 20 OR productID = "XHDK-A-1293-#fJ3")
AND (price != 30)
• ES DSL
GET /my_movie_catalog/movies/_search
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"should" : [
{ "term" : {"price" : 20}},
{ "term" : {"productID" : "XHDK-A-1293-#fJ3"}}
],
"must_not" : {
"term" : {"price" : 30}
…
Content based similarity queries (MLT)
{
"more_like_this" : {
"fields" : ["title", "description"],
"like_text" : "Once upon a time",
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
• The More Like This Query (MLT Query) finds documents
that are "like" a given set of documents. In order to do so,
MLT selects a set of representative terms of these input
documents, forms a query using these terms, executes the
query and returns the results.
Similar to a given document{
"more_like_this" : {
"fields" : ["title",
"description"],
"docs" : [
{
"_index" : "imdb",
"_type" : "movies",
"_id" : "1"
},
{
"_index" : "imdb",
"_type" : "movies",
"_id" : "2"
}],
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
Built-in functions
• Suppose we want to boost movies by
popularity (base-line of many RS)
Popularity-based boosting
GET /my_movie_catalog/movies/post/_search
{
"query": {
"function_score": {
"query": {
"multi_match": { "query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p"
}
}
}
}
Geo-Location
• Suppose we want to build a location-aware
recommender system
Decay functions
• Supported decay
functions
• Linear
• Gauss
• Exp
• Also supported
• random_score
GET /_search
{ "query": {
"function_score": {
"functions": [
{ "gauss": {
"location": { "origin": { "lat":
"offset": "2km", "scale": "3
}
}
},
{ "gauss": {
"price": {
"origin": "50",
"offset": "50",
"scale": "20"
}
},
"weight": 2
…
ES scoring script
• Trickier pricing and margin based scoring
if (price < threshold) {
profit = price * margin
} else {
profit = price * (1 - discount) * margin
}
return profit / target
ES Scoring Script
GET /_search
{
"function_score": {
"functions": [
{ ...location clause... },
{ ...price clause... },
{
"script_score": {
"params": { "threshold": 80, "discount": 0.1, "target":
10 },
"script": "price = doc['price'].value; margin =
doc['margin'].value; if (price < threshold) { return price *
margin / target };return price * (1 - discount) * margin /
target; "}
…
Limitations of ranking using ES
practical scoring function
• Stateless computation
• Meant primarily for text search
• Hard to represent context and history
• Limited complexity (simple math functions only)
• Nevertheless, original score should not be
discarded as it may become handy!
Scoring plugin in ES
public class PredictorPlugin extends AbstractPlugin {
@Override
public String name() {
return getClass().getName();
}
@Override
public String description() {
return "Simple plugin to predict values.";
}
public void onModule(ScriptModule module) {
module.registerScript(
PredictorScoreScript.SCRIPT_NAME,
PredictorScoreScript.Factory.class);
}
}
ML-Scoring evaluator using Spark
class SparkPredictorEngine[M](val readPath: String, val spHelp:
SparkModelHelpers[M]) extends PredictorEngine {
private var _model: ModelData[M] = ModelData[M]()
override def getPrediction(values: Collection[IndexValue]) = {
if (_model.clf.nonEmpty) {
val v = ReadUtil.cIndVal2Vector( values, _model.mapper)
_model.clf.get.predict(v)
} else {
throw new PredictionException("Empty model");
}
}
def readModel() = _model = spHelp.readSparkModel(readPath)
def getModel: ModelData[M] = _model
ML-Scoring query
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "search-predictor",
"lang": "native",
"params": {}
}
}
],
"boost_mode": "replace"
}
}
}
https://github.com/sdhu/elasti
csearch-prediction
Potential issues
• Performance
• It may be a problem if the search space is very
large and/or the computation to intensive
• Operations
• Code running on a key infrastructure
• Versioning and binary compatibility
Summary
• Importance of the whole picture – RS seen from the lenses
of the whole elephant
• RS research is a new field in comparison to IR
• Scalability is hard! Why not learn from all of RS’s cousins:
• Search
• Distributed systems
• Databases
• Machine learning
• Content analysis
• …
• Bridging the gap between research and engineering is an
ongoing effort
References
• Baeza-Yates, R., & Ribeiro-Neto, B. 2011. Modern information retrieval. New York:
ACM press.
• Chirita, P. A., Firan, C. S., & Nejdl, W. 2007. Personalized query expansion for the web.
In Proceedings of the 30th annual international ACM SIGIR conference on Research
and development in information retrieval (pp. 7-14). ACM.
• Croft, W. B., Metzler, D., & Strohman, T. 2010. Search engines: Information retrieval in
practice. Reading: Addison-Wesley.
• Dunning, T. 1993. Accurate methods for the statistics of surprise and
coincidence. Computational linguistics, 19(1), 61-74.
• Elastic, Elasticsearch: RESTful, Distributed Search & Analytics. 2015.
https://www.elastic.co/products/elasticsearch.
• Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. 2009.
The WEKA data mining software: an update. ACM SIGKDD explorations
newsletter, 11(1), 10-18.
• Ihaka, R., & Gentleman, R. 1996. R: a language for data analysis and graphics. Journal
of computational and graphical statistics, 5(3), 299-314.
References
• Kantor, P. B., Rokach, L., Ricci, F., & Shapira, B. 2011. Recommender systems handbook.
Springer.
• Manning, C. D., Raghavan, P., & Schütze, H. 2008. Introduction to information retrieval.
Cambridge: Cambridge university press.
• Qiu, F., & Cho, J. 2006. Automatic identification of user interest for personalized search.
In Proceedings of the 15th international conference on World Wide Web (pp. 727-736).
ACM.
• Sun, J. T., Zeng, H. J., Liu, H., Lu, Y., & Chen, Z. 2005. Cubesvd: a novel approach to
personalized web search. In Proceedings of the 14th international conference on World
Wide Web (pp. 382-390). ACM.
• Xing, B., & Lin, Z. 2006. The impact of search engine optimization on online advertising
market. In Proceedings of the 8th international conference on Electronic commerce: The
new e-commerce: innovations for conquering current barriers, obstacles and limitations to
conducting successful business on the internet (pp. 519-529). ACM.
• Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. 2010. Spark: cluster
computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in
cloud computing (Vol. 10, p. 10)
Additional Credits
• Doug Kang
• Data Scientist, Verizon OnCue
• Federico Ponte
• System Engineer from Mahisoft
• Yessika Labrador
• Data Engineer from Mahisoft
Ad

More Related Content

What's hot (20)

Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 Sydney
Alexandros Karatzoglou
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
Xavier Amatriain
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
Mindaugas Zickus
 
Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
Daniel Valcarce
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
Karen Li
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
Liang Xiang
 
kdd2015
kdd2015kdd2015
kdd2015
Deepak Agarwal
 
Recommender Systems - A Review and Recent Research Trends
Recommender Systems  -  A Review and Recent Research TrendsRecommender Systems  -  A Review and Recent Research Trends
Recommender Systems - A Review and Recent Research Trends
Sujoy Bag
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
Salil Navgire
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Xavier Amatriain
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
D Yogendra Rao
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
Tamer Rezk
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
Changsung Moon
 
Online Learning to Rank
Online Learning to RankOnline Learning to Rank
Online Learning to Rank
ewhuang3
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
Kishor Datta Gupta
 
Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Tutorial on Coreference Resolution
Tutorial on Coreference Resolution
Anirudh Jayakumar
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
youalab
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
Alan Said
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
PyData
 
Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 Sydney
Alexandros Karatzoglou
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
Xavier Amatriain
 
Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
Daniel Valcarce
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
Karen Li
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
Liang Xiang
 
Recommender Systems - A Review and Recent Research Trends
Recommender Systems  -  A Review and Recent Research TrendsRecommender Systems  -  A Review and Recent Research Trends
Recommender Systems - A Review and Recent Research Trends
Sujoy Bag
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
Salil Navgire
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Xavier Amatriain
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
D Yogendra Rao
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
Tamer Rezk
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
Changsung Moon
 
Online Learning to Rank
Online Learning to RankOnline Learning to Rank
Online Learning to Rank
ewhuang3
 
Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Tutorial on Coreference Resolution
Tutorial on Coreference Resolution
Anirudh Jayakumar
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
youalab
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
Alan Said
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
PyData
 

Viewers also liked (18)

Recommendation system
Recommendation system Recommendation system
Recommendation system
Vikrant Arya
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, Today
Data Science London
 
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
Pablo Castells
 
The continuous cold-start problem in e-commerce recommender systems
The continuous cold-start problem in e-commerce recommender systemsThe continuous cold-start problem in e-commerce recommender systems
The continuous cold-start problem in e-commerce recommender systems
Melanie JI Mueller
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Joaquin Delgado PhD.
 
Introduction and new trends in Recommender Systems
Introduction and new trends in Recommender SystemsIntroduction and new trends in Recommender Systems
Introduction and new trends in Recommender Systems
Paolo Tomeo
 
Smart Canvas @ Large Scale Recommender Systems Workshop 2015
Smart Canvas  @ Large Scale Recommender Systems Workshop 2015Smart Canvas  @ Large Scale Recommender Systems Workshop 2015
Smart Canvas @ Large Scale Recommender Systems Workshop 2015
Gilmar Souza
 
MKAD_black_V2
MKAD_black_V2MKAD_black_V2
MKAD_black_V2
Marina Krotofil
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Lucidworks
 
Recommender System at Scale Using HBase and Hadoop
Recommender System at Scale Using HBase and HadoopRecommender System at Scale Using HBase and Hadoop
Recommender System at Scale Using HBase and Hadoop
DataWorks Summit
 
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Deep Learning for Recommender Systems  - Budapest RecSys MeetupDeep Learning for Recommender Systems  - Budapest RecSys Meetup
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Alexandros Karatzoglou
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
Balázs Hidasi
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
T212
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Alexandros Karatzoglou
 
Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNet
AI Frontiers
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
Vikrant Arya
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, Today
Data Science London
 
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
Pablo Castells
 
The continuous cold-start problem in e-commerce recommender systems
The continuous cold-start problem in e-commerce recommender systemsThe continuous cold-start problem in e-commerce recommender systems
The continuous cold-start problem in e-commerce recommender systems
Melanie JI Mueller
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Joaquin Delgado PhD.
 
Introduction and new trends in Recommender Systems
Introduction and new trends in Recommender SystemsIntroduction and new trends in Recommender Systems
Introduction and new trends in Recommender Systems
Paolo Tomeo
 
Smart Canvas @ Large Scale Recommender Systems Workshop 2015
Smart Canvas  @ Large Scale Recommender Systems Workshop 2015Smart Canvas  @ Large Scale Recommender Systems Workshop 2015
Smart Canvas @ Large Scale Recommender Systems Workshop 2015
Gilmar Souza
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Lucidworks
 
Recommender System at Scale Using HBase and Hadoop
Recommender System at Scale Using HBase and HadoopRecommender System at Scale Using HBase and Hadoop
Recommender System at Scale Using HBase and Hadoop
DataWorks Summit
 
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Deep Learning for Recommender Systems  - Budapest RecSys MeetupDeep Learning for Recommender Systems  - Budapest RecSys Meetup
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Alexandros Karatzoglou
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
Balázs Hidasi
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
T212
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Alexandros Karatzoglou
 
Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNet
AI Frontiers
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 
Ad

Similar to RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning meets Search! (20)

Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
S. Diana Hu
 
Evaluation of web scale discovery services
Evaluation of web scale discovery servicesEvaluation of web scale discovery services
Evaluation of web scale discovery services
Nikesh Narayanan
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
Salford Systems
 
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.TLC2016 - A search engine for Blackboard Learn, the impossible made possible.
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.
BlackboardEMEA
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Simon Hughes
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
Lucidworks
 
Chapter 1 - Introduction to IR Information retrieval ch1 Information retrieva...
Chapter 1 - Introduction to IR Information retrieval ch1 Information retrieva...Chapter 1 - Introduction to IR Information retrieval ch1 Information retrieva...
Chapter 1 - Introduction to IR Information retrieval ch1 Information retrieva...
shumawakjira26
 
Understanding How Search Works November 7 2024.pptx
Understanding How Search Works November 7 2024.pptxUnderstanding How Search Works November 7 2024.pptx
Understanding How Search Works November 7 2024.pptx
Access Innovations, Inc.
 
Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfoms
Nikesh Narayanan
 
PgVector + : Enable Richer Interaction with vector database.pptx
PgVector + : Enable Richer Interaction with vector database.pptxPgVector + : Enable Richer Interaction with vector database.pptx
PgVector + : Enable Richer Interaction with vector database.pptx
aranjan11
 
Preliminary committee presentation
Preliminary committee presentationPreliminary committee presentation
Preliminary committee presentation
Richard Drake
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
Access Innovations, Inc.
 
Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval
Abhay Ratnaparkhi
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Lucidworks
 
Techniques of information retrieval
Techniques of information retrieval Techniques of information retrieval
Techniques of information retrieval
Tariq Hassan
 
Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Exploratory Search upon Semantically Described Web Data Sources: Service regi...Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Marco Brambilla
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Lucidworks
 
Current and emerging trends in library services
Current and emerging trends in library servicesCurrent and emerging trends in library services
Current and emerging trends in library services
Nikesh Narayanan
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Sujit Pal
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
S. Diana Hu
 
Evaluation of web scale discovery services
Evaluation of web scale discovery servicesEvaluation of web scale discovery services
Evaluation of web scale discovery services
Nikesh Narayanan
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
Salford Systems
 
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.TLC2016 - A search engine for Blackboard Learn, the impossible made possible.
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.
BlackboardEMEA
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Simon Hughes
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
Lucidworks
 
Chapter 1 - Introduction to IR Information retrieval ch1 Information retrieva...
Chapter 1 - Introduction to IR Information retrieval ch1 Information retrieva...Chapter 1 - Introduction to IR Information retrieval ch1 Information retrieva...
Chapter 1 - Introduction to IR Information retrieval ch1 Information retrieva...
shumawakjira26
 
Understanding How Search Works November 7 2024.pptx
Understanding How Search Works November 7 2024.pptxUnderstanding How Search Works November 7 2024.pptx
Understanding How Search Works November 7 2024.pptx
Access Innovations, Inc.
 
Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfoms
Nikesh Narayanan
 
PgVector + : Enable Richer Interaction with vector database.pptx
PgVector + : Enable Richer Interaction with vector database.pptxPgVector + : Enable Richer Interaction with vector database.pptx
PgVector + : Enable Richer Interaction with vector database.pptx
aranjan11
 
Preliminary committee presentation
Preliminary committee presentationPreliminary committee presentation
Preliminary committee presentation
Richard Drake
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
Access Innovations, Inc.
 
Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval
Abhay Ratnaparkhi
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Lucidworks
 
Techniques of information retrieval
Techniques of information retrieval Techniques of information retrieval
Techniques of information retrieval
Tariq Hassan
 
Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Exploratory Search upon Semantically Described Web Data Sources: Service regi...Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Marco Brambilla
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Lucidworks
 
Current and emerging trends in library services
Current and emerging trends in library servicesCurrent and emerging trends in library services
Current and emerging trends in library services
Nikesh Narayanan
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Sujit Pal
 
Ad

Recently uploaded (20)

Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 

RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning meets Search!

  • 1. RecSys 2015 Tutorial Scalable Recommender Systems Where Machine Learning Meets Search!
  • 2. Presenters Diana Hu Senior Data Scientist @sdianahu [email protected] Joaquin Delgado, PhD. Director of Engineering @joaquind [email protected]
  • 3. Disclaimer The content of this presentation are of the authors’ personal statements and does not officially represent their employer’s view in anyway. Included content is especially not intended to convey the views of OnCue or Verizon.
  • 4. Index 1. Introduction 1. What to expect? 2. Scaling recommender systems is hard 2. Recommender System Problem as a Search Problem 1. Representing queries as recommendations 3. Introduction to Search and Information Retrieval 1. Scalability in search 2. Introduction to Elasticsearch 4. Overview of Machine Learning Techniques for Recommender Systems 1. Learning to rank 2. Scalability in machine learning 3. ML software frameworks 5. Re-writing the ranking function 1. Writing a new ranking/scoring function in Elasticsearch 2. Training a spark model as a Elasticsearch plugin for custom ranking/scoring function 6. References
  • 6. What to expect from this tutorial? • The focus is on practical examples of how to implement scalable recommender systems using search and learning-to-rank (machine learning) techniques • What it is not • Deep dive into any specific areas (Search, RecSys, Learning to rank, or Machine learning) • Algorithmic survey • Comparative Analysis
  • 8. What is a recommendation? Beyond rating prediction
  • 9. Paradigms of recommender systems • Reduce information load by estimating relevance • Ranking Approaches: • Collaborative filtering: “Tell me what is popular amongst my peers” • Content Based: “Show me more of what I liked” • Knowledge Based: “Tell me what fits my needs” • Hybrid
  • 10. Model Type Pros Cons Collaborative • No metadata engineering effort • Serendipity of results • Learns market segments • Requires rating feedback • Cold start for new users and new items Content-based • No community required • Comparison between items possible • Content descriptions necessary • Cold start for new users • No serendipity Knowledge- based • Deterministic • Assured quality • No cold-start • Interactive user sessions • Knowledge engineering effort to bootstrap • Static • Does not react to short-term trends
  • 11. Scaling recommender systems is hard! • Millions of users • Millions of items • Cold start for ever increasing size of catalog and new users added • Imbalanced Datasets – power law distribution is quite common • Many algorithms have not been fully tested at “Internet Scale”
  • 12. 2. Recommender System Problem as a Search Problem
  • 13. Content-based methods inspired by IR • Rec Task: Given a user profile find the best matching items by their attributes • Similarity calculation: based on keyword overlap between user/items • Neighborhood method (i.e. nearest neighbor) • Query-based retrieval (i.e Rocchio’s method) • Probabilistic methods (classical text classification) • Explicit decision models • Feature representation: based on content analysis • Vector space model • TF-IDF • Topic Modeling
  • 14. Search queries as content-based recommendations • Exact matching (Boolean) • Relevant or not relevant (no ranking) • Ranking by similarity to query (Vector Space Model) • Text similarity: Bag of words, TF-IDF, Incidence Matrix • Ranking by importance (e.g. PageRank)
  • 15. Content-based similarity measures • Simple match • Dice’s Coefficient • Jaccard’s Coefficient • Cosine Coefficient • Overlap Coefficient 3D Term Vector Space
  • 16. Knowledge-based methods inspired by IR • Rec Task: Given explicit recommendation rules find the best matches between user’s requirements and item’s characteristics (i.e., which item should be recommended in which context?) • Similarity calculation: based on constraint satisfaction problem and distance similarity requirements<->attributes • Conjunctive queries • Similarity metrics for item retrieval • Feature representation: based on query representation • User defined preferences • Utility-based preferences • Conjoint analysis
  • 17. Search queries as knowledge-based recommendations • Constraint satisfaction problem (CSP) is a tuple (V,D,C) • V – set of variables • D – set of finite domains for V • C – set of constraints of possible V permutations • Recommendation as CSP: (V,D,C) => (Vi U Vu, D, Cr U Ci U Cf U REQ) • Vu – user properties (possible user’s requirements) • Vi – item properties • Cr – compatibility constraints (possible Vc permutations) • Ci – Item constraints (conjunction fully defines an item) • Cf – filter conditions (define Vu<->Vi relationships) • REQ – user’s requirements
  • 18. 3. Introduction to Search and Information Retrieval
  • 19. Search Search is about finding specific things that are either known or assumed to exist, Discovery is about is about helping the user encounter what he/she didn’t even know exists Both Search and Discovery can be achieved through a query based data/information system. Predicate Logic and Declarative Languages Rock!
  • 20. Examples of query based systems • Focused on Search • Search engines • Database systems • Focus on Discovery • Recommender systems • Advertising systems
  • 21. IR: The science behind search! Information Retrieval (IR) is a query based on data retrieval + relevance ranking (scoring) usually applied to unstructured data (i.e. text documents and fields); often referred to as full- text or keyword search. Have you heard of Bag-of-Words? Vector Space Representation? What about TF-IDF?
  • 22. IR Architecture Matched Hits Representation Function Similarity Calculation Matched Hits Documents Representation Function Input Query Matched Hits Matched Hits Retrieved Documents Online Processing Offline Processing (*)RelevanceFeedback Query Representation Doc Representation Index (*) Optional
  • 23. Retrieval Models Model Type Query Representation Document Representation Retrieval Boolean • Boolean expressions • Connected by AND, OR, NOT • Set of keywords • Bag of words • Binary term weight • Exact match • Binary relevance • No ranking Vector Space Model • Vector • Desired terms with optional weights • Vectors • Bag of words with weight based on TF-IDF scheme • Similarity score • Output documents are ranked • Relevance feedback support Probabilisti c • Similarity with priors • Document relevance • Ranks documents in decreasing probability of relevance
  • 24. Ranking in the Vector Space Model
  • 25. Search Engines: the big hammer! • Search engines are largely used to solve non-IR search problems, and here is why: • Widely available • Fast and scalable distributed systems • Integrates well with existing data stores (SQL and NoSQL)
  • 26. But are we using the right tool? • Search Engines were originally designed for IR. • More complex non-IR search/discovery tasks sometimes require a multi-phase, multi-system approach
  • 27. Filter + Scoring: Two Phase Approach Filter Rank
  • 28. Elasticsearch • What is Elasticsearch? • Elasticsearch is an open-source search engine • Elasticsearch is written in Java • Built on top of Apache Lucene™ • A distributed real-time document store where every field is indexed and searchable out-of-the box • A distributed search engine with real-time analytics • Has a plugin architecture that facilitates extending the core system • Written with NRT and cloud support in mind • Easy index, shard and replicas creation on live cluster • Has Optimistic Concurrency Control
  • 29. Examples of scaling challenges • More than 50 millions of documents a day • Real time search • Less than 200ms average query latency • Throughput of at least 1000 QPS • Multilingual indexing • Multilingual querying
  • 30. Who uses ES? • Wikipedia • Uses Elasticsearch to provide full-text search with highlighted search snippets, and search-as-you-type and did-you-mean suggestions. • The Guardian • Uses Elasticsearch to combine visitor logs with social -network data to provide real-time feedback to its editors about the public’s response to new articles. • Stack Overflow • Combines full-text search with geo-location queries and uses more-like-this to find related questions and answers. • GitHub • Uses Elasticsearch to query 130 billion lines of code.
  • 31. How ES scales? • Sharding and Replicas • Several indices (at least one index for each day of data) • Indices divided into multiple shards • Multiple replicas of a single shard • Real-time, synchronous replication • Near-real-time index refresh (1 to 30 seconds)
  • 33. Querying ES Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 ES Index Application
  • 34. Using Search Engines for RS • Its not just about rating prediction and ranking • Business filtering logic • Age restrictions • Catalog navigation context (e.g. e-commerce) • Promotional materials • Low latency and scale • SLAs on response times including query, responses and presentation • Actual time for computing recommendations is just a small fraction of total allocated time
  • 35. Stacking things up Visualization / UI Retrieval Ranking Query Generation and Contextual Pre-filtering Model Building Index Building Data/Events Collections Data Analytics Contextual Post Filtering OnlineOffline Experimentation
  • 37. 4. Overview of Machine Learning Techniques for Recommender Systems
  • 38. Machine Learning Machine Learning in particular supervised learning refer to techniques used to learn how to classify or score previously unseen objects based on a training dataset Inference and Generalization are the Key!
  • 39. Recommendations as data mining Amatriain, Xavier, et al. "Data mining methods for recommender systems." Recommender Systems Handbook. Springer US, 2011. 39-71.
  • 40. Learning to rank • Formulate the problem as standard supervised learning • Training data can be cardinal or binary • Various approaches: • Pointwise: Typically approximated by regression • Pairwise: Approximated via binary classifier • Listwise: Directly optimize whole list (difficult!) • A trick with ES is to include raw scores returned by ES into the feature vector
  • 41. Learning to rank with ES Elastic Search ES Query ES Index Input: Contextual features Potential Matches Trained Ranking Model ML Framework + Gold Dataset Output: Ranked Results
  • 42. Web scale ML challenges • Massive amount of examples • Billions of features • Big models don’t fit in a single machine’s memory • Variety of algorithms that need to be scaled up A Note of Caution….
  • 43. “Invariably, simple models and a lot of data trump more elaborate models based on less data.” Alon Halevy, Peter Norvig, and Fernando Pereira, Google http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/35179.pdf
  • 44. Scalability in Machine Learning • Distributed systems – Fault tolerance, Throughput vs. latency • Parallelization Strategies – Hashing, trees • Processing – Map reduce variants, MPI, graph parallel • Databases – Key/Value Stores, NoSQL
  • 45. What is Spark? Fast, expressive cluster computing system 45 BlinkDB approx queries Spark SQL structured data MLlib machine learning Spark Streaming real-time GraphX graph Analytics Spark Core
  • 46. What is Spark? • Work on distributed collections like local ones • RDD: • Immutable • Parallel transforms • Resilient and configurable persistence • Operations • Transforms: Lazy operations (map, filter, join,…) • Actions: Return/write results (collect, save, count,…)
  • 47. ML Software Framework: Spark MLlib • Subproject with ML primitives • Building blocks (as a framework vs. library) • Large scale statistics • Classification • Regression • Clustering • Matrix factorization • Optimization • Frequent pattern mining • Dimensionality reduction
  • 48. What is ML-Scoring? • Creates an Elastic Search (ES) document index of instances • Trains a supervised learning ML model from a dataset of instances + labels • Generate an Elasticsearch plugin that uses the trained ML model to score documents at query time • A • An Open Source POC!
  • 49. Remember the elephant? Visualization / UI Retrieval Ranking Query Generation and Contextual Pre-filtering Model Building Index Building Data/Events Collections Data Analytics Contextual Post Filtering OnlineOffline Experimentation
  • 50. Simplifying the Stack! Visualization / UI Query Generation and Contextual Pre-filtering Model Building Index Building Data/Events Collections Data Analytics Retrieval Contextual Post Filtering Ranking OnlineOffline Experimentation
  • 51. Elastic Search ML-Scoring Architecture Instances + Labels Trainer + Indexer Instances Index ML Scoring Plugin Serialized ML Model
  • 52. 5. Re-writing the ranking function
  • 53. Using ML-Scoring • Creating an ES Index • Boolean queries • More-Like-This queries • Built-in scoring functions • Scoring script • Scoring plugin • ML-Score evaluator using Spark • ML-Score query
  • 54. Creating an Index in ES POST /my_movie_catalog/movies/_bulk { "index": { "_id": 1 }} { ”genre" : “Documentary”, ”productID" : "XHDK-A-1293-#fJ3" , “title” : “Olympic Sports”, “content” : “Olympic greateness…“, price” : 20} { "index": { "_id": 2 }} { ”genre" : “Sports”, ”productID" : "KDKE-B-9947-#kL5", “title” : “NY Yankees: Winning the World Series”, , “content” : “There is no better team than the NY…“ “price” :20} { "index": { "_id": 3 }} { ”genre" : “Action”, “productID" : "JODL-X-1937-#pV7",”title” : “Rambo III”, , “content” : “Sylvester Stallone is evermore…“ “price” : 18} { "index": { "_id": 4 }} { ”genre" : “Children”, ”productID" : "QQPX-R-3956-#aD8", “title” : “Fairy Tale”, , “content” : “Once upon a time…“, “price” : 30}
  • 55. Boolean queries • SQL representation SELECT movie FROM movies WHERE (price = 20 OR productID = "XHDK-A-1293-#fJ3") AND (price != 30) • ES DSL GET /my_movie_catalog/movies/_search { "query" : { "filtered" : { "filter" : { "bool" : { "should" : [ { "term" : {"price" : 20}}, { "term" : {"productID" : "XHDK-A-1293-#fJ3"}} ], "must_not" : { "term" : {"price" : 30} …
  • 56. Content based similarity queries (MLT) { "more_like_this" : { "fields" : ["title", "description"], "like_text" : "Once upon a time", "min_term_freq" : 1, "max_query_terms" : 12 } } • The More Like This Query (MLT Query) finds documents that are "like" a given set of documents. In order to do so, MLT selects a set of representative terms of these input documents, forms a query using these terms, executes the query and returns the results.
  • 57. Similar to a given document{ "more_like_this" : { "fields" : ["title", "description"], "docs" : [ { "_index" : "imdb", "_type" : "movies", "_id" : "1" }, { "_index" : "imdb", "_type" : "movies", "_id" : "2" }], "min_term_freq" : 1, "max_query_terms" : 12 } }
  • 58. Built-in functions • Suppose we want to boost movies by popularity (base-line of many RS)
  • 59. Popularity-based boosting GET /my_movie_catalog/movies/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p" } } } }
  • 60. Geo-Location • Suppose we want to build a location-aware recommender system
  • 61. Decay functions • Supported decay functions • Linear • Gauss • Exp • Also supported • random_score GET /_search { "query": { "function_score": { "functions": [ { "gauss": { "location": { "origin": { "lat": "offset": "2km", "scale": "3 } } }, { "gauss": { "price": { "origin": "50", "offset": "50", "scale": "20" } }, "weight": 2 …
  • 62. ES scoring script • Trickier pricing and margin based scoring if (price < threshold) { profit = price * margin } else { profit = price * (1 - discount) * margin } return profit / target
  • 63. ES Scoring Script GET /_search { "function_score": { "functions": [ { ...location clause... }, { ...price clause... }, { "script_score": { "params": { "threshold": 80, "discount": 0.1, "target": 10 }, "script": "price = doc['price'].value; margin = doc['margin'].value; if (price < threshold) { return price * margin / target };return price * (1 - discount) * margin / target; "} …
  • 64. Limitations of ranking using ES practical scoring function • Stateless computation • Meant primarily for text search • Hard to represent context and history • Limited complexity (simple math functions only) • Nevertheless, original score should not be discarded as it may become handy!
  • 65. Scoring plugin in ES public class PredictorPlugin extends AbstractPlugin { @Override public String name() { return getClass().getName(); } @Override public String description() { return "Simple plugin to predict values."; } public void onModule(ScriptModule module) { module.registerScript( PredictorScoreScript.SCRIPT_NAME, PredictorScoreScript.Factory.class); } }
  • 66. ML-Scoring evaluator using Spark class SparkPredictorEngine[M](val readPath: String, val spHelp: SparkModelHelpers[M]) extends PredictorEngine { private var _model: ModelData[M] = ModelData[M]() override def getPrediction(values: Collection[IndexValue]) = { if (_model.clf.nonEmpty) { val v = ReadUtil.cIndVal2Vector( values, _model.mapper) _model.clf.get.predict(v) } else { throw new PredictionException("Empty model"); } } def readModel() = _model = spHelp.readSparkModel(readPath) def getModel: ModelData[M] = _model
  • 67. ML-Scoring query { "query": { "function_score": { "query": { "match_all": {} }, "functions": [ { "script_score": { "script": "search-predictor", "lang": "native", "params": {} } } ], "boost_mode": "replace" } } }
  • 69. Potential issues • Performance • It may be a problem if the search space is very large and/or the computation to intensive • Operations • Code running on a key infrastructure • Versioning and binary compatibility
  • 70. Summary • Importance of the whole picture – RS seen from the lenses of the whole elephant • RS research is a new field in comparison to IR • Scalability is hard! Why not learn from all of RS’s cousins: • Search • Distributed systems • Databases • Machine learning • Content analysis • … • Bridging the gap between research and engineering is an ongoing effort
  • 71. References • Baeza-Yates, R., & Ribeiro-Neto, B. 2011. Modern information retrieval. New York: ACM press. • Chirita, P. A., Firan, C. S., & Nejdl, W. 2007. Personalized query expansion for the web. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 7-14). ACM. • Croft, W. B., Metzler, D., & Strohman, T. 2010. Search engines: Information retrieval in practice. Reading: Addison-Wesley. • Dunning, T. 1993. Accurate methods for the statistics of surprise and coincidence. Computational linguistics, 19(1), 61-74. • Elastic, Elasticsearch: RESTful, Distributed Search & Analytics. 2015. https://www.elastic.co/products/elasticsearch. • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18. • Ihaka, R., & Gentleman, R. 1996. R: a language for data analysis and graphics. Journal of computational and graphical statistics, 5(3), 299-314.
  • 72. References • Kantor, P. B., Rokach, L., Ricci, F., & Shapira, B. 2011. Recommender systems handbook. Springer. • Manning, C. D., Raghavan, P., & Schütze, H. 2008. Introduction to information retrieval. Cambridge: Cambridge university press. • Qiu, F., & Cho, J. 2006. Automatic identification of user interest for personalized search. In Proceedings of the 15th international conference on World Wide Web (pp. 727-736). ACM. • Sun, J. T., Zeng, H. J., Liu, H., Lu, Y., & Chen, Z. 2005. Cubesvd: a novel approach to personalized web search. In Proceedings of the 14th international conference on World Wide Web (pp. 382-390). ACM. • Xing, B., & Lin, Z. 2006. The impact of search engine optimization on online advertising market. In Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet (pp. 519-529). ACM. • Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. 2010. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing (Vol. 10, p. 10)
  • 73. Additional Credits • Doug Kang • Data Scientist, Verizon OnCue • Federico Ponte • System Engineer from Mahisoft • Yessika Labrador • Data Engineer from Mahisoft

Editor's Notes

  • #7: Real-life recommender systems: How it works in the industry, outside of academics settings
  • #9: Rating prediction Ranking Similarity Filtering UI/Presentation cannot escape our own perspectives Which is why, if we want to do good work—and particularly if we want to innovate—we need to have ‘other.’ why the only way to see our biases is through other people. -trunk - snake -tusk - sword -stomach - whale -tail - reed -leg - tree the elephant in the room is truth
  • #10: KN: items such as apartments and cars are not purchased very often, therefore rating-based systems often do not perform well/ For example, only financial services must be recommended that support the investment period
  • #13: Diana
  • #14: Similarity calculation aka similarity-based retrieval Rocchio’s relevance feedback in 1960
  • #17: Challenge to deal with unsatiesdiable REQ and empty result sets User def- explicit Utility scoring rules predifined Conjoint- analysis of of past interaction Geo sim
  • #21: Database Systems Data retrieval with no scoring AI Systems Games: e.g. Chess – possible next moves with associated move quality score E-Commerce Systems Sorting can be based of relevance score, proximity (geo-location), price, etc. Recommender Systems Taste scoring based on content and/or user similarity Advertising Systems Two-way search (matching) and optimization problem with scores modeled as bids
  • #24: Pro Boolean: Popular retrieval model because: Easy to understand for simple queries. Clean formalism. Boolean models can be extended to include ranking. Reasonably efficient implementations possible for normal queries Cons Booelan: Very rigid: AND means all; OR means any. Difficult to express complex user requests. Difficult to control the number of documents retrieved. All matched documents will be returned. Difficult to rank output. All matched documents logically satisfy the query. Difficult to perform relevance feedback. If a document is identified by the user as relevant or irrelevant, how should the query be modified? Pro VSP Similarity based on occurrence frequencies of keywords in query and document. Automatic relevance feedback can be supported: Relevant documents “added” to query. Irrelevant documents “subtracted” from query. Cons VSP: How to determine important words in a document? Word sense? Word n-grams (and phrases, idioms,…)  terms (old school) How to determine the degree of importance of a term within a document and within the entire collection? How to determine the degree of similarity between a document and the query? In the case of the web, what is the collection and what are the effects of links, formatting information, etc.?
  • #25: The inner product is unbounded. Favors long documents with a large number of unique terms. Measures how many terms matched but not how many terms are not matched. Explain what an index is
  • #26: Search engines are widely used to solve non-IR search problems and here is why: Available: popular open source search engines (e.g. Apache SOLR and Elastic Search) based on a mature search library (Apache Lucene) Fast and Scalable : distributed (inverted) indexing and retrieval via scattering/gathering techniques Integrates well with existing data stores (SQL and No-SQL)
  • #27: They implement text-based scoring, including fuzzy match and some variation of TF-IDF or Okapi BM25, etc. Filter using querying a search engine Rank results based on a pre-generated ML predictive model
  • #32: Search engines are not ACID databases. By nature they are not transactional become eventually consistent.
  • #36: There is more to Recsys than algorithms and ranking - Retrieval - User Interface & Feedback - Data - AB Testing - Systems & Architectures
  • #40: Machine Learning can allow learning a user model or profile of a particular user based on: Sample interaction Rated examples This model or profile can then be used to: Recommend items Filter information Predict behavior
  • #41: Popularity is the obvious baseline Ratings prediction is a clear secondary data input that allows for personalization 1. Pointwise - Ranking function minimizes loss function defined on individual relevance judgment - Ranking score based on regression or classification - Ordinal regression, Logistic regression, SVM, GBDT, … 2. Pairwise -Loss function is defined on pair-wise preferences -Goal: minimize number of inversions in ranking -Ranking problem is then transformed into the binary classification problem -RankSVM, RankBoost, RankNet, Frank… 3.Listwise - Indirect Loss Function − RankCosine: similarity between ranking list and ground truth as loss function − ListNet: KL-divergence as loss function by defining a probability distribution − Problem: optimization of listwise loss function may not optimize IR metrics - Directly optimizing IR measures (difficult since they are not differentiable)
  • #47: You write a single program  similar to DryadLINQ Distributed data sets with parallel operations on them are pretty standard; the new thing is that they can be reused across ops Variables in the driver program can be used in parallel ops; accumulators useful for sending information back, cached vars are an optimization Mention cached vars useful for some workloads that won’t be shown here Mention it’s all designed to be easy to distribute in a fault-tolerant fashion
  • #48: Basic statistics summary statistics correlations stratified sampling hypothesis testing random data generation Classification and regression linear models (SVMs, logistic regression, linear regression) naive Bayes decision trees ensembles of trees (Random Forests and Gradient-Boosted Trees) isotonic regression Collaborative filtering alternating least squares (ALS) Clustering k-means Gaussian mixture power iteration clustering (PIC) latent Dirichlet allocation (LDA) streaming k-means Dimensionality reduction singular value decomposition (SVD) principal component analysis (PCA) Feature extraction and transformation Frequent pattern mining FP-growth association rules PrefixSpan Evaluation metrics PMML model export Optimization (developer) stochastic gradient descent limited-memory BFGS (L-BFGS)
  • #50: There is more to Recsys than algorithms and ranking - Retrieval - User Interface & Feedback - Data - AB Testing - Systems & Architectures
  • #51: There is more to Recsys than algorithms and ranking - Retrieval - User Interface & Feedback - Data - AB Testing - Systems & Architectures
  • #70: Performance It may be a problem if the search space is very large and/or the computation to intensive Operations Code running on a key infrastructure People are more hesitant to touch an infrastructure/DB component such as Elasticsearch. Similar concerns exist surrounding DB stored procedures. No way to sandbox a native plugin Requires strong automated regression and performance testing How handle versioning and binary compatibility Potential deployment issues Upgrades to Elastic search, the plugin code and/or the models may present challenges