0% found this document useful (0 votes)
36 views

Notes On Recommender Systems

Uploaded by

Gayathri Iyappan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Notes On Recommender Systems

Uploaded by

Gayathri Iyappan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Notes on Recommender Systems:

A Survey of State-of-the-Art Algorithms, Beyond Rating


Prediction Accuracy Approaches, and Business Value Perspectives∗†

Panagiotis Adamopoulos
Ph.D. Candidate

Department of Information, Operations & Management Sciences


Leonard N. Stern School of Business, New York University
[email protected]

May 2013

∗ Preliminary and Incomplete Draft, Comments Welcome, Please Do Not Cite without
Author’s Permission.
† Presented at Stern School of Business, New York University.
Contents
1 Introduction 4

2 Rating Prediction 5
2.1 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Memory-based Models . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Model-based Algorithms . . . . . . . . . . . . . . . . . . . 10
2.2 Advances in Collaborative Filtering . . . . . . . . . . . . . . . . . 10
2.2.1 Matrix Factorization Models . . . . . . . . . . . . . . . . 10

3 Beyond Rating Prediction Accuracy 20


3.1 Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Serendipity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 Proposed Approaches for Diversity . . . . . . . . . . . . . 24
3.4 Novelty and Diversity Metrics . . . . . . . . . . . . . . . . . . . . 30
3.5 Unexpectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.1 Differences from Related Concepts . . . . . . . . . . . . . 34
3.5.2 Proposed Approaches for Unexpectedness . . . . . . . . . 35
3.6 Recommendation Opportunities . . . . . . . . . . . . . . . . . . . 37
3.7 Recommendation Sets . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Business Value Aspects 41


4.1 The Impact of Recommender Systems on Sales Diversity . . . . . 44
4.2 The Impact of Recommender Systems on Customer Store Loyalty 49
4.3 The Impact of Recommendation Networks on Product Demand . 53
4.4 The Impact of Ranking on Consumer Behavior and Search Engine
Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5 Conclusions 57

2
Key Papers Presented

• State-of-the-Art Algorithms

– An Empirical Analysis of Design Choices in Neighborhood-Based


Collaborative Filtering Algorithms [Herlocker et al., 2002]
– Factorization Meets the Neighborhood: a Multifaceted Collaborative
Filtering Model [Koren, 2008]
– Matrix Factorization Techniques for Recommender Systems [Koren
et al., 2009]
– Collaborative Filtering with Temporal Dynamics [Koren, 2009]

• Beyond Rating Prediction Accuracy Approaches

– Avoiding Monotony: Improving the Diversity of Recommendation


Lists [Zhang and Hurley, 2008]
– Improving Aggregate Recommendation Diversity Using Ranking-Based
Techniques [Adomavicius and Kwon, 2012]
– Rank and Relevance in Novelty and Diversity Metrics for Recom-
mender Systems [Vargas and Castells, 2011, Castells et al., 2011]
– On Unexpectedness in Recommender Systems: Or How to Better
Expect the Unexpected [Adamopoulos and Tuzhilin, 2013a]

• Business Value Perspectives

– Blockbuster Culture’s Next Rise or Fall: The Impact of Recom-


mender Systems on Sales Diversity [Fleder and Hosanagar, 2009]
– Recommendation Networks and the Long Tail of Electronic Com-
merce [Oestreicher-Singer and Sundararajan, 2012a]
– Examining the Impact of Ranking on Consumer Behavior and Search
Engine Revenue [Ghose et al., 2013]

3
1 Introduction
This paper presents an overview of the field of recommender systems. In
particular, it discusses the current generation of recommendation methods fo-
cusing on collaborative filtering algorithms. Then, we move beyond the classical
perspective of rating prediction accuracy in recommender systems and present a
survey of approaches that enhance unexpectedness and the related but different
concepts of novelty, serendipity, and diversity. Besides, we provide interesting
directions for future research. This paper also discusses recent business value
perspectives on recommender systems focusing on the phenomenon of the long
tail, the impact of recommendations on sales, diversity, customer retention, and
generated revenue.

4
2 Rating Prediction
Two of the main perspectives in recommender systems are the retrieval and
the rating prediction perspectives [Ricci and Shapira, 2011]. According to these
perspectives, under the assumption that the users know in advance what they
want, recommender systems try to reduce search costs by accurately predicting
how much a user would like an item and providing “correct” recommendation
proposals by recommending the items with the highest predicted ratings.
Discussing these classical perspectives in recommender systems, we present
in this section a brief survey of the state-of-the-art collaborative filtering ap-
proaches in recommender systems and thoroughly discuss the most characteris-
tic approaches.

2.1 Collaborative Filtering


Collaborative filtering (CF) methods produce user specific recommenda-
tions of items based on patterns of ratings or usage (e.g. purchases) without
need for exogenous information about either items or users [Ricci and Shapira,
2011]. This approach analyzes relationships between users and interdependen-
cies among products to identify new user-item associations [Koren et al., 2009].
The term collaborative filtering was coined by the developers of Tapestry [Gold-
berg et al., 1992], the first recommender system.
In order to establish recommendations, CF systems need to relate two funda-
mentally different entities: items and users. There are two primary approaches
to facilitate such a comparison, which constitute the two main techniques of
CF: the neighborhood approach and latent factor models. Neighborhood meth-
ods focus on relationships between items or, alternatively, between users. An
item-item approach models the preference of a user to an item based on ratings
of similar items by the same user. Latent factor models, such as matrix fac-
torization (aka, SVD), comprise an alternative approach by transforming both
items and users to the same latent factor space. The latent space tries to ex-
plain ratings by characterizing both products and users on factors automatically
inferred from user feedback [Ricci and Shapira, 2011]. These discovered latent
factors might measure obvious dimensions, less well-developed dimensions, or
completely uninterpretable dimensions.
A major appeal of collaborative filtering is that it is domain free, yet it
can address data aspects that are often elusive and difficult to profile using

5
content filtering [Koren et al., 2009]. The availability of test databases for
CF in different domains favored the further development of various and more
complex CF techniques. Still, this somehow also narrowed the range of domains
on which CF techniques are actually applied. The most popular datasets are
about movies and books, and many researchers aim to improve the accuracy
of their algorithms only on these datasets. Whether a certain CF technique
performs particularly well in one domain or another is unfortunately beyond
the scope of many research efforts [Jannach et al., 2010].
[Adomavicius and Tuzhilin, 2005] provide a survey with an impressive list
of references to recent techniques for collaborative filtering.

2.1.1 Memory-based Models

The basic methods [Breese et al., 1998], [Delgado and Ishii, 1999], [Nakamura
and Abe, 1998], [Resnick et al., 1994] in this family are well known, and to a
large extent are based on heuristics [Ricci and Shapira, 2011].
The traditional user-based technique that we discuss in the next paragraph is
said to be memory-based because the original rating database is held in memory
and used directly for generating the recommendations.

2.1.1.1 Nearest Neighbors The user-based neighborhood recommenda-


tion methods predict the rating ru,i of user u for item i using the ratings given
to i by users most similar to u, called nearest neighbors and denoted by Ni (u).
Taking into account the fact that the neighbors can have different levels of sim-
ilarity, wu,v , and considering the k users v with the highest similarity to u (i.e.
the standard user-based k-NN collaborative filtering approach), the predicted
rating is: X
wu,v rv,i
v∈Ni (u)
r̂u,i = X (1)
|wu,v |
v∈Ni (u)

However, the ratings given to item i by the nearest neighbors of user u can
be combined into a single estimation using various combining (or aggregating)
functions [Adomavicius and Tuzhilin, 2005].12
1 Combining functions calculate the score for an instance from a set of the instance’s nearest

neighbors. Examples of combining function include majority voting, distance-moderated voting,


weighted average, adjusted weighted average, etc.
2 [Töscher et al., 2008] discuss sophisticated transformation of the similarity weights.

6
ALGORITHM 1: k-NN Recommendation Algorithm
Input: User-Item Rating matrix R
Output: Recommendation lists of size l
k: Number of users in the neighborhood of user u, Ni (u)
l: Number of items recommended to user u
for each user u do
Find the k users most similar to user u, Ni (u);
for each item i do
Combine ratings given to item i by neighbors Ni (u);
end
Recommend to user u the top-l items having the highest predicted rating r̂u,i ;
end

Algorithm 1 summarizes the user-based k-nearest neighbors (k-NN) collab-


orative filtering approach with a general combining function. Similarly, item-
based approaches [Deshpande and Karypis, 2004], [Linden et al., 2003] look at
ratings given to similar items.
There are some very important considerations in the implementation of a
neighborhood-based recommender system. Such consideration include the nor-
malization of ratings, the computation of the similarity weights, and the neigh-
borhood selection.

2.1.1.1.1 Rating Normalization Two of the most popular rating nor-


malization schemes that have been proposed to convert individual ratings to a
more universal scale are mean-centering and Z-score [Desrosiers and Karypis,
2011].
The idea of mean-centering [Breese et al., 1998], [Resnick et al., 1994] is to
determine whether a rating is positive or negative by comparing it to the mean
rating. Using this approach the user-based prediction of a rating rui is obtained
as: X
wu,v (rv,i − r̄v )
v∈Ni (u)
r̂u,i = r̄u + X
|wu,v |
v∈Ni (u)

While mean-centering removes the offsets caused by the different perceptions


of an average rating, Z-score normalization [Herlocker et al., 1999] also considers
the spread in the individual rating scales. A user-based prediction of rating rui

7
using this normalization approach would therefore be obtained as:
X
wu,v (rv,i − r̄v )/σv
v∈Ni (u)
r̂u,i = r̄u + σu X
|wu,v |
v∈Ni (u)

Even though Z-score has the additional benefit of considering the variance
in the ratings of individual users or items, it can be more sensitive than mean-
centering and, more often, predicts ratings that are outside the rating scale
[Desrosiers and Karypis, 2011].

2.1.1.1.2 Similarity Weight Computation As discussed in [Desrosiers


and Karypis, 2011], similarity weights allow the selection of trusted neighbors
whose ratings are used in the prediction, and they provide the means to give
more or less importance to these neighbors in the prediction.
If these few ratings are equal, then the users will be considered as “fully
similar” and will likely play an important role in each other’s recommendations.
However, if the users’ preferences are in fact different, this may lead to poor
recommendations. Several strategies have been proposed to take into account
the significance of a similarity weight. For instance, as in [Herlocker et al.,
1999], similarity can be penalized by a factor proportional to the number of
commonly rated items:
0 |Iuv |
wuv = · wuv
|Iuv | + β
where |Iuv | is the number of co-rated items and β > 0 is selected by cross-
validation. Such shrinkage can be motivated from a Bayesian perspective [Gel-
man et al., 2004].
Similar approaches taking into consideration the variance of the ratings have
also been proposed in the literature [Breese et al., 1998], [Jin et al., 2004].
Finally, [Koren and Bell, 2011] proposes to jointly derive (i.e. learn) the
interpolation weights directly from the ratings, not based on any similarity
measure. Hence, the interpolation weights formula explicitly accounts for re-
lationships among the neighbors, the sum of the weights is not constrained to
equal one, and the method automatically adjusts for variations among items in
their means or variances.

2.1.1.1.3 Neighborhood Selection The neighborhood selection is nor-


mally done in two steps: a global filtering step where only the most likely can-

8
didates are kept, and a per prediction step which chooses the best candidates
for this prediction.
As described in [Desrosiers and Karypis, 2011], the pre-filtering of neighbors
is an essential step that makes neighborhood-based approaches practicable by
reducing the amount of similarity weights to store, and limiting the number of
candidate neighbors to consider in the predictions. There are several ways in
which this can be accomplished:
Top-N filtering: For each user or item, only a list of the N nearest-neighbors
and their respective similarity weight is kept.

Threshold filtering: This approach keeps all the neighbors whose similarity
weight is greater than wmin . While this is more flexible than the previous
filtering technique, as only the most significant neighbors are kept, the
right value of wmin may be difficult to determine.
Finally, once a list of candidate neighbors has been computed for each user
or item, the prediction of new ratings is normally made with the k-nearest-
neighbors, that is, the k neighbors whose similarity weight has the greatest
magnitude and have rated the corresponding item / have been rated by the
corresponding user. The important question is which value to use for k; the
optimal value of k should be determined by cross-validation. [Adamopoulos and
Tuzhilin, 2013b] propose a probabilistic method for neighborhood selection.

Although quite simple to describe and implement, this recommendation ap-


proach has several important advantages, including its ability to explain a rec-
ommendation with the list of the neighbors used, its computational and space
efficiency which allows it to scale to large recommender systems, and its marked
stability in an online setting where new users and items are constantly added.
Another of its strengths is its potential to make serendipitous recommenda-
tions that can lead users to the discovery of unexpected, yet very interesting
items [Desrosiers and Karypis, 2011].
In typical commercial recommender systems, where the number of users
far exceeds the number of available items, item-based approaches are typi-
cally preferred since they provide more accurate recommendations, while be-
ing more computationally efficient and requiring less frequent updates. On the
other hand, user-based methods usually provide more original recommendations,
which may lead users to a more satisfying experience [Desrosiers and Karypis,
2011].

9
Finally, when the performance of a neighborhood-based approach suffers
from the problems of limited coverage and sparsity, one may explore techniques
based on dimensionality reduction or graphs [Desrosiers and Karypis, 2011].

2.1.2 Model-based Algorithms

In contrast to memory-based methods, model-based algorithms [Billsus and


Pazzani, 1998], [Breese et al., 1998], [Goldberg et al., 2001], [Hofmann, 2003],
[Pavlov and Pennock, 2002], [Ungar and Foster, 1998], [Ansari et al., 2000]
use the collection of ratings to learn a predictive model, typically using some
statistical or machine learning methods, which is then used to make rating pre-
dictions [Adomavicius and Tuzhilin, 2005]. Thus, the main difference between
collaborative model-based techniques and heuristic-based approaches is that the
model-based techniques calculate utility (rating) predictions based not on some
ad-hoc heuristic rules, but, rather, based on a model learned from the under-
lying data using statistical and machine learning techniques [Adomavicius and
Tuzhilin, 2005].
Model-based collaborative filtering methods include Bayesian models [Chien
and George, 1999], probabilistic relational models [Getoor and Sahami, 1999],
linear regressions [Sarwar et al., 2001], maximum entropy models [Pavlov and
Pennock, 2002], probabilistic latent semantic analysis [Si and Jin, 2003], and
Latent Dirichlet Allocation [Marlin, 2003]. Some of the most successful real-
izations of latent factor models are based on matrix factorization [Koren et al.,
2009].

2.2 Advances in Collaborative Filtering


2.2.1 Matrix Factorization Models

Latent factor models approach collaborative filtering with the holistic goal to
uncover latent features that explain observed ratings [Ricci and Shapira, 2011].
We will first describe the basic SVD model in order to illustrate the use of
matrix factorization models in recommender systems. In information retrieval
setting, this latent semantic analysis (LSA) technique is also referred to as latent
semantic indexing (LSI).
Nevertheless, much of the strength of model based collaborative filtering
methods, matrix factorization models, stems from their natural ability to han-
dle additional features of the data, including implicit feedback and temporal

10
information [Ricci and Shapira, 2011]. Thus, we will also discuss how to inte-
grate other sources of user feedback in order to increase prediction accuracy,
through the “SVD++ model.” Finally, we deal with the fact that customer
preferences for products may drift over time. Product perception and popu-
larity are constantly changing as new selection emerges. Similarly, customer
inclinations are evolving, leading them to ever redefine their taste. This leads
to a factor model that addresses temporal dynamics for better tracking user
behavior [Ricci and Shapira, 2011].

2.2.1.1 Singular Value Decomposition (SVD) Singular Value Decom-


position (SVD) is a well-known method for matrix factorization that provides
the best lower rank approximations of the original matrix [Sarwar et al., 2000].
In particular, SVD factors an m × n matrix R into three matrices as the
following:
R=U ·S·V0

where, U and V are two orthogonal matrices of size m×r and n×r respectively;
r is the rank of the matrix R. S is a diagonal matrix of size r × r having all
singular values of matrix R as its diagonal entries.3 All the entries of matrix
S are positive and stored in decreasing order of their magnitude. The matrices
obtained by performing SVD are particularly useful for our application because
of the property that SVD provides the best lower rank approximations of the
original matrix R, in terms of Frobenius norm. It is possible to reduce the r × r
matrix S to have only k largest diagonal values to obtain a matrix Sk , k < r. If
the matrices U and V are reduced accordingly, then the reconstructed matrix
Rk = Uk ·Sk ·Vk0 is the closest rank-k matrix to R. In other words, Rk minimizes
the Frobenius norm kR − Rk k over all rank-k matrices.4
A typical example of the SVD approach as well as a nice illustration of the
procedure in a realistic setting can be found in [Jannach et al., 2010].
3 The SVD theorem which states that a given matrix R can be decomposed into a product
of such three matrices can be found in [Golub and Kahan, 1965].
4 The Frobenius norm, or the Hilbert-Schmidt norm, can be defined in various ways:

v v
umin{m, n}
um X n
uX q u X

kAkF = t 2
|aij | = trace(A A) = t σi2
i=1 j=1 i=1

where A∗ denotes the conjugate transpose of A, σi are the singular values of A, and the trace
function is used. The Frobenius norm is very similar to the Euclidean norm on K n and comes
from the Frobenius inner product on the space of all matrices.

11
Usually, the procedure is more complicated than the one described in the
aforementioned example in order to achieve higher rating prediction accuracy.
In particular, the sparseness in the user-item ratings matrix often raises difficul-
ties and the conventional SVD is undefined when knowledge about the matrix
is incomplete. Also, the final rating prediction is computed by taking into con-
sideration also some baseline predictors (also known as biases or intercepts). A
baseline prediction (i.e. first-order approximation of the bias) for an unknown
rating rui is denoted by bui and accounts for the user and item effects:

bui = µ + bu + bi

where µ the overall average rating. The parameters bu and bi indicate the
observed deviations of user u and item i, respectively, from the average. As
discussed in [Ricci and Shapira, 2011], in order to estimate bu and bi one can
solve the least squares problem:
X X X
min (rui − µ − bu − bi )2 + λ( b2u + b2i ).
b∗
(u,i)∈K u i

2
P
Here, the first term (u,i)∈K (rui −µ−bu −bi ) strives to find bu ’s and bi ’s that
2
P P 2
fit the given ratings. The regularizing term λ( u bu + i bi ) avoids overfitting
by penalizing the magnitudes of the parameters. This least square problem can
be solved fairly efficiently by the method of stochastic gradient descent [Bottou,
2010]. A simpler but less accurate way to estimate the parameters is to decouple
the bi ’s and after estimating them to compute then the bu ’s.
Then, using such a baseline predictor, the rating is predicted by the rule:

r̂ui = µ + bi + bu + qiT pu ,

where qiT pu captures the interaction between user u and item i (interpreted
also as the overall interest of the user in characteristics of the item) using the
computed latent dimensions of the SVD. This allows each component to explain
only the part of a signal relevant to it.
In order to learn the model parameters (bu , bi , pu and qi ) we minimize the
regularized squared error:
X
min (rui − µ − bu − bi − qiT pu )2 + λ(b2u + b2i + ||qi ||2 + ||pi ||2 ).
b∗ ,q∗ ,p∗
(u,i)∈K

The constant λ, which controls the extent of regularization, is usually de-


termined by cross validation. Minimization is typically performed by either

12
stochastic gradient descent or alternating least squares. Alternating least squares
techniques rotate between fixing the pu ’s to solve for the qi ’s and fixing the qi ’s
to solve for the pu ’s. Notice that when one of these is taken as a constant,
the optimization problem is quadratic and can be optimally solved [Bell et al.,
2007], [Bell and Koren, 2007].
An easy stochastic gradient descent (SGD) optimization was first popular-
ized by Funk [Funk, 2006] and successfully practiced by many others [Koren,
2008], [Paterek, 2007]. The algorithm follows a standard machine learning ap-
proach. In particular, the algorithm loops through all ratings in the training
data and for each given rating rui , a prediction (r̂ui ) is made, and the associ-
def
ated prediction error eui = rui − r̂ui is computed. For a given training case rui ,
we modify the parameters by moving in the opposite direction of the gradient,
yielding:

bu ← bu + γ · (eui − λ · bu )
bi ← bi + γ · (eui − λ · bi )
qi ← qi + γ · (eui − λ · qi )
pu ← pu + γ · (eui − λ · pu )

This implementation combines easiness and with a relatively fast running


time. In principle, any standard machine learning optimization algorithm can be
used with the above procedure. Hence, usually, rating prediction accuracy can
be further improved using adaptive individual learning rates. Also, a distributed
implementation of the SVD-based approach can be achieved as in [Gemulla
et al., 2011]. A popular alternative approach that was briefly covered above is
alternating least squares (ALS). ALS is favorable in at least two cases. The first
is when the system can use parallelization. In ALS, the system computes each
qi independently of the other item factors and computes each pu independently
of the other user factors. This gives rise to potentially massive parallelization
of the algorithm [Zhou et al., 2008]. The second case is for systems centered
on implicit data. Because the training set cannot be considered sparse, looping
over each single training caseas gradient descent doeswould not be practical.
ALS can efficiently handle such cases [Hu et al., 2008].
SVD can be used in recommender systems to perform different tasks. For
instance, apart from capturing the latent relationships between customers and
products and generating recommendations based on that, this low-dimensional
representation of the original customer-product space can be used to compute

13
neighborhood-based models in the reduced space.
In [Sarwar et al., 2000], the SVD-based approach was consistently worse than
traditional collaborative filtering in a recommendation setting of an extremely
sparse e-commerce dataset. However, the SVD-based approach produced results
that were better than a traditional collaborative filtering algorithm some of the
time in the denser MovieLens data set. To a great extend the quality of the
recommendations seems to depend on the right choice of the amount of data
reduction (i.e. the number of singular values to keep in an SVD approach)
[Jannach et al., 2010]. This technique leads to very fast online performance,
requiring just a few simple arithmetic operations for each recommendation.

However, collaborative systems have their own limitations [Adomavicius and


Tuzhilin, 2005], [Balabanović and Shoham, 1997], [Lee, 2001]. Those problems
are related with new users, new items (i.e. the cold start problem), and sparsity.
Several techniques have been proposed to address this problem. Most of them
use the hybrid recommendation approach, which combines content-based and
collaborative techniques [Adomavicius and Tuzhilin, 2005].

2.2.1.2 SVD++ A way to relieve the sparsity problem and increase rating
prediction accuracy is to incorporate additional information about the users. A
valuable source of information for recommender systems is implicit feedback.
Here we focus on the SVD++ method [Koren, 2008].
Here, a new set of item factors are necessary, where item i is associated with
yi ∈ Rf . Those new item factors are used to characterize users based on the set
of items that they rated. The exact model is as follows:
− 21
X
r̂ui = µ + bu + bi + qiT (pu + |R(u)| yj )
j∈R(u)

where the set R(u) contains the items rated by user u.


Now, as illustrated in [Ricci and Shapira, 2011], a user u is modeled as pu +
−1 P
|R(u)| 2 j∈R(u) yj . We use a free user-factors vector, pu which is learned from
−1 P
the given explicit ratings. This vector is complemented by the sum |R(u)| 2 j∈R(u) yj ,
which represents the perspective of implicit feedback. Since the yj ’s are cen-
− 12
tered around zero (by the regularization), the sum is normalized by |R(u)| ,
in order to stabilize its variance across the range of observed values of |R(u)|.
As in the simple SVD case, for a given training case rui , we modify the

14
parameters by moving in the opposite direction of the gradient, yielding:

bu ← bu + γ · (eui − λ · bu )
bi ← bi + γ · (eui − λ · bi )
− 21
X
qi ← qi + γ · (eui · (pu + |R(u)| yj ) − λ · qi )
j∈R(u)

pu ← pu + γ · (eui · qi − λ · pu )
− 12
∀j ∈ R(u) : yj ← yj + γ · (eui · |R(u)| · qi − λ · yj )

Several types of implicit feedback can be simultaneously introduced into the


model by using extra sets of item factors. In the same way other sources of
information can be considered as well. For instance, we can use demographics
describing the gender of the user, age group, income, and so on. The ma-
trix factorization approach can easily integrate multiple signal sources. For in-
stance, [Manzato, 2013] recently proposed a matrix factorization approach that
supports implicit feedback on recommender systems with meta-data awareness.
An interesting note on the previous model is that depending on the regular-
ization parameters, we can encourage greater deviations from baseline estimates
for users that provided many ratings or plenty of implicit feedback. In general,
this is a good practice for recommender systems. We would like to take more
risk with well modeled users that provided much input. For such users we are
willing to predict quirkier and less common recommendations. On the other
hand, we are less certain about the modeling of users that provided only a lit-
tle input, in which case we would like to stay with safe estimates close to the
baseline values [Koren, 2008].

2.2.1.3 Temporal Dynamics - timeSVD++ So far, all the presented


models have been static. However, in reality, product perception and popularity
constantly change as new selections emerge. Similarly, customers’ inclinations
evolve, leading them to redefine their taste. Thus, the system should account
for the temporal effects reflecting the dynamic, time-drifting nature of user-item
interactions. The matrix factorization approach lends itself well to modeling
temporal effects, which can significantly improve its accuracy. Decomposing
ratings into distinct terms allows the system to treat different temporal aspects
separately. Specifically, we identify the following effects that vary over time:

• item biases, bi (t);

• user biases, bu (t); and

15
• user preferences, pu (t).

The first temporal effect addresses the fact that an item’s popularity might
change over time. The second temporal effect allows users to change their base-
line ratings over time. This might reflect several factors including a natural drift
in a user’s rating scale, the fact that users assign ratings relative to other recent
ratings, and the fact that the rater’s identity within a household can change
over time. The third temporal effect allows users to change their preferences
over time. This reflects that temporal dynamics affect user preferences and
therefore the interaction between users and items. Such a drift can occur also
because new products and services become available [Kolter and Maloof, 2003].
However, in many applications, including our focus application of recommender
systems, we also face a more complicated form of concept drift where intercon-
nected preferences of many users are drifting in different ways at different time
points. This requires the learning algorithm to keep track of multiple changing
concepts. In addition the typically low amount of data instances associated
with individual customers calls for more concise and efficient learning methods,
which maximize the utilization of signal in the data [Koren, 2009]. Thus, just
under-weighting past actions loses too much signal along with the lost noise,
which is detrimental given the scarcity of data per user. On the other hand,
we specify static item characteristics, qi , because we do not expect significant
temporal variation for items, which, unlike humans, are static in nature.
Based on the previous discussion, [Koren, 2009] suggests the following guide-
lines for modeling drifting user preferences:

• Seek models that explain user behavior along the full extent of the time
period, not only the present behavior (while subject to performance limi-
tations). This is key to being able to extract signal from each time point,
while neglecting only the noise.

• Multiple changing concepts should be captured. Some are user-dependent


and some are item-dependent. Similarly, some are gradual while others
are sudden.

• While separate drifting “concepts” or preferences per user and/or item are
needed to be modeled, it is essential to combine all those concepts within
a single framework. This allows modeling interactions crossing users and
items thereby identifying higher level patterns.

16
• In general, do not try to extrapolate future temporal dynamics, e.g., esti-
mating future changes in a users preferences. This could be very helpful
but is seemingly too difficult, especially given a limited amount of known
data. Rather than that, the goal is to capture past temporal patterns in
order to isolate persistent signal from transient noise. This, indeed, helps
in predicting future behavior.

2.2.1.3.1 Time changing baseline predictors Much of the temporal


variability is included within the baseline predictors, through two major tempo-
ral effects. The first is addressing the fact that an item’s popularity is changing
over time. The second major temporal effect is related to user biases - users
change their baseline ratings over time. As in [Koren, 2009], a template for a
time sensitive baseline predictor for u’s rating of i at day tui is:

bui (tui ) = µ + bu (tui ) + bi (tui ).

Here, bu (·) and bi (·) are real valued functions that change over time while the
function bui (tui ) represents the baseline estimate for u’s rating of iat day t. A
baseline predictor on its own cannot yield personalized recommendations, as it
disregards all interactions between users and items. In a sense, it is capturing
the portion of the data that is less relevant for establishing recommendations.
The exact way to build these functions should reflect a reasonable way to pa-
rameterize the involving temporal changes. For instance, depending on the rec-
ommendation domain, item likability could fluctuate on a daily basis or change
over more extended periods.
By definition, various predictors can be used. For instance, [Koren, 2009] in
a movie recommender system split the timeline into bins so as to balance the
desire to achieve finer resolution (hence, smaller bins) with the need for enough
ratings per bin (hence, larger bins):

bi (t) = bi + bi,Bin(t) .

While binning the parameters works well on the items, it is more of a challenge
on the users side. On the one hand, they would like a finer resolution for users
to detect very short lived temporal effects. On the other hand, they do not
expect enough ratings per user to produce reliable estimates for isolated bins.
Thus, [Koren, 2009] suggests the following predictor:

bu (t) = bu + αu · sign(t − tu ) · |t − tu |β ,

17
where tu denotes the mean date of rating. This simple linear model for approx-
imating a drifting behavior requires learning two parameters per user: bu and
αu .
So far we have discussed smooth functions for modeling the user bias, which
mesh well with gradual concept drift. In a similar way, we can model sudden
drifts emerging as “spikes” associated with a single day or session [Ricci and
Shapira, 2011].
Beyond the temporal effects described so far, one can use the same method-
ology to capture more effects. A primary example is capturing periodic effects.
For example, some products may be more popular in specific seasons or near
certain holidays. Periodic effects can be found also on the user side. As an
example, a user may have different attitudes or buying patterns during the
weekend compared to the working week. This way, the item bias can become:

bi (t) = bi + bi,period(t) .

For example, if we try to capture the change of item bias with the season of the
year, then period(t) ∈ {fall, winter, spring, summer}.
We should note that timeSVD++ requires a significant increase in the num-
ber of parameters, because of the refined representation of each user factor. Yet,
the improvement delivered by timeSVD++ over SVD++ is consistently more
significant. We are not aware of any single algorithm in the literature that could
deliver such accuracy [Ricci and Shapira, 2011]. At the same time, timeSVD++
offers a memory efficient compact model, which can be trained relatively easy.

2.2.1.3.2 Time changing factor model As discussed earlier, we should


also model the user preference change and thereby the interaction between users
and items (baseline predictor do not capture this interaction). This type of evo-
lution is modeled by taking the user factors (the vector pu ) as a function of time.
Once again, we need to model those changes at the very fine level of a daily
basis, while facing the built-in scarcity of user ratings. In fact, these temporal
effects are the hardest to capture, because preferences are not as pronounced as
main effects (user-biases), but are split over many factors [Ricci and Shapira,
2011].
Each component of the user preferences pu (t)T = (pu1 (t), . . . , puf (t)) can be
modeled in the same way that user biases were treated. At this point, we can tie
all pieces together and extend the SVD++ factor model by incorporating the
time changing parameters. The resulting model will be denoted as timeSVD++,

18
where the prediction rule is as follows:
− 12
X
r̂ui = µ + bu (tui ) + bi (tui ) + qiT (pu (tui ) + |R(u)| yj ).
j∈R(u)

2.2.1.4 Confidence Levels As discussed in [Koren, 2008], in several setups,


not all observed ratings deserve the same weight or confidence. Confidence can
stem from available numerical values that describe the frequency of actions, for
example, how much time the user watched a certain show or how frequently a
user bought a certain item. These numerical values indicate the confidence in
each observation.
The matrix factorization model can readily accept varying confidence levels,
which let it give less weight to less meaningful observations. If confidence in
observing rui is denoted as cui , then the model enhances the cost function to
account for confidence as follows:

X
min cui (rui − µ − bu − bi − qiT pu )2 + λ(b2u + b2i + ||qi ||2 + ||pi ||2 ).
b∗ ,q∗ ,p∗
(u,i)∈K

19
3 Beyond Rating Prediction Accuracy
As discussed in [Jannach et al., 2010], the popularity of the collaborative fil-
tering subfield of recommender systems has different reasons, most importantly
the fact that real-world benchmark problems are available and that the data
to be analyzed for generating recommendations have a very simple structure:
a matrix of item ratings. Thus, the evaluation of whether a newly developed
recommendation technique, or the application of existing methods to the rec-
ommendation problem, outperforms previous approaches is straightforward, in
particular because the evaluation metrics are also more or less standardized.
One can easily imagine that comparing different algorithms is not always as
easy as with collaborative filtering, in particular if more knowledge is available
than just the simple rating matrix.
However, the availability of test databases for CF in different domains fa-
vored the further development of various and more complex CF techniques.
Still, this somehow also narrowed the range of domains on which CF techniques
are actually applied. The most popular datasets are about movies and books,
and many researchers aim to improve the accuracy of their algorithms only on
these datasets. Whether a certain CF technique performs particularly well in
one domain or another is unfortunately beyond the scope of many research ef-
forts. In fact, given the rich number of different proposals, the question of which
recommendation algorithm to use under which circumstances is still open, even
if we limit our considerations to purely collaborative approaches. Moreover, the
accuracy results reported on the well-known test datasets do not convey a clear
picture. Many researchers compare their measurements with the already rather
old results from [Breese et al., 1998] and report that they can achieve better
results in one or another setting and experiment. A newer basis of comparison
is required, given the dozens of different techniques that have been proposed
over the past decade. Based on such a comparison, a new set of “baseline”
algorithms could help to get a clearer picture.
Moreover, despite all of the advancements, the current generation of recom-
mender systems still requires further improvements to make recommendation
methods more effective in a broader range of applications [Adomavicius and
Tuzhilin, 2005].

Even though the rating prediction perspective is the prevailing paradigm


in recommender systems, there are other perspectives that have been gaining

20
significant attention in this field [Jannach et al., 2010] and try to alleviate the
problems pertaining to the aforementioned narrow rating prediction accuracy-
based focus [Adamopoulos, 2013a]. In particular, some of the most recent rec-
ommender system perspectives maintain that recommender systems should pro-
vide personalized recommendations from a wide range of items and they should
also enable the users to find relevant items that might be hard to discover. In
addition, RSs should increase user satisfaction and engagement and offer a su-
perior user experience. Moreover, RSs should be able to reduce search costs and
improve the quality of decisions that consumers make. Besides, from a business
perspective, RSs should increase the number of sales and conversion rates as
well as promote items from the long tail that usually exhibit significantly lower
marginal cost and, at the same time, higher marginal profit. Furthermore, they
should also make the users familiar with the various product categories and the
whole product catalog.
Moving beyond the classical perspective of the rating prediction accuracy, in
this section we discuss various existing approaches and propose future research
directions.

3.1 Novelty
Novel recommendations are recommendations of those items that the user
did not know about [Konstan et al., 2006]. Hijikata et al. [Hijikata et al., 2009]
use collaborative filtering to derive novel recommendations by explicitly asking
users what items they already know whereas [Zhou et al., 2010] define novelty
as the average self-information of recommended items, which amounts to the
average log inverse ratio of users who like the item (also known as “inverse user
frequency”). Besides, [Weng et al., 2007] suggest a taxonomy-based RS that
utilizes hot topic detection using association rules to improve novelty and quality
of recommendations, whereas [Zhang and Hurley, 2009] propose to enhance
novelty at a small cost to overall accuracy by partitioning the user profile into
clusters of similar items and compose the recommendation list of items that
match well with each cluster, rather than with the entire user profile. Also,
[Celma and Herrera, 2008] analyze the item-based recommendation network
through similarity links to detect whether its intrinsic topology has a pathology
that hinders long-tail novel recommendations and [Nakatsuji et al., 2010] define
and measure novelty as the smallest distance from the class the user accessed
before to the class that includes target items over the taxonomy.

21
3.2 Serendipity
Moreover, serendipity involves a positive emotional response of the user
about a previously unknown (novel) item and measures how surprising these
recommendations are [Shani and Gunawardana, 2011]; serendipitous recommen-
dations are, by definition, also novel. However, a serendipitous recommendation
involves an item that the user would not be likely to discover otherwise, whereas
the user might autonomously discover novel items. [Iaquinta et al., 2008] propose
to enhance serendipity by recommending novel items whose description is se-
mantically far from users’ profiles and [Kawamae et al., 2009], [Kawamae, 2010]
suggest an algorithm for recommending novel items based on the assumption
that users follow earlier adopters who have demonstrated similar preferences.
In addition, [Sugiyama and Kan, 2011] proposed a method for recommending
scholarly papers utilizing dissimilar users and co-authors to construct the pro-
file of the target researcher. Also, [André et al., 2009] examine the potential for
serendipity in Web search and suggest that information about personal interests
and behavior may be used to support serendipity.

3.3 Diversity
Furthermore, diversification is defined as the process of maximizing the va-
riety of items in a recommendation list. Most of the literature in Recommender
Systems and Information Retrieval that goes beyond the traditional perspective
of rating prediction accuracy studies the principle of diversity to improve user
satisfaction. Typical approaches replace items in the derived recommendation
lists to minimize similarity between all items or remove “obvious” items from
them as in [Billsus and Pazzani, 2000]. [Ziegler et al., 2005] propose a similarity
metric using a taxonomy-based classification and use this to assess the topical
diversity of recommendation lists. They also provide a heuristic algorithm to
increase the diversity of the recommendation list based on a greedy re-ranking
algorithm that iteratively selects items that maximize a trade-off between the
original recommendation value and the average distance to the new list under
construction. Then, [Zhang and Hurley, 2008] focus on intra-list diversity and
address the problem as the joint optimization of two objective functions re-
flecting preference similarity and item diversity, and [Hurley and Zhang, 2011]
formulate the trade-off between diversity and matching quality as a binary op-
timization problem. Besides, [Wang and Zhu, 2009], inspired by the modern
portfolio theory in financial markets, suggest an algorithm that generalizes the

22
probability ranking principle by considering both the uncertainty of relevance
predictions and correlations between retrieved documents. Also, [Said et al.,
2012] suggest an inverted nearest neighbor model and recommend items dis-
liked by the least similar users. Following a different direction, [McSherry, 2002]
investigates the conditions in which similarity can be increased without loss
of diversity and presents an approach to retrieval which is designed to deliver
such similarity-preserving increases in diversity. In addition, [Zhang et al., 2012]
propose a collection of algorithms to simultaneously increase novelty, diversity,
and serendipity, at a slight cost to accuracy, and [Zhou et al., 2010] suggest
a hybrid algorithm which, without relying on any semantic or context-specific
information, simultaneously gains in both accuracy and diversity of recommen-
dations. In another stream of research, [Panniello et al., 2009] compare sev-
eral contextual pre-filtering, post-filtering, and contextual modeling methods in
terms of accuracy and diversity of their recommendations to determine which
methods outperform others and under which circumstances. Considering how
to measure diversity, [Castells et al., 2011] and [Vargas and Castells, 2011] aim
to cover and generalize the metrics reported in the RS literature [Zhang and
Hurley, 2008], [Zhou et al., 2010], [Ziegler et al., 2005], and derive new ones.
They suggest novelty and diversity metric schemes that take into consideration
item position and relevance through a probabilistic recommendation browsing
model. Besides, other researchers studied the importance of personalization
and users’ perception in diversity. In particular, [Hu and Pu, 2011] investigate
design issues that can enhance users’ perception or recommendation diversity
and improve users’ satisfaction, and [Ge et al., 2012] show that the perceived
diversity of a recommendation list depends on the placement of diverse items.
Further, [Vargas et al., 2012] suggest that the combination of personalization
and diversification achieves competitive performance improving the baseline,
plain personalization, and plain diversification approaches in terms of both di-
versity and accuracy measures, and [Shi et al., 2012] argue that the diversifi-
cation level in a recommendation list should be adapted to the target users’
individual situations and needs, and propose a framework to adaptively diver-
sify recommendation results for individual users based on latent factor models.
Lastly, examining similar but yet different concepts of diversity, [Adomavicius
and Kwon, 2009, Adomavicius and Kwon, 2012] propose the concept of aggre-
gate diversity as the ability of a system to recommend across all users as many
different items as possible while keeping accuracy loss to a minimum, by a con-
trolled promotion of less popular items toward the top of the recommendation

23
lists. Also, [Lathia et al., 2010] consider the concept of temporal diversity; the
diversity in the sequence of recommendation lists produced over time. Taking
into consideration the different notions and concepts discussed so far, avoiding
a too narrow set of choices is generally a good approach to increase the use-
fulness of the recommendation list since it enhances the chances that a user is
pleased by at least some recommended items. Toward this direction, in market-
ing [Ghose et al., 2012] provide evidence that consumers prefer the diversity in
the ranking results.

3.3.1 Proposed Approaches for Diversity

3.3.1.1 Intralist Diversity [Zhang and Hurley, 2008] propose methods


for maximizing the diversity of a recommendation list, while maintaining an
acceptable level of matching quality, and show how these competing concerns
can be presented as constrained binary optimization problems.
In particular, they adopt the definition of set diversity as average dissimilar-
ity. Specifically, given a distance function, d : I × I → R, such that d(i, j) is the
distance or dissimilarity between elements i, j ∈ I, the diversity fD (R) is given
as the average dissimilarity of all pairs of elements contained in R. That is,
2 X X
fD (R) = d(i, j),
p(p − 1)
i∈R j6=i∈R

where p = |R| and R the recommendation list. The distance function is


application-dependent and may correspond for example to a distance between
feature vectors under a mapping of I into a feature space. Based on this defini-
tion and metric, the opposition of the competing requirements of diversity and
high matching are immediately clear.
Then, the authors formulate the quadratic objective function of the problem.
In detail, given a fixed parameter θ ∈ [0, 1], find the vector y ∗ defined as:

y∗ = arg max(1 − θ)αyT Dy + θβmTu y


s.t. 1T y = p,
y(i) ∈ {0, 1} ∀i = 1, . . . , M.

where y is the indicator vector, such that y(i) = 1 if i is included in the recom-
mendation list and 0 otherwise, D is the distance matrix, mu the M -dimensional
vector with m(i) the matching score based on a matching function gm (qu , i), α
and β are normalization parameters to ensure that diversity and matching mea-
sures are normalized to the same scale, and θ explicitly expresses the trade-off

24
between the two objectives and represents the importance given to the matching
value in comparison to that given to diversity. This formulation corresponds to
binary quadratic programming problems with linear constraints.
A key step in the solution of a binary quadratic programming problem is
the choice of relaxation to a real-valued problem. Here, the authors use the
following relaxation for their application:

x∗ = arg sup(1 − θ)αxT Dx + θβmTu x


s.t. = kxk2 = p x ∈ RM ,

for which the solution is presented in [Rojas et al., 2001]. Once the real-valued
solution x∗ to the relaxed problem is found, it is necessary to quantize the
values to a candidate binary solution y. It can be easily shown that maximizing
yT x∗ maximizes a lower bound on the quadratic form. Since y is binary and
noting that −x∗ is also a valid solution, the dot product is maximized by setting
y(i) = 1, wherever x(i) is one of the p highest elements of +x or −x, depending
on which gives the larger value. For efficiency reasons, they do no further search
once a feasible binary solution is generated.
There are two steps within a recommendation framework at which diversity
criteria can be applied. The first is the selection of the candidate set C. Solving
the binary programming problem, for a given θ and a given size p = l, allows
the selection of l items, which are diverse as well as similar to the user profile.
Alternatively, or additionally, from an intermediate set I of items in C that are
most similar to the user profile, we can solve the binary programming problem,
to extract the most diverse subset of size p = N from I.
To evaluate the generated recommendations, the authors introduce a mea-
sure to capture this notion as
1 X
nR (i) , p(fD (R) − fD (R − {i})) = d(i, j),
p−1
j∈R

that is, the amount of additional diversity that i brings to the set R.
In this approach, the “training” part consists of learning the distance matrix
for which the authors use only implicit feedback. The evaluation has shown that
the proposed method can increase the likelihood of the system recommending
novel items, while maintaining good performance on the core items. It is notable
however that, even with diversity, the probability of recommending novel items
is very low and this will be the case whenever similarity is used as a primary
selection criterion.

25
3.3.1.2 Aggregate Recommendation Diversity [Adomavicius and Kwon,
2012] introduce and explore a number of item ranking techniques that can gener-
ate substantially more diverse recommendations across all users while maintain-
ing comparable levels of recommendation accuracy. Comprehensive empirical
evaluation consistently shows the diversity gains of the proposed techniques us-
ing several real-world rating data sets and different rating prediction algorithms.
In contrast to individual diversity, which has been explored in a number of
papers, some recent studies [Brynjolfsson et al., 2011], [Fleder and Hosanagar,
2009] started examining the impact of recommender systems on sales diversity
by considering aggregate diversity of recommendations across all users. Note
that high individual diversity of recommendations does not necessarily imply
high aggregate diversity. For example, if the system recommends to all users
the same five best selling items that are not similar to each other, the recom-
mendation list for each user is diverse (i.e., high individual diversity), but only
five distinct items are recommended to all users and purchased by them (i.e.,
resulting in low aggregate diversity or high sales concentration) [Adomavicius
and Kwon, 2012].
While the benefits of recommender systems that provide higher aggregate
diversity would be apparent to many users (because such systems focus on pro-
viding wider range of items in their recommendations and not mostly bestsellers,
which users are often capable of discovering by themselves), such systems could
be beneficial for some business models as well [Brynjolfsson et al., 2011], [Bryn-
jolfsson et al., 2003], [Fleder and Hosanagar, 2009], [Goldstein and Goldstein,
2006]. However, the impact of recommender systems on aggregate diversity in
real world e-commerce applications has not been well understood [Adomavi-
cius and Kwon, 2012] while contradictory results can be found in the related
literature [Brynjolfsson et al., 2011], [Fleder and Hosanagar, 2009].
Higher diversity (both individual and aggregate), however, can come at the
expense of accuracy. As known well, there is a trade-off between accuracy and
diversity because high accuracy may often be obtained by safely recommending
to users the most popular items, which can clearly lead to the reduction in
diversity. And conversely, higher diversity can be achieved by trying to uncover
and recommend highly idiosyncratic or personalized items for each user, which
often have less data and are inherently more difficult to predict, and, thus, may
lead to a decrease in recommendation accuracy [Adomavicius and Kwon, 2012].
Since the authors intend to measure the recommender systems performance
based on the top-N recommended items lists that the system provides to its

26
users, they use the total number of distinct items recommended across all users
as an aggregate diversity measure, which they refer to as diversity-in-top-N and
formally define as follows:

[
diversity-in-top-N = LN (u) .
u∈U

In order to control the accuracy-diversity trade-off, the authors propose a


parameterized ranking approach. The proposed ranking approaches are param-
eterized with “ranking threshold” TR ∈ [TH , Tmax ] (where Tmax is the largest
possible rating on the rating scale, e.g., Tmax = 5) to allow user the ability
to choose a certain level of recommendation accuracy. In particular, given any
ranking function rankX (i), ranking threshold TR is used for creating the param-
eterized version of this ranking function, rankX (i, TR ), which is formally defined
as: 
rank (i),
x if R∗ (u, i) ∈ [TR , Tmax ],
rankx (i, TR ) =
α + rank
u Standard (i), if R∗ (u, i) ∈ [TH , TR ),
where Iu∗ (TR ) = {i ∈ I|R∗ (u, i) ≥ TR }, αu = max rankx (i).
∗ (T )
i∈Iu R
Simply put, items that are predicted above ranking threshold TR are ranked
according to rankX (i), while items that are below TR are ranked according
to the standard ranking approach rankStandard (i). In addition, all items that
are above TR get ranked ahead of all items that are below TR (as ensured by
αu in the above formal definition). Thus, increasing the ranking threshold TR
toward Tmax would enable choosing the most highly predicted items resulting in
more accuracy and less diversity (becoming increasingly similar to the standard
ranking approach).
Then, the authors introduce several item ranking functions in order to empir-
ically test their approach. In particular, the authors introduce the following six
ranking approaches that can be used as alternatives to rankStandard to improve
recommendation diversity:

Reverse Predicted Rating Value: Ranking the candidate (highly predicted)


items based on their predicted rating value, from lowest to highest (as a
result choosing less popular items). More formally:

rankRevPred (i) = R∗ (u, i).

Item Average Rating: Ranking items according to an average of all known

27
ratings for each item:
1 X
rankAvgRating(i) = R(i), where R(i) = R(u, i).
|U (i)|
u∈U (i)

Item Absolute Likeability: Ranking items according to how many users liked
them (rated the item above TH ):

rankAbsLike (i) = |UH (i)|,

where UH (i) = {u ∈ U (i)|R(u, i) ≥ TH }.

Item Relative Likeability: Ranking items according to the percentage of the


users who liked an item (among all users who rated it):

rankRelLike (i) = |UH (i)|/|U (i)|.

Item Rating Variance: Ranking items according to each item’s rating vari-
ance (rating variance of users who rated the item):
1 X
rankItemVar (i) = (R(u, i) − R(i))2 .
|UH (i)|
u∈U (i)

Neighbors’ Rating Variance: Ranking items according to the rating vari-


ance of neighbors of a particular user for a particular item. The closest
neighbors of user u among the users who rated the particular item i, de-
0
noted by u , are chosen from the set of U (i) ∩ N (u):
1 X
rankNeighborVar (i) = (R(u0 , i) − Ru (i))2 ,
|U (i) ∩ N (u)|
u0 ∈(U (i)∩N (u))

1
R(u0 , i).
P
where Ru (i) = |U (i)∩N (u)| u0 ∈(U (i)∩N (u))

The proposed recommendation ranking approaches were tested with several


movie rating data sets using three widely popular recommendation techniques
for rating prediction, including two heuristic-based (user-based and item-based
CF) and one model-based (matrix factorization CF) technique. In general, all
proposed ranking approaches were able to provide significant diversity gains,
and the best performing ranking approach may be different depending on the
chosen data set and rating prediction technique. The conducted analysis pro-
vides empirical support that the proposed ranking approaches increase not just

28
the number of distinct items recommended, but also the proportion of recom-
mended long-tail items, thus, confirming that the proposed techniques truly
contribute toward more diverse and idiosyncratic recommendations across all
users. This means that the proposed ranking techniques do not just manipulate
our simple diversity-in-top-N metric to increase the number of different items
among the recommendations, but also fundamentally change the distribution of
recommended items toward more evenly distributed representation.
Moreover, because of the inherent trade-off between these two metrics, the
authors discuss the use of the common approach of solving multi-criteria opti-
mization problems that optimizes only one of the criteria and converts the others
to constraints. In particular, given some diversity metric d and the target di-
versity level D, we can search the space of all possible top-N recommendation
configurations for all users until an optimal configuration is found. However, as
the authors discuss, while the global optimization-based approach can promise
the highest predictive accuracy for the given level of diversity, this approach
would be prohibitive even for applications of relatively small size.
In addition to providing significant diversity gains, the proposed ranking
techniques have several other advantageous characteristics. In particular, these
techniques are extremely efficient, because they are based on scalable sorting-
based heuristics that make decisions based only on the “local” data (i.e., only
on the candidate items of each individual user) without having to keep track of
the “global” information, such as which items have been recommended across
all users and how many times. The techniques are also parameterizable, since
the user has the control to choose the acceptable level of accuracy for which the
diversity will be maximized. Also, the proposed ranking techniques provide a
flexible solution to improving recommendation diversity because: they are ap-
plied after the unknown item ratings have been estimated and, thus, can achieve
diversity gains in conjunction with a number of different rating prediction tech-
niques, as illustrated in the paper; the vast majority of current recommender
systems already employ some ranking approach, thus, the proposed techniques
would not introduce new types of procedures into recommender systems (they
would replace existing ranking procedures); the proposed ranking approaches
do not require any additional information about users (e.g., demographics) or
items (e.g., content features) aside from the ratings data, which makes them ap-
plicable in a wide variety of recommendation contexts [Adomavicius and Kwon,
2012].

29
3.4 Novelty and Diversity Metrics
[Vargas and Castells, 2011, Castells et al., 2011] propose a probabilistic
recommendation browsing model, building upon the basic concepts of choice,
discovery, and relevance, and based on this the authors introduce relevance and
rank in novelty and diversity metric schemes.
The proposed metric framework is founded on three fundamental relations
between users and items:

Discovery: an item is seen by (or is familiar to) a user. We consider this fact
independently from the degree of enjoyment / dislike, or whether the user
consumed the item or not.

Choice: an item is used, picked, selected, consumed, bought, etc., by a user.

Relevance: an item is liked, useful, enjoyed, etc., by a user.

In particular, they model these three relations as binary random variables


over the set of users and the set of items: seen, choose, rel : U × I → {0, 1}.
These three variables are naturally related: a chosen item must obviously be
seen, and relevant items are more likely to be chosen than irrelevant ones. As
a simplification, they assume relevant items are always chosen if they are seen,
irrelevant items are never chosen, and items are discovered independently from
their relevance. In terms of probability distribution, these assumptions are
expressed as:
p(choose) ∼ p(seen)p(rel)

where choose is a shorthand for choose = 1, and same for the other two variables.
Discovery, choice and relevance play different roles in our framework. Discovery
is used as the basis to define item novelty models. Choice is used to build models
of user browsing behavior over recommended lists of items. Together, browsing
models and item novelty models give rise to a fairly wide range of novelty and
diversity metrics.
The starting point of the proposed framework is a general scheme where a
recommendation metric is defined as the expected novelty of the recommended
items the user will choose. Given a ranked list R of items recommended to a
user u, this can be expressed as:
X
m(R|θ) = c p(choose|i, u, R)nov(i|θ)
i∈R

30
where c is a normalizing constant, and θ stands for a generic contextual variable
which will allow for the consideration of different perspectives in the definition
of novelty and diversity. Here, the novelty or diversity of a recommendation list
is measured as the aggregate novelty of its constituent items. Also, there are
different ways in which the recommendation browsing model and item novelty
can be developed.
The authors identify two main relevant approaches to model item novelty,
based on popularity and distance, respectively. The notion of item discovery
(popularity) introduced in the previous section enables a formulation of this
principle as the probability that an item was not observed before:

nov(i|θ) = 1 − p(seen|i, θ),

where the contextual variable θ represents any element on which item discovery
may depend, or relative to which we may want to particularize novelty. This
might include e.g. a specific user, a group of users, vertical domains, time inter-
vals, sources of item discovery such as searching, browsing, past or alternative
recommendations, friends, advertisements, etc.
In general terms, p(seen|i, θ) reflects a factor of item popularity, whereby
high novelty values correspond to long-tail items few users have interacted with,
and low novelty values correspond to popular head items. If one wish to em-
phasize highly novel items, the log of the inverse popularity can be used:

nov(i|θ) = − log2 p(seen|i, θ).

Alternatively, one may also consider the Bayesian inversion of the discov-
ery distribution, p(i|seen, θ), which provides a relative measure of how likely
items are to be seen with respect to each other. This leads to an interesting
formulation of item novelty:

nov(i|θ) = − log2 p(i|seen, θ).

This corresponds to the notion of self-information or surprisal I(i), commonly


used in Information Theory to measure novelty as the amount of information
the observation of i conveys.
As an alternative to the popularity-based view, the authors propose a similarity-
based model where item novelty is defined by a distance function between the
item and a context of experience. Then, the novelty can be formulated as the

31
expected or minimum distance between the item and the set:
X
nov(i|θ) = p(j|choose, θ, i)d(i, j)
j∈θ
or nov(i|θ) = min d(i, j)
j∈θ

where p(j|choose, θ, i) is the probability that the user chooses item j in the
context θ, when he has already chosen i, and d the distance measure.
The browsing component of the metric scheme is based on a distribution
p(choose|i, u, R) which can be modeled in terms of the user behavior in its in-
teraction with a list of recommended items. There are many ways to model
this behavior. First, the authors assume that the target user will use all recom-
mended items which he/she effectively gets to see and he/she finds relevant for
his/her taste. The model was already formulated as this, which in the current
context becomes:

p(choose|i, u, R) ∼ p(seen|i, u, R)p(rel|i, u)

where the relevance of an item is independent from the recommendation in


which it is delivered. The p(rel|i, u) component introduces relevance in the
definition of the metric: the novelty of a recommended item will be taken into
account only as much as the item is likely to be relevant for the target user.
The p(seen|i, u, R) component allows for the introduction of a rank discount
and reflect the fact that the lower an item is ranked in R, the less likely it will
be seen.
Then, the authors assume a so-called cascade model where the user browses
the items by ranking order without jumps, until she stops. At each position k
in the ranking, the user makes a decision whether or not to continue, which we
model as a binary random variable cont, where p(cont|k, u, R) is the probability
that user u decides to continue browsing the next item at position k + 1. With
this scheme we have, by recursion:

p(seen|k, u, R) = p(seen|k − 1, u, R)p(cont|k − 1, u, R) =


k−1
Y
= p(cont|j, u, R)
j=1

For the estimation of the models, a maximum likelihood estimate can be


used for item discovery p(seen|i, θ). Correspondingly, relevance in the context
of recommendation is a user-specific notion which can be equated to the interest

32
of users for items. This probability of items being liked can be modeled by
a heuristic mapping between rating values and probability of relevance. For
instance, drawing from Information Retrieval:

2g(u,i)
p(rel|i, u) ∼ ,
2gmax
where g is a utility function to be derived from ratings, e.g. g(u, i) = max(0, r(u, i)−
τ ), where τ represents the “indifference” rating value.
By plugging the different novelty and browsing models, one can devise differ-
ent metrics for novelty and diversity using popularity-based and distance-based
measures. In particular, for a novelty metric the distance of an item from
the user is considered, whereas for diversity the distance of a candidate items
from the rest of the items included in the recommendation list. For instance, a
popularity-based metric can be derived as:
X
nov(R|u) = EP C = C disc(k)p(rel|ik , u)(1 − p(seen|ik )).
ik ∈R

Respectively, in the distance-based model, a measure of recommendation diver-


sity can be devised as following:
X
div(R|u) = EILD = C 00 disc(k)disc(l|k)p(rel|ik , u)p(rel|il , u)d(ik , il ),
ik ∈R
il ∈R
l6=k

where disc(l|k) = disc(max(1, l − k)) reflects a relative rank discount for an


item at position l knowing that position k has been reached.
Based on the above discussion, the proposed framework provides a common
ground for the development of metrics based on different perspectives on novelty
and diversity, generalizing metrics reported in the literature, and deriving new
ones. An advantage of the proposed decomposition into a few essential modular
pieces is a high potential for generalization and unification.

3.5 Unexpectedness
Pertaining to unexpectedness, in the field of knowledge discovery, [Silber-
schatz and Tuzhilin, 1996], [Berger and Tuzhilin, 1998], [Padmanabhan and
Tuzhilin, 1998, Padmanabhan and Tuzhilin, 2000, Padmanabhan and Tuzhilin,
2006] propose a characterization relative to the system of prior domain beliefs
and develop efficient algorithms for the discovery of unexpected patterns, which

33
combine the independent concepts of unexpectedness and minimality of pat-
terns. Also, [Kontonasios et al., 2012] survey different methods for assessing
the unexpectedness of patterns focusing on frequent itemsets, tiles, association
rules, and classification rules. In the field of recommender systems, [Murakami
et al., 2008] and [Ge et al., 2010] suggest both a definition of unexpectedness as
the deviation from the results obtained from a primitive prediction model and
metrics for evaluating unexpectedness. Also, [Akiyama et al., 2010] propose
unexpectedness as a general metric that does not depend on a user’s record
and involves an unlikely combination of features. However, all these approaches
do not fully capture the multi-faceted concept of unexpectedness since they do
not truly take into account the actual expectations of the users, which is crucial
according to philosophers, such as Heraclitus, and some modern researchers [Sil-
berschatz and Tuzhilin, 1996], [Berger and Tuzhilin, 1998], [Padmanabhan and
Tuzhilin, 1998]. Hence, an alternative definition of unexpectedness, taking into
account prior expectations of the user, and methods for providing unexpected
recommendations are still needed.

3.5.1 Differences from Related Concepts

Based on the previous discussion, it should be clear to the reader by now that
the concepts of novelty, serendipity, unexpectedness, and diversity are closely
related but still different concepts.
Comparing novelty to unexpectedness, a novel recommendation might be
unexpected but novelty is strictly defined in terms of previously unknown non-
redundant items without allowing for known but unexpected ones. Also, novelty
does not include any positive reactions of the user to recommendations. Illus-
trating some of these differences in the movie context, assume that the user
John Doe is mainly interested in Action & Adventure films. Recommending to
this user the newly released production of one of his favorite Action & Adven-
ture film directors is a novel recommendation but not necessarily unexpected
and possibly of low utility for him since John was either expecting the release of
this film or he could easily find out about it. Similarly, assume that we recom-
mend to this user the latest Children & Family film. Although this is definitely
a novel recommendation, it is probably also of low utility and would be likely
considered “irrelevant” because it departs too much from his expectations.
Moreover, even though both serendipity and unexpectedness involve a posi-
tive surprise of the user, serendipity is restricted to novel items and their acci-

34
dental discovery, without taking into consideration the expectations of the users
and the relevance of the items, and thus constitutes a different type of recom-
mendations that can be more risky and ambiguous. To further illustrate the
differences of these two concepts, let’s assume that we recommend to John Doe
the latest Romance film. There are some chances that John will like this novel
item and the accidental discovery of a serendipitous recommendation. However,
such a recommendation might also be of low utility to the user since it does not
take into consideration his expectations and the relevance of the items. On the
other hand, assume that we recommend to John Doe a movie in which one of
his favorite Action & Adventure film directors is performing as an actor in an
old (non-novel) Action film of another director. The user will most probably
like this unexpected but non-serendipitous recommendation.
Finally, comparing unexpectedness to diversity, diversity is a very different
concept from unexpectedness and constitutes an ex-post process that can be
combined with the concept of unexpectedness.

3.5.2 Proposed Approaches for Unexpectedness

3.5.2.1 Expecting the Unexpected In [Adamopoulos and Tuzhilin, 2011,


Adamopoulos and Tuzhilin, 2013a], we propose a concept of unexpected recom-
mendations as recommending those items that significantly depart from the
expectations of the users and suggest a method for generating such recommen-
dations, based on the utility theory of economics, as well as specific metrics to
measure the unexpectedness of recommendation lists.
In particular, we formally define the concept of unexpectedness in recom-
mender systems taking into account the actual expectations of the users and
discuss how the concept of unexpectedness is differentiated from various related
notions, such as novelty, serendipity, and diversity. Following the Greek philoso-
pher Heraclitus, we approach this difficult problem of finding and recommending
unexpected items by first capturing the items expected by the users. Toward
this direction, we suggest several mechanisms for specifying users’ expectations
that can be applied across various domains. Such mechanisms include the past
transactions performed by the users, knowledge discovery and data mining tech-
niques, and experts’ domain knowledge. Besides, we formulate and fully oper-
ationalize the notion of unexpectedness and present an algorithm for providing
unexpected recommendations of high quality that are hard to discover but fairly
match the users’ interests, based on the utility theory of economics. Moreover,

35
we propose specific performance metrics to measure the unexpectedness of the
generated recommendation lists taking into account also the usefulness of indi-
vidual items.
Using “real-world” data sets, various examples of sets of expected recom-
mendations, and different utility functions and distance metrics, we were able
to test the proposed method under a large number of experimental settings
including various levels of sparsity, different mechanisms for specifying users’
expectations, and different cardinalities of these sets of expectations. The em-
pirical study showed that all the examined variations of the proposed method
significantly outperformed in terms of unexpectedness the standard baseline al-
gorithms, including item-based and user-based k-Nearest Neighbors [Konstan
et al., 1997], Slope One [Lemire and Maclachlan, 2007], and Matrix Factoriza-
tion [Koren et al., 2009]. This demonstrates that the proposed method indeed
effectively captures the concept of unexpectedness since, in principle, it should
do better than unexpectedness-agnostic methods. Furthermore, the proposed
method for unexpected recommendations performed at least as well as, and
in some cases even better than, the baseline algorithms in terms of the classi-
cal accuracy-based measures, such as root-mean-square error (RMSE) and the
F-measure, as well as other popular performance measures, such as catalog cov-
erage, aggregate diversity, serendipity, and the Gini coefficient. In addition,
we presented a number of actual recommendation examples generated by the
proposed method and the employed baseline approaches and provided insightful
qualitative comments.
One of the main premises of the proposed method is that the users’ expec-
tations should be explicitly considered in order to provide the users with un-
expected recommendations of high quality that are hard to discover but fairly
match their interests. Hence, the greatest improvements both in terms of un-
expectedness and accuracy vis-à-vis all other approaches were observed in the
experiments using the more accurate sets of expectations. Moreover, the use
of a utility function of standard form illustrates that the proposed method can
be easily implemented in existing recommender systems as a new component
that enhances unexpectedness of recommendations, without the need to further
modify the current rating prediction procedures.

3.5.2.2 Probabilistic Neighborhood Selection In [Adamopoulos and


Tuzhilin, 2013b], we propose a specific variation of the classical k-nearest neigh-
bors (k-NN) collaborative filtering method in which the neighborhood selection

36
is based on an underlying probability distribution instead of just the k neighbors
with the highest similarity level to the target user. For the probabilistic neigh-
borhood selection, we use an efficient method for weighted sampling [Wong and
Easton, 1980] of k neighbors without replacement that also takes into considera-
tion the similarity levels between the target user and the n candidate neighbors.
The key intuition for this probabilistic nearest neighbors collaborative filter-
ing method is two-fold. Using the neighborhood with the most similar users
to estimate unknown ratings and recommend candidate items, the generated
recommendation lists usually consist of known items with which the users are
already familiar. Besides, because of the multi-dimensionality of users’ tastes,
there are many items that the target user may like and are unknown to the k
most similar users to her/him. Thus, we propose the use of probabilistic neigh-
borhood selection in order to alleviate the aforementioned problems and move
beyond the limited focus of rating prediction accuracy.
To investigate this claim, we are conducting an empirical study and we are
testing the proposed method under a large number of experimental settings. In
detail, we are using a large number of probability distributions from different
families of distributions testing various shape parameters in order to compare
the proposed probabilistic method for neighborhood selection against the stan-
dard collaborative filtering approach in terms of popular evaluation metrics for
item prediction accuracy, coverage, diversity, unexpectedness, and dispersion of
recommendations.

3.6 Recommendation Opportunities


In the related field of data mining, [Provost and Fawcett, 2013] discusses the
use of combining functions in clustering and [Lawrence et al., 2007] utilizes the
concept of percentiles, looking for the 80 percent of the conditional spending dis-
tribution of customers, in order to identify new sales prospects. Even though the
percentile ideas were applied to clustering techniques in data mining, they have
not yet been applied to the recommender systems problems. In [Adamopoulos
and Tuzhilin, 2013c], under a definition of a recommendation opportunity as how
much a user could realistically like an item, we aim at recommending items that
the users will remarkably like. Moving beyond the standard perspective of rat-
ing prediction accuracy and exploring such recommendation opportunities can
increase user satisfaction and engagement and offer a superior user experience
through the discovery of items that the users will really like.

37
In particular, we illustrate the practical implementation of the proposed
approach presenting a certain variation of the classical user-based k-NN collab-
orative filtering (CF) method in which the estimation of an unknown rating of
the target user for an item is based not on the weighted averages of the k near-
est neighbors but on the weighted percentile of the ratings of these k neighbors.
For the estimation of the weighted percentile of the distribution of the ratings
in the neighborhood of the target user, an efficient method is used that does
not increase the computational complexity of the classical k-NN method. The
key intuition behind using this weighted percentile method, instead of weighted
averages, is that high percentiles (such as in the 70% to 90% range) constitute
more realistic estimates of how much a targeted user could possibly like the
candidate item. As a consequence, the proposed approach not only provides
more useful for the users recommendations of items that they remarkably like
but also has the potential to let us better identify and serve any specific niches
of the market. In the following paragraphs we describe the proposed method in
greater detail.
In particular, such a high percentile p (e.g. 70th , . . . , 90th ) of the conditional
distribution of the user’s estimated rating, given all the information that we
have available, characterizes how much the target user u could realistically like
p
the candidate item i. Formally, the percentile, denoted by r̂u,i , is defined such
p
that the probability that user u would rate item i with a rating of r̂u,i or less is
p%. The information that we have available in a typical k-NN model in order
p
to estimate the quantity r̂u,i is the neighbors of user u Ni (u), the similarity
levels of these neighbors wNi (u) := (wu,v : v ∈ Ni (u)), and the corresponding
ratings rN (u),i := (rv,i : v ∈ Ni (u)). As an example, consider the neighborhood
N (u) of size 4 with similarity weights wN (u) = (0.2, 0.4, 0.3, 0.1) and items x
and y with ratings rN (u),x = (2, 3, 3, 4) and rN (u),y = (2, 2, 4, 4), respectively.
Using the standard combining function as in (1), item x would be recommended.
However, using for instance the weighted 80th percentile of the variable rN (u),i ,
item y would be recommended since the specific percentile for item y, denoted
p=80
by r̂u,y , corresponds to a higher rating than item x and, thus, there is high
potential that user u could realistically like item y more than x; equivalently,
the probability of user u assigning a rating greater or equal to 4 is higher for
item y than x.
Algorithm 1 summarizes the user-based k-nearest neighbors (k-NN) collab-
orative filtering approach with a general combining function and Algorithm 2
p
shows a procedure to estimate a weighted percentile r̂u,i (i.e. the proposed com-

38
ALGORITHM 2: Weighted Percentile Estimation
Input: Values v1 , . . . , vn , Weights w1 , . . . , wn , and
p percentile to be estimated.
Output: p-th weighted percentile of ordered values v1 , . . . , vn
Order values v1 , . . . , vn from least to greatest;
Rearrange weights w1 , . . . , wn based on ordered values;
Calculate the percent rank for p based on weights w1 , . . . , wn ;
Use linear interpolation between the two nearest ranks;

bining function), where the values rN (u),i are the ratings given to candidate
item i by neighbors Ni (u), the k users most similar to target user u, and the
weights wNi (u) are the corresponding similarity levels of neighbors to user u.
To support this claim, we conducted an empirical study and showed that
the proposed percentile method outperforms by a wide margin the standard
user-based collaborative filtering approach in terms of item prediction accuracy
measures, such as precision, recall, and the F-measure, across various experi-
mental settings. Finally, we demonstrated that this performance improvement
is not achieved at the expense of some other popular performance measures,
such as catalog coverage, aggregate diversity, and the Gini coefficient. This
illustrates that our proposed weighted percentile method for recommendation
opportunities performs at least as well or even significantly better than the
classical user-based collaborative filtering method in terms of these important
measures in most of the experiments.
Nevertheless, apart from the user-based and item-based k-NN collaborative
filtering approaches, other popular RSs methods that can be easily extended,
with the use of quantile regression [Koenker, 2005], in order to allow us both
to build models that predict high percentiles and to evaluate them with regard
to the goal of predicting percentiles of estimated ratings, include content-based
methods, and Matrix Factorization [Karatzoglou and Weimer, 2010].
As a part of the future work, we would like to conduct live experiments
with real users. Also, we will study the impact of the proposed method on
novelty, serendipity, and unexpectedness [Adamopoulos and Tuzhilin, 2011] of
recommendation lists.

39
3.7 Recommendation Sets
In the classical recommendation system paradigms, the generated recom-
mendation lists are based on the top-N items with the highest estimated rat-
ings regardless of the possible interaction effects among the candidate items.
One exception of this paradigm is the concept of diversity, according to which
the variety of items in a recommendation list is maximized. However, in many
recommendation settings there are important interactions among the candidate
items that should be explicitly considered. For instance, in a clothing on-line
retail setting the utility of recommending to a user a specific pair of pants de-
pends on whether a matching shirt is also included in the same recommendation
list or not. Similarly, in the case of a RS for a supermarket, the utility of recom-
mending olive oil to a user might depend on whether feta cheese and tomatoes
are recommended as well.
Moving beyond the classical recommendation lists of individual items with
the highest estimated ratings recommendation sets, rather than individual items,
should take into account various interactions effects among the candidate items
[Hansen and Golbeck, 2009], the potential prerequisites and constraints of the
items [Parameswaran and Garcia-Molina, 2009], and the limited budget of the
users [Xie et al., 2010].
However, future research should focus more on co-occurrence interaction ef-
fects, such as complementarity and substitution effects, aiming at developing
effective methods for both accurately estimating the various effects and effi-
ciently recommending such sets of items.

40
4 Business Value Aspects
The influence of a variety of information technologies on firm performance
outcomes such as sales, internal operations, procurement, market share, and
return on assets has been investigated by IS researchers (e.g., [Hitt and Bryn-
jolfsson, 1996], [Zhu and Kraemer, 2005], [Melville et al., 2004]) [Zhang et al.,
2011].
One of the widespread phenomena related to recommender systems and in-
formation technologies, in general, is the long tail [Brynjolfsson et al., 2011].
Two basic explanations have been offered for the Internet’s long tail phenomenon
[Brynjolfsson et al., 2006]. The first explanation focuses on the supply side.
The Internet channel can carry a much larger product selection than traditional
retail channels. Also, by increasing the supply of niche products that are un-
available through traditional channels, Internet commerce may boost the share
of sales generated from these niche products, leading to a long tail [Brynjolf-
sson et al., 2010]. The second explanation centers on the demand side. The
Internet channel’s ability to allow consumers to acquire product information
with greater convenience and at lower costs leads to increased demand for niche
products [Brynjolfsson et al., 2011].
[Fleder et al., 2010], [Hosanagar et al., 2013] studying whether recommender
systems have homogenizing effects or will create fragmentation among the con-
sumers (users), they find, in an empirical study of a music industry recommenda-
tion service, that recommendations are associated with an increase in commonal-
ity among consumers, as defined by similarity in the items consumed/purchased
by them (both at the aggregate and at the disaggregate/cluster level). This in-
crease in purchase similarity occurs for two reasons, which the authors term
volume and taste effects. The volume effect is that consumers simply purchase
more after recommendations, increasing the chance of having purchases in com-
mon with others. The taste effect is that consumers buy a more similar mix of
products after recommendations, conditional on volume. Thus, recommender
systems can drive a significant increase in purchase volume and may further
alter the mix of products users buy. The authors also discuss both policy and
business implications. Similarly, [Fleder and Hosanagar, 2009] studying the im-
pact of recommender systems on sales diversity find that, when recommenders
have both awareness and salience effects, sales diversity generally decreases. On
the other hand, [Brynjolfsson et al., 2011] the Internet channel exhibits a sig-
nificantly less concentrated sales distribution when compared with traditional

41
channels, even when the Internet and traditional channels share exactly the same
product availability and prices, and is associated with an increase the share of
niche products. Similarly, [Oestreicher-Singer and Sundararajan, 2012a] test the
conjecture that peer-based recommendations lead to a redistribution of demand
from popular products to niche products using the revenue distributions of books
in over 200 distinct categories on Amazon.com and detailed daily snapshots of
co-purchase recommendation networks in which the products of these categories
are situated. They find that categories whose products are influenced more
by the recommendation network have significantly flatter demand and revenue
distributions, even after controlling for variation in average category demand,
category size, and price differentials. The authors analyzing and quantifying
the incremental amplification in individual demand that is attributable to the
visibility of product networks also find that newer and more popular products
“use” the attention they garner from their network position more efficiently and
that diversity in the sources of spillover further amplifies the demand effects of
the recommendation network [Oestreicher-Singer and Sundararajan, 2012b].
Furthermore, [Pathak et al., 2010] provide support that recommender sys-
tems help in reinforcing the long-tail phenomenon of electronic commerce. They
also discuss simultaneity among demand, price, and strength of recommenda-
tions and show that providing recommendations allows retailers to charge higher
prices, while at the same time increasing demand by providing more information
regarding the quality and match of products. Besides, they show that obscure
recommended items positively affect retailers’ cross-selling efforts.
Moreover, [Horton, 2012] conducted an experiment in an online labor mar-
ket and studies whether recommending matches facilitates employee search and
screening. The author shows that recommendations improved fill rates by nearly
17% among technical (e.g., computer programming) vacancies but had no ef-
fect on non-technical vacancies. This heterogeneity was likely caused by higher
screening costs and tighter markets for technical vacancies. The results imply
that, despite their smaller size, search costs do impede matching in computer-
mediated markets, but they can be reduced through informational interventions.
The author also maintains that, despite explicit promotions of certain workers
over others, in some cases recommendations can improve marketplace efficiency
without making anyone worse off.
Furthermore, [Ghose et al., 2012] propose to generate a ranking system that
recommends products that provide, on average, the best value for the consumer’s
money. The key idea is that products that provide a higher surplus should be

42
ranked higher on the screen in response to consumer queries. The authors pro-
pose a random coefficient hybrid structural model, taking into consideration the
two sources of consumer heterogeneity the different travel occasions and differ-
ent hotel characteristics introduce. Then, [Ghose et al., 2013] study the effects
of three different kinds of search engine rankings on consumer behavior and
search engine revenues: direct ranking effect, interaction effect between ranking
and product ratings, and personalized ranking effect. The authors show that
a consumer utility-based ranking mechanism can lead to a significant increase
in overall search engine revenue and that significant interplay occurs between
search engine ranking and product ratings. Also, they find that an “active”
(wherein users can interact with and customize the ranking algorithm) person-
alized ranking system can lead to higher clicks but lower purchase propensities
and lower search engine revenue compared to a “passive” (wherein users cannot
interact with the ranking algorithm) personalized ranking system.
In another stream of research, [Wang and Benbasat, 2005], [Komiak and
Benbasat, 2006], theoretically articulate and empirically examine the effects of
perceived personalization and familiarity on cognitive trust and emotional trust
in an recommender system, and the impact of cognitive trust and emotional
trust on the intention to adopt the recommender system either as a decision
aid or as a delegated agent. The results from a laboratory experiment show
that consumers treat online recommendation agents as “social actors” and per-
ceive human characteristics (e.g., benevolence and integrity) in computerized
agents. Furthermore, the results confirm the validity of Trust-Technology Adop-
tion Model (TAM) to explain online recommendation acceptance and reveal the
relative importance of consumers’ initial trust vis-à-vis other antecedents ad-
dressed by TAM (i.e. perceived usefulness and perceived ease of use). Both the
usefulness of the agents as “tools” and consumers’ trust in the agents as “virtual
assistants” are important in consumers’ intentions to adopt online recommenda-
tion agents [Wang and Benbasat, 2005]. Perceived personalization significantly
increases customers’ intention to adopt by increasing cognitive trust and emo-
tional trust. Emotional trust fully mediates the impact of cognitive trust on
the intention to adopt the recommender system as a delegated agent, while it
only partially mediates the impact of cognitive trust on the intention to adopt
the recommender system as a decision aid. Finally, familiarity increases the
intention to adopt through cognitive trust and emotional trust [Komiak and
Benbasat, 2006]. [Xiao and Benbasat, 2007] go beyond generalized models, such
as TAM, and identifies the RS-specific features, such as recommender system

43
input, process, and output design characteristics, that affect users’ evaluations,
including their assessments of the usefulness and ease-of-use of RS applications.
Based on a review of existing literature on e-commerce RSs, this paper develops
a conceptual model with 28 propositions derived from five theoretical perspec-
tives. The propositions help answer the two research questions: (1) How do
recommender system use, recommender system characteristics, and other fac-
tors influence consumer decision making processes and outcomes? (2) How do
recommender system use, recommender system characteristics, and other factors
influence users evaluations of recommender systems? This paper also provides
advice to information systems practitioners concerning the effective design and
development of recommender systems.
Using a different approach, [Gorgoglione et al., 2011] study in real-life set-
tings how contextual recommendations affect the purchasing behavior of cus-
tomers and their trust in the provided recommendations. Conducting live con-
trolled experiments with real customers of a major commercial Italian retailer
they compared the customers’ purchasing behavior across contextual, content-
based and random recommendations and investigated the role of accuracy and
diversity of recommendations.
Similarly, [Zhang et al., 2011] study the impact of recommender systems on
customer store loyalty and find that higher quality recommendations amplify
consumers’ repurchase intention by reducing consumer product screening cost
and improving consumer decision-making quality. Although consumer product
evaluation cost will go up with higher quality recommendations, it does not
affect consumers’ repurchase intention because consumers have complete control
over how many items they evaluate. If they feel that there are too many items
recommended by the online store, they can stop at any time.
Nevertheless, the causal effect [Pearl, 2009], [Stitelman et al., 2011] of popu-
lar systems for personalized recommendations on online purchases and customer
satisfaction should be estimated.
In the following sections, we discuss in detail some of the aforementioned
characteristic approaches.

4.1 The Impact of Recommender Systems on Sales Diver-


sity
As [Fleder and Hosanagar, 2009] discuss in their paper, anecdotes from users
and researchers suggest that recommenders help consumers discover new prod-

44
ucts and, thus, increase diversity, while others believe several recommender de-
signs might reinforce the position of already-popular products and thus reduce
diversity.
The authors focus on collaborative filtering recommender systems because
these systems use historical sales data to generate recommendations and, as a
consequence, positive feedback cycles could emerge and lower diversity. Besides,
they use the Gini coefficient [Gini, 1909] and they define recommender bias
as a concentration bias, diversity bias, or no bias depending on the following
conditions: 


 Concentration bias Gi > G0 ,

Diversity bias Gi < G0 ,



No bias Gi = G0 ,
where G0 is the Gini coefficient of the firm’s sales during a fixed time period in
which a recommender system was not used and Gi the Gini coefficient of the
firm’s sales in which ri recommender systems was employed with all else equal.
After presenting a theoretical model, under the assumption that each con-
sumer buys one and only one of the two available products, the authors turn to
a simulation that combines a choice model with actual recommender systems
where repeat purchases are permitted.
An overview of the process provided by the authors is as follows. There are I
consumers and J products positioned in an attribute space. Consumers are not
aware of all products. Each consumer knows most of the center products and a
small number of products in his own neighborhood. Every period, a consumer
either purchases one of the products or makes no purchase at all. To model this
choice, a multinomial logit is used for J products plus an outside good. Just
before choosing a product, a recommendation is generated. The recommender
has two effects. First, the consumer becomes aware of the recommended product
if he was not already. This increase in awareness is permanent. Second, the
salience of the recommended product is increased temporarily, raising the chance
that the recommended product is purchased in that purchase instance. The next
consumer makes a purchase in a similar manner, and the process repeats after
all consumers have purchased. After a predetermined number of iterations, the
Gini is computed. The Gini is then compared to a benchmark G0 , the Gini
from an equivalent period in which recommendations were not offered.
In detail, they use 50 consumers and 50 products that are points in a two-
dimensional space (standard bivariate normal distribution, N2 (0, I)). The em-

45
ployed recommender systems use implicit data (amount of sales) and are based
on the classical user-based collaborative filtering approach. Recommender r1
is the most basic collaborative filter: For a given user, it first finds the set
N ∗ of the n most similar customers by using cosine similarity to compare vec-
tors of purchase counts and, then, recommends the most popular item among
this group (no weighting or other correction was used). Recommender r2 has
one difference. When selecting the most popular product among similar users,
candidate items are first discounted by their overall popularity in the entire
population.
Besides, the authors assume that each consumer is aware of a subset of the
J products, and only items in this awareness set can be purchased. Once an
item is recommended to a consumer, he is always aware of it in future periods.
At the start, consumers are aware of many of the central products on the map
plus a few items in their own neighborhood. These initial awareness states for
each consumer-product pair are sampled according to
2 2
P (ci aware of pj ) = λe−distance0j /θ + (1 − λ)e−distanceij /(κθ) ,

where distance0j and distanceij are, respectively, the Euclidean distances from
the origin to product pj and from consumer ci to product pj . The higher is λ,
the more users are aware of central, mainstream products (left term), and the
higher is 1 − λ, the more users are aware of products in their neighborhood. θ
and κθ determine how fast awareness decays with distance.
Furthermore, the authors model purchases using the multinomial logit. Con-
sumer ci ’s utility for product pj at time t is defined as uijt := vijt + ijt , where
vijt is a deterministic component and ijt is an independent and identically
distributed random variable with extreme value distribution. Under these as-
sumptions,
evijt
P (ci buys pj at t|ci aware of pj at t) = PJ .
vikt
k=1 e

The unconditional probability is defined as

P (ci buys pj at t) = P (ci buys pj at t|ci aware of pj at t)P (ci aware of pj at t).

If a consumer is unaware of a product, the rightmost term is zero, and he cannot


buy it. The deterministic component is then defined as

vijt := similarityij = −k log distanceij ,

46
where the parameter k determines the consumer’s sensitivity to distance on the
map. The higher the k is, the more the consumer prefers the closest products.
Here, k = 10 and, thus, the Gini coefficient equals 0.72. Finally, for each person,
the outside good is closer than roughly 90% of the other goods (a distance of
0.75).
Last but not least, the term δ is the amount by which a recommended prod-
uct’s salience is temporarily increased in the consumer’s choice set. The impact
of the salience boost is that the purchase probability for the recommended item
j is the same as that for an item j 0 with vij 0 = vij + δ.
For r1 , in which popularity determines what product is recommended, a self-
reinforcing cycle is created: popular items are recommended more, items rec-
ommended more are purchased more, purchased items are recommended more,
and so on. Although r2 dampens the popularity bias, the result also originates
from using only sales data to make recommendations. Products with limited
historical sales have little or no chance of being recommended even if they would
be favorably received by the consumer. The authors also show that based on
the simulations, when recommenders have both awareness and salience effects,
diversity generally decreases. When recommenders affect only awareness, diver-
sity decreases slightly for r1 and increases slightly for r2 .
The results also show that the average number of unique items aware of per
person increases; both recommender systems inform consumers of new prod-
ucts. They also verify that individual diversity can increase, whereas aggregate
diversity decreases. Consumers are discovering new products, but they are dis-
covering the same products others have bought. Thus, the consumers become
more similar since their purchases come from a smaller, more popular set.
Then the authors approach sensitivity analysis in four parts: additional rec-
ommenders; best-seller lists in the base case; variety seeking in the utility spec-
ification; and alternate parameter values. The additional recommender systems
are as follows: r3 is another popularity-discounting variation according to which
discounting takes place in the user similarity calculation, but not the product
selection calculation; System r4 is a combination of r2 and r3 : discounting is
performed in both the user similarity calculation and arg max; System r5 rec-
ommends the lowest-sales product; System r6 recommends the median-selling
product; System r7 recommends the best-selling product; r8 is a best-seller list,
which recommends the top-five selling items.
It is worth mentioning here that without recommenders, consumers might
obtain product suggestions from best-seller lists. In such a settings, the results

47
discussed so far might not hold. In particular, relative to an “older” world of
best-seller lists, recommenders may reduce concentration, by virtue of cutting
out the even more popularity-biased tool. However, relative to a world without
such lists, recommenders may increase concentration.
In addition, the authors incorporate variety seeking in their model; the ex-
tent to which prior purchases of a product affect its future purchase propensity.
Positive dependence is termed inertia, whereas negative dependence is termed
variety seeking. To incorporate variety and inertia in the specification, they
define
vijt := −k log distanceij + βXijt

Xijt := αXijt−1 + (1 − α)I(ci bought pj on t − 1),

where Xijt is an exponential smooth of purchase indicators I(), and thus it


summarizes how often and recently ci has bought pj . The parameter α ∈ (0, 1)
determines how much weight is placed on recent versus distant purchase occa-
sions. β determines the effect strength, with β < 0 for variety seeking, and β > 0
for inertia. The variety-seeking results have an interesting interpretation. If con-
sumers turn to recommendations only in their most variety-seeking moments,
diversity increases under r2 . However, as recommenders become ubiquitous,
consumers are affected by them all the time (i.e., not only when β << 0)-e.g.,
as with sites that users visit regularly, such as personalized news, personalized
radio, and personalized retail. In these instances, diversity decreases even under
r2 .
This paper, apart from demonstrating that some designs may be associ-
ated with greater concentration bias than others, finds that recommenders can
increase sales, and recommenders that discount popularity appropriately may
increase sales more. For consumers, they showed that the awareness effects
of recommenders can inform consumers of better (closer) products. However,
if recommendations are highly salient, popularity-influenced recommendations
may displace what would otherwise be better product matches. Future empirical
work would be valuable for determining the relative strength of the awareness
versus salience effects.

48
4.2 The Impact of Recommender Systems on Customer
Store Loyalty
[Zhang et al., 2011] draw upon the household production function model in
the consumer economics literature to develop a theoretical framework that ex-
plains the mechanisms through which recommender systems influence customer
store loyalty in electronic markets in order to answer the broader question of
whether and how recommender systems (personalized product recommenda-
tions) generate value for online retailers. To theorize about the effects of online
retailer learning, manifest in the form of higher quality recommendations, on
consumer store loyalty, the authors also test their predictions using an experi-
mental design where they manipulate the quality of a retailers’ learning, thereby
inducing variation in the quality of recommendations offered.
The motivation for the research is that in the early days of e-commerce,
it was believed that building customer store loyalty was more difficult online
than offline. Even though consumer information search and switching costs are
relatively small online compared to offline [Bakos, 1997], early evidence indicated
that, across many product categories, a few online firms dominate the market
and command price premiums [Brynjolfsson and Smith, 2000]. Hence, these
findings raise the fundamental question of why online retailers are able to retain
their customers.

Figure 1: Conceptual Model

Figure 1 represents the conceptualization of the mechanism through which


personalized services affect consumer store loyalty. According to this model,
personalization is a reflection of retailer learning that, in conjunction with con-

49
Figure 2: Research Model

50
sumer learning, enhances the efficiency of the shopping activity. Following from
the consumer economics literature, efficiency maximizing consumers will display
a propensity to engage in activities that are more efficient (i.e., they will exhibit
store loyalty). To examine in depth the role of recommender systems, the au-
thors propose the research model in Figure 2. The mechanism linking customer
and retailer learning to online store loyalty is online product brokering effi-
ciency, reflecting the efficiencies created for the consumer. The model predicts
that higher quality recommender systems, an instantiation of retailer learning,
in conjunction with consumer learning, increase consumer online product bro-
kering efficiency, which in turn improves customer store loyalty, operationalized
as repurchase intention. Past research has conceptualized the consumer’s shop-
ping process as comprised of six stages: (1) need identification; (2) product
brokering; (3) merchant brokering; (4) negotiation; (5) purchase and delivery;
and (6) post-sales service. The authors focus on consumer product brokering
efficiency because product brokering is the stage in which consumers engage in
information search and processing to decide which product to purchase to meet
their specific needs, and is therefore the shopping stage where recommendations
are likely to have the greatest impact. In addition, they introduce a variety of
controls in the model to account for factors that may influence the mediating
and dependent variables (but are not of theoretical interest in this study).
In particular, the central construct in the research model, consumer online
product brokering efficiency, is based on the household production model and
defined as the costs and the value of the online shopping activity for the con-
sumer. The authors assess consumer product brokering cost using two compo-
nents: product screening cost and product evaluation cost. Besides, they define
the value obtained by consumers during the product brokering process as the
quality of the purchase decision (i.e., the extent to which the items consumers
have decided to purchase fit their needs or taste). Also, they theorize that both
retailer learning and consumer learning affect components of consumers’ online
product brokering efficiency.
To experimentally test the hypotheses depicted on Figure 2, the authors use
two groups of subjects that completed an online shopping task on Amazon.com.
The experimenter provided different levels of information to Amazon’s recom-
mender system for each group, thereby assuring that each group received dif-
ferent quality recommendations. In detail, they first collected subjects’ product
ratings; next, they created a fictitious account for each subject at Amazon.com
and entered different numbers of product ratings to each account: 5 product

51
ratings (low input) or 15 product ratings (high input). During the online shop-
ping task, users’ interaction with the website (clickstream) was automatically
captured. After task completion, the authors measured the perceived quality of
recommendations and various other research constructs through a survey.
The experimental findings provide support for the proposed model and ex-
plain significant variance in the dependent and mediating variables. By search-
ing the whole database and screening all of the products on behalf of consumers,
higher quality recommendations are associated with lower consumers’ product
screening cost and higher consumer decision-making quality, which in turn, is
positively associated with consumer repurchase intention. With the extra re-
sources saved from product screening, consumers are able and willing to form a
larger consideration set and give in-depth evaluation of more items and, thus,
higher quality recommendations result in higher consumer product evaluation
cost. However, contrary to the predictions of the proposed research model, the
experimental findings show that higher consumer product evaluation cost does
not significantly affect consumer repurchase intention. There are two plausible
explanations for this finding. First, the reason that higher quality recommenda-
tions increase consumer product evaluation cost is because consumers receiving
higher quality recommendations are willing to inspect more items and form a
larger consideration set. Moreover, at the price of higher product evaluation
cost, consumers are able to reach a higher quality purchase decision or obtain
more value from the product brokering process. It seems that consumers are
able to distinguish the two types of cost incurred during online shopping, cost
under the control of the online store (i.e., product screening cost) and the cost
under their own control (i.e., product evaluation cost). The latter type of cost
does not affect consumers’ attitudes toward the online store and repurchase
intentions.
In general, the authors maintain that higher quality recommendations am-
plify consumers’ repurchase intention by reducing consumer product screen-
ing cost and improving consumer decision-making quality. Although consumer
product evaluation cost will go up with higher quality recommendations, it
does not affect consumers’ repurchase intention because consumers have com-
plete control over how many items they evaluate. If they feel that there are too
many items recommended by the online store, they can stop at any time.

52
4.3 The Impact of Recommendation Networks on Product
Demand
[Oestreicher-Singer and Sundararajan, 2012b] use data about the co-purchase
networks and demand levels associated with more than 250, 000 interconnected
books offered on Amazon.com over the period of one year and study the con-
jecture that the explicit visibility of such “product networks” can alter demand
spillovers across their constituent items. They present new evidence quantify-
ing the role of network position in electronic markets and highlight the power
of basing (virtual) shelf position on consumer preferences that are explicitly
revealed through shared purchasing patterns.
In particular, every product on an electronic commerce site has a network
position, one that is determined by the products and other pages it links to
and by those that link to it. However, the ensuing direction and extent of the
influence of a co-purchase network is not immediately clear. For example, the
level of attention paid to popular products may increase because such products
are bought more frequently and thus are more likely to show up more often in a
co-purchase network. In contrast, such networks might redirect demand toward
niche products by making consumers aware of items that were previously not
so frequently visible to them. Network visibility might influence demand more
intensively for newer products that consumers are less likely to have seen in
the past. Alternatively, it might have a greater impact on familiar products,
ones that a consumer is more comfortable purchasing if offered unexpectedly.
Less expensive products might be influenced more, especially if the influence
originates at a more expensive product. This influence might diminish or grow
over time [Oestreicher-Singer and Sundararajan, 2012b].
Conceptually, the co-purchase network is a directed graph in which nodes
correspond to products and edges correspond to directed co-purchase links. In
order to alleviate demand endogeneity issues, the authors also create three dif-
ferent “complementary sets” for each product, which they use as control groups
when estimating the influence of the visible set of network neighbors. First, the
authors construct a complementary set based on observed future hyperlinks on
Amazon.com. That is, for each product, they construct a complementary set
based on “links from the future” –co-purchase hyperlinks that are not necessar-
ily visible today but that will be visible in the near future. Such products that
are not linked today but that will be linked in the days to follow are assumed
to be “as complementary” to the focal product as the items currently present

53
on its web page (as evidenced by the link that is eventually formed). A second
complementary set is constructed using data about the product networks on
the Barnes & Noble (B&N) website. The B&N website features a co-purchase
network similar to the one presented on Amazon. However, products linked
on B&N do not necessarily appear on Amazon.com and hence are invisible to
Amazon.com consumers. Finally, the authors construct a third complementary
product set based on a weighted sum of the demand levels of all products in the
data set, calculated as follows: For each pair of products in the sample, they
estimate the probability of a link between the two products. Then, they weight
each product’s demand according to this propensity of being linked to the focal
product and sum those weights to be the complementary “set” of that focal
product.
The identification strategy is based on the idea that the set of visible links
is a subset of each of the complementary sets. This enables the identification
if the influence of a visible hyperlink on demand correlation, after accounting
for unobserved complementarity. In particular, the authors measure the visible
hyperlink effects, the complementary product effects, the effects of character-
istics of the network neighbors and of the complementarity products set, and
the effects of the book’s own characteristics on its demand. The authors also
control for correlated effects by using fixed effects.
The presented empirical findings suggest that the visibility of the product
network can result in up to a threefold average increase in the influence that
complementary products have on one another’s’ demand. The authors also an-
alyze how the magnitude of influence varies across products of different vintage,
products of different categories, and products of varying popularity. A number
of interesting implications emerge from these sensitivity analyses, including,
among other things, that newer and more popular products “use” the attention
they garner from their network position more efficiently and that links from
more diverse sets of sources increase the effect of the network on sales.

4.4 The Impact of Ranking on Consumer Behavior and


Search Engine Revenue
[Ghose et al., 2012] propose to generate a ranking system that recommends
products that provide, on average, the best value for the consumer’s money
(higher surplus) while [Ghose et al., 2013] study the effects of three different
kinds of search engine rankings on consumer behavior and search engine rev-

54
enues: direct ranking effect, interaction effect between ranking and product
ratings, and personalized ranking effect.
As the authors discuss, in product search engines, the ranking of the dis-
played products is often based on criteria such as price, product rating, etc. In
such a setting, consumers have to observe multiple, competing ranking signals
and come up with their own ranking in their minds. Also, the demand can be
influenced by the joint variation in product ratings (either professional rating
or user rating) and online screen position.
In order to study the position effect in product search engines, conditional
on its interaction with product ratings, the authors examine the variation in the
ratings of different hotels (both hotel “class” rating and customer rating) at the
same rank on the travel search engine over time. Controlling for room prices,
such variation allows to model the interaction effect of hotel class and customer
ratings with rank, and to measure its effect on demand. Then, so as to study
the effect of different ranking mechanisms on product search engine revenue,
the authors examine how different ranking mechanisms affect the search engine
revenue by conducting a set of policy experiments. They consider six different
ranking designs: utility-based, conversion rate (CR)-based, click-through rate
(CTR)-based, price-based, customer rating-based and Travelocity default algo-
rithms. Besides, in order to answer if allowing users to interact with the ranking
algorithm to pro-actively personalize their search results lead to more or fewer
purchases, they conduct a set of experiments on Amazon Mechanical Turk and
compare the two settings (i.e. interaction and no interaction).
For the empirical model estimation, the authors propose to build a simul-
taneous equations model of click-through, conversion, and rank. In particular,
they model the click-through and conversion behavior as a function of hotel
brand, price, rank, page, sorting criteria, and hotel characteristics, while the
rank of a hotel is modeled as a function of hotel brand, price, sorting criteria,
hotel characteristics, and performance metrics such as previous conversion rate.
To estimate our model, they applied the MCMC methods using a Metropolis-
Hastings algorithm with a random walk chain [Chib and Greenberg, 1995].
The presented experimental results on ranking are consistent with those from
the Bayesian model-based archival data analysis, suggesting a significant and
causal effect of search engine ranking on consumer click and purchase behavior.
In addition to a significant surplus gain found by a previous study [Ghose et al.,
2012], a consumer-utility-based ranking mechanism yields the highest purchase
propensity and the highest search engine overall revenue compared to existing

55
benchmark systems, such as ranking based on price or star ratings. Moreover,
an inferior screen position tends to more adversely affect luxury hotels and more
expensive hotels. Hotels with lower reputations are benefiting more from being
placed at the top of the search results.
Moreover, the presented experimental results on personalized ranking show
the availability of excess personalization capabilities during the decision-making
process may discourage consumers from searching, evaluating, and making final
choices. In particular, the authors find that although active personalized rank-
ing, compared to passive personalized ranking, can attract more online attention
from consumers, it leads to a lower purchase propensity and lower search en-
gine revenue. This finding suggests personalized ranking should not be adopted
blindly and the level of personalization should be carefully designed based on the
search context. Nevertheless, a good ranking mechanism can reduce consumers’
search costs, improve click-through rates and conversion rates of products, and
improve revenue for search engines.

56
5 Conclusions
This paper presents an overview of the field of recommender systems. In
particular, it discusses the current generation of recommendation methods fo-
cusing on collaborative filtering algorithms. Then, we move beyond the classical
perspective of rating prediction accuracy in recommender systems and present a
survey of approaches that enhance unexpectedness and the related but different
concepts of novelty, serendipity, and diversity. Besides, we provide interesting
directions for future research. This paper also discusses recent business value
perspectives on recommender systems focusing on the phenomenon of the long
tail, the impact of recommendations on sales, diversity, customer retention, and
generated revenue.
In detail, in Section 2, we first discuss the main paradigm in recommender
systems that, based on the retrieval and the rating prediction perspectives, tries
to reduce search costs by accurately predicting how much a user would like an
item and providing “correct” recommendation proposals by recommending the
items with the highest predicted ratings. Then, we focus on the most popular
approaches in recommender systems, collaborative filtering, and present in Sec-
tion 2.1 a brief survey of the state-of-the-art algorithms. This popularity of the
collaborative filtering subfield of recommender systems has different reasons,
most importantly the fact that real-world benchmark problems are available
and that the data to be analyzed for generating recommendations have a very
simple structure: a matrix of item ratings. Besides, we thoroughly discuss in
Section 2.2 the most characteristic approaches found in the literature.
Moreover, despite all of the advancements, the current generation of recom-
mender systems still requires further improvements to make recommendation
methods more effective in a broader range of applications [Adomavicius and
Tuzhilin, 2005]. Even though the rating prediction perspective is the prevail-
ing paradigm in recommender systems, there are other perspectives that have
been gaining significant attention in this field [Jannach et al., 2010] and try
to alleviate the problems pertaining to the narrow rating prediction accuracy-
based focus. Some of the most recent recommender system perspectives main-
tain that recommender systems should provide personalized recommendations
from a wide range of items and they should also enable the users to find rele-
vant items that might be hard to discover. In addition, recommender systems
should increase user satisfaction and engagement and offer a superior user expe-
rience while they reduce search costs and improve the quality of decisions that

57
consumers make. Besides, from a business perspective, recommender systems
should increase the number of sales and conversion rates as well as promote
items from the long tail that usually exhibit significantly lower marginal cost
and, at the same time, higher marginal profit while they should also make the
users familiar with the various product categories and the whole product cat-
alog. Thus, working toward this direction and moving beyond the classical
perspective of the rating prediction accuracy, in Section 3, we present a survey
of approaches that enhance unexpectedness (Section 3.5) and the related but
different concepts of novelty (Section 3.1), serendipity (Section 3.2), and diver-
sity (Section 3.3). Besides, in Sections 3.5.2, 3.6, 3.7, we provide interesting
directions for future research.
Furthermore, since the generated recommendations should be useful for both
end-users and business, in Section 4 we discuss various business value perspec-
tives in recommender systems and similar Information Systems technologies. In
particular, in Section 4, after presenting a brief survey of the related literature,
we focus on specific aspects of the business value perspective and thoroughly
discuss the phenomenon of the long tail and the impact of recommendations on
sales in Section 4.3, sales diversity in Section 4.1, customer store retention and
loyalty in Section 4.4, and generated revenue in Section 4.4.
As part of the future work, we would like to design an empirical study in
order to examine the economic impact of the proposed recommender system
approaches and perspectives in Section 3 across various domains and recom-
mendation settings including a traditional on-line retail setting and a platform
for massive open on-line courses [Adamopoulos, 2013b]. In particular, we would
like to study and estimate the economic impact of both the direct effect of offer-
ing unexpected recommendations from a wider range of items that are harder
to discover and the indirect effect of recommending items from the long tail and
not focusing mostly on bestsellers that usually exhibit higher marginal cost.
Such a research study can shed more light on the usefulness of recommender
systems for businesses and further promote the use of non-classical perspectives
and approaches that go beyond the traditional paradigm of rating prediction
accuracy. Nevertheless, the causal effect [Pearl, 2009], [Stitelman et al., 2011]
of recommender systems on online purchases and customer satisfaction should
also be estimated.

58
References
[Adamopoulos, 2013a] Adamopoulos, P. (2013a). Beyond Rating Prediction Ac-
curacy: On New Perspectives in Recommender Systems. In Proceedings of the
seventh ACM conference on Recommender systems, RecSys ’13, New York,
NY, USA. ACM.

[Adamopoulos, 2013b] Adamopoulos, P. (2013b). What Makes a Great MOOC?


An Interdisciplinary Analysis of Student Retention in Online Courses. In
Proceedings of the 34th International Conference on Information Systems,
ICIS ’13.

[Adamopoulos and Tuzhilin, 2011] Adamopoulos, P. and Tuzhilin, A. (2011).


On Unexpectedness in Recommender Systems: Or How to Expect the Un-
expected. In DiveRS 2011 - ACM RecSys 2011 Workshop on Novelty and
Diversity in Recommender Systems, RecSys 2011. ACM.

[Adamopoulos and Tuzhilin, 2013a] Adamopoulos, P. and Tuzhilin, A. (2013a).


On Unexpectedness in Recommender Systems: Or How to Better Ex-
pect the Unexpected. Working Paper: CBA-13-01, New York University.
http://ssrn.com/abstract=2282999.

[Adamopoulos and Tuzhilin, 2013b] Adamopoulos, P. and Tuzhilin, A.


(2013b). Probabilistic Neighborhood Selection in Collaborative Fil-
tering Systems. Working Paper: CBA-13-04, New York University.
http://hdl.handle.net/2451/31988.

[Adamopoulos and Tuzhilin, 2013c] Adamopoulos, P. and Tuzhilin, A. (2013c).


Recommendation Opportunities: Improving Item Prediction Using Weighted
Percentile Methods in Collaborative Filtering Systems. In Proceedings of the
seventh ACM conference on Recommender systems, RecSys ’13, New York,
NY, USA. ACM.

[Adomavicius and Kwon, 2009] Adomavicius, G. and Kwon, Y. (2009). Toward


more diverse recommendations: Item re-ranking methods for recommender
dystems. In Proceedings of the 19th Workshop on Information Technology
and Systems (WITS’09).

[Adomavicius and Kwon, 2012] Adomavicius, G. and Kwon, Y. (2012). Im-


proving aggregate recommendation diversity using ranking-based techniques.
Knowledge and Data Engineering, IEEE Transactions on, 24(5):896 –911.

59
[Adomavicius and Tuzhilin, 2005] Adomavicius, G. and Tuzhilin, A. (2005). To-
ward the next generation of recommender systems: A survey of the state-of-
the-art and possible extensions. IEEE Trans. on Knowl. and Data Eng.,
17(6):734–749.

[Akiyama et al., 2010] Akiyama, T., Obara, K., and Tanizaki, M. (2010). Pro-
posal and evaluation of serendipitous recommendation method using general
unexpectedness. In Proceedings of the ACM RecSys Workshop on Practical
Use of Recommender Systems, Algorithms and Technologies (PRSAT 2010),
RecSys 2010, New York, NY, USA. ACM.

[André et al., 2009] André, P., Teevan, J., and Dumais, S. T. (2009). From
x-rays to silly putty via uranus: Serendipity and its role in web search. In
Proceedings of the 27th international conference on Human factors in com-
puting systems, CHI ’09, pages 2033–2036, New York, NY, USA. ACM.

[Ansari et al., 2000] Ansari, A., Essegaier, S., and Kohli, R. (2000). Internet
recommendation systems. Journal of Marketing research, 37(3):363–375.

[Bakos, 1997] Bakos, J. Y. (1997). Reducing buyer search costs: implications


for electronic marketplaces. Manage. Sci., 43(12):1676–1692.

[Balabanović and Shoham, 1997] Balabanović, M. and Shoham, Y. (1997). Fab:


content-based, collaborative recommendation. Communications of the ACM,
40(3):66–72.

[Bell et al., 2007] Bell, R., Koren, Y., and Volinsky, C. (2007). Modeling rela-
tionships at multiple scales to improve accuracy of large recommender sys-
tems. In Proceedings of the 13th ACM SIGKDD international conference on
Knowledge discovery and data mining, pages 95–104. ACM.

[Bell and Koren, 2007] Bell, R. M. and Koren, Y. (2007). Scalable collaborative
filtering with jointly derived neighborhood interpolation weights. In Data
Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, pages
43–52. IEEE.

[Berger and Tuzhilin, 1998] Berger, G. and Tuzhilin, A. (1998). Discovering un-
expected patterns in temporal data using temporal logic. Temporal Databases:
research and practice, pages 281–309.

60
[Billsus and Pazzani, 1998] Billsus, D. and Pazzani, M. J. (1998). Learning
collaborative information filters. In Proceedings of the fifteenth international
conference on machine learning, volume 54, page 48.

[Billsus and Pazzani, 2000] Billsus, D. and Pazzani, M. J. (2000). User model-
ing for adaptive news access. User Modeling and User-Adapted Interaction,
10(2-3):147–180.

[Bottou, 2010] Bottou, L. (2010). Large-scale machine learning with stochas-


tic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186.
Springer.

[Breese et al., 1998] Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empir-
ical analysis of predictive algorithms for collaborative filtering. In Proceedings
of the Fourteenth conference on Uncertainty in artificial intelligence, pages
43–52. Morgan Kaufmann Publishers Inc.

[Brynjolfsson et al., 2011] Brynjolfsson, E., Hu, Y. J., and Simester, D. (2011).
Goodbye pareto principle, hello long tail: The effect of search costs on the
concentration of product sales. Manage. Sci., 57(8):1373–1386.

[Brynjolfsson et al., 2010] Brynjolfsson, E., Hu, Y. J., and Smith, M. (2010).
The longer tail: The changing shape of amazons sales distribution curve.

[Brynjolfsson et al., 2003] Brynjolfsson, E., Hu, Y. J., and Smith, M. D. (2003).
Consumer surplus in the digital economy: Estimating the value of increased
product variety at online booksellers. Manage. Sci., 49(11):1580–1596.

[Brynjolfsson et al., 2006] Brynjolfsson, E., Hu, Y. J., and Smith, M. D. (2006).
From niches to riches: The anatomy of the long tail.

[Brynjolfsson and Smith, 2000] Brynjolfsson, E. and Smith, M. D. (2000). Fric-


tionless commerce? a comparison of internet and conventional retailers. Man-
agement Science, 46(4):563–585.

[Castells et al., 2011] Castells, P., Vargas, S., and Wang, J. (2011). Novelty and
diversity metrics for recommender dystems: Choice, discovery and relevance.
In International Workshop on Diversity in Document Retrieval (DDR 2011)
at the 33rd European Conference on Information Retrieval (ECIR 2011).

61
[Celma and Herrera, 2008] Celma, O. and Herrera, P. (2008). A new approach
to evaluating novel recommendations. In Proceedings of the 2008 ACM con-
ference on Recommender systems, RecSys ’08, pages 179–186, New York, NY,
USA. ACM.

[Chib and Greenberg, 1995] Chib, S. and Greenberg, E. (1995). Understanding


the metropolis-hastings algorithm. The American Statistician, 49(4):327–335.

[Chien and George, 1999] Chien, Y.-H. and George, E. I. (1999). A bayesian
model for collaborative filtering. In Proceedings of the 7th International Work-
shop on Artificial Intelligence and Statistics. San Francisco: Morgan Kaufman
Publishers,[http://uncertainty99. microsoft. com/proceedings. htm].

[Delgado and Ishii, 1999] Delgado, J. and Ishii, N. (1999). Memory-based


weighted majority prediction. In ACM SIGIR’99 workshop on recommender
systems. Citeseer.

[Deshpande and Karypis, 2004] Deshpande, M. and Karypis, G. (2004). Item-


based top-n recommendation algorithms. ACM Transactions on Information
Systems (TOIS), 22(1):143–177.

[Desrosiers and Karypis, 2011] Desrosiers, C. and Karypis, G. (2011). A com-


prehensive survey of neighborhood-based recommendation methods. In Rec-
ommender systems handbook, pages 107–144. Springer.

[Fleder and Hosanagar, 2009] Fleder, D. and Hosanagar, K. (2009). Block-


buster culture’s next rise or fall: The impact of recommender systems on
sales diversity. Management Science, 55(5):697 – 712.

[Fleder et al., 2010] Fleder, D., Hosanagar, K., and Buja, A. (2010). Recom-
mender systems and their effects on consumers: the fragmentation debate.
In Proceedings of the 11th ACM conference on Electronic commerce, EC ’10,
pages 229–230, New York, NY, USA. ACM.

[Funk, 2006] Funk, S. (2006). Netflix update: Try this at home.

[Ge et al., 2010] Ge, M., Delgado-Battenfeld, C., and Jannach, D. (2010). Be-
yond accuracy: evaluating recommender systems by coverage and serendipity.
In Proceedings of the fourth ACM conference on Recommender systems, Rec-
Sys ’10, pages 257–260, New York, NY, USA. ACM.

62
[Ge et al., 2012] Ge, M., Jannach, D., Gedikli, F., and Hepp, M. (2012). Effects
of the placement of diverse items in recommendation lists. In Proceedings
of 14th International Conference on Enterprise Information Systems (ICEIS
2012).

[Gelman et al., 2004] Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B.
(2004). Bayesian data analysis. Chapman & Hall/CRC.

[Gemulla et al., 2011] Gemulla, R., Nijkamp, E., Haas, P. J., and Sismanis, Y.
(2011). Large-scale matrix factorization with distributed stochastic gradient
descent. In Proceedings of the 17th ACM SIGKDD international conference
on Knowledge discovery and data mining, pages 69–77. ACM.

[Getoor and Sahami, 1999] Getoor, L. and Sahami, M. (1999). Using proba-
bilistic relational models for collaborative filtering. In Proc. Workshop Web
Usage Analysis and User Profiling (WEBKDD99). Citeseer.

[Ghose et al., 2012] Ghose, A., Ipeirotis, P., and Li, B. (2012). Designing rank-
ing systems for hotels on travel search engines by mining user-generated and
crowd-sourced content. Marketing Science.

[Ghose et al., 2013] Ghose, A., Ipeirotis, P., and Li, B. (2013). Examining the
impact of ranking on consumer behavior and search engine revenue. Manage-
ment Science, Forthcoming.

[Gini, 1909] Gini, C. (1909). Concentration and dependency ratios (in Italian).
English translation in Rivista di Politica Economica, 87:769–789.

[Goldberg et al., 1992] Goldberg, D., Nichols, D., Oki, B. M., and Terry, D.
(1992). Using collaborative filtering to weave an information tapestry. Com-
munications of the ACM, 35(12):61–70.

[Goldberg et al., 2001] Goldberg, K., Roeder, T., Gupta, D., and Perkins, C.
(2001). Eigentaste: A constant time collaborative filtering algorithm. Infor-
mation Retrieval, 4(2):133–151.

[Goldstein and Goldstein, 2006] Goldstein, D. and Goldstein, D. (2006). Prof-


iting from the long tail. Harvard Business Review, 84(6):24–28.

[Golub and Kahan, 1965] Golub, G. and Kahan, W. (1965). Calculating the
singular values and pseudo-inverse of a matrix. Journal of the Society for

63
Industrial & Applied Mathematics, Series B: Numerical Analysis, 2(2):205–
224.

[Gorgoglione et al., 2011] Gorgoglione, M., Panniello, U., and Tuzhilin, A.


(2011). The effect of context-aware recommendations on customer purchasing
behavior and trust. In Proceedings of the fifth ACM conference on Recom-
mender systems, RecSys ’11, pages 85–92, New York, NY, USA. ACM.

[Hansen and Golbeck, 2009] Hansen, D. L. and Golbeck, J. (2009). Mixing it


up: recommending collections of items. In Proceedings of CHI ’09, pages
1217–1226. ACM.

[Herlocker et al., 2002] Herlocker, J., Konstan, J. A., and Riedl, J. (2002). An
empirical analysis of design choices in neighborhood-based collaborative fil-
tering algorithms. Information retrieval, 5(4):287–310.

[Herlocker et al., 1999] Herlocker, J. L., Konstan, J. A., Borchers, A., and
Riedl, J. (1999). An algorithmic framework for performing collaborative filter-
ing. In Proceedings of the 22nd annual international ACM SIGIR conference
on Research and development in information retrieval, pages 230–237. ACM.

[Hijikata et al., 2009] Hijikata, Y., Shimizu, T., and Nishida, S. (2009).
Discovery-oriented collaborative filtering for improving user satisfaction. In
Proceedings of the 14th international conference on Intelligent user interfaces,
IUI ’09, pages 67–76, New York, NY, USA. ACM.

[Hitt and Brynjolfsson, 1996] Hitt, L. M. and Brynjolfsson, E. (1996). Produc-


tivity, business profitability, and consumer surplus: Three different measures
of. MIS quarterly, 20(2):121–143.

[Hofmann, 2003] Hofmann, T. (2003). Collaborative filtering via gaussian prob-


abilistic latent semantic analysis. In Proceedings of the 26th annual interna-
tional ACM SIGIR conference on Research and development in informaion
retrieval, pages 259–266. ACM.

[Horton, 2012] Horton, J. J. (2012). Computer-mediated Matchmaking: Fa-


cilitating Employer Search and Screening. Working Paper: oDesk Research
Harvard University.

[Hosanagar et al., 2013] Hosanagar, K., Fleder, D., Lee, D., and Buja, A.
(2013). Recommender systems and their effects on consumers: The frag-
mentation debate. Management Science.

64
[Hu and Pu, 2011] Hu, R. and Pu, P. (2011). Helping users perceive recom-
mendation diversity. Workshop on Novelty and Diversity in Recommender
Systems (DiveRS 2011), page 43.

[Hu et al., 2008] Hu, Y., Koren, Y., and Volinsky, C. (2008). Collaborative
filtering for implicit feedback datasets. In Data Mining, 2008. ICDM’08.
Eighth IEEE International Conference on, pages 263–272. IEEE.

[Hurley and Zhang, 2011] Hurley, N. and Zhang, M. (2011). Novelty and diver-
sity in top-n recommendation–analysis and evaluation. ACM Transactions
on Internet Technology (TOIT), 10(4):14.

[Iaquinta et al., 2008] Iaquinta, L., Gemmis, M. d., Lops, P., Semeraro, G., Fi-
lannino, M., and Molino, P. (2008). Introducing serendipity in a content-based
recommender system. In Proceedings of the 2008 8th International Confer-
ence on Hybrid Intelligent Systems, HIS ’08, pages 168–173, Washington, DC,
USA. IEEE Computer Society.

[Jannach et al., 2010] Jannach, D., Zanker, M., Felfernig, A., and Friedrich, G.
(2010). Recommender systems: an introduction. Cambridge University Press.

[Jin et al., 2004] Jin, R., Chai, J. Y., and Si, L. (2004). An automatic weighting
scheme for collaborative filtering. In Proceedings of the 27th annual interna-
tional ACM SIGIR conference on Research and development in information
retrieval, pages 337–344. ACM.

[Karatzoglou and Weimer, 2010] Karatzoglou, A. and Weimer, M. (2010).


Quantile matrix factorization for collaborative filtering. In Proceedings of
EC-Web 2010, pages 253–264.

[Kawamae, 2010] Kawamae, N. (2010). Serendipitous recommendations via in-


novators. In Proceedings of the 33rd international ACM SIGIR conference on
Research and development in information retrieval, SIGIR ’10, pages 218–225,
New York, NY, USA. ACM.

[Kawamae et al., 2009] Kawamae, N., Sakano, H., and Yamada, T. (2009). Per-
sonalized recommendation based on the personal innovator degree. In Pro-
ceedings of the third ACM conference on Recommender systems, RecSys ’09,
pages 329–332, New York, NY, USA. ACM.

[Koenker, 2005] Koenker, R. (2005). Quantile Regresssion. Wiley Online Li-


brary.

65
[Kolter and Maloof, 2003] Kolter, J. Z. and Maloof, M. A. (2003). Dynamic
weighted majority: A new ensemble method for tracking concept drift. In
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on,
pages 123–130. IEEE.

[Komiak and Benbasat, 2006] Komiak, S. Y. and Benbasat, I. (2006). The ef-
fects of personalization and familiarity on trust and adoption of recommen-
dation agents. Mis Quarterly, pages 941–960.

[Konstan et al., 2006] Konstan, J. A., McNee, S. M., Ziegler, C.-N., Torres,
R., Kapoor, N., and Riedl, J. T. (2006). Lessons on applying automated
recommender systems to information-seeking tasks. In proceedings of the 21st
national conference on Artificial intelligence - Volume 2, AAAI’06, pages
1630–1633, Palo Alto, CA, USA. AAAI Press.

[Konstan et al., 1997] Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L.,
Gordon, L. R., and Riedl, J. (1997). Grouplens: applying collaborative filter-
ing to usenet news. Communications of the ACM, 40(3):77–87.

[Kontonasios et al., 2012] Kontonasios, K.-N., Spyropoulou, E., and De Bie, T.


(2012). Knowledge discovery interestingness measures based on unexpected-
ness. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discov-
ery, 2(5):386–399.

[Koren, 2008] Koren, Y. (2008). Factorization meets the neighborhood: a


multifaceted collaborative filtering model. In Proceedings of the 14th ACM
SIGKDD international conference on Knowledge discovery and data mining,
pages 426–434. ACM.

[Koren, 2009] Koren, Y. (2009). Collaborative filtering with temporal dynam-


ics. In Proceedings of the 15th ACM SIGKDD international conference on
Knowledge discovery and data mining, pages 447–456. ACM.

[Koren and Bell, 2011] Koren, Y. and Bell, R. (2011). Advances in collaborative
filtering. In Recommender Systems Handbook, pages 145–186. Springer.

[Koren et al., 2009] Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix fac-
torization techniques for recommender systems. Computer, 42(8):30–37.

[Lathia et al., 2010] Lathia, N., Hailes, S., Capra, L., and Amatriain, X. (2010).
Temporal diversity in recommender systems. In Proceedings of the 33rd inter-

66
national ACM SIGIR conference on Research and development in information
retrieval, SIGIR ’10, pages 210–217, New York, NY, USA. ACM.

[Lawrence et al., 2007] Lawrence, R., Perlich, C., Rosset, S., Arroyo, J., Calla-
han, M., Collins, J., Ershov, A., Feinzig, S., Khabibrakhmanov, I., Mahatma,
S., et al. (2007). Analytics-driven solutions for customer targeting and sales-
force allocation. IBM Systems Journal, 46(4):797–816.

[Lee, 2001] Lee, W. S. (2001). Collaborative learning for recommender systems.


In Machine Learning International Conference, pages 314–321.

[Lemire and Maclachlan, 2007] Lemire, D. and Maclachlan, A. (2007). Slope


one predictors for online rating-based collaborative filtering. CoRR,
abs/cs/0702144.

[Linden et al., 2003] Linden, G., Smith, B., and York, J. (2003). Amazon.com
recommendations: Item-to-item collaborative filtering. Internet Computing,
IEEE, 7(1):76–80.

[Manzato, 2013] Manzato, M. G. (2013). gsvd++: supporting implicit feedback


on recommender systems with metadata awareness. In Proceedings of the 28th
Annual ACM Symposium on Applied Computing, SAC ’13, pages 908–913,
New York, NY, USA. ACM.

[Marlin, 2003] Marlin, B. (2003). Modeling user rating profiles for collaborative
filtering. Advances in neural information processing systems, 16.

[McSherry, 2002] McSherry, D. (2002). Diversity-conscious retrieval. In Proceed-


ings of the 6th European Conference on Advances in Case-Based Reasoning,
ECCBR ’02, pages 219–233, London, UK, UK. Springer-Verlag.

[Melville et al., 2004] Melville, N., Kraemer, K., and Gurbaxani, V. (2004). Re-
view: Information technology and organizational performance: An integrative
model of it business value. MIS quarterly, 28(2):283–322.

[Murakami et al., 2008] Murakami, T., Mori, K., and Orihara, R. (2008). Met-
rics for evaluating the serendipity of recommendation lists. In Proceedings of
the 2007 conference on New frontiers in artificial intelligence, JSAI’07, pages
40–46, Berlin, Heidelberg. Springer-Verlag.

67
[Nakamura and Abe, 1998] Nakamura, A. and Abe, N. (1998). Collaborative
filtering using weighted majority prediction algorithms. In Proceedings of the
Fifteenth International Conference on Machine Learning, pages 395–403.

[Nakatsuji et al., 2010] Nakatsuji, M., Fujiwara, Y., Tanaka, A., Uchiyama, T.,
Fujimura, K., and Ishida, T. (2010). Classical music for rock fans?: Novel
recommendations for expanding user interests. In Proceedings of the 19th
ACM international conference on Information and knowledge management,
CIKM ’10, pages 949–958, New York, NY, USA. ACM.

[Oestreicher-Singer and Sundararajan, 2012a] Oestreicher-Singer, G. and Sun-


dararajan, A. (2012a). Recommendation networks and the long tail of elec-
tronic commerce. Mis Quarterly, 36(1):65–83.

[Oestreicher-Singer and Sundararajan, 2012b] Oestreicher-Singer, G. and Sun-


dararajan, A. (2012b). The visible hand? demand effects of recommendation
networks in electronic markets. Management Science, 58(11):1963–1981.

[Padmanabhan and Tuzhilin, 1998] Padmanabhan, B. and Tuzhilin, A. (1998).


A belief-driven method for discovering unexpected patterns. In Proceedings of
the third International Conference on Knowledge Discovery and Data Mining,
KDD ’98, pages 94–100, Palo Alto, CA, USA. AAAI Press.

[Padmanabhan and Tuzhilin, 2000] Padmanabhan, B. and Tuzhilin, A. (2000).


Small is beautiful: discovering the minimal set of unexpected patterns. In
Proceedings of the sixth ACM SIGKDD international conference on Knowl-
edge discovery and data mining, KDD ’00, pages 54–63, New York, NY, USA.
ACM.

[Padmanabhan and Tuzhilin, 2006] Padmanabhan, B. and Tuzhilin, A. (2006).


On characterization and discovery of minimal unexpected patterns in rule
discovery. IEEE Trans. on Knowl. and Data Eng., 18(2):202–216.

[Panniello et al., 2009] Panniello, U., Tuzhilin, A., Gorgoglione, M., Palmisano,
C., and Pedone, A. (2009). Experimental comparison of pre- vs. post-filtering
approaches in context-aware recommender systems. In Proceedings of the
third ACM conference on Recommender systems, RecSys ’09, pages 265–268,
New York, NY, USA. ACM.

[Parameswaran and Garcia-Molina, 2009] Parameswaran, A. G. and Garcia-


Molina, H. (2009). Recommendations with prerequisites. In Proceedings of the

68
third ACM conference on Recommender systems, RecSys ’09, pages 353–356.
ACM.

[Paterek, 2007] Paterek, A. (2007). Improving regularized singular value decom-


position for collaborative filtering. In Proceedings of KDD cup and workshop,
volume 2007, pages 5–8.

[Pathak et al., 2010] Pathak, B., Garfinkel, R., Gopal, R. D., Venkatesan, R.,
and Yin, F. (2010). Empirical analysis of the impact of recommender systems
on sales. Journal of Management Information Systems, 27(2):159–188.

[Pavlov and Pennock, 2002] Pavlov, D. and Pennock, D. M. (2002). A max-


imum entropy approach to collaborative filtering in dynamic, sparse, high-
dimensional domains. In Neural Information Processing Systems, pages 1441–
1448.

[Pearl, 2009] Pearl, J. (2009). Causal inference in statistics: An overview.


Statistics Surveys, 3:96–146.

[Provost and Fawcett, 2013] Provost, F. and Fawcett, T. (2013). Data Science
for Business: Fundamental Principles of Data Mining and Data-Analytic
Thinking. In preparation. O’Reilly Media.

[Resnick et al., 1994] Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and
Riedl, J. (1994). Grouplens: an open architecture for collaborative filtering of
netnews. In Proceedings of the 1994 ACM conference on Computer supported
cooperative work, pages 175–186. ACM.

[Ricci and Shapira, 2011] Ricci, F. and Shapira, B. (2011). Recommender sys-
tems handbook. Springer.

[Rojas et al., 2001] Rojas, M., Santos, S. A., and Sorensen, D. C. (2001). A
new matrix-free algorithm for the large-scale trust-region subproblem. SIAM
Journal on optimization, 11(3):611–646.

[Said et al., 2012] Said, A., Jain, B. J., Kille, B., and Albayrak, S. (2012).
Increasing diversity through furthest neighbor-based recommendation. In
Proceedings of the WSDM’12 Workshop on Diversity in Document Retrieval
(DDR’12).

69
[Sarwar et al., 2000] Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2000).
Application of dimensionality reduction in recommender system-a case study.
Technical report, DTIC Document.

[Sarwar et al., 2001] Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2001).
Item-based collaborative filtering recommendation algorithms. In Proceedings
of the 10th international conference on World Wide Web, pages 285–295.
ACM.

[Shani and Gunawardana, 2011] Shani, G. and Gunawardana, A. (2011). Eval-


uating recommendation systems. Recommender Systems Handbook, 12(19):1–
41.

[Shi et al., 2012] Shi, Y., Zhao, X., Wang, J., Larsona, M., and Hanjalic, A.
(2012). Adaptive diversification of recommendation results via latent factor
portfolio. In SIGIR.

[Si and Jin, 2003] Si, L. and Jin, R. (2003). Flexible mixture model for collab-
orative filtering. In Machine Learning International Conference, volume 20,
page 704.

[Silberschatz and Tuzhilin, 1996] Silberschatz, A. and Tuzhilin, A. (1996).


What makes patterns interesting in knowledge discovery systems. Knowl-
edge and Data Engineering, IEEE Transactions on, 8(6):970 –974.

[Stitelman et al., 2011] Stitelman, O., Dalessandro, B., Perlich, C., and
Provost, F. (2011). Estimating the effect of online display advertising on
browser conversion. Data Mining and Audience Intelligence for Advertising
(ADKDD 2011), 8.

[Sugiyama and Kan, 2011] Sugiyama, K. and Kan, M.-Y. (2011). Serendipitous
recommendation for scholarly papers considering relations among researchers.
In Proceedings of the 11th annual international ACM/IEEE joint conference
on Digital libraries, JCDL ’11, pages 307–310, New York, NY, USA. ACM.

[Töscher et al., 2008] Töscher, A., Jahrer, M., and Legenstein, R. (2008). Im-
proved neighborhood-based algorithms for large-scale recommender systems.
In Proceedings of the 2nd KDD Workshop on Large-Scale Recommender Sys-
tems and the Netflix Prize Competition, page 4. ACM.

70
[Ungar and Foster, 1998] Ungar, L. H. and Foster, D. P. (1998). Clustering
methods for collaborative filtering. In AAAI Workshop on Recommendation
Systems, number 1.

[Vargas and Castells, 2011] Vargas, S. and Castells, P. (2011). Rank and rele-
vance in novelty and diversity metrics for recommender systems. In Proceed-
ings of the fifth ACM conference on Recommender systems, RecSys ’11, pages
109–116, New York, NY, USA. ACM.

[Vargas et al., 2012] Vargas, S., Castells, P., and Vallet, D. (2012). Explicit rel-
evance models in intent-oriented information retrieval diversification. In 35th
Annual International ACM SIGIR Conference on Research and Development
in Information Retrieval (SIGIR 2012), Portland, OR, USA.

[Wang and Zhu, 2009] Wang, J. and Zhu, J. (2009). Portfolio theory of informa-
tion retrieval. In Proc. of the Annual International ACM SIGIR Conference
on Research and Development on Information Retrieval (SIGIR).

[Wang and Benbasat, 2005] Wang, W. and Benbasat, I. (2005). Trust in and
adoption of online recommendation agents. Journal of the Association for
Information Systems, 6(3):72–101.

[Weng et al., 2007] Weng, L.-T., Xu, Y., Li, Y., and Nayak, R. (2007). Im-
proving recommendation novelty based on topic taxonomy. In Proceedings
of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence
and Intelligent Agent Technology - Workshops, WI-IATW ’07, pages 115–118,
Washington, DC, USA. IEEE Computer Society.

[Wong and Easton, 1980] Wong, C.-K. and Easton, M. C. (1980). An efficient
method for weighted sampling without replacement. SIAM Journal on Com-
puting, 9(1):111–113.

[Xiao and Benbasat, 2007] Xiao, B. and Benbasat, I. (2007). E-commerce prod-
uct recommendation agents: use, characteristics, and impact. Mis Quarterly,
31(1):137–209.

[Xie et al., 2010] Xie, M., Lakshmanan, L. V., and Wood, P. T. (2010). Break-
ing out of the box of recommendations: from items to packages. In Proceedings
of the fourth ACM conference on Recommender systems, RecSys ’10, pages
151–158. ACM.

71
[Zhang and Hurley, 2008] Zhang, M. and Hurley, N. (2008). Avoiding
monotony: Improving the diversity of recommendation lists. In Proceedings
of the 2008 ACM conference on Recommender systems, RecSys ’08, pages
123–130, New York, NY, USA. ACM.

[Zhang and Hurley, 2009] Zhang, M. and Hurley, N. (2009). Novel item rec-
ommendation by user profile partitioning. In Proceedings of the 2009
IEEE/WIC/ACM International Joint Conference on Web Intelligence and
Intelligent Agent Technology - Volume 01, WI-IAT ’09, pages 508–515, Wash-
ington, DC, USA. IEEE Computer Society.

[Zhang et al., 2011] Zhang, T., Agarwal, R., and Lucas Jr, H. C. (2011). The
value of it-enabled retailer learning: personalized product recommendations
and customer store loyalty in electronic markets. MIS Quarterly-Management
Information Systems, 35(4):859.

[Zhang et al., 2012] Zhang, Y. C., Séaghdha, D. O., Quercia, D., and Jambor,
T. (2012). Auralist: introducing serendipity into music recommendation. In
Proceedings of the fifth ACM international conference on Web search and data
mining, WSDM ’12, pages 13–22, New York, NY, USA. ACM.

[Zhou et al., 2010] Zhou, T., Kuscsik, Z., Liu, J., Medo, M., Wakeling, J.,
and Zhang, Y. (2010). Solving the apparent diversity-accuracy dilemma of
recommender systems. Proceedings of the National Academy of Sciences,
107(10):4511.

[Zhou et al., 2008] Zhou, Y., Wilkinson, D., Schreiber, R., and Pan, R. (2008).
Large-scale parallel collaborative filtering for the netflix prize. In Algorithmic
Aspects in Information and Management, pages 337–348. Springer.

[Zhu and Kraemer, 2005] Zhu, K. and Kraemer, K. L. (2005). Post-adoption


variations in usage and value of e-business by organizations: cross-country
evidence from the retail industry. Information Systems Research, 16(1):61–
84.

[Ziegler et al., 2005] Ziegler, C.-N., McNee, S. M., Konstan, J. A., and Lausen,
G. (2005). Improving recommendation lists through topic diversification. In
Proceedings of the 14th international conference on World Wide Web, WWW
’05, pages 22–32, New York, NY, USA. ACM.

72

You might also like