Mining The Network Value of Customers - Domingos and Richardson
Mining The Network Value of Customers - Domingos and Richardson
ABSTRACT and market only to those [19]. D a t a mining plays a key role
One of the major applications of d a t a mining is in helping in this process, by allowing the construction of models that
companies determine which potential customers to market predict a customer's response given her past buying behavior
to. If the expected profit from a customer is greater than the and any available demographic information [29]. When suc-
cost of marketing to her, the marketing action for t h a t cus- cessful, this approach can significantly increase profits [34].
tomer is executed. So far, work in this area has considered One basic limitation of it is that it treats each customer
only the intrinsic value of the customer (i.e, the expected as making a buying decision independently of all other cus-
profit from sales to her). We propose to model also the tomers. In reality, a person's decision to buy a product is
customer's network value: the expected profit from sales to often strongly influenced by her friends, acquaintances, busi-
other customers she may influence to buy, the customers ness partners, etc. Marketing based on such word-of-mouth
those may influence, and so on reeursively. Instead of view- networks can be much more cost-effective t h a n the more con-
ing a market as a set of independent entities, we view it as a ventional variety, because it leverages the customers them-
social network and model it as a Markov random field. We selves to carry out most of the promotional effort. A classic
show the advantages of this approach using a social network example of this is the Hotmail free email service, which grew
mined from a collaborative filtering database. Marketing from zero to 12 million users in 18 months on a minuscule
t h a t exploits the network value of customers--also known advertising budget, thanks to the inclusion of a promotional
as viral m a r k e t i n g - - c a n be extremely effective, but is still a message with the service's URL in every email sent using
black art. Our work can be viewed as a step towards pro- it [23]. Competitors using conventional marketing fared far
viding a more solid foundation for it, taking advantage of less well. This type of marketing, dubbed viral marketing
the availability of large relevant databases. because of its similarity to the spread of an epidemic, is now
used by a growing number of companies, particularly in the
Internet sector. More generally, network effects (known in
Categories and Subject Descriptors the economics literature as network externalities) are of crit-
H.2.8 [Database Management]: Database Applications-- ical importance in many industries, including notably those
data mining; 1.2.6 [Artificial Intelligence]: Learning--in- associated with information goods (e.g., software, media,
duction; 1.5.1 [ P a t t e r n R e c o g n i t i o n ] : Models--statistical; telecommunications, etc.) [38]. A technically inferior prod-
J.4 [Computer Applications]: Social and Behavioral Sci- uct can often prevail in the marketplace if it better leverages
ences the network of users (for example, VHS prevailed over Beta
in the VCR market).
General Terms Ignoring network effects when deciding which customers
to market to can lead to severely suboptimal decisions. In
Markov random fields, dependency networks, direct market- addition to the intrinsic value t h a t derives from the pur-
ing, viral marketing, social networks, collaborative filtering chases she will make, a customer effectively has a network
value that derives from her influence on other customers. A
1. INTRODUCTION customer whose intrinsic value is lower than the cost of mar-
Direct marketing is one of the major applications of KDD. keting may in fact be worth marketing to when her network
In contrast to mass marketing, where a product is promoted value is considered. Conversely, marketing to a profitable
indiscriminately to all potential customers, direct marketing customer may be redundant if network effects already make
a t t e m p t s to first select the customers likely to be profitable, her very likely to buy. However, quantifying the network
value of a customer is at first sight an extremely difficult un-
dertaking, and to our knowledge has never been attempted.
A customer's network value depends not only on herself,
Permission to make digital or hard copies of all or part of this work for but potentially on the configuration and state of the entire
personal or classroom use is granted without fee provided that copies network. As a result, marketing in the presence of strong
are not made or distributed for profit or commercial advantage and that network effects is often a hit-and-miss affair. Many startup
copies bear this notice and the full citation on the first page. To copy
otherwise, to republish, to post on servers or to redistribute to lists, companies invest heavily in customer acquisition, on the ba-
requires prior specific permission and/or a fee. sis t h a t this is necessary to "seed" the network, only to face
KDD 01 San Francisco CA USA bankruptcy when the desired network effects fail to materi-
Copyright ACM 2001 1-58113-391-x/01/08...$5.00
57
alize. On the other hand, some companies (like Hotmail and This yields 1
the ICQ instant messenger service) are much more successful
than expected. A sounder basis for action in network-driven P(X, IX ~, Y, M )
markets would thus have the potential to greatly reduce the = E P ( X i [ N i , Y, M ) r I P ( X j [ x k , Y , M )
risk of companies operating in them. C(N~ ) x~ CNl'
We believe that, for many of these markets, the growth
of the Internet has led to the availability of a wealth of (2)
data from which the necessary network information can be The set of variables X u, with joint probability conditioned
mined. In this paper we propose a general framework for on X k, Y and M described by Equation 2, is an instance
doing this, and for using the results to optimize the choice of a Markov random field [2, 25, 7]. Because Equation 2
of which customers to market to, as well as estimating what expresses the probabilities P(Xi IX k, Y, M) as a function of
customer acquisition cost is justified for each. Our solu- themselves, it can be applied iteratively to find them, start-
tion is based on modeling social networks as Markov ran- ing from a suitable initial assignment. This procedure is
dom fields, where each customer's probability of buying is a known as relaxation labeling, and is guaranteed to converge
function of both the intrinsic desirability of the product for to locally consistent values as long as the initial assignment
the customer and the influence of other customers. We then is sufficiently close to them [33I. A natural choice for initial-
focus on collaborative filtering databases as an instance of a ization is to use the network-less probabilities P(XilY, M).
data source for mining networks of influence from. We apply Notice that the number of terms in Equation 2 is expo-
our framework to the domain of marketing motion pictures, nential in the number of unknown neighbors of )/7/. If this
using the publicly-available EachMovie database of 2.8 mil- number is small (e.g., 5), this should not be a problem; oth-
lion movie ratings, and demonstrate its advantages" relative erwise, an approximate solution is necessary. A standard
to traditional direct marketing. The paper concludes with a method for this purpose is Gibbs sampling [16]. An alterna-
discussion of related work and a summary of contributions tive based on an efficient k-shortest-path algorithm is pro-
and future research directions. posed in Chakrabarti et al. [6].
Given N~ and Y, Xi should be independent of the mar-
keting actions for other customers. Assuming a naive Bayes
2. MODELING MARKETS AS SOCIAL model for Xi as a function of Ni, Y1. . . . ,Ym and Mi [11],
NETWORKS
Consider a set of n potential customers, and let Xi be a P(Xi[Ni, Y,M)
Boolean variable that takes the value i if customer i buys the = P(Xi[Ni,Y, Mi)
product being marketed, and 0 otherwise. In what follows P(Xi)P(Ni, Y, Mi[Xi)
we will often slightly abuse language by taking Xi to "be" P ( N i , Y, Mi)
the ith customer. Let the neighbors of Xi be the customers
which directly influence Xi: Ni = {Xi,1,... ,Xi,,~i} C_X - = P(X')P(Ni[X')P(MdX')p(Ni,Y,M) H P(YkIX,)
{Xi}, where X = {X1,... , X , } . In other words, Xi is in- k----1
dependent of X - Ni - {Xi} given Ni. Let X k (X ~) be ?yt
the customers whose value (i.e., whether they have bought P(Xi[N,)P(Mi[XI) H P(Yk[X,) (3)
the product) is known (unknown), and let N~' = Ni f3 X ~. = P(Y, Mi [Ni) k---1
Assume the product is described by a set of attributes Y =
where P ( Y , Mi[NI) = P(Y, Mi[Xi = 1)P(Xi = l I N i ) +
{YI,... ,Ym}. Let Mi be a variable representing the mar- P ( Y , Mi[Xi = O)P(gi = O[Ni). The corresponding net-
keting action that is taken for customer i. For example, Mi
work-less probabilities are P ( X i [ Y , M ) = P(Xi)P(Mi[Xi)
could be a Boolean variable, with Mi = 1 if the customer is
offered a given discount, and Mi = 0 otherwise. Alternately,
I'I~'n=lP(Y~[Xi)/P(Y, Mi). Given Equation 3, in order to
compute Equation 2 we need to know only the following
Mi could be a continuous variable indicating the size of the
probabilities, since all terms reduce to them: P(Xi[Ni),
discount offered, or a nominal variable indicating which of
several possible actions is taken. Let M = {M1,... , M , } .
P(X~), P(MdXD, and P(Y~IX~) for all k. With the excep-
tion of P(Xi[Ni), all of these are easily obtained in one pass
Then, for all Xi f[X k,
through the data by counting (assuming the Yk are discrete
or have been pre-discretized; otherwise a univariate model
P(Xi [X k, Y, M) can be fit for each numeric Yk). The form of P ( X I [ N i ) de-
= E P(X,, N~' IX k, Y, M) pends on the mechanism by which customers influence each
C(N~') other, and will vary from application to application. In the
next section we focus on the particular case where X is the
= E P(XdN~'XI"Y'M)P(N'~IXk'Y'M) set of users of a collaborative filtering system.
O(N'~) For simplicity, assume that M is a Boolean vector (i.e.,
(i) only one type of marketing action is being considered, such
= E P(X~IN~'Y'M)P(N~'IXk'Y'M) as offering the customer a given discount). Let c be the
C(N?)
cost of marketing to a customer (assumed constant), r0 be
the revenue from selling the product to the customer if no
where C(N]') is the set of all possible configurations of the
marketing action is performed, and r, be the revenue if mar-
unknown neighbors of Xi (i.e., the set of all possible 2 [N?[
keting is performed, ro and rl will be the same unless the
assignments of 0 and 1 to them). Following Pelkowitz [33],
we approximate P(N~'[X k, Y, M ) by its maximum entropy 1The same result can be obtained by assuming that the Xj
estimate given the marginals P ( X j [ X k, Y, M), for Xj E N]'. are independent given X k, Y and M.
58
marketing action includesofferinga discount. Let f~ (M) be amount spent in marketing to her, and that we can estimate
the result of setting M~ to 1 and leaving the rest of M un- how much needs to be spent to produce a given increase in
changed, and similarlyfor f°(M). The expected liftin profit buying probability. The optimal customer acquisition cost
from marketing to customer i in isolation(i.e.,ignoring her for customer i is then the value of c / t h a t maximizes her to-
effecton other customers) is then [8] tal value ELP(X k, Y, f~ (M)) - ELP(X k, Y, f ° ( M ) ) , with
IMIc replaced by ~"]~=1c/ in Equation 5.
ELP/(X ~, Y, M ) =
riP(X/= 1IXk,Y, f : (M)) 3. MINING SOCIAL NETWORKS FROM
- r o P ( X i = l l X k, Y, f ° ( M ) ) - c (4) COLLABORATIVE FILTERING
Let Mo be the null vector (all zeros). The global lift in DATABASES
profit that results from a particular choice M of customers Arguably, a decade ago it would have been difficult to
to market to is then make practical use of a model like Equation 2, because
of the lack of data to estimate the influence probabilities"
ELP(Xk,Y,M) = P(Xi[NI). Fortunately, the explosion of the Internet has
n drastically changed this. People influence each other online
E riP(X~ = 1IX k, Y, M) (and leave a record of it) through postings and responses to
/-----1 newsgroups, review and knowledge-sharing sites like epin-
n
ions.corn, chat rooms and IRC, online game playing and
-ro E P(X/ = llXk, Y, Uo) -IMIc (5) MUDs, peer-to-peer networks, email, interlinking of Web
i=1
pages, etc. In general, any form of online community is a
where ri = rl if M~ = 1, r~ = ro if M/ = 0, and [M[ is potentially rich source of data for mining social networks
the number of l's in M. Our goal is to find the assignment from. (Of course, mining these sources is subject to the
of values to M that maximizes ELP. In general, finding the usual privacy concerns; but many sources are public infor-
optimal M requires trying all possible combinations of as- mation.) In this paper we will concentrate on a particularly
signments to its components. Because this is intractable, we simple and potentially very effective data source: the col-
propose using one of the following approximate procedures laborative filtering systems widely used by e-commerce sites
instead: (e.g., amazon.corn) to recommend products to consumers.
In a collaborative filtering system, users rate a set of items
Single pass For each i, set M / = 1 if ELP(X k, Y, f~ (M0)) (e.g., movies, books, newsgroup postings, Web pages), and
> 0, and set M / = 0 otherwise. these ratings are then used to recommend other items the
user might be interested in. The ratings may be implicit
G r e e d y s e a r c h Set M = M0. Loop through the Mi's, set- (e.g., the user did or did not buy the book) or explicit (e.g.,
ting each M~ to 1 if ELP(X k, Y, f~ (M)) > ELP(X k, the user gives a rating of zero to five stars to the book,
Y, M). Continue looping until there are no changes depending on how much she liked it). Many algorithms have
in a complete scan of the M/'s. The key difference be- been proposed for choosing which items to recommend given
tween this method and the previous one is that here the incomplete matrix of ratings (see, for example, Breese
later changes to the M/'s are evaluated with earlier et al. [3]). The most widely used method, and the one that
changes to the Mi's already in place, while in the pre- we will assume here, is the one proposed in GroupLens, the
vious method all changes axe evaluated with respect project that originally introduced quantitative collaborative
to M0. filtering [35]. The basic idea in this method is to predict a
Hill-climbing s e a r c h Set M = Mo. Set M/~ = 1, where user's rating of an item as a weighted average of the ratings
il = argmax/{ELP(X k, Y, f~(M))}. Now set M/2 = given by similar users, and then recommend items with high
1, where is = argmaxi{ ELP(X k, Y, f~ (f~l (M)))}. Re- predicted ratings. The similarity of a pair of users (i,j) is
peat until there is no i for which setting M/ = 1 in- measured using the Pearson correlation coefficient:
creases ELP.
w/~ = EkCP~k- ~)(R,k -- ~ ) (6)
Each method is computationally more expensive than the C E k (/7~k _ ~.)2 EkCRjk - -~)2
previous one, but potentially leads to a better solution for
M (i.e., produces a higher ELP). where Rak is user i's rating of item k, R~ is the mean of user
The intrinsic value of a customer is given by Equation 4. i's ratings, likewise for j, and the summations and means
The total value of a customer (intrinsic plus network) is the are computed over the items k that both i and j have rated.
ELP obtained by marketing to her: ELP(X k, Y, f~ (M)) - Given an item k that user i has not rated, her rating of it is
ELP(Xk,Y, f ° ( M ) ) . The customer's network value is the then predicted as
difference between her total and intrinsic values. Notice
that, in general, this value will depend on which other cus- ~ =~ + p ~_, wj/CRjk - - ~ ) (7)
tomers are being marketed to, and which others have already XjeNi
bought the product.
Suppose now that M/ is a continuous variable, that we where p = 1/~,X~eNi ]W/j[ is a normalization factor, and
can choose to incur different marketing costs for different N/ is the set of n/ users most similar to i according to
customers, and that there is a known relationship between Equation 6 (her neighbors). In the limit, N/ can be the
c~ and P(X~[M~). In other words, suppose that we can in- entire database of users, but for reasons of noise robustness
crease a customer's probability of buying by increasing the and computational efficiency it is usually much smaller (e.g.,
59
ni = 5). For neighbors that did not rate the item, Rj~ is set the contribution of a neighbor with unknown rating will be
to Rj. E [ R j l Y ] --Rj. P(Rj [Y) can be estimated using a naive
The key advantage of a collaborative filtering database Bayes model (assuming Rj only takes on a small number of
as a source for mining a social network for viral marketing different values, which is usually the case). Let /~'(Ni) be
is that the mechanism by which individuals influence each the value of/~. obtained in this way. Then, treating this as
other is known and well understood: it is the collaborative a deterministic value,
filtering algorithm itself. User i influences user j when j
sees a recommendation that is partly the result of i's rating.
Assuming i and j do not know each other in real life (which,
P(XilNi) = /rt,~a, P(XilP~', Ni) dP(P~'[Ni)
J Rmi n
given that they can be anywhere in the world, is likely to
be true), there is no other way they can substantially in- --- P(Xil/~i(Ni),Ni) = P(Xi[/~i(Ni)) (8)
fluence each other. Obviously, a user is subject to many
All that remains is to estimate P(Xi[/~'). This can be
influences besides that of the collaborative filtering system
(including the influence of people not on the system), but viewed as a univariate regression problem, with /~. as the
the uncertainty caused by those influences is encapsulated input and P(XiI~') as the output. The most appropriate
to a first degree of approximation in P(XilP~'k), the proba- functional form for this regression will depend on the ob-
bility that a user will purchase an item given the rating the served data. In the experiments described below, we used
system predicts for her. It is also reasonable to assume that a piecewise-linear model for P(Xi[/~'), obtained by dividing
an individual would not continue to use a collaborative fil- /~i's range into bins, computing the mean/~, and P(XiI~. )
tering system if she did not find its recommendations useful, for each bin, and then estimating P(X~[/~.) for an arbitrary
and therefore that there is a causal connection (rather than /~" by interpolating linearly between the two nearest means.
simply a correlation) between the recommendations received Given a small number of bins, this approach can fit a wide
and the purchases made. variety of observations relatively well, with little danger of
To extract a social network model from a collaborative fil- over fitting.
tering database, we view an item as a random sample from Notice that the technical definition of a Markov random
the space (X, Y), where Y is a set of properties of the item field requires that the neighborhood relation be symmetric
(assumed available), and Xi represents whether or not user (i.e., if i is a neighbor of j, then j is also a neighbor of i),
i rated the item. For simplicity, we assume that if a user but in a collaborative filtering system this may not be the
rates an item then she bought it, and vice-versa; removing case. The probabilistic model obtained from it in the way
this assumption would be straightforward, given the relevant described will then be an instance of a dependency network,
data. The prior P(Xi) can then be estimated simply as the a generalization of Markov random fields recently proposed
fraction of items rated by user i. The conditional proba- by Heckerman et al. [17]. Heckerman et al. show that Gibbs
bilities P(Y~IXi) can be obtained by counting the number sampling applied to such a network defines a joint distribu-
of occurrences of each value of Yk (assumed discrete or pre- tion from which all probabilities of interest can be computed.
discretized) with each value of Xi. Estimating P(MiIXi ) While in our experimental studies Gibbs sampling and re-
requires a data collection phase in which users to market laxation labeling produced very similar results, the formal
to are selected at random and their responses are recorded derivation of the properties of dependency networks under
(both when being marketed to and not). P(MilXi ) can be relaxation labeling is a matter for future research.
estimated individually for each user, or (requiring far less
data) as the same for all users, as done in Chickering and 4. EMPIRICAL STUDY
Heckerman [8]. If the necessary data is not available, we
We have applied the methodology described in the previ-
propose setting P(MilXi ) using prior knowledge about the
ous sections to the problem of marketing motion pictures,
effectiveness of the type of marketing being considered, given
using the EachMovie collaborative filtering database (www.-
any demographic information available about the users. (It
is also advisable to test the sensitivity of the outcome to
research,compaq.com/src/eachmovie/). EachMovie contains
2.8 million ratings of 1628 movies by 72916 users, gath-
P(Mi[Xi) by trying a range of values.)
ered between January 29, 1996 and September 15, 1997 by
The set of neighbors Ni for each i is the set of neighbors
the eponymous recommendation site, which was run by the
of the corresponding user in the collaborative filtering sys-
DEC (now Compaq) Systems Research Center. EachMovie
tem. If the ratings are implicit (i.e., yes/no), a model for
is publicly available, and has become a standard database
P(Xi[Ni) (e.g., a naive Bayes model, as we have assumed for evaluating collaborative filtering systems (e.g., Breese at
for P(YklXi)) can be fit directly to the observed X vectors.
If explicit ratings axe given (e.g., zero to five stars), then al. [3]). Motion picture marketing is an interesting applica-
tion for the techniques we propose because the success of a
we know that Xi depends on N~ solely through [ti, Xi's movie is known to be strongly driven by word of mouth [12].
predicted rating according to Equation 7 (for readability, EachMovie is composed of three databases: one contain-
we will omit the item indexes k). In other words, Xi is ing the ratings, one containing demographic information
conditionally independent of Ni given/~i. If the neighbors' about the users (which we did not use), and one contain-
ratings are known,/~, is a deterministic function of Ni given ing information about the movies. The latter includes the
by Equation 7, with Xj E Ni determining whether the con- movie's title, studio, theater and video status (old or cur-
tribution of the j t h neighbor is Rj - Rj or 0 (see discussion rent), theater and video release dates, and ten Boolean at-
following Equation 7). If the ratings of some or all neigh- tributes describing the movie's genre (action, animation,
bors are unknown (i.e., the ratings that they would give if art/foreign, classic, comedy, drama, family, horror, romance,
they were to rate the item), we can estimate them as their and thriller; a movie can have more than one genre). The
expected values given the item's attributes. In other words, movie's URL in the Internet Movie Database (www.imdb..
60
corn) is also included. This could be used to augment the
movie description with attributes extracted from the IMDB;
we plan to do so in the future. The ratings database contains 1
an entry for each movie t h a t each user rated, on a scale of 0.8
zero to five stars, and the time and date on which the rating
was generated. 0.6
The collaborative filtering algorithm used in EachMovie 0.4
has not been published, but we will assume that the al- 0.2
gorithm described in the previous section is a reasonable
approximation to it. This assumption is supported by the 0
observation that, despite their variety in form, all the many -3 -2 -1 0 1 2 3
A
collaborative filtering algorithms proposed a t t e m p t to cap- Ri
ture essentially the same information (namely, correlations
between users).
The meaning of the variables in the EachMovie domain F i g u r e 1: Empirical distribution of/~" a n d Xi given
is as follows: Xi is whether person i saw the movie being /~.
considered. Y contains the movie attributes. R~ is the rat-
ing (zero to five stars) given to the movie by person i. For
tion, P(XiI~') was modeled using a piecewise linear func-
simplicity, throughout this section we assume t h e / ~ i ' s are
centered at zero (i.e., R/ has been subtracted from P~; see tion. We measured P(XiI~') for each of nine bins, whose
Equation 7). boundaries were - 5 . 0 , - 2 . 0 , - 1 . 0 , - 0 . 5 , - 0 . 1 , 0.1, 0.5, 1.0,
2.0, and 5.0. Note t h a t w h i l e / ~ must be between 0 and 5,
4.1 The Model /~i is a weighted sum of the neighbors' difference from their
average, and thus may range from - 5 to 5. We also had a
We used Y = {Y1,Y2,... ,Ylo}, the ten Boolean movie
zero-width bin located a t / ~ = 0. Movies were seen with low
genre attributes. Thus P ( Y I X i ) was in essence a model of
probability (1-5%), and thus there was a high probability
a user's genre preferences, and during inference two movies
that a movie had not been rated by any of Xi's neighbors. In
with the same genre attributes were indistinguishable. The
the absence of a rating, a neighbor's contribution to P~ was
network consisted of all people who had rated at least ten
zero. 84% of the samples fell into this zero bin. Bin bound-
movies, and whose ratings had non-zero standard deviation
aries were chosen by examination of the distribution of d a t a
(otherwise they contained no useful information). Neigh-
in the training set, shown in Figure 1. /~' was unlikely to
bor weights Wij were determined using a modified Pear-
deviate far from 0~ for the reasons given above. We used
son correlation coefficient, which penalized the correlation
narrow bins n e a r / ~ = 0 to obtain higher accuracy in this
by 0.05 for each movie less than ten t h a t both X~ and Xj
area, which contained a majority of the d a t a (96.4% of the
had rated. This correction is commonly used in collabo-
d a t a fell between - 0 . 5 and 0.5). To combat d a t a sparseness,
rative filtering systems to avoid concluding that two users
are very highly correlated simply because they rated very both P(XiI~') and the per-bin m e a n / ~ , were smoothed for
few movies in common, and by chance rated them similarly each bin using an m-estimate with m = l and the population
[18]. The neighbors of Xi were the X j ' s for which W~i was average as the prior.
highest. W i t h n~=5, a number we believe provides a reason- Initially, we expected P(Xi IRi) to increase monotonically
able tradeoff between model accuracy and speed, the aver- with /~.. The actual shape, shown in Figure 1, shows in-
age Wii of neighbors was 0.91. Repeating the experiments creasing P(Xi[~) as Ri moves significantly away from 0
described below with ni = 10 and ni = 20 produced no sig- in either direction. This shape is due to a correlation be-
nificant change in model accuracy, and small improvements tween [/~i[ and the popularity of a movie: for a popular
in profit. Interestingly, the network obtained in each case movie, /~. is more likely to deviate further from zero and
was completely connected (i.e., it contained no isolated sub- Xi is more likely to be 1. Note, however, that P(Xi[~')
graphs). is indeed monotonically increasing in the [-0.1,0.1] inter-
As discussed above, the calculation of P(XdX k, Y, M) val, where the highest density of ratings is. Furthermore,
requires estimating P(XdR~), P ( X i ) , P(MilXi), P(YkIXi), E[P(X~I~" > 0)] = 0.203 > 0.176 = E[P(Xil~ < 0)].
and P(R~[Y). P ( X i ) is simply the fraction of movies Xi
rated. We used a naive Bayes model for P(R~ [Y). P(Y~[Xi), 4.2 The Data
P ( R j [Y), and P(Xi) were all smoothed using an m-estimate While the EachMovie database is large, it has problems
[5] with m=l and the population average as the prior. We which had to be overcome. The movies in the database
did not know the true values of P(MdXi ). We expected which were in theaters before January 1996 were drawn
marketing to have a larger effect on a customer who was from a long time period, and so tended to be very well
already inclined to see the movie, and thus we set the prob- known movies. Over 75% (2.2 million) of the ratings were
abilities P(MdXi) so as to obtain on these movies. In general, the later a movie was released,
the fewer ratings and thus the less information we had for
P(Xi = llMi = 1) = min{aP(Xi = IlM, = 0), 1} (9) it. We divided the database into a training set consisting
of all ratings received through September 1, 1996, and a
where a > 1 is a parameter t h a t we varied in the experi- test set consisting of all movies released between September
ments described below. 2 As described in the previous sec- 1, 1996 and December 31, 1996, with the ratings received
2To fully specify P(MiIXd) we used the additional constraint we used it was always possible to satisfy Equation 9 and this
t h a t P ( Y , Mi = 1) = P ( Y , Mi = 0). W i t h the values of a constraint simultaneously.
61
for those movies any time between September 1, 1996 and binations only need to be computed once. Further, since in
the end of the database. Because there was such a large a single search step only one Mi changes, most of the re-
difference in average movie popularity between the early sults of one step can be reused in the next, greatly speeding
movies and the later ones, we further divided the training up the search process. W i t h these optimizations, we were
set into two subsets: Sotd, containing movies released before able to measure the effect of over 10,000 single changes in
January 1996 (1.06 million votes), and Srece~t, containing M per second, on a 1 GHz Pentium III machine. In pre-
movies released between January and September 1996 (90k liminary experiments, we found relaxation labeling carried
votes). The average movie viewership of Sold was 5.6%, ver- out this way to be several orders of magnitude faster than
sus 1.4% for Srece~e. Since 92% of the training d a t a was in Gibbs sampling; we expect t h a t it would also be much faster
Sotd, we could not afford to ignore it. However, in terms of than the more efficient version of Gibbs sampling proposed
the probability t h a t someone rates a movie, the test period in Heckerman e t a l . [17]. 4 The relaxation labeling process
could be expected to be much more similar to S,.~ce,~. Thus, typically converged quite quickly; few nodes ever required
we trained using all training data, then rescaled P(Xi) and more than a few updates.
P(XiI~') using S~ce~t, and smoothed these values using
an m-estimate with m=l and the distribution on the full 4.4 Model Accuracy
training set as the prior. To test the accuracy of our model, we computed the esti-
Many movies in the test set had very low probability (36% mated probability P(X~IX k, Y , M ) for each person Xi with
were viewed by 10 people or less, and 48% were viewed by M = M0 and X k = $. We measured the correlation between
20 people or less, out of over 20748 people3). Since it is not this and the actual value of Xi in the test set, over all movies,
possible to model such low probability events with any reli- over all people. 5 (Note that, since the comparison is with
ability, we removed all movies which were viewed by fewer test set values, we did not expect to receive ratings from
than 1% of the people. This left 737,579 votes over 462 inactive people, and therefore P ( X i l Y ) = 0 for them.) The
movies for training, and 3912 votes over 12 movies for test- resulting correlation was 0.18. Although smaller than desir-
ing. P(Y]X~), P ( R / ] Y ) , P(Xi), and P(X~]~.) were learned able, this correlation is remarkably high considering t h a t the
using only these movies. However, because the EachMovie only input to the model was the movie's genre. We expect
collaborative filtering system presumably used all movies, the correlation would increase if a more informative set of
we used all movies when simulating it (i.e., when computing movie attributes Y were used.
similarities (Equation 6), selecting neighbors, and predicting
ratings (Equation 7)). 4.5 Network Values
A majority of the people in the EachMovie database pro- For the first movie in the test set ("Space Jam"), we mea-
vided ratings once, and never returned. These people af- sured the network value for all 9585 active people e in the
fected the predicted ratings/~, seen by users of EachMovie, following scenario (see Equations 4 and 9): r0 = 1, r l = 0.5,
but because they never returned to the system for queries, c = 0.1, ~ = 1.5, and M = Mo. Figure 2 shows the 500
their movie viewing choices were not affected by their neigh- highest network values (out of 9585) in decreasing order.
bors. We call these people inactive. A person was marked as The unit of value in this graph is the average revenue t h a t
inactive if there were more than ~- days between her last rat- would be obtained by marketing to a customer in isolation,
ing and the end of the training period. In our tests, we used without costs or discounts. Thus, a network value of 20 for
a r of 60, which resulted in 11163 inactive people. Inactive a given customer implies t h a t by marketing to her we es-
people could be marketed to, since they were presumably sentially get free marketing to an additional 20 customers.
still watching movies; they were just not reporting ratings The scale of the graph depends on the marketing scenario
to EachMovie. If an inactive person was marketed to, she (e.g., network values increase with ~), but the shape gen-
was assumed to have no effect on the rest of the network. erally remains the same. The figure shows t h a t a few users
4.3 Inference and Search have very high network value. This is the ideal situation for
the type of targeted viral marketing we propose, since we
Inference was performed by relaxation labeling, as de- can effectively market to many people while incurring only
scribed in Section 2. This involved iteratively re-estimating the expense of marketing to those few. A good customer
probabilities until they all converged to within a threshold 7. to market to is one who: (1) is likely to give the product
(We used 7 = 10-5.) We maintained a queue of nodes whose a high rating, (2) has a strong weight in determining the
probabilities needed to be re-estimated, which initially con- rating prediction for many of her neighbors, (3) has many
tained all nodes in the network. Each Xi was removed from neighbors who are easily influenced by the rating prediction
the queue in turn, and its probability was re-estimated using they receive, (4) will have a high probability of purchasing
Equation 2. If P(X~tX ~, Y , M ) had changed by more than the product, and thus will be likely to actually submit a rat-
% all nodes that Xi was a neighbor of that were not already ing t h a t will affect her neighbors, and finally (5) has many
in the queue were added to it. Note t h a t the probabilities neighbors with the same four characteristics outlined above,
of nodes corresponding to inactive people only needed to be
computed once, since they are independent of the rest of the 4In our experiments, one Gibbs cycle of sampling all the
network. nodes in the network took on the order of a fiftieth of a
The computation of Equation 2 can be sped up by noting second. The total runtime would be this value multiplied
that, after factoring, all terms involving the Yk's are con- by the number of sampling iterations desired and by the
stant throughout a run, and so these terms and their com- number of search steps.
5Simply measuring the predictive error rate would not be
3This is the number of people left after we removed anyone very useful, because a very low error rate could be obtained
who rated fewer than ten movies, rated movies only after simply by predicting t h a t no one sees the movie.
September 1996, or gave the same rating to all movies. eInactive people always have a network value of zero.
52
movie offer, the profit from direct marketing could not be
0 ............................................................... positive, since without network effects we were guaranteed
to lose money on anyone who saw a movie for free. Figure 3
50 shows that our method was able to find profitable market-
ing opportunities that were missed by direct marketing. For
40
the discounted movie, direct marketing actually resulted in
30, a loss of profit. A customer that looked profitable on her
own may actually have had a negative overall value. This
20- situation demonstrates that not only can ignoring network
effects cause missed marketing opportunities, but it can also
10 make an unprofitable marketing action look profitable. In
the advertising scenario, for small c~ our method increased
0 profits only slightly, while direct marketing again reduced
Rank them. Both methods improved with increasing c~, but our
method consistently outperformed direct marketing.
As can be seen in Figure 3, greedy search produced re-
Figure 2: Typical distribution o f n e t w o r k values. sults that were quite close to those of hill climbing. The
average difference between greedy and hill-climbing profits
(as a percentage of the latter) in the three marketing sce-
and so on recursively. In the movie domain, these corre- narios was 9.6%, 4.0%, and 0.0% respectively. However, as
spond to finding a person who (1) will enjoy the movie, (2) seen in Figure 3, the runtimes differed significantly, with
has many close friends, who are (3) easily swayed, (4) will hill-climbing time ranging from 4.6 minutes to 42.1 minutes
very likely see the movie if marketed to, and (5) has friends while greedy-search time ranged from 3.8 to 5.5 minutes.
whose friends also have these properties. The contrast was even more pronounced in the advertising
scenario, where the profits found by the two methods were
4.6 Marketing Experiments nearly identical, but hill climbing took 14 hours to com-
plete, compared to greedy search's 6.7 minutes. Single-pass
We compared three marketing strategies: mass marketing,
was the fastest method and was comparable in speed to di-
traditional direct marketing, and the network-based market-
rect marketing, but led to significantly lower profits in the
ing method we proposed in Section 2. In mass marketing,
free and discounted movie scenarios.
all customers were marketed to (Mi = 1 for all i). In di-
The lift in profit was considerably higher if all users were
rect marketing, a customer Xi was marketed to (Mi = 1) if
assumed to be active. In the free movie scenario, the lift in
and only if ELPi(Xk,Y, Mo) > 0 (see Equation 4) ignor-
profit using greedy search was 4.7 times greater than when
ing network effects (i.e., using the network-less probabilities
the network had inactive nodes. In the discount and adver-
P ( X i l Y , M)). For our approach, we compared the three
tising scenarios the ratio was 4.1 and 1.8, respectively. This
approximation methods proposed in Section 2: single pass,
was attributable to the fact that the more inactive neighbors
greedy search and hill-climbing. Figure 3 compares these
a node had, the less responsive it could be to the network.
three search types and direct marketing on three different
From the point of view of an e-merchant applying our ap-
marketing scenarios. For all scenarios, r0 = 1, which means
proach, this suggests modifying the collaborative filtering
profit numbers are in units of number of movies seen. In
system to only assign active users as neighbors.
the free movie scenario rl = 0, and in the discounted movie
scenario rl = 0.5. In both of these scenarios we assumed a
cost of marketing of 10% of the revenue from a single sale: 5. RELATED WORK
c = 0.1. In the advertising scenario no discount was of- Social networks have been an object of study for some
fered (r] = 1), and a lower cost of marketing was assumed time, but previous work within sociology and statistics has
(corresponding, for example, to online marketing instead of suffered from a lack of data and focused almost exclusively
physical mailings): e = 0.02. Notice that all the marketing on very small networks, typically in the low tens of indi-
actions considered were effectively in addition to the (pre- viduals [41]. Interestingly, the Google search engine [4] and
sumably mass) marketing that was actually carried out for Kleinberg's (1998) HITS algorithm for finding hubs and au-
the movie. The average number of people who saw a movie thorities on the Web are based on social network ideas. The
given only this marketing (i.e., with M = Mo) was 311. success of these approaches, and the discovery of widespread
The baseline profit would be obtained by subtracting from network topologies with nontrivial properties [42], has led to
this the (unknown) original costs. The correct c~ for each a flurry of research on modeling the Web as a semi-random
marketing scenario was unknown, so we present the results graph (e.g., Kumar et al. [28], Barab~si et al. [1]). Some of
for a range of values. We believe we have chosen plausible this work might be applicable in our context.
ranges, with a free movie providing more incentive than a In retrospect, the earliest sign of the potential of viral
discount, which in turn provides more incentive than simply marketing was perhaps the classic paper by Milgram [31]
advertising. X k = 0 in all experiments. estimating that every person in the world is only six edges
In all scenarios, mass marketing resulted in negative prof- away from every other, if an edge between i and j means "i
its. Not surprisingly, it fared particularly poorly in the knows j." Schwartz and Wood [37] mined social relation-
free and discounted movie scenarios, producing profits which ships from email logs. The ReferralWeb project mined a so-
ranged from -2057 to -2712. In the advertising scenario, cial network from a wide variety of publicly-available online
mass marketing resulted in profits ranging from - 1 4 3 to information [24], and used it to help individuals find experts
-381 (depending on the choice of c~). In the case of a free who could answer their questions. The COBOT project
63
Free Movie Advertising
0 hill [] greedy I s~ngle-pass -'- direct] I O hill [:3 greedy I single-pass A direct I
12 15o
10 120
8 9O
6
d-
40
10
f
0
lO
-5 o •" /4• Lt~ ":"
1.5 2 1.5 2
Alpha Alpha
F i g u r e 3: P r o f i t s a n d r u n t i m e s o b t a i n e d u s i n g different m a r k e t i n g s t r a t e g i e s .
gathered social statistics from participant interactions in the evant probabilities are the same for all customers, and is only
LambdaMoo MUD, but did not explicitly construct a social applied to a made-up network with seven nodes.
network from them [21]. A Maxkov random field formulation Collaborative filtering systems proposed in the literature
similar to Equation 2 was used by Chakrabarti et al. [6] for include GroupLens [35], PHOAKS [40], Siteseer [36], and
classification of Web pages, with pages corresponding to cus- others. A list of collaborative filtering systems, projects
tomers, hyperlinks between pages corresponding to influence and related resources can be found at www.sims.berkeley.-
between customers, and the bag of words in the page corre- edu/resources/collab /.
sponding to properties of the product. Neville and Jensen
[32] proposed a simple iterative algorithm for labeling nodes 6. FUTURE WORK
in social networks, based on the naive Bayes classifier. Cook
The type of data mining proposed here opens up a rich
and Holder [9] developed a system for mining graph-based
field of directions for future research. In this section we
data. Flake et al. [13l used graph algorithms to mine com-
briefly mention some of the main ones.
munities from the Web (defined as sets of sites that have
Although the network we have mined is large by the staa-
more links to each other than to non-members).
daxds of previous research, much larger ones can be en-
Several researchers have studied the problem of estimating
visioned. Scaling up may be helped by developing search
a customer's lifetime value from data [22]. This line of re-
methods specific to the problem, to replace the generic ones
search generally focuses on variables like an individual's ex-
we used here. Segmenting a network into more tractable
pected tenure as a customer [30] and future frequency of pur-
parts with minimal loss of profit may also be important.
chases [15]. Customer networks have received some atten-
Flake et al. [13] provide a potential way of doing this. A
tion in the marketing literature [20]. Most of these studies
related approach would be to mine subnetworks with high
are purely qualitative; where data sets appear, they are very
profit potential embedded in larger ones. Recent work on
small, and used only for descriptive purposes. Krackhardt
mining significant Web subgraphs such as bipartite cores,
[27] proposes a very simple model for optimizing which cus-
cliques and webrings (e.g., [28]) provides a starting point.
tomers to offer a free sample of a product to. The model only
More generally, we would llke to develop a characterization
considers the impact on the customer's immediate friends,
of network types with respect to the profit that can be ob-
ignores the effect of product characteristics, assumes the tel-
tained in them using an optimal marketing strategy. This
54
would, for example, help a company to better gauge the between users is richer and stronger there. For example, it
profit potential of a market before entering (or attempting may be profitable for a company to offer its products at a
to create) it. loss to influential contributors to such sites. Our method
In this paper we mined a network from a single source is also potentially applicable beyond marketing, to promot-
(a collaborative filtering database). In general, multiple ing any type of social change for which the relevant network
sources of relevant information will be available; the Re- of influence can be mined from available data. The spread
ferralWeb project [24] exemplified their use. Methods for of online interaction creates unprecedented opportunities for
combining diverse information into a sound representation of the study of social information processing; our work is a step
the underlying influence patterns are thus an important area towards better exploiting this new wealth of information.
for research. In particular, detecting the presence of causal
relations between individuals (as opposed to purely corre- 7. CONCLUSION
lational ones) is key. While mining causal knowledge from This paper proposed the application of data mining to vi-
observational databases is difficult, there has been much re- ral marketing. Viewing customers as nodes in a social net-
cent progress [10, 39]. work, we modeled their influence on each other as a Markov
We have also assumed so far that the relevant social net- random field. We developed methods for mining social net-
work is completely known. In many (or most) applications work models from collaborative filtering databases, and for
this will not be the case. For example, a long-distance tele- using these models to optimize marketing decisions. An
phone company may know the pattern of telephone calls empirical study using the EachMovie collaborative filtering
among its customers, but not among its non-customers. How- database confirmed the promise of this approach.
ever, it may be able to make good use of connections be-
tween customers and non-customers, or to take advantage
of information about former customers. A relevant ques-
8. REFERENCES
[1] A. L. Barab~si, R. Albert, and H. Jong. Scale-free
tion is thus: what can be inferred from a (possibly biased)
characteristics of random networks: The topology of
sample of nodes and their neighbors in a network? At the
the World Wide Web. Ph~/siea A, 281:69-77, 2000.
extreme where no detailed information about individual in-
teractions is available, our method could be extended to [2] J. Besag. Spatial interaction and the statistical
apply to networks where nodes are groups of similar or re- analysis of lattice systems. Journal of the Royal
lated customers, and edges correspond to influence among Statistical Society, Series B, 36:192-236, 1974.
groups. [3] J. S. Breese, D. Heckerman, and C. Kadie. Empirical
Another promising research direction is towards more de- analysis of predictive algorithms for collaborative
tailed node models and multiple types of relations between filtering. In Proceedings of the Fourteenth Conference
nodes. A theoretical framework for this could be provided on Uncertainty in Artificial Intelligence, Madison, WI,
by the probabilistic relational models of Friedman et al. [14]. 1998. Morgan Kaufmann.
We would also like to extend our approach to consider multi- [4] S. Brin and L. Page. The anatomy of a large-scale
ple types of marketing actions and product-design decisions, hypertextual Web search engine. In Proceedings of the
and to multi-player markets (i.e., markets where the actions Seventh International World Wide Web Conference,
of competitors must also be taken into account, leading to Brisbane, Australia, 1998. Elsevier.
a game-like search process). [5] B. Cestnik. Estimating probabilities: A crucial task in
This paper considered making marketing decisions at a machine learning. In Proceedings of the Ninth
specific point in time. A more sophisticated alternative European Conference on Artificial Intelligence, pages
would be to plan a marketing strategy by explicitly sim- 147-149, Stockholm, Sweden, 1990. Pitman.
ulating the sequential adoption of a product by customers [6] S. Chakrabarti, B. Dom, and P. Indyk. Enhanced
given different interventions at different times, and adapting hypertext categorization using hyperlinks. In
the strategy as new data on customer response arrives. A Proceedings of the 1998 ACM SIGMOD International
further time-dependent aspect of the problem is that social Conference on Management of Data, pages 307-318,
networks are not static objects; they evolve, and particularly Seattle, WA, 1998. ACM Press.
on the Internet can do so quite rapidly. Some of the largest [7] R. Chellappa and A. K. Jain, editors. Markov Random
opportunities may lie in modeling and taking advantage of Fields: Theory and Application. Academic Press,
this evolution. Boston, MA, 1993.
Once markets are viewed as social networks, the inade- [8] D. M. Chickering and D. Heckerman. A decision
quacy of random sampling for pilot tests of products sub- theoretic approach to targeted advertising. In
ject to strong network effects (e.g., smart cards, video on Proceedings of the Sixteenth Annual Conference on
demand) becomes clear. Developing a better methodology Uncertainty in Artificial Intelligence, Stanford, CA,
for studies of this type could help avoid some expensive fail- 2000. Morgan Kaufmann.
ures. [9] D. J. Cook and L. B. Holder. Graph-based data
Many e-commerce sites already routinely use collabora- mining. IEEE Intelligent Systems, 15:32-41, 2000.
tive filtering. Given that the infrastructure for data gather- [10] G. F. Cooper. A simple constraint-based algorithm for
ing and for inexpensive execution of marketing actions (e.g., efficiently mining observational databases for causal
making specific offers to specific customers when they visit relationships. Data Mining and Knowledge Discovery,
the site) is already in place, these would appear to be good 1:203-224, 1997.
candidates for a real-world test of our method. The greatest
[11] P. Domingos and M. Pazzani. On the optimality of the
potential, however, may lie in knowledge-sharing and cus-
simple Bayesian classifier under zero-one loss. Machine
tomer review sites like epinions.com, because the interaction
Learning, 29:103-130, 1997.
65
[12] It. Dye. The buzz on buzz. Harvard Business Review, 50-59. Sage, Thousand Oaks, CA, 1996.
78(6):139-146, 2000. [28] R. Kumar, P. IL~ghavan, S. Rajagopalan, and
[13] G. W. Flake, S. Lawrence, and C. L. Giles. Efficient A. Tomkins. Extracting large-scale knowledge bases
identification of Web communities. In Proceedings of from the Web. In Proceedings of the Twenty-Fifth
the Sixth ACM SIGKDD International Conference on International Conference on Very Large Databases,
Knowledge Discovery and Data Mining, pages pages 639-650, Edinburgh, Scotland, 1999. Morgan
150-160, Boston, MA, 2000. ACM Press. Kaufmann.
[14] N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. [29] C. X. Ling and C. Li. Data mining for direct
Learning probabilistic relational models. In marketing: Problems and solutions. In Proceedings of
Proceedings of the Sixteenth International Joint the Fourth International Conference on Knowledge
Conference on Artificial Intelligence, pages 1300-1307, Discovery and Data Mining, pages 73-79, New York,
Stockholm, Sweden, 1999. Morgan Kaufmann. NY, 1998. AAAI Press.
[15] K. Gelbrich and It. Nakhaeizadeh. Value Miner: A [30] D. It. Mani, J. Drew, A. Betz, and P. Datta. Statistics
data mining environment for the calculation of the and data mining techniques for lifetime value
customer lifetime value with application to the modeling. In Proceedings of the Fifth ACM SIGKDD
automotive industry. In Proceedings of the Eleventh International Conference on Knowledge Discovery and
European Conference on Machine Learning, pages Data Mining, pages 94-103, New York, NY, 1999.
154-161, Barcelona, Spain, 2000. Springer. ACM Press.
[16] S. Geman and D. Geman. Stochastic relaxation, Gibbs [31] S. Milgra.m. The small world problem. Psychology
distributions, and the Bayesian restoration of images. Today, 2:60-67, 1967.
IEEE Transactions on Pattern Analysis and Machine [32] J. Neville and D. Jensen. Iterative classification in
Intelligence, 6:721-741, 1984. relational data. In Proceedings of the AAAI-2000
[17] D. Heckerman, D. M. Chickering, C. Meek, Workshop on Learning Statistical Models from
It. Itounthwaite, and C. Kadie. Dependency networks Relational Data, pages 42-49, Austin, TX, 2000.
for inference, collaborative filtering, and data AAAI Press.
visualization. Journal of Machine Learning Research, [33] L. Pelkowitz. A continuous relaxation labeling
1:49-75, 2000. algorithm for Markov random fields. IEEE
[18] J. Herlocker, J. Konstan, A. Borchers, and J. Riedl. Transactions on Systems, Man and Cybernetics,
An algorithmic framework for performing 20:709-715, 1990.
collaborative filtering. In Proceedings of the 1999 [34] G. Piatetsky-Shapiro and B. Masand. Estimating
Conference on Research and Development in campaign benefits and modeling lift. In Proceedings of
Information Retrieval, Berkeley, CA, 1999. the Fifth ACM SIGKDD International Conference on
[19] A. M. Hughes. The Complete Database Marketer: Knowledge Discovery and Data Mining, pages
Second.Generation Strategies and Techniques for 185-193, San Diego, CA, 1999. ACM Press.
Tapping the Power of your Customer Database. Irwin, [35] P. Itesnick, N. Iacovou, M. Suchak, P. Bergstrom, and
Chicago, IL, 1996. J. Riedl. GroupLens: An open architecture for
[20] D. Iacobucci, editor. Networks in Marketing. Sage, collaborative filtering of netnews. In Proceedings of the
Thousand Oaks, CA, 1996. A CM 199~ Conference on Computer Supported
[21] C. L. Isbell, Jr., M. Kearns, D. Korman, S. Singh, and Cooperative Work, pages 175-186, New York, NY,
P. Stone. Cobot in LambdaMOO: A social statistics 1994. ACM Press.
agent. In Proceedings of the Seventeenth National [36] J. Itucker and M. J. Polanco. Siteseer: Personalized
Conference on Artificial Intelligence, pages 36-41, navigation for the web. Communications of the A CM,
Austin, TX, 2000. AAAI Press. 40(3):73-76, 1997.
[22] D. It. Jackson. Strategic application of customer [37] M. F. Schwartz and D. C. M. Wood. Discovering
lifetime value in direct marketing. Journal of shared interests using graph analysis. Communications
Targeting, Measurement and Analysis for Marketing, of the ACM, 36(8):78-89, 1993.
1:9-17, 1994. [38] C. Shapiro and H. It. Varian. Information Rules: A
[23] S. Jurvetson. What exactly is viral marketing? Red Strategic Guide to the Network Economy. Harvard
Herring, 78:110-112, 2000. Business School Press, Boston, MA, 1999.
[24] H. Kautz, B. Selman, and M. Shah. IteferralWeb: [39] C. Silverstein, S. Brin, It. Motwani, and J. Ullman.
Combining social networks and collaborative filtering. Scalable techniques for mining causal structures. Data
Communications of the ACM, 40(3):63-66, 1997. Mining and Knowledge Discovery, 4:163-192, 2000.
[25] R. Kindermman and J. L. Snell. Markov Random [40] L. Terveen, W. Hill, B. Amento, D. McDonald, and
Fields and Their Applications. American J. Creter. PHOAKS: A system for sharing
Mathematical Society, Providence, RI, 1980. recommendations. Communications of the A CM,
[26] J. M. Kleinberg. Authoritative sources in a 40(3):59-62, 1997.
hyperlinked environment. In Proceedings of the Ninth [41] S. Wasserman and K. Faust. Social Network Analysis:
Annual A CM-SIAM Symposium on Discrete Methods and Applications. Cambridge University
Algorithms, pages 668-677, Baltimore, MD, 1998. Press, Cambridge, UK, 1994.
ACM Press. [42] D. J. Watts and S. H. Strogatz. Collective dynamics of
[27] D. Krackhardt. Structural leverage in marketing. In "small-world" networks. Nature, 393:440--442, 1998.
D. Iacobueci, editor, Networks in Marketing, pages
66