Privacy-Preserving Social Media Data Publishing For Personalized Ranking-Based Recommendation
Privacy-Preserving Social Media Data Publishing For Personalized Ranking-Based Recommendation
net/publication/325420687
CITATIONS READS
48 3,031
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Philippe Cudre-Mauroux on 04 April 2019.
Abstract—Personalized recommendation is crucial to help users find pertinent information. It often relies on a large collection of user
data, in particular users’ online activity (e.g., tagging/rating/checking-in) on social media, to mine user preference. However, releasing
such user activity data makes users vulnerable to inference attacks, as private data (e.g., gender) can often be inferred from the users’
activity data. In this paper, we proposed PrivRank, a customizable and continuous privacy-preserving social media data publishing
framework protecting users against inference attacks while enabling personalized ranking-based recommendations. Its key idea is to
continuously obfuscate user activity data such that the privacy leakage of user-specified private data is minimized under a given data
distortion budget, which bounds the ranking loss incurred from the data obfuscation process in order to preserve the utility of the data
for enabling recommendations. An empirical evaluation on both synthetic and real-world datasets shows that our framework can
efficiently provide effective and continuous protection of user-specified private data, while still preserving the utility of the obfuscated
data for personalized ranking-based recommendation. Compared to state-of-the-art approaches, PrivRank achieves both a better
privacy protection and a higher utility in all the ranking-based recommendation use cases we tested.
Index Terms—Privacy-preserving data publishing, Customized privacy protection, Personalization, Ranking-based recommendation,
Social media, Location based social networks
1 I NTRODUCTION
personalized recommendations. In addition to such public data obfuscation framework for user activity data on
data, these services may require optional access to users’ social media. The key idea is to measure the privacy
profiles. While some privacy-conscious users want to keep leakage of user-specified private data from public data
certain data from their profiles (e.g., gender) as private, based on mutual information, and then to obfuscate
other non privacy-conscious users may not care about the public data such that the privacy leakage is minimized
same type of private data and choose to release them. under a given data distortion budget, which can ensure
Subsequently, an adversary could illegitimately infer the the utility of the released data. To handle the real-world
private data of the privacy-conscious users, by learning the use case of third-party services built on top of social
correlation between the public and the private data from the media, our framework considers both historical and
non privacy-conscious users. Therefore, it is indispensable online user activity data:
to provide privacy protection when releasing user public – Historical data publishing: When a user subscribes to
data from social media. a third-party service for the first time, the service
In this paper, we study the problem of privacy- provider has access to the user’s entire historical
preserving publishing of user social media data by con- public data. To obfuscate the user’s historical data,
sidering both the specific requirements of user privacy on we minimize the privacy leakage from her historical
social media and the data utility for enabling high-quality data by obfuscating her data using data from another
personalized recommendation. Towards this goal, we face user whose historical data is similar but with less
the following three challenges. First, since users often have privacy leakage.
different privacy concerns [5], a specific type of data (e.g., – Online data publishing: After the user subscribed to
gender) may be considered as private by some users, while third-party services, the service provider also has
other users may prefer to consider it as public in order to get real-time access to her future public data stream. Due
better personalized services. Therefore, the first challenge to efficiency considerations, online data publishing
is to provide users with customizable privacy protection, i.e., should be performed based on incoming data in-
to protect user-specified private data only. Second, when stances only (e.g., a rating/tagging/checking-in ac-
subscribing to third-party services, users often allow the ser- tivity on an item), without accessing the user’s histor-
vice providers to access not only their historical public data, ical data. Therefore, we minimize the privacy leakage
but also their future public data as a data stream. Although from individual activity data instance by obfuscating
the obfuscated historical public data can efficiently reduce the data stream on-the-fly.
privacy leakage, the continuous release of user activity • Third, to guarantee the utility of the obfuscated data for
feed will incrementally increase such leakage (see Figure 7 enabling personalized ranking-based recommendation,
for details). Therefore, the second challenge is to provide we measure and bound the data distortion using a
continuous privacy protection over user activity data streams. pairwise ranking loss metric, i.e., the Kendall-τ rank
Third, we consider the case of ranking-based (or top-N) distance [8]. To efficiently incorporate such ranking
recommendation, which is more practical and has been loss, we propose a bootstrap sampling process to fast
widely adopted in practice by many e-commerce platforms approximate the Kendall-τ distance.
[6]. As ranking-based recommendation algorithms mainly • Finally, we conduct an extensive empirical evaluation of
leverage the ranking of items for preference prediction, they PrivRank. The results show that PrivRank can continu-
are sensitive to the ranking loss incurred from the data ously provide customized protection of user-specified
obfuscation process. However, the computation of ranking private data, while the obfuscated data can still be
losses often implies a high cost that is super-linear in the exploited to enable high-quality personalized ranking-
number of items used for recommendation [7]. Therefore, based recommendation.
the third challenge is to efficiently bound ranking loss in data
obfuscation. The rest of the paper is organized as follows. We present
Aiming at overcoming the above challenges, we pro- the related work in Section 2. The preliminaries of our work
pose PrivRank, a customizable and continuous privacy- are presented in Section 3. Afterward, we firstly define our
preserving data publishing framework protect users against threat model in Section 4, and present our historical and
inference attacks while enabling personalized ranking- online data publishing methods in Section 5 and 6, respec-
based recommendation. It provides continuous protection tively. The experimental evaluation is shown in Section 7.
of user-specified private data against inference attacks by We conclude our work in Section 8.
obfuscating both the historical and streaming user activity
data before releasing them, while still preserving the utility
of the published data for enabling personalized ranking-
2 R ELATED W ORK
based recommendation by efficiently limiting the pairwise To protect user privacy when publishing user data, the
ranking loss incurred from data obfuscation. Our main current practice mainly relies on policies or user agreements,
contributions are summarized as follows: e.g., on the use and storage of the published data [4].
• First, considering the use case of recommendation However, this approach cannot guarantee that the users’
based on social media data, we identify a privacy- sensitive information is actually protected from a malicious
preserving data publishing problem by analyzing the attacker. Therefore, to provide effective privacy protection
specific privacy requirements and users’ benefits of when releasing user data, privacy-preserving data publish-
social media. ing has been widely studied. Its key idea is to obfuscate
• Second, we propose a customizable and continuous user data such that published data remains useful for some
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3
application scenarios while the individual’s privacy is pre- the predicted ranking list and the actual list from the users.
served. According to the attacks considered, existing work Therefore, different from existing methods that bound data
can be classified into two categories. distortion using non-ranking-based measures, our approach
The first category is based on heuristic techniques to considers bounding the ranking loss incurred from the data
protect ad-hoc defined user privacy [4]. Specific solutions obfuscation process using the Kendall-τ rank distance [8]
mainly tackle the privacy threat when attackers are able to to preserve the utility of the published data for person-
link the data owner’s identity to a record, or an attribute alized ranking-based recommendation. In addition, as the
in the published data. For example, to protect user privacy computation of ranking losses often implies a high cost that
from identity disclosure, K-anonymity [9] obfuscates the is super-linear in the number of items for recommendation
released data so that each record cannot be distinguished [7], we develop a bootstrap sampling process to fast approx-
from at least k-1 other records. However, since these tech- imate the Kendall-τ distance.
niques usually have ad-hoc privacy definitions, they have Compared to our previous work [2], this paper makes
been proven to be non-universal and can only be successful the following improvements: 1) we extend the scope of our
against limited adversaries [10]. privacy-preserving data publishing problem from location
The second category is theory-based and focuses on the based social networks to general social media; 2) we im-
uninformative principle [11], i.e., on the fact that the pub- prove the data utility guarantee by explicitly considering
lished data should provide attackers with as little additional the use case of personalized ranking-based recommenda-
information as possible beyond background knowledge. tion, and re-design the privacy-preserving data publish-
Differential privacy [12] is a well-known technique that is ing framework by bounding ranking loss; 3) we discuss
known to guarantee user privacy against attackers with and compare different types of ranking losses and select
arbitrary background knowledge. Information-theoretic pri- Kendall-τ distance, and propose a bootstrap-sampling pro-
vacy protection approaches have also been proposed in cess for its fast approximation; 4) we re-design and conduct
that context. They try to quantitatively measure privacy new experiments with two ranking-based recommendation
leakage based on various entropy-based metrics such as use cases to show the effectiveness of our framework and
conditional entropy [10] and mutual information [13], and its superiority over our previous work [2] for enabling
to design privacy-protection mechanisms based on those ranking-based recommendations; 5) we conduct a thorough
measures. Although the concept of differential privacy is scalability study with synthetic datasets, and show that
stricter (i.e., against attackers with arbitrary background PrivRank can scale up to large datasets.
knowledge) than that of information-theoretic approaches,
the latter is intuitively more accessible and fits the practical 3 P RELIMINARIES
requirements of many application domains [10]. In partic-
ular, information theory can provide intuitive guidelines 3.1 System Workflow
to quantitatively measure the amount of a user’s private Figure 1 illustrates the end-to-end workflow of our system.
information that an adversary can learn by observing and PrivRank is implemented as a supplementary module to
analyzing the user’s public data (i.e. the privacy leakage of existing social media platforms, in order to let user en-
private data from public data). joy high-quality personalized recommendations from third-
In this study, we advocate the information-theoretic party services under a customized privacy guarantee.
approach. Specifically, we measure the privacy leakage of 1) When users interact with each other via a social media
user private data from public data based on mutual in- service, they voluntarily share their activity data, partic-
formation, and then obfuscate public data such that the ularly the tagging/rating/checking-in activities which
privacy leakage is minimized under a given data distortion massively implies their preference.
budget. In the current literature, existing data obfuscation 2) When a user wants to subscribe to third-party services,
methods mainly ensure data utility by bounding the data she typically needs to give them access to such kind
distortion using metrics such as Euclidean distance [14], of activity data. Specifically, right after the user’s sub-
Squared L2 distance [15], Hamming distance [1] or Jensen- scription, third-party services can immediately access
Shannon distance [2]. They are analogous to limiting the the user’s historical activity data. Before releasing her
loss of predicting user ratings on items, where the goal is to such data and according to the user’s own criteria, the
minimize the overall difference (e.g., mean absolute error) historical data publishing module obfuscates her histori-
between the predicted ratings and the real ratings from the cal activity data to protect user-specified private data
users. Although minimizing such a rating prediction error is against inference attacks. Afterward, when the user
widely adopted by the research community, ranking-based continuously report her activity on the social media, the
(or top-N) recommendation is more practical and is actually online data publishing module obfuscates each activity
adopted by many e-commercial platforms [6]. Specifically, (e.g., adding a tag to a photo, rating a movie or checking
different from rating prediction that tries to infer how in at a POI) from her activity streams before sending to
users rate items, ranking-based recommendation tries to third-party services. All data obfuscation is performed
determine a ranked list of items for the user, where the with the utility guarantee for personalized ranking-
top items are most likely to be appealing to her. However, based recommendation by limiting the ranking loss
we argue that bounding data distortion using traditional incurred from data obfuscation.
metrics is not optimal for ranking-based recommendation, 3) Despite receiving obfuscated public data, the third-
whose goal is to minimize the ranking difference (e.g., party services can still provide high-quality personal-
pairwise ranking loss or mean average precision) between ized ranking-based recommendation to the users.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4
Fig. 2. A toy example of ranking loss. The real user rating a on three
items can be obfuscated to â1 or â2 . While the Euclidean distances for
both obfuscations are exactly the same, the ranking loss from the two
obfuscations are different. Compared to the ranking list i1 < i2 < i3 in
the original rating a, the obfuscated rating â1 does not incur any ranking
loss as we still observe i1 < i2 < i3 . However, the obfuscated rating â2
incurs a certain ranking loss as we find i1 < i3 < i2 there.
As noted above, we use the probabilistic obfuscation [19]. In other words, for ranking-based recommendation
function pX̂|X to generate the released public data X̂ . There- algorithms, minimizing pairwise/listwise loss is equivalent
fore, the joint probability of X̂ and Y can be computed as: to maximizing the predicted ranking quality measured by
X MAP or NDCG. Following this idea, we also want to bound
p(x̂, y) = pX̂|X (x̂|x)pX,Y (x, y) (6) the data distortion incurred from data obfuscation by limit-
x∈X ing the pairwise ranking loss when obfuscating X into X̂ .
The marginal probability pX̂ (x̂), pX (x) and pY (y) can be We choose to measure the pairwise ranking loss using a
calculated as follows: widely known metric, i.e., the Kendall-τ rank distance [8].
X It measures the number of pairwise disagreements between
pX̂ (x̂) = pX̂|X (x̂|x)pX,Y (x, y) (7) two ranking lists. For two users a and b, we denote their
x∈X,y∈Y public data vectors as V a and V b , respectively. The Kendall-
X X τ rank distance K(V a , V b ) is then computed as:
pX (x) = pX,Y (x, y), pY (y) = pX,Y (x, y) (8) X
y∈Y x∈X K(V a , V b ) = 1Vib <Vjb (12)
Via >Vja
Combined with the above Equations, the mutual informa-
tion between the release public data X̂ and the private data where Via is the ranking score of item i in list V a , and
Y can be derived as: so on. 1cond is an indicator function which is equal to 1
X p(x̂, y) X when cond is true and 0 otherwise. As Eq. 12 counts the
I(X̂, Y ) = p(x̂, y) log − p(y) log p(y) absolute number of pairwise disagreements, we normalize
p(x̂)
x̂∈X̂,y∈Y y∈Y it by dividing by |I|(|I| − 1)/2, so that the normalized
(9) Kendall-τ distance lies in the interval [0,1]:
where the second term is the entropy of Y , i.e.,
P 1 X
− y∈Y p(y) log p(y), which is a constant for the specified K(V a , V b ) = 1 b b (13)
private data (e.g., gender) in a given dataset. Hence, we |I|(|I| − 1)/2 a a Vi <Vj
Vi >Vj
ignore this term in the following derivations and obtain:
A value of 1 indicates maximum disagreement while 0
X p(x̂, y) indicates that the two lists express the same ranking. For
I(X̂, Y ) = p(x̂, y) log (10)
p(x̂) the sake of simplicity, all terms of the Kendall-τ distance
x̂∈X̂,y∈Y
refer to the normalized Kendall-τ distance (Eq. 13) in the
Combined with Equations 6 and 7, the mutual information following.
can then be derived as a function of only two factors, namely In practice, a large number of items yields a high
the joint probability pX,Y which can be empirically obtained cost when computing the Kendall-τ distance. Since the
from a given dataset, and the obfuscation function pX̂|X : computation of the Kendall-τ distance requires a total of
X |I|(|I| − 1)/2 pairwise comparisons, the resulting computa-
I(X̂, Y ) = pX̂|X (x̂|x)pX,Y (x, y) tion complexity is O(n2 ), where n is the number of items |I|.
x̂∈X̂ To efficiently compute the Kendall-τ distance for large item
x∈X
y∈Y sets, we propose to use a bootstrap sampling process [20]
P
pX̂|X (x̂|x0 )pX,Y (x0 , y) (11) to approximate the Kendall-τ distance. Specifically, instead
x0 ∈X of computing all |I|(|I| − 1)/2 comparisons, we randomly
· log P
pX̂|X (x̂|x00 )pX,Y (x00 , y 0 ) sample S pairs of items for comparison. After counting the
x00 ∈X absolute number of disagreements in S sampled pairs, we
0
y ∈Y
then normalize it by dividing by |S|:
The optimal obfuscation function pX̂|X is learned such that 1 X
I(X̂, Y ) is minimized under a given distortion budget ∆X . K(V a , V b ) ≈ 1Vib <Vjb (14)
|S| a a
Vi >Vj ,(i,j)∈S
5.2.2 Bounding Ranking Loss for Utility
5.2.3 Optimal Obfuscation Function Learning
To provide optimal utility guarantees for personalized
ranking-based recommendation, we consider bounding the Considering the above ranking loss as a constraint to ensure
data distortion dist(X̂, X) based on ranking loss. There are high data utility, we now present our algorithm that learns
typically three types of ranking loss functions [16], namely the optimal cluster-wise obfuscation function pĜ|G . For a
pointwise, pairwise, and listwise, which are defined on the given dataset, we can empirically determine pG,Y according
basis of single items, pairs of items, and all ranking items, to private data Y (e.g., gender). Thus, the obfuscation func-
respectively. As a pointwise loss function measures the loss tion pĜ|G can be learned by Algorithm 1, which contains a
of (ranking) score for individual items, it is analogous to convex optimization problem with three constraints (can be
non-ranking-based distance metrics. A theoretical study on solved by many solvers such as CVX [21]). The first con-
these three types of ranking loss functions [16] shows that straint is for the distortion budget that bounds the expected
pairwise and listwise losses are indeed upper bounds of two Kendall-τ distance w.r.t. the probabilistic obfuscation func-
quantities 1-MAP and 1-NDCG, respectively, where Mean tion pĜ|G . Note that it is easy to compute the Kendall-τ dis-
Average Precision (MAP) and Normalized Discounted Cu- tance for Ĝ and G. The last two constraints are probability
mulative Gain (NDCG) are two popular metrics for evalu- constraints of pG|G
ˆ . To stress the protected private data Y ,
ating ranking-based information retrieval algorithms [18], we denote the corresponding optimal obfuscation function
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7
min I(Ĝ, Y )
p ˆ
G|G
Algorithm 3 Personalized activity-wise obfuscation func- we first investigate the trade-off between privacy protection
tion learning and personalization performance for ranking-based recom-
Require: Joint probability pi,Y , distortion budget ∆X , user public data mendation. Second, we study the continuous privacy pro-
vector V u tection performance by evaluating the privacy leakage over
1: Solve the optimization problem for pî|i
time. Third, we evaluate the customization performance of
min I(î|i, Y ) privacy protection by comparing the privacy leakage of
pî|i
user-specified private data and that of other data. Fourth,
s.t., Eî,i (K(V u + i, V u + î)) ≤ ∆X we further explore the utility guarantee for ranking-based
pî|i (î|i) ∈ [0, 1], ∀i, î ∈ I recommendation under different loss metrics. Fifth, based
X on synthetic datasets, we study the impact of private data
pî|i (î|i) = 1, ∀i ∈ I
î
settings. Finally, we evaluate the runtime performance of
our framework. We start by introducing our experimental
2: return pu
î|i,Y setup below before reporting on the evaluation results.
TABLE 1 a test set Xtest (20%), and then use our framework to
Characteristics of the experimental dataset obfuscate Xtrain into X̂train . Subsequently, we apply the
recommendation algorithms on the obfuscated data X̂train ,
Dataset New York City Tokyo
User number 3,669 6,870
and then make predictions for the test dataset Xtest , which
POI number 1,861 2,811 represents the users’ true preference. Our goal is to verify
Check-in number 893,722 1,290,445 that the obfuscated data X̂train can still be used to accurately
predict the users’ true preference in Xtest . To evaluate the
quality of the resulting recommendations, we use Mean
privacy protection and data utility. Specifically, we imple- Average Precision (MAP) [18], which is a widely used metric
ment two inference attack methods to directly assess the in information retrieval to assess the quality of rankings.
performance of our privacy protection and use two real- Higher value of MAP implies better performance. Each reported
world ranking-based recommendation use cases to evaluate result is the mean value of ten repeated trials.
the resulting utility of the obfuscated data.
Privacy. Inference attacks [3] on private data try to
7.1.3 Baseline Approaches
infer a user’s private information Y (e.g., gender) from
her released public data X̂ , which can be regarded as a In order to demonstrate the effectiveness of our framework,
classification problem for discrete data. Therefore, we adopt we compare it with the following baselines:
here two common classification algorithms as inference • Random obfuscation (Rand). For historical data obfus-
attack methods, namely Support Vector Machine (SVM) and cation, it randomly obfuscates each user public data
0
Naive Bayes (NB). We assume that adversaries have trained vector V u to another V u with a given probability
their classifiers based on the original public data X and prand . For online activity obfuscation, it randomly ob-
private data Y from some non privacy-conscious users [13], fuscates each user activity on i to another item i0 with
who do not care about their privacy and publish all their probability prand . Here, prand controls the distortion
data. We randomly sample 50% of all users as such non budget in both cases.
privacy-conscious users for training the classifiers, and then • Frapp [33]. It is a generalized matrix-theoretic frame-
perform inference attacks on the private data Y of the rest work of data perturbation for privacy-preserving min-
of the users based on their obfuscated activity data X̂ . We ing. Its key idea is to obfuscate one’s activity data to
use the Area Under the Curve (AUC), which is a widely itself with higher probability than to others. For his-
used metric for classification problem [30], to evaluate the torical data obfuscation, it obfuscates a user u’s public
0
performance of the inference attacks. We report the value data vector V u to V u with probability pf rapp = γe
(1-AUC) as a privacy protection metric in the experiments. if u = u0 , otherwise pf rapp = e. Here e is used for
Higher value of (1-AUC) implies better privacy protection. The probability normalization, i.e., e = γ+|U1 |−1 . For online
ideal privacy protection is achieved when AU C = 0.5 (i.e., data obfuscation, it obfuscates each activity on item i to
1 − AU C = 0.5), which implies that any inference attack another item i0 with probability pf rapp = γe if i = i0 ,
method performs no better than a random guess. otherwise pf rapp = e (here e = γ+|I|−11
). The distortion
Utility. In this work, utility refers to the ranking-based budget is controlled by γ in both cases.
recommendation performance. We select two typical use cases, • Differential privacy (Diff ) [12] is a state-of-the-art
i.e., POI recommendation [23] and context-aware activity method to protect privacy regardless of the adver-
recommendation [24], as our target scenarios. sary’s prior knowledge. It can be implemented for
• POI Rec. POI recommendation [23] tries to recommend different types of data, such as numeric data [34],
to a user a list of POIs that she would be interested categorical data [35], set-valued data [36] or location
in. To implement this use case, we first consider the data [37]. Here we adopt exponential mechanism [35],
cumulative check-in number of a user on a POI as the [38], [39] in our experiments as it fits our use case
rating (i.e., the cumulative number of interactions as of categorical data. More importantly, it is straightfor-
preference score) to build a user-POI matrix, and then ward to be implemented for online data obfuscation,
leverage a Bayesian personalized ranking algorithm which we explicitly consider in PrivRank. We exclude
[31] to predict the ranked list. Note that the POI Rec other sophisticated differential privacy methods for
is a common use case for user-item recommendation. categorical data (such as [40]), as they do not handle
• Activity Rec. Context-aware activity recommendation online data obfuscation. We implement exponential
[24] tries to come up with a list of activities (represented mechanism as follows. For historical data obfuscation,
0
by POI categories, e.g., restaurant or bar) that a given it obfuscates V u to V u with a probability that de-
0
user may be interested in based on her current context creases exponentially with the distance d(V u , V u ), i.e.,
0 0
(i.e., location and time). We first discretize the context pdif f (V u |V u ) ∝ exp(−βd(V u , V u )), where β ≥ 0. β
(i.e., time slots and location grid cells) of check-in data actually controls the distortion budget. The exponential
to build a user-context-activity tensor using the 0/1- mechanism satisfies 2βdmax -differential privacy, where
0
based scheme (i.e., binary format of preference score), dmax = maxu,u0 ∈U d(V u , V u ). For online activity ob-
and then leverage a ranking tensor factorization algo- fuscation, this method obfuscates each activity of user
rithm [32] for ranking prediction. u on item i with pdif f (i0 |i) ∝ exp(−βd(V u +i0 , V u +i)).
For both use cases, we first randomly split the original Here we also use the Kendall-τ distance for d(·, ·). The
public data X into a training dataset Xtrain (80%) and distortion budget is controlled by β in both cases.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10
TABLE 2
Runtime performance for online data publishing
Fig. 13. Runtime and privacy performance for Activity Rec with SVM
(about 92 check-ins/sec on average) in 20163 .
[5] I. A. Junglas, N. A. Johnson, and C. Spitzmüller, “Personality traits [29] X. Zhao, L. Li, and G. Xue, “Checking in without worries: Location
and concern for privacy: an empirical study in the context of privacy in location based social networks,” in Proc. of INFO-
location-based services,” European Journal of Information Systems, COM’13. IEEE, 2013, pp. 3003–3011.
vol. 17, no. 4, pp. 387–402, 2008. [30] C. X. Ling, J. Huang, and H. Zhang, “Auc: a better measure
[6] P. Cremonesi, Y. Koren, and R. Turrin, “Performance of recom- than accuracy in comparing learning algorithms,” in Advances in
mender algorithms on top-n recommendation tasks,” in Proc. of Artificial Intelligence. Springer, 2003, pp. 329–341.
RecSys’10. ACM, 2010, pp. 39–46. [31] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme,
[7] N. Li, R. Jin, and Z.-H. Zhou, “Top rank optimization in linear “Bpr: Bayesian personalized ranking from implicit feedback,” in
time,” in Advances in neural information processing systems, 2014, Proc. of UAI’09. AUAI Press, 2009, pp. 452–461.
pp. 1502–1510. [32] D. Yang, D. Zhang, Z. Yu, and Z. Yu, “Fine-grained preference-
[8] M. G. Kendall, “Rank correlation methods.” 1948. aware location search leveraging crowdsourced digital footprints
from lbsns,” in Proc. of UbiComp’13. ACM, 2013, pp. 479–488.
[9] L. Sweeney, “k-anonymity: A model for protecting privacy,” In-
[33] S. Agrawal and J. R. Haritsa, “A framework for high-accuracy
ternational Journal of Uncertainty, Fuzziness and Knowledge-Based
privacy-preserving mining,” in Proc. of the ICDE’05. IEEE, 2005,
Systems, vol. 10, no. 05, pp. 557–570, 2002.
pp. 193–204.
[10] L. Sankar, S. R. Rajagopalan, and H. V. Poor, “Utility-privacy [34] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating
tradeoffs in databases: An information-theoretic approach,” IEEE noise to sensitivity in private data analysis,” in Theory of Cryp-
Transactions on Information Forensics and Security, vol. 8, no. 6, pp. tography Conference. Springer, 2006, pp. 265–284.
838–852, 2013. [35] F. McSherry and K. Talwar, “Mechanism design via differential
[11] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubrama- privacy,” in Proc. of FOCS’07. IEEE, 2007, pp. 94–103.
niam, “l-diversity: Privacy beyond k-anonymity,” ACM Transac- [36] R. Chen, N. Mohammed, B. C. Fung, B. C. Desai, and L. Xiong,
tions on Knowledge Discovery from Data, vol. 1, no. 1, p. 3, 2007. “Publishing set-valued data via differential privacy,” PVLDB,
[12] C. Dwork, “Differential privacy,” in Automata, languages and pro- vol. 4, no. 11, pp. 1087–1098, 2011.
gramming. Springer, 2006, pp. 1–12. [37] L. Wang, D. Yang, X. Han, T. Wang, D. Zhang, and X. Ma,
[13] F. du Pin Calmon and N. Fawaz, “Privacy against statistical “Location privacy-preserving task allocation for mobile crowd-
inference,” in Proc. of Allerton’12. IEEE, 2012, pp. 1401–1408. sensing with differential geo-obfuscation,” in Proceedings of the 26th
[14] A. Zhang, S. Bhamidipati, N. Fawaz, and B. Kveton, “Priview: International Conference on World Wide Web. ACM, 2017, pp. 627–
Media consumption and recommendation meet privacy against 636.
inference attacks,” IEEE Web, vol. 2, 2014. [38] C. Dwork, “Differential privacy: A survey of results,” in Proc. of
[15] S. Salamatian, A. Zhang, F. du Pin Calmon, S. Bhamidipati, TAMC. Springer, 2008, pp. 1–19.
N. Fawaz, B. Kveton, P. Oliveira, and N. Taft, “Managing your [39] Z. Huang and S. Kannan, “The exponential mechanism for social
private and public data: Bringing down inference attacks against welfare: Private, truthful, and nearly optimal,” in Proc. of FOCS’12,
your privacy,” IEEE Journal of Selected Topics in Signal Processing, 2012.
vol. 9, no. 7, pp. 1240–1255, 2015. [40] Y. Shen and H. Jin, “Privacy-preserving personalized recommen-
[16] W. Chen, T.-Y. Liu, Y. Lan, Z.-M. Ma, and H. Li, “Ranking measures dation: An instance-based approach via differential privacy,” in
and loss functions in learning to rank,” in Proc. of NIPS, 2009, pp. Proc. of ICDM. IEEE, 2014, pp. 540–549.
315–323.
[17] M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein, “Cluster
analysis and display of genome-wide expression patterns,” PNAS,
vol. 95, no. 25, pp. 14 863–14 868, 1998.
[18] R. Baeza-Yates, B. Ribeiro-Neto et al., Modern information retrieval.
ACM press New York, 1999, vol. 463.
[19] K. Järvelin and J. Kekäläinen, “Cumulated gain-based evaluation Dingqi Yang is a senior researcher in the Department of Computer
of ir techniques,” ACM Transactions on Information Systems (TOIS), Science, University of Fribourg, Switzerland. He received his Ph.D. in
vol. 20, no. 4, pp. 422–446, 2002. Computer Science from Pierre and Marie Curie University (Paris VI)
and Institut Mines-TELECOM/TELECOM SudParis in 2015, where he
[20] B. Efron and R. J. Tibshirani, An introduction to the bootstrap. CRC won both the Doctorate Award and the Institut Mines-TELECOM Press
press, 1994. Mention. His research interests lie in big social media data analytics,
[21] M. Grant and S. Boyd, “Graph implementations for nonsmooth ubiquitous computing and smart city applications.
convex programs,” in Recent Advances in Learning and Control.
Springer, 2008, pp. 95–110.
[22] G. S. Manku, S. Rajagopalan, and B. G. Lindsay, “Approximate
medians and other quantiles in one pass and with limited mem-
ory,” in ACM SIGMOD Record, vol. 27, no. 2. ACM, 1998, pp.
426–435.
[23] D. Yang, D. Zhang, Z. Yu, and Z. Wang, “A sentiment-enhanced Bingqing Qu is a post-doc researcher in the Department of Computer
personalized location recommendation system,” in Proc. of HT’13. Science, University of Fribourg, Switzerland. She received her Ph.D. in
ACM, 2013, pp. 119–128. Computer Science in University of Rennes 1 in 2016. Her research inter-
[24] D. Yang, D. Zhang, V. W. Zheng, and Z. Yu, “Modeling user activ- ests include historical document analysis, multimedia content analysis,
ity preference by leveraging user spatial temporal characteristics social media data mining and computer vision.
in lbsns,” IEEE Transactions on System, Man, Cybernetics: System,
vol. 45, no. 1, pp. 129–142, 2015.
[25] Z. Yu, H. Xu, Z. Yang, and B. Guo, “Personalized travel package
with multi-point-of-interest recommendation based on crowd-
sourced user footprints,” IEEE Transactions on Human-Machine
Systems, vol. 46, no. 1, pp. 151–158, 2016.
[26] D. Yang, D. Zhang, L. Chen, and B. Qu, “Nationtelescope: Mon- Philippe Cudre-Mauroux is a Full Professor and the director of the
itoring and visualizing large-scale collective behavior in lbsns,” eXascale Infolab at the University of Fribourg in Switzerland. He re-
Journal of Network and Computer Applications, vol. 55, pp. 170–180, ceived his Ph.D. from the Swiss Federal Institute of Technology EPFL,
2015. where he won both the Doctorate Award and the EPFL Press Mention.
[27] D. Yang, D. Zhang, and B. Qu, “Participatory cultural mapping Before joining the University of Fribourg he worked on information
based on collective behavior data in location-based social net- management infrastructures for IBM Watson Research, Microsoft Re-
works,” ACM Transactions on Intelligent Systems and Technology search Asia, and MIT. His research interests are in next-generation,
(TIST), vol. 7, no. 3, p. 30, 2016. Big Data management infrastructures for non-relational data. Webpage:
[28] Z. Cheng, J. Caverlee, K. Lee, and D. Z. Sui, “Exploring millions http://exascale.info/phil
of footprints in location sharing services.” Proc. of ICWSM’11, vol.
2011, pp. 81–88, 2011.