0% found this document useful (0 votes)
52 views

Discovery of Ranking Fraud For Mobile Apps

This document proposes a system for detecting ranking fraud for mobile apps. It identifies three key challenges: (1) fraud does not always occur, so the system must detect local rather than global anomalies, (2) the large number of apps requires scalable automatic detection without benchmarks, and (3) dynamic rankings make evidence linking difficult. The system addresses these by mining app leading sessions to focus detection, investigating ranking, rating and review evidence through statistical tests, and optimally aggregating evidence for detection. An evaluation on real iOS app data validated the system's effectiveness and scalability.

Uploaded by

gopi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Discovery of Ranking Fraud For Mobile Apps

This document proposes a system for detecting ranking fraud for mobile apps. It identifies three key challenges: (1) fraud does not always occur, so the system must detect local rather than global anomalies, (2) the large number of apps requires scalable automatic detection without benchmarks, and (3) dynamic rankings make evidence linking difficult. The system addresses these by mining app leading sessions to focus detection, investigating ranking, rating and review evidence through statistical tests, and optimally aggregating evidence for detection. An evaluation on real iOS app data validated the system's effectiveness and scalability.

Uploaded by

gopi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 1

Discovery of Ranking Fraud for Mobile Apps


Hengshu Zhu, Hui Xiong, Senior Member, IEEE, Yong Ge, and Enhong Chen, Senior Member, IEEE

Abstract—Ranking fraud in the mobile App market refers to fraudulent or deceptive activities which have a purpose of bumping up
the Apps in the popularity list. Indeed, it becomes more and more frequent for App developers to use shady means, such as inflating
their Apps’ sales or posting phony App ratings, to commit ranking fraud. While the importance of preventing ranking fraud has been
widely recognized, there is limited understanding and research in this area. To this end, in this paper, we provide a holistic view of
ranking fraud and propose a ranking fraud detection system for mobile Apps. Specifically, we first propose to accurately locate the
ranking fraud by mining the active periods, namely leading sessions, of mobile Apps. Such leading sessions can be leveraged for
detecting the local anomaly instead of global anomaly of App rankings. Furthermore, we investigate three types of evidences, i.e.,
ranking based evidences, rating based evidences and review based evidences, by modeling Apps’ ranking, rating and review behaviors
through statistical hypotheses tests. In addition, we propose an optimization based aggregation method to integrate all the evidences
for fraud detection. Finally, we evaluate the proposed system with real-world App data collected from the iOS App Store for a long time
period. In the experiments, we validate the effectiveness of the proposed system, and show the scalability of the detection algorithm as
well as some regularity of ranking fraud activities.

Index Terms—Mobile Apps, Ranking Fraud Detection, Evidence Aggregation, Historical Ranking Records, Rating and Review.

1 I NTRODUCTION short time. For example, an article from VentureBeat [4]


reported that, when an App was promoted with the
The number of mobile Apps has grown at a breathtaking
help of ranking manipulation, it could be propelled
rate over the past few years. For example, as of the end
from number 1,800 to the top 25 in Apple’s top free
of April 2013, there are more than 1.6 million Apps at
leaderboard and more than 50,000-100,000 new users
Apple’s App store and Google Play. To stimulate the
could be acquired within a couple of days. In fact, such
development of mobile Apps, many App stores launched
ranking fraud raises great concerns to the mobile App
daily App leaderboards, which demonstrate the chart
industry. For example, Apple has warned of cracking
rankings of most popular Apps. Indeed, the App leader-
down on App developers who commit ranking fraud [3]
board is one of the most important ways for promoting
in the Apple’s App store.
mobile Apps. A higher rank on the leaderboard usually
In the literature, while there are some related work,
leads to a huge number of downloads and million
such as web ranking spam detection [22], [25], [30],
dollars in revenue. Therefore, App developers tend to
online review spam detection [19], [27], [28], and mobile
explore various ways such as advertising campaigns to
App recommendation [24], [29], [31], [32], the problem of
promote their Apps in order to have their Apps ranked
detecting ranking fraud for mobile Apps is still under-
as high as possible in such App leaderboards.
explored. To fill this crucial void, in this paper, we
However, as a recent trend, instead of relying on propose to develop a ranking fraud detection system for
traditional marketing solutions, shady App developers mobile Apps. Along this line, we identify several im-
resort to some fraudulent means to deliberately boost portant challenges. First, ranking fraud does not always
their Apps and eventually manipulate the chart rankings happen in the whole life cycle of an App, so we need
on an App store. This is usually implemented by using to detect the time when fraud happens. Such challenge
so-called “bot farms” or “human water armies” to inflate can be regarded as detecting the local anomaly instead of
the App downloads, ratings and reviews in a very global anomaly of mobile Apps. Second, due to the huge
number of mobile Apps, it is difficult to manually label
• H. Zhu and E. Chen are with the School of Computer Science and ranking fraud for each App, so it is important to have
Technology, University of Science and Technology of China, Hefei, Anhui
230026, China.
a scalable way to automatically detect ranking fraud
Email: [email protected]; [email protected] without using any benchmark information. Finally, due
• H. Xiong is with the Management Science and Information Systems to the dynamic nature of chart rankings, it is not easy
Department, Rutgers Business School, Rutgers University, Newark, NJ
07102 USA.
to identify and confirm the evidences linked to ranking
Email: [email protected] fraud, which motivates us to discover some implicit
• Y. Ge is with the Computer Science Department, UNC Charlotte, Char- fraud patterns of mobile Apps as evidences.
lotte, NC 28223 USA.
Email: [email protected]
Indeed, our careful observation reveals that mobile
Apps are not always ranked high in the leaderboard, but
This is a substantially extended and revised version of [33], which appears
in Proceedings of the 22nd ACM Conference on Information and Knowledge only in some leading events, which form different leading
Management (CIKM2013). sessions. Note that we will introduce both leading events

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 2

Work Flow
START
INPUT Data Flow
real-world data sets. Section 6 provides a brief review
Mobile Apps Historical Records
MINING LEADING of related works. Finally, in Section 7, we conclude the
SESSIONS
paper and propose some future research directions.
RANKING BASED
EVIDENCES
2 I DENTIFYING L EADING S ESSIONS FOR MO-
OUTPUT
RATING BASED
BILE A PPS
EVIDENCE
AGGREGATION EVIDENCES
In this section, we first introduce some preliminaries,
REVIEW BASED and then show how to mine leading sessions for mobile
END EVIDENCES
Apps from their historical ranking records.
Fig. 1. The framework of our ranking fraud detection
2.1 Preliminaries
system for mobile Apps.
The App leaderboard demonstrates top K popular Apps
and leading sessions in detail later. In other words, with respect to different categories, such as “Top Free
ranking fraud usually happens in these leading sessions. Apps” and “Top Paid Apps”. Moreover, the leaderboard
Therefore, detecting ranking fraud of mobile Apps is is usually updated periodically (e.g., daily). There-
actually to detect ranking fraud within leading sessions fore, each mobile App a has many historical ranking
of mobile Apps. Specifically, we first propose a simple records which can be denoted as a time series, Ra =
yet effective algorithm to identify the leading sessions of {r1a , · · · , ria , · · · , rna }, where ria ∈ {1, ..., K, +∞} is the
each App based on its historical ranking records. Then, ranking of a at time stamp ti ; +∞ means a is not ranked
with the analysis of Apps’ ranking behaviors, we find in the top K list; n denotes the number of all ranking
that the fraudulent Apps often have different ranking records. Note that, the smaller value ria has, the higher
patterns in each leading session compared with nor- ranking position the App obtains.
mal Apps. Thus, we characterize some fraud evidences By analyzing the historical ranking records of mobile
from Apps’ historical ranking records, and develop three Apps, we observe that Apps are not always ranked
functions to extract such ranking based fraud evidences. high in the leaderboard, but only in some leading events.
Nonetheless, the ranking based evidences can be affected For example, Figure 2 (a) shows an example of leading
by App developers’ reputation and some legitimate mar- events of a mobile App. Formally, we define a leading
keting campaigns, such as “limited-time discount”. As event as follows.
a result, it is not sufficient to only use ranking based Definition 1 (Leading Event): Given a ranking thresh-
evidences. Therefore, we further propose two types of old K ∗ ∈ [1, K], a leading event e of App a contains
fraud evidences based on Apps’ rating and review his- a time range Te = [testart , teend ] and corresponding rank-
tory, which reflect some anomaly patterns from Apps’ ings of a, which satisfies rstart a
≤ K ∗ < rstart−1
a
, and
historical rating and review records. In addition, we a ∗
rend ≤ K < rend+1 . Moreover, ∀tk ∈ (tstart , teend ), we
a e
develop an unsupervised evidence-aggregation method have rka ≤ K ∗ .
to integrate these three types of evidences for evaluating Note that we apply a ranking threshold K ∗ which is
the credibility of leading sessions from mobile Apps. usually smaller than K here because K may be very big
Figure 1 shows the framework of our ranking fraud (e.g., more than 1000), and the ranking records beyond
detection system for mobile Apps. K ∗ (e.g., 300) are not very useful for detecting the
It is worth noting that all the evidences are extracted ranking manipulations.
by modeling Apps’ ranking, rating and review behaviors Furthermore, we also find that some Apps have sev-
through statistical hypotheses tests. The proposed frame- eral adjacent leading events which are close to each
work is scalable and can be extended with other domain- other and form a leading session. For example, Figure 2(b)
generated evidences for ranking fraud detection. Finally, shows an example of adjacent leading events of a given
we evaluate the proposed system with real-world App mobile App, which form two leading sessions. Particu-
data collected from the Apple’s App store for a long larly, a leading event which does not have other nearby
time period, i.e., more than two years. Experimental neighbors can also be treated as a special leading session.
results show the effectiveness of the proposed system, The formal definition of leading session is as follows.
the scalability of the detection algorithm as well as some Definition 2 (Leading Session): A leading session s of
regularity of ranking fraud activities. App a contains a time range Ts = [tsstart , tsend ] and
Overview. The remainder of this paper is organized as n adjacent leading events {e1 , ..., en }, which satisfies
follows. In Section 2, we introduce some preliminaries tsstart = testart1
, tsend = teendn
and there is no other leading

and how to mine leading sessions for mobile Apps. session s that makes Ts ⊆ Ts∗ . Meanwhile, ∀i ∈ [1, n),
ei+1
Section 3 presents how to extract ranking, rating and we have (tstart − teendi
) < ϕ, where ϕ is a predefined time
review based evidences and combine them for ranking threshold for merging leading events.
fraud detection. In Section 4 we make some further Intuitively, the leading sessions of a mobile App rep-
discussion about the proposed approach. In Section 5, resent its periods of popularity, so the ranking manip-
we report the experimental results on two long-term ulation will only take place in these leading sessions.

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 3

Event 1

Event 2 Session 1 Session 2

(a) Leading Events (b) Leading Sessions


Fig. 2. (a) Example of leading events; (b) Example of Fig. 3. An example of different ranking phases of a
leading sessions of mobile Apps. leading event.
Therefore, the problem of detecting ranking fraud is to 3 E XTRACTING E VIDENCES FOR R ANKING
detect fraudulent leading sessions. Along this line, the F RAUD D ETECTION
first task is how to mine the leading sessions of a mobile In this section, we study how to extract and combine
App from its historical ranking records. fraud evidences for ranking fraud detection.
3.1 Ranking based Evidences
2.2 Mining Leading Sessions
According to the definitions introduced in Section 2, a
There are two main steps for mining leading sessions. leading session is composed of several leading events.
First, we need to discover leading events from the App’s Therefore, we should first analyze the basic characteris-
historical ranking records. Second, we need to merge tics of leading events for extracting fraud evidences.
adjacent leading events for constructing leading sessions. By analyzing the Apps’ historical ranking records,
Specifically, Algorithm 1 demonstrates the pseudo code we observe that Apps’ ranking behaviors in a leading
of mining leading sessions for a given App a. event always satisfy a specific ranking pattern, which
Algorithm 1 Mining Leading Sessions consists of three different ranking phases, namely, rising
phase, maintaining phase and recession phase. Specifically, in
Input 1: a’s historical ranking records Ra ;
Input 2: the ranking threshold K ∗ ; each leading event, an App’s ranking first increases to a
Input 2: the merging threshold ϕ; peak position in the leaderboard (i.e., rising phase), then
Output: the set of a’s leading sessions Sa ; keeps such peak position for a period (i.e., maintaining
Initialization: Sa = ∅; phase), and finally decreases till the end of the event
1: Es = ∅; e = ∅; s = ∅; testart = 0; (i.e., recession phase). Figure 3 shows an example of
2: for each i ∈ [1, |Ra |] do different ranking phases of a leading event. Indeed, such
3: if ria ≤ K ∗ and testart == 0 then a ranking pattern shows an important understanding of
4: testart = ti ;
leading event. In the following, we formally define the
5: else if ria > K ∗ and testart ̸= 0 then
6: //found one event; three ranking phases of a leading event.
7: teend = ti−1 ; e =< testart , teend >; Definition 3 (Ranking Phases of a Leading Event):
8: if Es == ∅ then Given a leading event e of App a with time range
9: Es ∪ = e; tsstart = testart ; tsend = teend ; [testart , teend ], where the highest ranking position of a
10: else if (testart − tsend ) < ϕ then a
is rpeak , which belongs to ∆R. The rising phase of
11: Es ∪ = e; tsend = teend ;
12: else then e is a time range [tea , teb ], where tea = testart , rba ∈ ∆R
13: //found one session; and ∀ti ∈ [tea , teb ) satisfies ria ̸∈ ∆R. The maintaining
14: s =< tsstart , tsend , Es >; phase of e is a time range [teb , tec ], where rca ∈ ∆R and
15: Sa ∪ = s; s = ∅ is a new session; ∀ti ∈ (tec , teend ] satisfies ria ̸∈ ∆R. The recession phase is
16: Es = {e}; tsstart = testart ; tsend = teend ; a time range [tec , ted ], where ted = teend .
17: tstart = 0; e = ∅ is a new leading event;
e

18: return Sa Note that, in Definition 3, ∆R is a ranking range


to decide the beginning time and the end time of the
In Algorithm 1, we denote each leading event e and maintaining phase. teb and tec are the first and last time
session s as tuples < testart , teend > and < tsstart , tsend , Es > when the App is ranked into ∆R. It is because an
respectively, where Es is the set of leading events in App, even with ranking manipulation, cannot always
session s. Specifically, we first extract individual leading maintain the same peak position (e.g., rank 1) in the
event e for the given App a (i.e., Step 2 to 7) from the leaderboard but only in a ranking range (e.g., top 25).
beginning time. For each extracted individual leading If a leading session s of App a has ranking fraud,
event e, we check the time span between e and the a’s ranking behaviors in these three ranking phases of
current leading session s to decide whether they belong leading events in s should be different from those in
to the same leading session based on Definition 2. Partic- a normal leading session. Actually, we find that each
ularly, if (testart − tsend ) < ϕ, e will be considered as a new App with ranking manipulation always has an expected
leading session (i.e., Step 8 to 16). Thus, this algorithm ranking target (e.g., top 25 in leaderboard for one week)
can identify leading events and sessions by scanning a’s and the hired marketing firms also charge money accord-
historical ranking records only once. ing to such ranking expectation (e.g., $1000/day in top

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 4

where |Es | is the number of leading events in session


s. Intuitively, if a leading session s contains significantly
higher θs compared with other leading sessions of Apps
in the leaderboard, it has high probability of having
ranking fraud. To capture this, we propose to apply
statistical hypothesis test for computing the significance
of θs for each leading session. Specifically, we define
(a) Example 1 (b) Example 2 two statistical hypotheses as follows and compute the
Fig. 4. Two real-world examples of leading events. p-value of each leading session.
◃ H YPOTHESIS 0: The signature θs of leading session s is
25). Therefore, for both App developers and marketing
not useful for detecting ranking fraud.
firms, the earlier the ranking expectation meets, the more
◃ H YPOTHESIS 1: The signature θs of leading session s is
money can be earned. Moreover, after reaching and
significantly greater than expectation.
maintaining the expected ranking for a required period,
the manipulation will be stopped and the ranking of the Here, we propose to use the popular Gaussian ap-
malicious App will decrease dramatically. As a result, proximation to compute the p-value with the above
the suspicious leading events may contain very short hypotheses. Specifically, we assume θs follows the Gaus-
rising and recession phases. Meanwhile, the cost of sian distribution, θs ∼ N (µθ , σθ ), where µθ and σθ can
ranking manipulation with high ranking expectations is be learnt by the classic maximum-likelihood estimation
quite expensive due to the unclear ranking principles (MLE) method from the observations of θs in all Apps’
of App stores and the fierce competition between App historical leading sessions. Then, we can calculate the
developers. Therefore, the leading event of fraudulent p-value by
( )
Apps often has very short maintaining phase with high ( ) (θ − µ )
1 s
ranking positions. P N (µθ , σθ ) ≥ θs = 1 − θ
1 + erf √ , (3)
2 σθ 2
Figure 4 (a) shows an example of ranking records from
one of the reported suspicious Apps [5]. We can see that where erf(x) is the Gaussian Error Function as follows,
this App has several impulsive leading events with high ∫ x
2 2
ranking positions. In contrast, the ranking behaviors of a erf(x) = √ e−t dt. (4)
π 0
normal App’s leading event may be completely different.
For example, Figure 4 (b) shows an example of ranking Intuitively, a leading session with a smaller p-value P
records from a popular App “Angry Birds: Space”, which has more chance to reject H YPOTHESIS 0 and accept
contains a leading event with a long time range (i.e., H YPOTHESIS 1. This means it has more chance of com-
more than one year), especially for the recession phase. mitting ranking fraud. Thus, we define the evidence as
In fact, once a normal App is ranked high in the leader- ( )
Ψ1 (s) = 1 − P N (µθ , σθ ) ≥ θs . (5)
board, it often owns lots of honest fans and may attract
more and more users to download. Therefore, this App EVIDENCE 2. As discussed above, the Apps with
will be ranked high in the leaderboard for a long time. ranking fraud often have a short maintaining phase with
Based on the above discussion, we propose some ranking high ranking positions in each leading event. Thus, if
based signatures of leading sessions to construct fraud we denote the maintaining phase of a leading event e
evidences for ranking fraud detection. as ∆tem = (tec − teb + 1), and the average rank in this
• EVIDENCE 1. As shown in Figure 3, we use two maintaining phase as rem , we can define a fraud signature
shape parameters θ1 and θ2 to quantify the ranking χs for each leading session as follows,
patterns of the rising phase and the recession phase of 1 ∑ K ∗ − rem
App a’s leading event e, which can be computed by χs = , (6)
|Es | e∈s ∆tem
K ∗ − rba K ∗ − rca
θ1e = arctan( ), θ2e = arctan( e ). (1) where K ∗ is the ranking threshold in Definition 1.
tb − ta
e e td − tec
If a leading session contains significantly higher χs
where K ∗ is the ranking threshold in Definition 1. In- compared with other leading sessions of Apps in the
tuitively, a large θ1 may indicate that the App has been leaderboard, it has high chance of having ranking fraud.
bumped to a high rank within a short time, and a large θ2 To capture such signatures, we define two statistical
may indicate that the App has dropped from a high rank hypotheses as follows to compute the significance of χs
to the bottom within a short time. Therefore, a leading for each leading session.
session, which has more leading events with large θ1 ◃ H YPOTHESIS 0: The signature χs of leading session s is
and θ2 values, has higher probability of having ranking not useful for detecting ranking fraud.
fraud. Here, we define a fraud signature θs for a leading ◃ H YPOTHESIS 1: The signature χs of leading session s is
session as follows. significantly higher than expectation.
1 ∑ e Here, we also propose to use the Gaussian approxima-
θs = (θ1 + θ2e ), (2)
|Es | e∈s tion to calculate the p-value with the above hypotheses.

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 5

Specifically, we assume χs follows the Gaussian distri-


bution, χs ∼ N (µχ , σχ ), where µχ and σχ can be learnt
by the MLE method from the observations of χs in all
Apps’ historical leading sessions. Then, we can calculate
the evidence by
( )
Ψ2 (s) = 1 − P N (µχ , σχ ) ≥ χs . (7)

EVIDENCE 3. The number of leading events in a (a) Example 1 (b) Example 2


leading session, i.e., |Es |, is also a strong signature of Fig. 5. Two real-world examples of the distribution of
ranking fraud. For a normal App, the recession phase Apps’ daily average ratings.
indicates the fading of popularity. Therefore, after the Specifically, after an App has been published, it can
end of a leading event, it is unlikely to appear another be rated by any user who downloaded it. Indeed, user
leading event in a short time unless the App updates its rating is one of the most important features of App ad-
version or carries out some sales promotion. Therefore, vertisement. An App which has higher rating may attract
if a leading session contains much more leading events more users to download and can also be ranked higher
compared with other leading sessions of Apps in the in the leaderboard. Thus, rating manipulation is also an
leaderboard, it has high probability of having ranking important perspective of ranking fraud. Intuitively, if an
fraud. To capture this, we define two statistical hypothe- App has ranking fraud in a leading session s, the ratings
ses to compute the significance of |Es | for each leading during the time period of s may have anomaly patterns
session as follows. compared with its historical ratings, which can be used
◃ H YPOTHESIS 0: The signature |Es | of leading session s for constructing rating based evidences. For example,
is not useful for detecting ranking fraud. Figures 5 (a) and (b) show the distributions of the daily
◃ H YPOTHESIS 1: The signature |Es | of leading session s average rating of a popular App “WhatsApp” and a
is significantly lager than expectation. suspicious App discovered by our approach, respec-
Since |Es | always has discrete values, we propose to tively. We can observe that a normal App always receives
leverage the Poisson approximation to calculate the p- similar average rating each day, while a fraudulent App
value with the above hypotheses. Specifically, we assume may receive relatively higher average ratings in some
|Es | follows the Poisson distribution, |Es | ∼ P(λs ), time periods (e.g., leading sessions) than other times.
where the parameter λs can be learnt by the MLE Thus, we define two rating fraud evidences based on
method from the observations of |Es | in all Apps’ histor- user rating behaviors as follows.
ical leading sessions. Then, we can calculate the p-value EVIDENCE 4. For a normal App, the average rating in
as follows, a specific leading session should be consistent with the
|Es |
average value of all historical ratings. In contrast, an App
( ) ∑ (λs )i with rating manipulation might have surprisingly high
P P(λs ) ≥ |Es | = 1 − e−λs . (8)
i=0
i! ratings in the fraudulent leading sessions with respect to
its historical ratings. Here, we define a fraud signature
Therefore, we can compute the evidence by ∆Rs for each leading session as follows,
( )
Ψ3 (s) = 1 − P P(λs ) ≥ |Es | . (9) Rs − Ra
∆Rs = , (s ∈ a) (10)
Intuitively, the values of the above three evidences Ra
Ψ1 (s), Ψ2 (s) and Ψ3 (s) are all within the range of [0, 1]. where Rs is the average rating in leading session s, and
Meanwhile, the higher evidence value a leading session Ra is the average historical rating of App a. Therefore, if
has, the higher probability this session contains ranking a leading session has significantly higher value of ∆Rs
fraud activities. compared with other leading sessions of Apps in the
leaderboard, it has high probability of having ranking
3.2 Rating based Evidences fraud. To capture this, we define statistical hypotheses to
The ranking based evidences are useful for ranking fraud compute the significance of ∆Rs for each leading session
detection. However, sometimes, it is not sufficient to only as follows.
use ranking based evidences. For example, some Apps ◃ H YPOTHESIS 0: The signature ∆Rs of leading session
created by the famous developers, such as Gameloft, s is not useful for detecting ranking fraud.
may have some leading events with large values of ◃ H YPOTHESIS 1: The signature ∆Rs of leading session
θ1 due to the developers’ credibility and the “word-of- s is significantly higher than expectation.
mouth” advertising effect. Moreover, some of the legal Here, we use the Gaussian approximation to calculate
marketing services, such as “limited-time discount”, may the p-value with the above hypotheses. Specifically, we
also result in significant ranking based evidences. To assume ∆Rs follows the Gaussian distribution, ∆Rs ∼
solve this issue, we also study how to extract fraud N (µR , σR ), where µR and σR can be learnt by the
evidences from Apps’ historical rating records. MLE method from the observations of ∆Rs in all Apps’

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 6

historical leading sessions. Then, we can compute the downloads, and thus propel the App’s ranking position
evidence by in the leaderboard. Although some previous works on
( ) review spam detection have been reported in recent
Ψ4 (s) = 1 − P N (µR , σR ) ≥ ∆Rs . (11)
years [14], [19], [21], the problem of detecting the local
EVIDENCE 5. In the App rating records, each rating anomaly of reviews in the leading sessions and capturing
can be categorized into |L| discrete rating levels, e.g., 1 them as evidences for ranking fraud detection are still
to 5, which represent the user preferences of an App. The under-explored. To this end, here we propose two fraud
rating distribution with respect to the rating level li in a evidences based on Apps’ review behaviors in leading
normal App a’s leading session s, p(li |Rs,a ), should be sessions for detecting ranking fraud.
consistent with the distribution in a’s historical rating EVIDENCE 6. Indeed, most of the the review manip-
records, p(li |Ra ), and vice versa. Specifically,
( N s ) we can ulations are implemented by bot farms due to the high
compute the distribution by p(li |Rs,a ) = N sli , where cost of human resource. Therefore, review spamers often
(.)
post multiple duplicate or near-duplicate reviews on the
Nlsi is the number of ratings in s and the rating is at level
s same App to inflate downloads [19], [21]. In contrast,
li , N(.) is the total number of ratings in s. Meanwhile,
the normal App always have diversified reviews since
we can compute p(li |Ra ) in a similar way. Then, we use
users have different personal perceptions and usage
the Cosine similarity between p(li |Rs,a ) and p(li |Ra ) to
experiences. Based on the above observations, here we
estimate the difference as follows.
∑|L| define a fraud signature Sim(s), which denotes the
p(li |Rs,a ) × p(li |Ra ) average mutual similarity between the reviews within
D(s) = √∑ i=1
√∑ . (12)
|L| |L| leading session s. Specifically, this fraud signature can
i=1 p(li |Rs,a ) × i=1 p(li |Ra )
2 2

be computed by following steps.


Therefore, if a leading session has significantly lower First, for each review c in leading session s, we remove
value of D(s) compared with other leading sessions of all stop words (e.g., “of”, “the”) and normalize verbs and
Apps in the leaderboard, it has high probability of hav- adjectives (e.g., “plays → play”, “better → good”).
ing ranking fraud. To capture this, we define statistical Second, we build a normalized words vector − w→ =
c
hypotheses to compute the significance of D(s) for each dim[n] for each review c, where n indicates the number
leading session as follows. of all unique normalized words in all reviews of s. To
◃ H YPOTHESIS 0: The signature D(s) of leading session s f reqi,c
be specific, here we have dim[i] = ∑ f req (1 ≤ i ≤ n),
is not useful for detecting ranking fraud. i i,c
where f reqi,c is the frequency of the i-th word in c.
◃ H YPOTHESIS 1: The signature D(s) of leading session s
Finally, we can calculated the similarity between two
reviews ci and cj by the Cosine similarity Cos(− w→ −→
is significantly lower than expectation.
ci , wcj ).
Here, we use the Gaussian approximation to compute Thus, the fraud signature Sim(s) can be computed by
the p-value with the above hypotheses. Specifically, we ∑
assume D(s) follows the Gaussian distribution, D(s) ∼ 2× 1≤i<j≤Ns Cos(−
w→ −→
ci , wcj )
Sim(s) = , (14)
N (µD , σD ), where µD and σD can be learnt by the Ns × (Ns − 1)
MLE method from the observations of D(s) in all Apps’
historical leading sessions. Then, we can compute the where Ns is the number of reviews during leading
evidence by session s. Intuitively, the higher value of sim(s) indicates
( ) more duplicate/near-duplicate reviews in s. Thus, if a
Ψ5 (s) = 1 − P N (µD , σD ) ≤ D(s) . (13) leading session has significantly higher value of Sim(s)
The values of two evidences Ψ4 (s) and Ψ5 (s) are in compared with other leading sessions of Apps in the
the range of [0, 1]. Meanwhile, the higher evidence value leaderboard, it has high probability of having ranking
a leading session has, the more chance this session has fraud. To capture this, we define statistical hypotheses
ranking fraud activities. to compute the significance of Sim(s) for each leading
session as follows.
3.3 Review based Evidences ◃ H YPOTHESIS 0: The signature Sim(s) of leading session
Besides ratings, most of the App stores also allow users s is not useful for detecting ranking fraud.
to write some textual comments as App reviews. Such ◃ H YPOTHESIS 1: The signature Sim(s) of leading session
reviews can reflect the personal perceptions and usage s is significantly higher than expectation.
experiences of existing users for particular mobile Apps. Here, we use the Gaussian approximation to com-
Indeed, review manipulation is one of the most im- pute the p-value with the above hypotheses. Specifically,
portant perspective of App ranking fraud. Specifically, we assume Sim(s) follows the Gaussian distribution,
before downloading or purchasing a new mobile App, Sim(s) ∼ N (µSim , σSim ), where µSim and σSim can be
users often firstly read its historical reviews to ease learnt by the MLE method from the observations of
their decision making, and a mobile App contains more Sim(s) in all Apps’ historical leading sessions. Then, we
positive reviews may attract more users to download. can compute the evidence by
Therefore, imposters often post fake reviews in the lead- ( )
ing sessions of a specific App in order to inflate the App Ψ6 (s) = 1 − P N (µSim , σSim ) ≥ Sim(s) . (15)

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 7

EVIDENCE 7. From the real-world observations, we Here, we also use the Gaussian approximation to com-
find that each review c is always associated with a pute the p-value with the above hypotheses. Specifically,
specific latent topic z. For example, some reviews may we assume DKL (s||a) follows the Gaussian distribution,
be related to the latent topic “worth to play” while DKL (s||a) ∼ N (µDL , σDL ), where µDL and σDL can
some may be related to the latent topic “very boring”. be learnt by the MLE method from the observations of
Meanwhile, since different users have different personal DKL (s||a) in all Apps’ historical leading sessions. Then,
preferences of mobile Apps, each App a may have we can compute the evidence by
different topic distributions in their historical review ( )
records. Intuitively, the topic distribution of reviews in Ψ7 (s) = 1 − P N (µDL , σDL ) ≥ DKL (s||a) . (17)
a normal leading session s of App a, i.e., p(z|s), should The values of two evidences Ψ6 (s) and Ψ7 (s) are in
be consistent with the topic distribution in all historical the range of [0, 1]. Meanwhile, the higher evidence value
review records of a, i.e., p(z|a). It is because that the a leading session has, the more chance this session has
review topics are based on the users’ personal usage ranking fraud activities.
experiences but not the popularity of mobile Apps. In
contrast, if the reviews of s have been manipulated, the 3.4 Evidence Aggregation
two topic distributions will be markedly different. For After extracting three types of fraud evidences, the next
example, there may contain more positive topics, such as challenge is how to combine them for ranking fraud
“worth to play” and “popular”, in the leading session. detection. Indeed, there are many ranking and evidence
In this paper we propose to leverage topic modeling aggregation methods in the literature, such as permu-
to extract the latent topics of reviews. Specifically, here tation based models [17], [18], score based models [11],
we adopt the widely used Latent Dirichlet Allocation [26] and Dempster-Shafer rules [10], [23]. However, some
(LDA) model [9] for learning latent semantic topics. To of these methods focus on learning a global ranking for
be more specific, the historical reviews of a mobile App all candidates. This is not proper for detecting ranking
a, i.e., Ca , is assumed to be generated as follows. First, fraud for new Apps. Other methods are based on super-
before generating Ca , K prior conditional distributions vised learning techniques, which depend on the labeled
of words given latent topics {ϕz } are generated from a training data and are hard to be exploited. Instead,
prior Dirichlet distribution β. Second, a prior latent topic we propose an unsupervised approach based on fraud
distribution θa is generated from a prior Dirichlet distri- similarity to combine these evidences.
bution α for each mobile App a. Then, for generating Specifically, we define the final evidence score Ψ∗ (s)
the j-th word in Ca denoted as wa,j , the model firstly as a linear combination of all the existing evidences as
generates a latent topic z from θa and then generates Equation 18. Note that, here we propose to use the linear
wa,j from ϕz . The training process of LDA model is combination because it has been proven to be effective
to learn proper latent variables θ = {P (z|Ca )} and and is widely used in relevant domains, such as ranking
ϕ = {P (w|z)} for maximizing the posterior distribution aggregation [16], [20].
of review observations, i.e., P (Ca |α, β, θ, ϕ). In this paper,

NΨ ∑

we use a Markov chain Monte Carlo method named ∗
Ψ (s) = wi × Ψi (s), s.t. wi = 1, (18)
Gibbs sampling [12] for training LDA model. If we i=1 i=1
denote the reviews in leading session s of a as Cs,a , we
can use the KL-divergence to estimate the difference of where NΨ = 7 is the number of evidences, and weight
topic distributions between Ca and Cs,a . wi ∈ [0, 1] is the aggregation parameter of evidence
Ψi (s). Thus, the problem of evidence aggregation be-
∑ P (zk |Cs,a ) comes how to learn the proper parameters {wi } from
DKL (s||a) = P (zk |Cs,a )ln , (16)
P (zk |Ca ) the training leading sessions.
k
∏ We first propose an intuitive assumption as Principle
where P (zk |Ca ) and P (zk |Cs,a ) ∝ P (zk ) w∈Cs,a P (w|zk ) 1 for our evidence aggregation approach. Specifically,
can be obtained through the LDA training process. The we assume that effective evidences should have similar
higher value of DKL (s||a) indicates the higher difference evidence scores for each leading session, while poor evidences
of topic distributions between Ca and Cs,a . Therefore, will generate different scores from others. In other words,
if a leading session has significantly higher value of evidences that tend to be consistent with the plurality
DKL (s||a) compared with other leading sessions of Apps of evidences will be given higher weights and evidences
in the leaderboard, it has high probability of having which tend to disagree will be given smaller weights. To
ranking fraud. To capture this, we define statistical hy- this end, for each evidence score Ψi (s), we can measure
potheses to compute the significance of DKL (s||a) for its consistence using the variance-like measure
each leading session as follows. ( )2
σi (s) = Ψi (s) − Ψ(s) , (19)
◃ H YPOTHESIS 0: The signature DKL (s||a) of leading
session s is not useful for detecting ranking fraud. where Ψ(s) is the average evidence score of leading
◃ H YPOTHESIS 1: The signature DKL (s||a) of leading session s obtained from all NΨ evidences. If σi (s) is
session s is significantly higher than expectation. small, the corresponding Ψi (s) should be given a bigger

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 8

weight and vice versa. Therefore, given an App set by σi∗ (s) in Equation 20, and exploit similar gradient
A = {ai } with their leading sessions {sj }, we can define based approach that is introduced above for learning the
the evidence aggregation problem as an optimization weights of evidences.
problem that minimizes weighted variances of the ev-
idences over all leading sessions; that is 4 D ISCUSSION
Here, we provide some discussion about the proposed
∑ ∑∑

ranking fraud detection system for mobile Apps.
arg min wi · σi (s), (20)
w First, the download information is an important sig-
a∈A s∈a i=1
nature for detecting ranking fraud, since ranking ma-


nipulation is to use so-called “bot farms” or “human
s.t. wi = 1; ∀wi ≥ 0. (21)
i=1
water armies” to inflate the App downloads and ratings
in a very short time. However, the instant download
In this paper, we exploit the gradient based approach
information of each mobile App is often not available for
with exponentiated updating [15], [16] to solve this
analysis. In fact, Apple and Google do not provide ac-
problem. To be specific, we first assign wi = N1Ψ as
curate download information on any App. Furthermore,
the initial value, then for each s, we can compute the
the App developers themselves are also reluctant to
gradient by,
release their download information for various reasons.
∂wi · σi (s) Therefore, in this paper, we mainly focus on extracting
∇i = = σi (s). (22)
∂wi evidences from Apps’ historical ranking, rating and re-
Thus, we can update the weight wi by view records for ranking fraud detection. However, our
approach is scalable for integrating other evidences if
w∗ × exp(−λ∇i ) available, such as the evidences based on the download
wi = ∑NΨi , (23)

j=1 wj × exp(−λ∇j ) information and App developers’ reputation.
Second, the proposed approach can detect ranking
where wi∗ is the last updated weight value wi , and λ is
fraud happened in Apps’ historical leading sessions.
the learning rate, which is empirically set λ = 10−2 in
However, sometime, we need to detect such ranking
our experiments.
fraud from Apps’ current ranking observations. Actually,
Finally, we can exploit Equation (18) to estimate the a
given the current ranking rnow of an App a, we can
final evidence score of each leading session. Moreover,
detect ranking fraud for it in two different cases. First,
given a leading session s with a predefined threshold τ , a
if rnow > K ∗ , where K ∗ is the ranking threshold intro-
we can determine that s has ranking fraud if Ψ∗ (s) > τ .
duced in Definition 1, we believe a does not involve in
However, sometimes only using evidence scores for
ranking fraud, since it is not in a leading event. Second,
evidence aggregation is not appropriate. It is because a
if rnow < K ∗ , which means a is in a new leading event e,
that different evidences may have different score range
we treat this case as a special case that teend = tenow and
to evaluate leading sessions. For example, some evi-
θ2 = 0. Therefore, such real-time ranking frauds also can
dences may always generate higher scores for leading
be detected by the proposed approach.
sessions than the average evidence score, although they
Finally, after detecting ranking fraud for each leading
can detect fraudulent leading sessions and rank them in
session of a mobile App, the remainder problem is
accurate positions.
how to estimate the credibility of this App. Indeed, our
Therefore, here we propose another assumption as
approach can discover the local anomaly instead of the
Principle 2 for our evidence aggregation approach.
global anomaly of mobile Apps. Thus, we should take
Specifically, we assume that effective evidences should rank
consideration of such kind of local characteristics when
leading sessions from a similar conditional distribution, while
estimating the credibility of Apps. To be specific, we
poor evidences will lead to a more uniformly random ranking
define an App fraud score F(a) for each App a according
distribution [16]. To this end, given a set of leading
to how many leading sessions of a contain ranking fraud.
sessions, we first rank them by each evidence score and ∑
obtain NΨ ranked lists. Let us denote πi (s) as the ranking F(a) = [[Ψ∗ (s) > τ ]] × Ψ∗ (s) × ∆ts , (26)
of session s returned by Ψi (s), then we can calculate the s∈a
average ranking for leading session s by where s ∈ a denotes that s is a leading session of
App a, and Ψ∗ (s) is the final evidence score of leading
1 ∑

session s that can be calculated by Equation 18. In
π(s) = πi (s). (24) particular, we define a signal function [[x]] (i.e., [[x]] = 1
NΨ i=1
if x = T rue, and 0 otherwise) and a fraud threshold τ to
Then, for each evidence score Ψi (s), we can measure its decide the top k fraudulent leading sessions. Moreover,
consistence using the variance-like measure, ∆ts = (tsend − tsstart + 1) is the time range of s, which
( )2 indicates the duration of ranking fraud. Intuitively, an
σi∗ (s) = πi (s) − π(s) . (25)
App contains more leading sessions, which have high
If σi∗ (s) is small, the corresponding Ψi (s) should be given fraud evidence scores and long time duration, will have
a bigger weight and vice versa. Then we can replace σi (s) higher App fraud scores.

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 9

(a) Top Free 300 data set (b) Top Paid 300 data set (a) Top Free 300 data set (b) Top Paid 300 data set
Fig. 6. The distribution of the number of Apps w.r.t Fig. 8. The distribution of the number of Apps w.r.t
different rankings. different numbers of leading events.

(a) Top Free 300 data set (b) Top Paid 300 data set (a) Top Free 300 data set (b) Top Paid 300 data set

Fig. 7. The distribution of the number of Apps w.r.t Fig. 9. The distribution of the number of Apps w.r.t
different numbers of ratings. different number of leading sessions.

5 E XPERIMENTAL R ESULTS 5.2 Mining Leading Sessions


In this section, we evaluate the performances of ranking Here, we demonstrate the results of mining leading
fraud detection using real-world App data. sessions in both data sets. Specifically, in Algorithm 1,
we set the ranking threshold K ∗ = 300 and threshold
5.1 The Experimental Data
ϕ = 7. This denotes two adjacent leading events can be
The experimental data sets were collected from the “Top segmented into the same leading session if they occur
Free 300” and “Top Paid 300” leaderboards of Apple’s within one week of each other. Figure 8 and Figure 9
App Store (U.S.) from February 2, 2010 to September 17, show the distributions of the number of Apps with
2012. The data sets contain the daily chart rankings 1 of respect to different numbers of contained leading events
top 300 free Apps and top 300 paid Apps, respectively. and leading sessions in both data sets. In these figures,
Moreover, each data set also contains the user ratings we can see that only a few Apps have many leading
and review information. Table 1 shows the detailed data events and leading sessions. The average numbers of
characteristics of our data sets. leading events and leading sessions are 2.69 and 1.57 for
TABLE 1 free Apps, and 4.20 and 1.86 for paid Apps. Moreover,
Statistics of the experimental data. Figures 10 (a) and 10 (b) show the distribution of the
number of leading sessions with respect to different
Top Free 300 Top Paid 300
App Num. 9,784 5,261
numbers of contained leading events in both data sets.
Ranking Num. 285,900 285,900 In these figures, we can find only a few leading sessions
Avg. Ranking Num. 29.22 54.34 contain many leading events. This also validates the
Rating Num. 14,912,459 4,561,943 evidence Ψ3 . Indeed, the average number of leading
Avg. Rating Num. 1,524.17 867.12
events in each leading session is 1.70 for free Apps and
Figures 6 (a) and 6 (b) show the distributions of the 2.26 for paid Apps.
number of Apps with respect to different rankings in
these data sets. In the figures, we can see that the 5.3 Human Judgement based Evaluation
number of Apps with low rankings is more than that
of Apps with high rankings. Moreover, the competition To the best of our knowledge, there is no existing
between free Apps is more than that between paid Apps, benchmark to decide which leading sessions or Apps
especially in high rankings (e.g., top 25). Figures 7 (a) really contain ranking fraud. Thus, we develop four
and 7 (b) show the distribution of the number of Apps intuitive baselines and invite five human evaluators
with respect to different number of ratings in these data to validate the effectiveness of our approach EA-RFD
sets. In the figures, we can see that the distribution of (Evidence Aggregation based Ranking Fraud Detection).
App ratings is not even, which indicates that only a small Particularly, we denote our approach with score based
percentage of Apps are very popular. aggregation (i.e., Principle 1) as EA-RFD-1, and our
approach with rank based aggregation (i.e., Principle 2)
1. The information was collected at 11:00PM (PST) each day. as EA-RFD-2, respectively.

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 10

(a) Top Free 300 data set (b) Top Paid 300 data set
Fig. 10. The distribution of the number of leading ses-
sions w.r.t different number of leading events.

5.3.1 Baselines Fig. 11. The screenshots of our fraud evaluation platform.
5.3.2 The Experimental Setup
The first baseline Ranking-RFD stands for Ranking ev-
idence based Ranking Fraud Detection, which estimates To study the performance of ranking fraud detection by
ranking fraud for each leading session by only using each approach, we set up the evaluation as follows.
ranking based evidences (i.e., Ψ1 to Ψ3 ). These three First, for each approach, we selected 50 top ranked
evidences are integrated by our aggregation approach. leading sessions (i.e., most suspicious sessions), 50 mid-
The second baseline Rating-RFD stands for Rating dle ranked leading sessions (i.e., most uncertain ses-
evidence based Ranking Fraud Detection, which esti- sions), and 50 bottom ranked leading sessions (i.e., most
mates the ranking fraud for each leading session by only normal sessions) from each data set. Then, we merged
using rating based evidences (i.e., Ψ4 and Ψ5 ). These two all the selected sessions into a pool which consists 587
evidences are integrated by our aggregation approach. unique sessions from 281 unique Apps in “Top Free 300”
data set, and 541 unique sessions from 213 unique Apps
The third baseline Review-RFD stands for Review ev-
in “Top Paid 300” data set.
idence based Ranking Fraud Detection, which estimates
the ranking fraud for each leading session by only using Second, we invited five human evaluators who are
review based evidences (i.e., Ψ6 and Ψ7 ). These two familiar with Apple’s App store and mobile Apps to
evidences are integrated by our aggregation approach. manually label the selected leading sessions with score
2 (i.e., Fraud), 1 (i.e., Not Sure) and 0 (i.e., Non-
Particularly, here we only use the rank based aggrega-
fraud). Specifically, for each selected leading session,
tion approach (i.e., Principle 2) for integrating evidences
each evaluator gave a proper score by comprehensively
in above baselines. It is because that these baselines are
considering the profile information of the App (e.g.,
mainly used for evaluating the effectiveness of different
descriptions, screenshots), the trend of rankings during
kinds of evidences, and our preliminary experiments val-
this session, the App leaderboard information during
idated that baselines with Principle 2 always outperform
this session, the trend of ratings during this session, and
baselines with Principle 1.
the reviews during this session. Moreover, they can also
The last baseline E-RFD stands for Evidence based download and try the corresponding Apps for obtaining
Ranking Fraud Detection, which estimates the ranking user experiences. Particularly, to facilitate their evalu-
fraud for each leading session by ranking, rating and ation, we develop a Ranking Fraud Evaluation Platform,
review based evidences without evidence aggregation. which ensures that the evaluators can easily browse
Specifically, it ranks leading sessions by Equation 18, all the information. Also, the platform demonstrates
where each wi is set to be 1/7 equally. This baseline leading sessions in random orders, which guarantees
is used for evaluating the effectiveness of our ranking there is no relationship between leading sessions’ order
aggregation method. and their fraud scores. Figure 11 shows the screenshot
Note that, according to Definition 3, we need to define of the platform. The left panel shows the main menu,
some ranking ranges before extracting ranking based the right upper panel shows the reviews for the given
evidences for EA-RFD-1, EA-RFD-2, Rank-RFD and E- session, and the right lower panel shows the ranking
RFD. In our experiments, we segment the rankings into related information for the given session. After human
5 different ranges, i.e., [1, 10], [11, 25], [26, 50], [51, 100], evaluation, each leading session s is assigned a fraud
[101, 300], which are commonly used in App leader- score f (s) ∈ [0, 10]. As a result, all the five evaluators
boards. Furthermore, we use the LDA model to extract agreed on 86 fraud sessions and 113 non-fraud sessions
review topics as introduced in Section 3.3. Particularly, Top Free 300 data set. Note that, 11 labeled fraud sessions
we first normalize each review by the Stop-Words Re- among them are from the external reported suspicious
mover [6] and the Porter Stemmer [7]. Then, the number Apps [4], [5], which validates the effectiveness of our hu-
of latent topic Kz is set to 20 according to the perplexity man judgement. Similarly, all the five evaluators agreed
based estimation approach [8], [31]. Two parameters α on 94 fraud sessions and 119 non-fraud sessions Top Free
and β for training LDA model are set to be 50/K and 300 data set. Moreover, we computed the Cohen’s kappa
0.1 according to [13]. coefficient [1] between each pair of evaluators to estimate

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 11

(a) P recision@K (b) Recall@K (c) F @K (d) N DCG@K


Fig. 12. The overall performance of each detection approach in Top Free 300 data set.

(a) P recision@K (b) Recall@K (c) F @K (d) N DCG@K


Fig. 13. The overall performance of each detection approach in Top Paid 300 data set.
the inter-evaluator agreement. The values of Cohen’s our approach, i.e., EA-RFD-2/EA-RFD-2, consistently
kappa coefficient are between 0.66 to 0.72 in the user outperforms other baselines and the improvements are
evaluation. This indicates the substantial agreement [19]. more significant for smaller K (e.g., K < 100). This result
Finally, we further ranked the leading sessions by each clearly validates the effectiveness of our evidence ag-
approach with respect to their fraudulent scores, and gregation based framework for detecting ranking fraud.
obtained six ranked lists of leading sessions. In partic- Second, EA-RFD-2 outperforms EA-RFD-1 sightly in
ular, if we treat the commonly agreed fraud sessions terms of all evaluation metrics, which indicates that rank
(i.e., 89 sessions in Top Free 300 data set, 94 sessions based aggregation (i.e., Principle 2) is more effective than
in Top Paid 300 data set) as the ground truth, we can score based aggregation (i.e., Principle 1) for integrating
evaluate each approach with three widely-used metrics, fraud evidences. Third, our approach consistently out-
namely P recision@K, Recall@K, F @K [2]. Also, we can performs E-RFD, which validates the effectiveness of ev-
exploit the metric Normalized Discounted Cumulative idence aggradation for detecting ranking fraud. Fourth,
Gain (NDCG) for determining the ranking performance E-RFD have better detection performance than Ranking-
of each approach. Specifically, the discounted cumula- RFD, Rating-RFD and Review-RFD. This indicates that
tive gain given a cut-off rank K can be calculated by leveraging three kinds of evidences is more effective than
∑K 2f (si ) −1
DCG@K = i=1 log2 (1+i) , where f (si ) is the human
only using one type of evidences, even if without evi-
labeled fraud score. The N DCG@K is the DCG@K dence aggregation. Finally, by comparing Ranking-RFD,
normalized by the IDCG@K, which is the DCG@K Rating-RFD and Review-RFD, we can observe that the
value of the ideal ranking list of the returned results, i.e., ranking based evidences are more effective than rating
we have N DCG@K = IDCG@K DCG@K
. N DCG@K indicates and review based evidences. It is because rating and re-
how well the rank order of given sessions returned by an view manipulations are only supplementary to ranking
approach with a cut-off rank K. The larger N DCG@K manipulation. Particularly, we observe that Review-RFD
value, the better performance of ranking fraud detection. may not be able to lead to the good performance in
terms of all evaluation metrics on the two data sets. A
5.3.3 Overall Performances
possible reason behind this phenomenon is that review
In this subsection, we present the overall performances manipulation (i.e., fake-positive reviews) does not di-
of each ranking fraud detection approach with re- rectly affect the chart ranking of Apps, but may increase
spect to different evaluation metrics, i.e., P recision@K, the possibility of inflating App downloads and ratings.
Recall@K, F @k, and N DCG@K. Particularly, here we Therefore, the review manipulation does not necessarily
set the maximum K to be 200, and all experiments are result in ranking fraud due to the unknown ranking
conducted on a 2.8GHZ×2 quad-core CPU, 4G main principles in the App Store. However, the proposed
memory PC. review based evidences can be helpful as supplementary
Figures 12 and Figures 13 show the evaluation per- for ranking fraud detection. Actually, in our preliminary
formance of each detection approach in two data sets. experiments, we found that the review based evidences
From these figures we can observe that the evaluation could always improve the detection performances while
results in two data sets are consistent. Indeed, by an- being used together with other evidences. This clearly
alyzing the evaluation results, we can obtain several validates the effectiveness of the review based evidences.
insightful observations. Specifically, first, we find that

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 12

(a) Fluff Friends (b) VIP Poker


Fig. 14. Case study of reported suspicious mobile Apps.
To further validate the experimental results, we also
conduct a series of paired T-test of 0.95 confidence level
which show that the improvements of our approach, i.e.,
EA-RFD-2/EA-RFD-1, on all evaluation metrics with dif-
ferent K compared to other baselines are all statistically
significant.
5.4 Case Study: Evaluating App Credibility (c) Tiny Pets (d) Crime City
As introduced in Section 4, our approach can be used Fig. 15. The demonstration of the ranking records of four
for evaluating the credibility of Apps by Equation 26. reported suspicious Apps.
Here, we study the performance of evaluating App
credibility based on the prior knowledge from existing
reports. Specifically, as reported by IBTimes [5], there are
eight free Apps which might involve in ranking fraud.
In this paper, we use seven of them in our data set
(Tiny Pets, Social Girl, Fluff Friends, Crime City, VIP Poker,
Sweet Shop, Top Girl) for evaluation. Indeed, we try to
study whether each approach can find these suspicious
Apps with high rankings, since a good ranking fraud
detection system should have the capability of capturing (a) Top Free 300 data set (b) Top Paid 300 data set
these suspicious Apps. Particularly, instead of setting Fig. 16. The robustness test of our aggregation model
a fixed fraud threshold τ in Equation 26, we treat top with two principles.
10% ranked leading sessions as suspicious sessions to
compute the credibility of each App. case, the process of extracting evidences for each leading
Figure 14 shows the top percentage position of each session will be very fast (less than 100 millisecond on
App in the ranked list returned by each approach. We average in our experiments).
can see that our approach, i.e., EA-RFD-2 and EA-RFD- Meanwhile, a learning process is required for evidence
1, can rank those suspicious Apps into higher positions aggregation. After learning the aggregation model on
than other baseline methods. Similarly as the results in a historical data set, each new test App can reuse this
Section 5.3.3, only leveraging single kind of evidences model for detecting ranking fraud. However, it is still not
for fraud detection cannot obtain good performance, i.e., clear how many learning data are required. To study this
finding such suspicious Apps in high positions. problem and validate the robustness of our approach, we
Figure 15 shows the ranking records of the above Apps first rank all leading sessions by modeling with weight
(limited by space, we only show four of them). In this parameters learnt from the entire data set. Then we
figure, we find all these Apps have clear ranking based also rank all leading sessions by modeling with weight
fraud evidences. For example, some Apps have very parameters learnt from different segmentation of the
short leading sessions with high rankings (i.e., Evidence entire data set (i.e., 10%,...,100%). Finally, we test the root
1 and 2), and some Apps have leading session with many mean squared error (RMSE) of the ranking of leading
leading events (i.e., Evidence 3). These observations sessions between different results. Figure 16 shows the
clearly validate the effectiveness of our approach. results of robust test on two data sets. We can find that
the aggregation model does not need a lot of learning
5.5 Efficiency and Robustness of our Approach data, thus the robustness of our approach is reasonable.
The computational cost of our approach majorally comes
from the task of extracting three kinds of fraud evidences 6 R ELATED W ORK
for the given leading sessions. Indeed, the main pro- Generally speaking, the related works of this study can
cesses of this task can be calculated offline in advance. be grouped into three categories.
For example, the LDA model can be trained offline and The first category is about Web ranking spam detec-
the fraud signatures of the existing leading sessions can tion. Specifically, the Web ranking spam refers to any
also be mined in advance and stored in the server. In this deliberate actions which bring to selected Web pages

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 13

an unjustifiable favorable relevance or importance [30]. Moreover, we proposed an optimization based aggrega-
For example, Ntoulas et al. [22] have studied various tion method to integrate all the evidences for evaluating
aspects of content-based spam on the Web and presented the credibility of leading sessions from mobile Apps.
a number of heuristic methods for detecting content An unique perspective of this approach is that all the
based spam. Zhou et al [30] have studied the problem evidences can be modeled by statistical hypothesis tests,
of unsupervised Web ranking spam detection. Specifi- thus it is easy to be extended with other evidences
cally, they proposed an efficient online link spam and from domain knowledge to detect ranking fraud. Finally,
term spam detection methods using spamicity. Recently, we validate the proposed system with extensive experi-
Spirin et al. [25] have reported a survey on Web spam ments on real-world App data collected from the Apple’s
detection, which comprehensively introduces the princi- App store. Experimental results showed the effectiveness
ples and algorithms in the literature. Indeed, the work of the proposed approach.
of Web ranking spam detection is mainly based on the In the future, we plan to study more effective fraud
analysis of ranking principles of search engines, such as evidences and analyze the latent relationship among rat-
PageRank and query term frequency. This is different ing, review and rankings. Moreover, we will extend our
from ranking fraud detection for mobile Apps. ranking fraud detection approach with other mobile App
The second category is focused on detecting online related services, such as mobile Apps recommendation,
review spam. For example, Lim et al. [19] have identified for enhancing user experience.
several representative behaviors of review spammers Acknowledgement. This work was supported in part by grants
and model these behaviors to detect the spammers. Wu from National Science Foundation for Distinguished Young Scholars
et al. [27] have studied the problem of detecting hybrid of China (Grant No. 61325010), Natural Science Foundation of China
shilling attacks on rating data. The proposed approach (NSFC, Grant No.71329201), National High Technology Research and
is based on the semi-supervised learning and can be Development Program of China (Grant No. SS2014AA012303), Sci-
used for trustworthy product recommendation. Xie et ence and Technology Development of Anhui Province (Grants No.
al. [28] have studied the problem of singleton review 1301022064), the International Science and Technology Cooperation
spam detection. Specifically, they solved this problem Plan of Anhui Province (Grant No. 1303063008). This work was also
by detecting the co-anomaly patterns in multiple review partially supported by grants from National Science Foundation (NSF)
based time series. Although some of above approaches via grant numbers CCF-1018151 and IIS-1256016.
can be used for anomaly detection from historical rating
and review records, they are not able to extract fraud R EFERENCES
evidences for a given time period (i.e., leading session).
[1] http://en.wikipedia.org/wiki/cohen’s kappa.
Finally, the third category includes the studies on [2] http://en.wikipedia.org/wiki/information retrieval.
mobile App recommendation. For example, Yan et al. [29] [3] https://developer.apple.com/news/index.php?id=02062012a.
developed a mobile App recommender system, named [4] http://venturebeat.com/2012/07/03/apples-crackdown-on-app-
ranking-manipulation/.
Appjoy, which is based on user’s App usage records to [5] http://www.ibtimes.com/apple-threatens-crackdown-biggest-
build a preference matrix instead of using explicit user app-store-ranking-fraud-406764.
ratings. Also, to solve the sparsity problem of App usage [6] http://www.lextek.com/manuals/onix/index.html.
records, Shi et al. [24] studied several recommendation [7] http://www.ling.gu.se/l̃ager/mogul/porter-stemmer.
[8] L. Azzopardi, M. Girolami, and K. V. Risjbergen. Investigating the
models and proposed a content based collaborative fil- relationship between language model perplexity and ir precision-
tering model, named Eigenapp, for recommending Apps recall measures. In Proceedings of the 26th International Conference on
in their Web site Getjar. In addition, some researchers Research and Development in Information Retrieval (SIGIR’03), pages
369–370, 2003.
studied the problem of exploiting enriched contextual [9] D. M. Blei, A. Y. Ng, and M. I. Jordan. Lantent dirichlet allocation.
information for mobile App recommendation. For exam- Journal of Machine Learning Research, pages 993–1022, 2003.
ple, Zhu et al. [32] proposed a uniform framework for [10] Y. Ge, H. Xiong, C. Liu, and Z.-H. Zhou. A taxi driving fraud
detection system. In Proceedings of the 2011 IEEE 11th International
personalized context-aware recommendation, which can Conference on Data Mining, ICDM ’11, pages 181–190, 2011.
integrate both context independency and dependency [11] D. F. Gleich and L.-h. Lim. Rank aggregation via nuclear norm
assumptions. However, to the best of our knowledge, minimization. In Proceedings of the 17th ACM SIGKDD international
conference on Knowledge discovery and data mining, KDD ’11, pages
none of previous works has studied the problem of 60–68, 2011.
ranking fraud detection for mobile Apps. [12] T. L. Griffiths and M. Steyvers. Finding scientific topics. In Proc.
of National Academy of Science of the USA, pages 5228–5235, 2004.
[13] G. Heinrich. Paramter stimaion for text analysis. Technical report,
7 C ONCLUDING R EMARKS University of Lipzig, 2008.
[14] N. Jindal and B. Liu. Opinion spam and analysis. In Proceedings
In this paper, we developed a ranking fraud detection of the 2008 International Conference on Web Search and Data Mining,
system for mobile Apps. Specifically, we first showed WSDM ’08, pages 219–230, 2008.
[15] J. Kivinen and M. K. Warmuth. Additive versus exponentiated
that ranking fraud happened in leading sessions and gradient updates for linear prediction. In Proceedings of the twenty-
provided a method for mining leading sessions for each seventh annual ACM symposium on Theory of computing, STOC ’95,
App from its historical ranking records. Then, we iden- pages 209–218, 1995.
[16] A. Klementiev, D. Roth, and K. Small. An unsupervised learning
tified ranking based evidences, rating based evidences algorithm for rank aggregation. In Proceedings of the 18th European
and review based evidences for detecting ranking fraud. conference on Machine Learning, ECML ’07, pages 616–623, 2007.

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TKDE.2014.2320733, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX 14

[17] A. Klementiev, D. Roth, and K. Small. Unsupervised rank Hui Xiong is currently an Associate Professor
aggregation with distance-based models. In Proceedings of the 25th and Vice Chair of the Management Science
international conference on Machine learning, ICML ’08, pages 472– and Information Systems Department, and the
479, 2008. Director of Rutgers Center for Information As-
[18] A. Klementiev, D. Roth, K. Small, and I. Titov. Unsupervised rank surance at the Rutgers, the State University
aggregation with domain-specific expertise. In Proceedings of the of New Jersey, where he received a two-year
21st international jont conference on Artifical intelligence, IJCAI’09, early promotion/tenure (2009), the Rutgers Uni-
pages 1101–1106, 2009. versity Board of Trustees Research Fellowship
[19] E.-P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, and H. W. Lauw. for Scholarly Excellence (2009), and the ICDM-
Detecting product review spammers using rating behaviors. In 2011 Best Research Paper Award (2011). He
Proceedings of the 19th ACM international conference on Information received the B.E. degree from the University of
and knowledge management, CIKM ’10, pages 939–948, 2010. Science and Technology of China (USTC), China, the M.S. degree from
[20] Y.-T. Liu, T.-Y. Liu, T. Qin, Z.-M. Ma, and H. Li. Supervised rank the National University of Singapore (NUS), Singapore, and the Ph.D.
aggregation. In Proceedings of the 16th international conference on degree from the University of Minnesota (UMN), USA.
World Wide Web, WWW ’07, pages 481–490, 2007. His general area of research is data and knowledge engineering, with
[21] A. Mukherjee, A. Kumar, B. Liu, J. Wang, M. Hsu, M. Castellanos, a focus on developing effective and efficient data analysis techniques
and R. Ghosh. Spotting opinion spammers using behavioral for emerging data intensive applications. He has published prolifically
footprints. In Proceedings of the 19th ACM SIGKDD international in refereed journals and conference proceedings (3 books, 40+ journal
conference on Knowledge discovery and data mining, KDD ’13, 2013. papers, and 60+ conference papers). He is a co-Editor-in-Chief of Ency-
[22] A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting clopedia of GIS, an Associate Editor of IEEE Transactions on Data and
spam web pages through content analysis. In Proceedings of the Knowledge Engineering (TKDE) and the Knowledge and Information
15th international conference on World Wide Web, WWW ’06, pages Systems (KAIS) journal. He has served regularly on the organization
83–92, 2006. and program committees of numerous conferences, including as a
[23] G. Shafer. A mathematical theory of evidence. 1976. Program Co-Chair of the Industrial and Government Track for the 18th
[24] K. Shi and K. Ali. Getjar mobile application recommendations ACM SIGKDD International Conference on Knowledge Discovery and
with very sparse datasets. In Proceedings of the 18th ACM SIGKDD Data Mining and a Program Co-Chair for the 2013 IEEE International
international conference on Knowledge discovery and data mining, Conference on Data Mining (ICDM-2013). He is a senior member of the
KDD ’12, pages 204–212, 2012. ACM and IEEE.
[25] N. Spirin and J. Han. Survey on web spam detection: principles
and algorithms. SIGKDD Explor. Newsl., 13(2):50–64, May 2012.
[26] M. N. Volkovs and R. S. Zemel. A flexible generative model
for preference aggregation. In Proceedings of the 21st international
conference on World Wide Web, WWW ’12, pages 479–488, 2012. Yong Ge received his Ph.D. in Information Tech-
[27] Z. Wu, J. Wu, J. Cao, and D. Tao. Hysad: a semi-supervised hybrid nology from Rutgers, The State University of
shilling attack detector for trustworthy product recommendation. New Jersey in 2013, the M.S. degree in Signal
In Proceedings of the 18th ACM SIGKDD international conference on and Information Processing from the University
Knowledge discovery and data mining, KDD ’12, pages 985–993, 2012. of Science and Technology of China (USTC)
[28] S. Xie, G. Wang, S. Lin, and P. S. Yu. Review spam detection in 2008, and the B.E. degree in Information
via temporal pattern discovery. In Proceedings of the 18th ACM Engineering from Xi’an Jiao Tong University in
SIGKDD international conference on Knowledge discovery and data 2005. He is currently an Assistant Professor at
mining, KDD ’12, pages 823–831, 2012. the University of North Carolina at Charlotte.
[29] B. Yan and G. Chen. Appjoy: personalized mobile application His research interests include data mining and
discovery. In Proceedings of the 9th international conference on Mobile business analytics. He received the ICDM-2011
systems, applications, and services, MobiSys ’11, pages 113–126, 2011. Best Research Paper Award, Excellence in Academic Research (one
[30] B. Zhou, J. Pei, and Z. Tang. A spamicity approach to web spam per school) at Rutgers Business School in 2013, and the Dissertation
detection. In Proceedings of the 2008 SIAM International Conference Fellowship at Rutgers University in 2012. He has published prolifically
on Data Mining, SDM’08, pages 277–288, 2008. in refereed journals and conference proceedings, such as IEEE TKDE,
[31] H. Zhu, H. Cao, E. Chen, H. Xiong, and J. Tian. Exploiting ACM TOIS, ACM TKDD, ACM TIST, ACM SIGKDD, SIAM SDM, IEEE
enriched contextual information for mobile app classification. In ICDM, and ACM RecSys. He has served as Program Committee mem-
Proceedings of the 21st ACM international conference on Information bers at the ACM SIGKDD 2013, the International Conference on Web-
and knowledge management, CIKM ’12, pages 1617–1621, 2012. Age Information Management 2013, and IEEE ICDM 2013. Also he has
[32] H. Zhu, E. Chen, K. Yu, H. Cao, H. Xiong, and J. Tian. Mining served as a reviewer for numerous journals, including IEEE TKDE, ACM
personal context-aware preferences for mobile users. In Data TIST, KAIS, Information Science, and TSMC-B.
Mining (ICDM), 2012 IEEE 12th International Conference on, pages
1212–1217, 2012.
[33] H. Zhu, H. Xiong, Y. Ge, and E. Chen. Ranking fraud detection
for mobile apps: A holistic view. In Proceedings of the 22nd ACM
international conference on Information and knowledge management,
CIKM ’13, 2013.
Enhong Chen is currently a Professor and Vice
Hengshu Zhu is currently a Ph.D. student in Dean of School of Computer Science, Vice Di-
the School of Computer Science and Technol- rector of the National Engineering Laboratory
ogy at University of Science and Technology of for Speech and Language Information Process-
China (USTC), China. He was supported by the ing of University of Science and Technology of
China Scholarship Council (CSC) as a visiting China (USTC), winner of the National Science
research student at Rutgers, the State University Fund for Distinguished Young Scholars of China.
of New Jersey, USA, for more than one year. He He received the B.S. degree form Anhui Uni-
received his B.E. degree in Computer Science versity, Master degree from Hefei University of
from USTC, China, in 2009. Technology and Ph.D degree in computer sci-
His main research interests include mobile ence from USTC.
data mining, recommender systems, and social His research interests include data mining and machine learning, so-
networks. During his Ph.D. study, he received the KSEM-2011 and cial network analysis and recommender systems. He has published lots
WAIM-2013 Best Student Paper Award. He has published a number of of papers on refereed journals and conferences, including TKDE, TMC,
papers in refereed journals and conference proceedings, such as IEEE KDD, ICDM, NIPS and CIKM. He has served on program committees
TMC, ACM TIST, WWW Journal, KAIS, ACM CIKM, and IEEE ICDM. of numerous conferences including KDD, ICDM, SDM. He received the
He also has served as a reviewer for numerous journals, such as IEEE Best Application Paper Award on KDD-2008 and Best Research Paper
TSMC-B, KAIS, and WWW Journal. Award on ICDM-2011. He is a senior member of the IEEE.

1041-4347 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like