Smartphone_App_Usage_Analysis_Datasets_Methods_and_Applications_2_-1
Smartphone_App_Usage_Analysis_Datasets_Methods_and_Applications_2_-1
fi
Li, Tong
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
2022
Li , T , Xia , T , Wang , H , Tu , Z , Tarkoma , S , Han , Z & Hui , P 2022 , ' Smartphone App
Usage Analysis : Datasets, Methods, and Applications ' , IEEE Communications Surveys
and Tutorials , vol. 24 , no. 2 , pp. 937-966 . https://doi.org/10.1109/COMST.2022.3163176
http://hdl.handle.net/10138/354371
10.1109/COMST.2022.3163176
cc_by
publishedVersion
Abstract—As smartphones have become indispensable personal Index Terms—Smartphone device, mobile app, app usage,
devices, the number of smartphone users has increased dramat- behavior analysis, data mining.
ically over the last decade. These personal devices, which are
supported by a variety of smartphone apps, allow people to
access Internet services in a convenient and ubiquitous manner.
App developers and service providers can collect fine-grained
app usage traces, revealing connections between users, apps, and I. I NTRODUCTION
smartphones. We present a comprehensive review of the most EOPLE can now use their smartphone apps to access a
recent research on smartphone app usage analysis in this survey.
Our survey summarizes advanced technologies and key patterns
P variety of Internet services, including instant messaging
(e.g., WhatsApp, WeChat), online socializing (e.g., Twitter,
in smartphone app usage behaviors, all of which have significant
implications for all relevant stakeholders, including academia and Weibo), electronic commerce (e.g., Amazon, Taobao), and
industry. We begin by describing four data collection methods: online payment (e.g., PayPal, Alipay). These services have
surveys, monitoring apps, network operators, and app stores, as become an important part of the infrastructure of the mod-
well as nine publicly available app usage datasets. We then sys- ern information society, making smartphone apps a necessity
tematically summarize the related studies of app usage analysis
in three domains: app domain, user domain, and smartphone in daily life [1]–[3]. According to a report from Statista [4],
domain. We make a detailed taxonomy of the problem studied, the number of apps available in Google Play, the official app
the datasets used, the methods used, and the significant results store of Android, has increased exponentially from 16,000 in
obtained in each domain. Finally, we discuss future directions in December 2009 to 2,893,806 in July 2021. The app market
this exciting field by highlighting research challenges. is expected to generate 935.2 billion US dollars in business
value by 2023 [5]. Such a vast and vital app market has
Manuscript received March 10, 2021; revised September 15, 2021 attracted developers and service providers to investigate app
and January 27, 2022; accepted March 18, 2022. Date of publication
March 31, 2022; date of current version May 24, 2022. This work was usage behavior to better develop and deliver mobile apps.
supported in part by the National Key Research and Development Program Understanding app usage behaviors has significant impli-
of China under Grant 2020YFA0711403; in part by the National Natural cations for all relevant stakeholders, including smartphone
Science Foundation of China under Grant 61972223, Grant 61971267,
Grant 62171260, Grant U1936217, Grant U20B2060, and Grant U21B2036; manufacturers, network operators, market intermediaries, app
in part by the International Postdoctoral Exchange Fellowship Program developers, and end consumers [6]. To improve device
(Talent-Introduction Program) under Grant YJ20210274; in part by the performance and extend usage time, smartphone manufac-
Academy of Finland under Project 319669, Project 319670, Project 325570,
Project 326305, Project 325774, and Project 335934; and in part by NSF turers can optimize the scheduling of various smartphone
under Grant CNS-2107216 and Grant CNS-2128368. (Corresponding author: resources, such as CPU, memory, and battery power, based on
Pan Hui.) the usage patterns of specific apps [7], [8]. Based on app traffic
Tong Li is with the Department of Computer Science, University
of Helsinki, 00100 Helsinki, Finland, also with the System and Media patterns, network operators can dynamically optimize traffic
Laboratory, Hong Kong University of Science and Technology, Hong offloading schemes and improve network services [9], [10].
Kong, and also with the Department of Electronic Engineering, Beijing Furthermore, network operators and market intermediaries can
National Research Center for Information Science and Technology, Tsinghua
University, Beijing 100190, China (e-mail: [email protected]). provide personalized services, such as accurate recommenda-
Tong Xia, Huandong Wang, and Zhen Tu are with the Department of tions and targeted advertisements, by profiling mobile users’
Electronic Engineering, Beijing National Research Center for Information preferences and interests from their app usage behaviors. By
Science and Technology, Tsinghua University, Beijing 100084, China (e-mail:
[email protected]). doing so, operators and intermediaries can improve the quality
Sasu Tarkoma is with the Department of Computer Science, University of of experience (QoE) while increasing profits [11], [12]. App
Helsinki, 00100 Helsinki, Finland (e-mail: [email protected]). developers can better understand customer satisfaction and
Zhu Han is with the Department of Electrical and Computer Engineering,
University of Houston, Houston, TX 77004 USA, and also with the market trends by analyzing app usage and profiling app pop-
Department of Computer Science and Engineering, Kyung Hee University, ularity, which may provide excellent guidance for upgrading
Seoul 446-701, South Korea (e-mail: [email protected]). existing apps and designing new apps [13].
Pan Hui is with the Computational Media and Arts Trust, Hong Kong
University of Science and Technology (Guangzhou), Guangzhou 511458, In recent years, extensive research efforts have been invested
China, also with the Division of Emerging Interdisciplinary Area, Hong in understanding user behaviors using data-driven methods
Kong University of Science and Technology, Hong Kong, and also with the based on mobile app usage data. As shown in Fig. 1, we
Department of Computer Science, University of Helsinki, 00100 Helsinki,
Finland (e-mail: [email protected]). counted the number of papers published in the field of app
Digital Object Identifier 10.1109/COMST.2022.3163176 usage data analysis. We can see that this research area began
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
938 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022
Fig. 1. Volume of publications in the field of app usage data analysis. The
statistical values are up to Aug. 2021. The predicted value for 2021 is based
on previous years’ values.
characteristics of mobile apps and app markets, such as app
categorization and popularity modeling. The goal of app usage
pattern discovery is to find regularities in mobile app usage
behaviors. Predicting which apps will be released and inferring
users’ preferences for undiscovered new apps are the goals of
app usage prediction and recommendation.
The goal of user domain studies is to discover the link
between user characteristics and app usage behaviors. User
profiling and user identification are two research topics in this
domain. User profiling focuses on group-level user attributes
like gender, occupation, income, and personality. User iden-
tification, focusing on the individual level, aims to identify a
user based on his or her app usage behaviors.
The smartphone domain research focuses on smartphone
characteristics, with two main areas of investigation: app
Fig. 2. Structure of existing studies for app usage analysis. energy drain and app traffic patterns. These two areas aim
to improve smartphone performance by analyzing app energy
and traffic consumption patterns.
in 2010, then grew slowly until 2012. Most notably, after In this survey, we aim to answer the following questions.
2012, there was a rapid increase in the number of publi- How should one collect mobile app data? What are the ben-
cations. Overall, the volume of publications in this field is efits and limitations of the various data collection methods?
expanding, indicating a burgeoning research field. Thus, we What are the critical features of app usage datasets that have
aim to conduct a comprehensive literature survey in response been used in existing studies? What are the main research
to significant recent progress in mobile app usage analysis. topics in mobile app usage analysis? What are the most com-
The app usage data are collected through surveys [14], mon- mon data-driven methods used, the significant findings, and
itoring apps [8], network operators [15], and app stores [16]. the main results in existing studies? What are the challenges
These data form a cross-domain and multi-view data ecosys- and promising future directions in mobile app usage analysis?
tem that includes various app usage behaviors, e.g., down- To answer these questions and draw conclusions, we sys-
loading, installing, launching, uninstalling apps, contextual tematically review the recent research. Fig. 3 shows our survey
information about app usage, e.g., time, location, traffic, paper’s structure. In Section II, we describe and compare data
energy consumption, and information about apps, e.g., app collection methods. We also present a set of public datasets
description, app rating, and user reviews. As a result, the and discuss privacy and ethical issues. Section III summa-
app usage data reflects the characteristics of apps, users, and rizes existing studies in the app domain. Section IV reviews
smartphones. research efforts in the user domain. Section V reviews exist-
Existing studies for app usage data analysis fall into three ing studies in the smartphone domain. We compare app usage
domains: app domain, user domain, and smartphone domain, datasets, methods, and findings across all research topics. In
as shown in Fig. 2. From the app perspective, app domain Section VI, we discuss the challenges and future research
research aims to reveal app features from app usage data. The in mobile app usage analysis. Section VII concludes the
research topics in this area include app ecosystem profiling, survey.
app usage pattern discovery, and app usage prediction and rec- The main contributions of this paper can be summarized as
ommendation. App ecosystem profiling looks into the inherent follows.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 939
TABLE I
S UMMARY OF E XISTING S URVEY A RTICLES ON A PP U SAGE A NALYSIS
• A comprehensive survey of data collection methods used of existing datasets, commonly used analytical methods, and
to collect app usage data is presented, including surveys, key findings and results.
monitoring apps, network operators, and app stores. We
discuss data characteristics and compare the benefits and II. DATA C OLLECTION AND OVERVIEW
limitations of various data collection methods. We also The foundation and core parts of smartphone app usage
gather and introduce prominent app usage datasets to the analysis are the real-world app usage data. We will introduce
research community for further study. and compare different data collection methods in this section,
• A comprehensive review of current research in the field and we will present a set of public datasets. We will also
of smartphone app usage data analysis is presented, discuss privacy and ethical concerns.
including primary research topics in the app, user,
and smartphone domains. As for each research topic, A. Data Collection Methods
we make a thorough comparison in terms of the
1) Surveys: Surveys are a simple and important method
problem they studied, the datasets they used, the
to collect information about app usage. Researchers generally
methods they applied, and the significant results they
create a questionnaire based on their goals and then collect
achieved.
data from the participants’ responses. In practice, researchers
• Advanced technologies and key findings gleaned from
should pay close attention to questionnaire design, particularly
app usage behaviors are thoroughly summarized, with
question wording, to reduce respondent bias. The American
significant implications for all relevant stakeholders in
Association for Public Opinion Research offers some tips on
the research community and industries.
how to conduct a high-quality survey.1
• The challenges and future research opportunities concern-
There are several methods for conducting a survey, includ-
ing mobile app usage analysis are identified. We give a
ing telephone, mail, and the Internet. In-app surveys, which
thorough discussion of the challenges in data and meth-
post questionnaires and collect user feedback directly inside
ods. We also go over future research topics, such as app
mobile apps, have been popular in recent years and are often
evolution, context-aware app usage modeling, deep rea-
used to obtain app usage data. WhatsApp and Skype, for exam-
soning of app usage behaviors, linking physical activities
ple, ask users to rate the quality of their audio calls at the end
and app usage, and location-based services and urban
of the call. YouTube uses questionnaires to gather information
computing.
about its users’ preferences. In-app surveys require less main-
There are only a few surveys on related topics. Cao and
tenance than other methods such as telephone and face-to-face
Lin [17] surveyed studies that looked at app usage prediction
surveys [18]. In-app surveys can capture a lot of data because
and recommendations. Zhao et al. [11] summarized stud-
they can reach a large number of people lightly. However, in-
ies on user profiling based on their app usage behaviors.
app surveys, respondents only represent one group, namely
Guo et al. [13] conducted a review of the literature using app
app users, which may limit the generalizability and represen-
usage data gathered through crowdsourced methods. Existing
tativeness of the data obtained. In other words, in-app surveys
surveys are either limited to a single data source [13] or a
offer a mode impact, whereas different survey modes result in
single small topic [11], [17]. Table I summarizes the contribu-
different data being collected.
tions and limitations of existing survey articles on app usage
Surveys can collect only the coarse-grained app usage
analysis. In comparison, our survey examines the state-of-the-
behaviors of users, such as which app store they used and
art studies in all app, user, and smartphone domains, as well as
the connections between them. We provide a systematic review 1 https://www.aapor.org/Standards-Ethics/Best-Practices.aspx
940 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022
TABLE II
C OMPARISON OF D IFFERENT DATA C OLLECTION M ETHODS
as network traffic, CPU usage, memory usage, battery status. but willing to provide socially desirable responses, resulting
However, because of the security concerns of iOS, iPhones in biases in the data collected. Although other methods collect
cannot support third-party app stores. Thus, this data collection data automatically and avoid biases, network operators and app
method is most concentrated on Android users. stores are still limited to specific types of behaviors, such as
App stores can provide various types of app metadata, such network access and app management. Monitoring apps, which
as app name, app category, app description, app rating, and are installed on smartphones and based on event-triggered col-
user reviews. App metadata produces valuable information lection, can be used to collect all types of app usage behaviors
about smartphone apps and allows the collection of user feed- by selecting different trigger events. Different data collection
back about them. App descriptions, for example, are frequently methods can provide additional side information about app
used to extract app functions for app categorization. For user usage behaviors, such as user profiles, sensor data, location,
profiling, app categories provide semantic meanings to explain and app metadata.
usage activities. App ratings and user reviews are used to Every coin has two sides. Different collection methods have
model app popularity and recommend apps based on their pop- their own set of benefits and drawbacks. In practice, we must
ularity and usage experience. App metadata can be crawled select data collection methods with care and precision to
from app stores, such as Google Play and Apple Store, or answer the research question. For example, surveys, moni-
provided directly by app stores. toring apps, and app stores are better choices than network
Table II compares the four data collection methods men- operators for investigating country and culture differences in
tioned above. In terms of data collection scale, all four mobile app usage behaviors because it is difficult to conduct
approaches are capable of large-scale measurements involv- worldwide data collection from network operators. App stores
ing tens of thousands of users, thanks to advancements in and monitoring apps are also better if we want to track a
communication and network technologies. However, surveys user’s usage behaviors over time because they can link the
and monitoring apps are limited by response rates or app same user based on user accounts even if the user changes
popularity, which makes it difficult to collect millions of smartphones. Different collection methods can be combined at
users. By recruiting volunteers, both surveys and monitor- times. For example, operator data can be used to discover app
ing apps can collect small-scale datasets. It is also possible usage patterns across millions of users and then conduct con-
to conduct control studies by carefully selecting participants, trol studies on small-scale datasets collected from monitoring
which is impracticable for network operators and app stores. apps or surveys to verify the discovered patterns.
Monitoring apps outperform the competition in terms of col-
lectible app usage behavior. It is difficult for surveys to collect
fine-grained usage traces because they are limited by ques- B. Public Datasets
tionnaires. Because users are directly responsible for their In this section, we present nine datasets that are pub-
responses, they may be hesitant to share sensitive information licly available for further research to the research community.
942 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022
We introduce the collection method, collection period and users in over 100 countries so far.3 The dataset owners made a
collection items of these datasets in detail. long-term app usage dataset available to the research commu-
1) Worldwide Survey Dataset: This dataset was conducted nity. The top 1,000 users ranked by total time spent using
through a survey methodology in 2012, collecting mobile Carat from 2014 to 2018 are included in the public long-
app usage information of 10,208 people from more than terms app usage dataset. The user with the longest duration in
15 countries [19]. The countries include the United States, the public dataset has 18,146,042 time-series records spanning
Canada, Mexico, the United Kingdom, Australia, France, 4.65 years, and even the user with the shortest duration has
Germany, Italy, Spain, Brazil, Russia, India, Japan, China, more than two years of records. The public dataset is available
and South Korea. The collected items for mobile app usage at ‘https://www.cs.helsinki.fi/group/carat/data-sharing/’.
behaviors include the most frequently used app stores and 5) TalkingData: The dataset is collected using the
app categories, the reasons that lead users to find apps, TalkingData Software Development Kit (SDK), which is inte-
and the reasons that lead users to download and aban- grated into monitoring apps. This dataset contains the complete
don apps. It is worth noting that this dataset only contains app usage traces for over 70 thousand users from May
usage data for app categories instead of individual apps. 01, 2016, to May 07, 2016. Each app usage log includes
The dataset includes participants’ demographics, including an anonymous user ID, timestamp, app ID, and location
age, gender, nationality, marital status, country of residence, information (longitude and latitude). This dataset contains
first language, ethnicity, education level, occupation, and demographic information about the users involved, such as
income. This dataset also collects participants’ personality age and gender. Researchers should be aware of the mode
traits by using the Big-Five personality measurement. This effect and biases in the analysis when using this public
dataset and corresponding questionnaire are public, available dataset because the users involved are mostly from main-
at ‘http://www0.cs.ucl.ac.uk/staff/S.Lim/app_user_survey/’. land China. This dataset was released in a Kaggle chal-
2) Mobile Data Challenge (MDC) Dataset: The Lausanne lenge, TalkingData Mobile User Demographics, and can be
Data Collection Campaign (LDCC) developed a specific moni- accessed via ‘https://www.kaggle.com/c/talkingdata-mobile-
toring app based on Nokia platforms to implement smartphone user-demographics/’.
data collection. From October 2009 to March 2011, they 6) PhoneStudy Dataset: The PhoneStudy smartphone
collected a longitudinal smartphone dataset from nearly 200 research app for Android was used to collect this data [41].
volunteers. The volunteers are primarily dispersed through- Between September 2014 and January 2018, 743 volunteers
out the Lake Geneva region of Switzerland. Large amounts were recruited through forums, social media, blackboards, fly-
of smartphone data, such as app usage behaviors (launch, ers, and direct recruitment. The activities of users on their
foreground, close, and view), location (GPS, WLAN), motion smartphones were logged in the form of time-stamped logs
(accelerometer), and proximity (Bluetooth), are all recorded in of events. Those events included calls, app starts/installations,
the dataset. The dataset was released to the research commu- screen de/activations, contact entries, texting, global posi-
nity in the Mobile Data Challenge (MDC) [39] held by Nokia, tioning system (GPS) locations, etc. The dataset is publicly
which is available at ‘https://www.idiap.ch/dataset/mdc’. available at ‘https://osf.io/kqjhr/’.
3) LiveLab Dataset: This dataset was collected through 7) Context-Aware App Usage Dataset: This dataset was
a monitoring app called LiveLab [40]. LiveLab is an iOS collected by a primary Internet Service Provider (ISP) in
app used to gather smartphone usage data from 24 iPhone China [42]. The data was collected over the course of one
users over the course of a year, from February 2010 to week in April 2016, and it covered the entire metropoli-
February 2011. The dataset specifically tracks two types of tan area of Shanghai, one of the world’s largest cities. The
app usage behaviors: app launches and changes to the fore- SAMPLES [35] tool was used to identify app usage records
ground app. The dataset also includes contextual information from network metadata, and the traffic generated by the
about user behaviors gathered from iPhone sensors such as 2,000 most popular apps was successfully recognized. Each
the accelerometer, GPS, battery, Bluetooth, and WiFi status. It app usage record contains an anonymized user ID, times-
is important to note that this data was collected from skewed tamp, base station ID, app ID, and traffic volume. The top
samples; all 24 volunteers are Rice University students. This 1,000 active users in the app usage dataset are released to
dataset was released to the public and can be accessed via the research community. The public dataset is available at
‘http://yecl.org/livelab/traces.html’. ‘http://fi.ee.tsinghua.edu.cn/appusage/”.
4) Carat Dataset: Carat [8], a monitoring app, was used 8) Wandoujia Dataset: Wandoujia is a leading app store in
to collect this dataset. Carat applies an event-triggered collec- China [43]. The dataset was collected from over 17 million
tion scheme, gathering a data sample every time the battery users and recorded their app management activities, such as
level changes by 1%. Each data sample contains a list of downloading, updating, and uninstalling apps, from May 1 to
apps used, a user-specific identifier, and a timestamp. Carat September 30, 2014. Wandoujia also gathered daily network
also gathers sensor contextual data, such as battery level, bat- activities for each app, such as traffic volume and Wi-Fi and
tery status, mobile country code. Carat is available on both cellular network connection time. A portion of this dataset
Google Play and Apple Store, which the developers hope has been made available to the public. The public dataset is
will increase the number of people who participate in data available at ‘http://www.liuxuanzhe.com/appdata/’.
collection. In this way, they collect data from all over the
world. Carat has collected data from over 500,000 mobile 3 http://carat.cs.helsinki.fi/.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 943
9) iOS Apps Dataset: This dataset [44] was extracted from must also take into account the ethical decisions that developed
the Apple Store and covers the metadata of removed iOS apps systems will make.
for a period of 1.5 years, from January 1, 2019, to April 30,
2020. There are 1,129,615 app records in total, correspond-
ing to 1,033,488 unique mobile apps. The dataset contains III. A PP D OMAIN R ESEARCH
app objective data such as app name and release date, app App features such as functionality and popularity have a
subjective data such as app ratings and reviews, and app pop- significant impact on mobile app usage behavior. Many efforts
ularity data such as app ranking. The information is open to the have been made in recent years to examine mobile app usage
public and can be found on ‘https://github.com/LuckyFQ/iOS- data from the standpoint of apps by analyzing app features.
Removed-Apps-Dataset’. App ecosystem profiling, app usage pattern discovery, and app
In Table III, we summarize the above seven public datasets usage prediction and recommendation are the three main top-
in terms of collection method, collection area, collection date, ics in app domain research, according to our careful review
number of apps and users, collected items, and availability of existing studies. In this section, we will summarize exist-
status. ing app domain studies on the three topics and compare their
datasets, methods, and key findings.
C. Privacy and Ethical Considerations
In 2018, the General Data Protection Regulation (GDPR) A. App Ecosystem Profiling
became enforceable in the European Union. GDPR gives Millions of mobile apps form a symbiotic mobile app
European citizens greater control over their data and raises ecosystem [50]–[52]. App ecosystem profiling aims to inves-
awareness of privacy issues and the value of their personal tigate the inherent characteristics of mobile apps, focusing
data [45]. GDPR, as a strong regulation for collecting and on two main sub-problems: app categorization and popularity
processing personal data, imposes a number of requirements. modeling.
The following are the most relevant and important data col- 1) App Categorization: An app category groups apps that
lection principles. First, personal data can only be collected if perform similar functions. WhatsApp and Skype, for example,
the user has given consent for a specific purpose. Namely, user are both classified as communication apps on Google Play.
consent is only used for a specific purpose. Second, the data App categories are currently determined by app developers
minimization principle limits data collection to the minimum when they release apps in app stores [54], [55]. However,
required to achieve the app’s purpose. Third, individuals have deciding on a category for an app can be difficult at times [56].
the right to withdraw their consent and erase their personal WeChat, for example, can be used for social networking as
data. The GDPR has had a significant impact on how mobile well as messaging. As a result, both the communication and
app usage data is collected. According to a recent study [46], social app categories appear to be appropriate. Moreover, man-
the number of permission items used by apps decreased signifi- ually selecting app categories opens the door to deliberate
cantly after the GDPR was implemented. This finding suggests gaming [57], [58]. App developers may desire to avoid fierce
that monitoring apps are cautious about the data they col- competition by selecting less appropriate categories to improve
lect to follow to the GDPR’s data minimization principle. their app’s ranking. For the reasons stated above, many mobile
Furthermore, the data collection section is also responsible apps are miscategorized in app stores. Hence, it is expected
for consent management. Many app vendors and app stores to have an effective and automated method for categorizing
have mechanisms in place to respond to user requests, such mobile apps.
as erasing data and withdrawing consent [47]. Several studies looked into the possibility of automatically
Smartphone app usage data, as a type of sensitive personal categorizing apps based on their descriptions. They gener-
data, must be handled with caution when it comes to ethics. ally modeled app categorization as a problem of short text
When collecting, processing, and analyzing app usage data, topic modeling, which they solved with Natural Language
researchers must adhere to the principle of Data for Good [48]. Processing (NLP) tools. Al-Subaihin et al. [59] used Term
First, app usage data should be used for the greater good of Frequency-Inverse Document Frequency (TF-IDF), while
humanity and society. We should use data to improve peo- Ochiai et al. [60] used a word embedding model to extract
ple’s lives, such as by improving interpersonal relationships, app features from app descriptions. Fuad and Al-Yahya [61]
providing convenient and ubiquitous services, improving user further used Latent Dirichlet Allocation (LDA) to analyze
experience. However, some dangerous and terrible things have the app descriptions of 13,282 apps in Google Play. They
occurred in recent years. Cambridge Analytica, for example, discovered some new categories, including occasions, pan-
abused the data of millions of Facebook users [49]. To prevent demics, and languages, that were not present in the original
such things from happening again, apart from government reg- app store classification. Surian et al. [62], on the other hand,
ulations, we still need data scientists to hold themselves to concentrated on the app market’s miscategorization issue. To
a high standard. Second, we should make good use of app refine the category of apps, they developed a von Mises-Fisher
usage data. We should adhere to the principle of end-user distribution-based probabilistic topic model. They discovered
transparency by outlining the items collected, the purpose of that 0.35 ∼ 1.10 percent of game apps are miscategorized
data collection, the potential privacy risk. Collection and anal- after running experiments on 48,663 game apps crawled from
ysis should be done in a way that protects people’s privacy. We Google Play.
944 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022
TABLE III
P UBLIC A PP U SAGE DATASETS
TABLE IV
L ITERATURE R EVIEW IN A PP E COSYSTEM P ROFILING
TABLE V
T YPES OF C ONTEXTUAL FACTORS sequential characteristics. For example, using the Camera app
first and then the Album app is twice as likely as using the
Album app first and then the Camera app.
App usage is also influenced by the social context.
Ferreira et al. [100] classified the social context into four
types alone, with friends, with strangers, and others. They
observed a significant correlation between app usage and
social context with a p-value of 0.002. Shema and Acuna [101]
to work together to complete a single task causes such correla- used app usage sequences to predict social context, achiev-
tions [94], [95]. Rahmati et al. [96] first demonstrated the app ing an accuracy of 70% with a random forest model.
usage dependency of one-nearest prior used apps and found Kloumann et al. [102] and Taylor et al. [103] discovered that
that such a dependency remains relatively constant for one to social contexts have varying degrees of influence on mobile
three months. By analyzing a dataset spanning seven years, app usage. The context of friendship has a greater impact when
Fan et al. [97] looked into how usage context changes over compared to family members.
time. Huang et al. [34] and Liu et al. [98] identified frequent 2) Temporal Pattern Discovery: Unlike contextual patterns,
cooccurrence app sets. They discovered that e-commerce and which focus on static analysis, temporal patterns look into
online payment apps, such as Taobao and Alipay, are fre- the dynamics of app usage. The diurnal pattern is a basic
quently used together to complete the task of online shopping. temporal pattern of app usage behavior that has been dis-
Tseng and Hsu [99] found that usage context dependency has covered by numerous studies [104]–[106]. A typical diurnal
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 947
TABLE VI
L ITERATURE R EVIEW IN A PP U SAGE PATTERN D ISCOVERY
the day of one week of usage behavior. Jiang et al. [124] added
time as a new feature and used the most similar app usage case
from history to generate prediction results. Wang et al. [125]
used time as a condition in their probabilistic graphical mod-
els. Zhao et al. [126] developed a deep learning-based model
for predicting app usage based on time context. As illustrated
in Fig. 9, they concatenated the time feature with historical
used apps and fed the time feature into the last layer of the
neural network to introduce time context. It is a state-of-the-
art deep learning-based app usage prediction model, with a
Recall@5 of 84.47%.
Location is also an important contextual factor to consider
when predicting app usage. To achieve location-aware app
prediction, Parate et al. [127] split app usage sequences into
a variable-length Markov chain according to location features, Fig. 9. A deep learning model for time context-aware app usage
prediction [126].
while Wang et al. [128] used the location factor as a condition
in Bayesian networks. Furthermore, deep learning techniques
are used in location-aware prediction. Xia et al. [129], for app-time, and app-category subgraphs. Yu et al. [131] used a
example, developed a recurrent neural network-based model graph neural network to learn the node embeddings on an
to predict the next used app and visited location at the same app-location-time graph. They then used the learned node
time. Chen et al. [130] proposed CAP, a graph embedding- embeddings to predict app usage. Zhou et al. [132] fur-
based model for learning node embeddings from app-location, ther proposed a heterogeneous graph-based model that learns
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 949
TABLE VII
L ITERATURE R EVIEW IN A PP U SAGE P REDICTION AND R ECOMMENDATION
context-aware app recommendations by taking location as deep neural networks and graph neural networks, have
and time as context information and adding additional become increasingly popular in recent years for mobile app
dimensions to represent them. Deep learning models, such recommendations.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 951
TABLE VIII
U SER ATTRIBUTES T HAT C AN BE P ROFILED F ROM M OBILE A PP U SAGE DATA
TABLE X
L ITERATURE R EVIEW IN D ESCRIPTIVE P ROFILING
In terms of personality trait prediction, Xu et al. [178] a regression problem by placing participants on a Big Five
developed a prototype, Personality Test, which automatically Personality Inventory continuum. They used app usage and
evaluates users’ personality traits based on apps they installed. smartphone sensory data and achieved RMSEs ranging from
Personality Test tracks the use of seven app categories: social 12.7% to 22.2% for different traits.
apps, games, music and video apps, shopping apps, photog- Ochiai et al. [183] looked at app usage patterns to see if
raphy apps, finance apps, and personalization apps. With a they might forecast users’ psychological states, such as stress
random forest classifier, the Personality Test can obtain an levels. They considered the order in which the apps were
average prediction precision of 42.72%. Peltonen et al. [179] used and created an app usage graph for each user to depict
went on to collect data on app usage across all categories. They the contextual relationship between apps. Based on the Graph
showed that category-level aggregated app usage can predict Isomorphism Network (GIN), they classified app usage graphs
Big Five personality traits with a prediction fit of 86%-96%. In to predict users’ stress levels and achieved an accuracy of
addition to app usage data, Chittaranjan et al. [180] considered 54.5%.
call logs, SMS logs, and Bluetooth scan logs. They achieved 3) Discussion: We summarized recent studies on user pro-
an average F1-score of 0.57 using an SVM classifier with a filing based on app usage behaviors in this section. In Tables X
radial basis function (RBF) kernel. Kambham et al. [181] and and XI, we review prominent literature on descriptive profiling
Gao et al. [182] characterized personality trait prediction as and predictive profiling. The tables include research topics,
954 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022
TABLE XI
L ITERATURE R EVIEW IN P REDICTIVE P ROFILING
dataset information, methods, and findings and results. We Welke et al. [186] studied app adoption of 46,726 participants
can see that the dataset scale ranges from 28 users to over and discovered that using 500 of the world’s most popular
100,000 users. App usage data is generally collected through apps could distinguish unique app-signatures for 99.67% of
monitoring apps and network operators, while user attributes users. Tu et al. [187] analyzed a larger dataset of 1.37 million
are collected through online surveys. Descriptive statistics are Chinese users and found that using only four apps can uniquely
frequently used in descriptive profiling to reveal the relation- identify 88% of users. Sekara et al. [188] further compared
ship between app usage and user characteristics. The statistical the size of app-signatures of users in different countries. They
significance of discovered relationship is demonstrated using discovered that the identification rate is heavily influenced by
correlation test metrics such as Pearson correlation, Wilcoxon- the country of users. When the size of app-signatures is set to
Mann-Whitney test, and Tukey-Kramer test. In predictive 5, the average identification rate varies dramatically between
profiling, SVM, LR, MLP, and random forest are commonly countries, ranging from 41.2% in Finland to 66.5% in the
used classification methods. A few studies recently looked United States. These observations present severe privacy con-
into advanced machine learning techniques, such as trans- cerns as most individuals are identifiable and trackable, even
former, attention, and embedding, indicating a hot direction in large-scale anonymized datasets.
for predictive user profiling. Some researchers use such identifiable characteristics for
active authentication to secure users’ smartphones. The basic
idea is to use a user’s app usage patterns to verify his or her
B. User Identification identity [189]. For each user, Ashibani and Mahmoud [190]
Although data collection agencies have anonymized user chose the eight most-used apps and used the timestamp of
IDs to protect users’ privacy, mining or sharing app usage app usage as a feature to continuously identify users. Their
datasets still poses a significant privacy risk. Many studies authentication model verifies users with an average F1-score
have demonstrated that users can be identified or re-identified of 96.5% after conducting experiments on ten users. They
from anonymized datasets based on their app usage behav- then improved the F1-score to 98% by including traffic pat-
iors. In 2010, Falaki et al. [185] were the first ones to terns while accessing apps as features in their model [191].
demonstrate the diversity of smartphone and app usage among For more reliable authentication, Bassu et al. [192] proposed
individuals, paving the way for user identification research. combining contextual information, such as location and time,
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 955
TABLE XII
L ITERATURE R EVIEW IN U SER I DENTIFICATION
consume significantly more energy on some clients than oth- component, confirming the findings in [197]. Most impor-
ers are referred to as bug apps. That may be caused by the tantly, they discovered that called system APIs account for
different usage habits of users. For example, some users prefer the majority of an app’s energy consumption, implying that
to use Kindle with networks to synchronize notes and book- app developers should exercise caution when using them.
marks, while others prefer not to do so to save money on
mobile data. The energy consumption will differ significantly
depending on whether the network connection is turned on or B. App Traffic Patterns
off. In this way, Kindle, Facebook, Youtube are typical bug Smartphones, which are supported by a variety of network-
apps. Based on these findings, they then created actionable based apps, allow users to stay connected to the ubiquitous
diagnosis trees to recommend practical actions such as turn- Internet [211]. According to a Cisco report [212], smartphone
ing on/off WiFi/GPS, avoiding using certain apps under certain traffic accounted for 73% of all traffic in 2018 and will rise
conditions, and upgrading/keeping app or operating system to 89% by 2023. Because of the high volume and growing
versions. importance of mobile traffic, researchers and network oper-
Chen et al. [197] created eStar, a free Android app that col- ators are interested in understanding how mobile apps use
lects energy drain data, and used it to gather an energy trace network resources.
dataset from 1,520 Galaxy S3 and S4 devices covering 800 Xu et al. [32] conducted anonymous network measurements
different apps in the wild. They discovered that background from a tier-1 cellular carrier in the United States, covering
apps and services consume 16.1% of total energy during the over 600 thousand users, and used HTTP signatures to iden-
screen-off period. This finding implies that energy consump- tify traffic from various apps. In terms of traffic sources, they
tion can be reduced by improving the app scheduling algorithm discovered that mobile apps could be divided into two groups:
and killing background apps when the screen is off. However, local apps and national apps. The majority of the traffic of
if we disable all apps without considering the differences in local apps comes from a single region, whereas the traffic of
background activities, the user experience may be adversely national apps comes from diverse regions. This finding sug-
affected. In [198], Chen et al. looked into this issue in depth. gests that content optimization in access networks could be
By analyzing the same dataset, they discovered 76 no-sleep significant, with local app content being placed on servers
apps for which the CPU was never suspended. They then closer to end-users. Jiang et al. [202] looked into the relation-
designed a metric, Background-Foreground Correlation (BFC), ship between app popularity and network performance. They
to assess the utility of background app activities. A simple discovered that over 60% of popular apps optimize network
yet effective screen-off energy optimizer based on BFC was delay under 350ms, highlighting the importance of app service
developed to learn and suppress useless background activi- network performance. Falaki et al. [213] and Li et al. [203]
ties of apps automatically. Li et al. [199] pointed out that the looked at a small and large scale network dataset, respectively,
unnecessary workload of apps is a major root cause of energy and found that smartphone operating systems have an impact
issues. Many apps, they discovered, perform computations that on app traffic consumption. When compared to Android and
do not provide users with discernible benefits, resulting in Windows Phone, iOS consumes more traffic. Jin et al. [204]
unnecessary workload and energy consumption. also discovered that different data collection purposes resulted
The above studies look at the energy drain pattern of a sin- in different app network traffic usage. As a result, they created
gle app. However, because most smartphones have multi-core MobiPupose, a system that could track app network requests
CPUs, parallelism execution results in different app combina- and classify data collection purposes based on app traffic pat-
tions having different power consumption patterns. The energy terns. Walelgne et al. [205] and Okic et al. [206] showed that
consumption of one app’s threads will affect the energy con- different app categories have different traffic patterns. In terms
sumption of other apps’ threads. By studying different apps as of traffic volume, entertainment and social media apps are the
a group, Rex et al. [200] attempted to reschedule app threads most traffic-intensive, while education and weather apps are
and correlate their influence on energy consumption. Because the least traffic-intensive [205]. Moreover, in terms of tempo-
the number of total possible combinations of apps is enor- ral patterns, music and shopping apps see the most traffic in
mous, they explored the frequent app usage sets based on the morning, while e-mail and game apps see the most traffic
users’ usage habits and then optimize scheduling sequences during lunchtime [206]. Some studies leveraged app traffic pat-
for frequent app usage sets. terns to improve network performance. Ouyang and Yan [207]
To help developers create ‘greener’ apps that use less developed AppWiR, a crowdsourcing-based system that gath-
energy, some researchers looked at energy drain patterns of ers app usage data and derives relationships between app
specific app events or functions. Li et al. [210] developed usage, network traffic, and network resources. AppWiR can
a prototype tool called vLens to collect energy data at the predict network resources occupied by apps with a mean abso-
nanosecond and millisecond levels. As a result, vLens can lute percentage error of 12.54% sim13.39$. Zeng et al. [10]
calculate the energy consumption of smartphone apps at the analyzed the temporal traffic consumption patterns of various
source line level and track high-energy events like thread app categories. They then used a linear regression model to
switching and garbage collection. In [201], they then used forecast the amount of traffic consumed by the most popu-
vLens to collect the energy drain patterns of 405 apps. They lar app categories over a given period of base stations and
discovered that, on average, apps waste 61% of their energy devised an edge caching strategy to cache the content for the
in idle states, and the network is the most energy-consuming most popular app services. Aceto et al. [208] defined network
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 957
TABLE XIII
L ITERATURE R EVIEW IN S MARTPHONE D OMAIN R ESEARCH
traffic at various levels, such as packet and message levels, is primarily caused by hogs and background apps and unnec-
and proposed using the Markov model to predict app network essary workload. These findings suggest that app developers
traffic at different levels. Yin et al. [209] incorporated mobile should carefully select system APIs to reduce unnecessary
app usage patterns into data pricing strategies, creating a pric- power consumption and that operating systems should improve
ing scheme that is updated based on user satisfaction and app scheduling algorithms. Some studies have attempted to
operational conditions. optimize app scheduling; however, their optimizer is still
unable to support personalized requirements, such as adjust-
ing optimization strategies for different user habits. In the field
C. Discussion of app traffic patterns, the datasets are collected from network
This section summarized relevant smartphone domain stud- operators and monitoring apps. Network operator datasets typ-
ies from two principal fields: app energy drain and app ically cover millions of users, which is significantly more
traffic patterns. Table XIII lists the most important litera- than those of monitoring apps. Clustering algorithms, such
ture in terms of research topics, dataset information, methods, as K-means and DBSCAN, are used to discover app traf-
and key findings. We can see that most datasets for energy fic patterns. In the meantime, classification and regression
drain analysis are gathered from monitoring apps to sup- methods, such as SVM, decision tree, markov model, and
port battery state collection. Excessive energy consumption random forest, are used to forecast app traffic consumption
958 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022
to optimize network resource allocation and data pricing leads to data bias. It is hard to organize a smartphone app
strategies. usage dataset wholly and accurately reflecting real-world user
behaviors. Most existing studies are based on biased popu-
VI. C HALLENGES AND F UTURE R ESEARCH lations. The users involved in [34], [43], [69], [78], [169],
[187] are all Chinese. The participants in [39], [101], [216]
Smartphone app usage analysis is a promising direction.
all live in the Lake Geneva region in Switzerland. University
A significant number of studies have been done in recent
students are only considered in [40], [217], [218]. To alleviate
years and obtained a lot of achievements. However, there
the biases, improving the diversity of participants and sampling
are still several active challenges that have not been well
technology are useful methods to increase the robustness of
addressed. This section will discuss the challenges first and
results.
then look forward to the future research of smartphone app
The replicability of research will be hampered by data
usage analysis.
bias. Church et al. [2], for example, attempted to duplicate
previous work. However, their findings differed significantly
A. Challenge in Data from previous research, revealing replicability issues and
1) Data Collection: The collection of app usage data biases across datasets. To solve this problem, we need to put
has become much more difficult after the implementation of a lot more effort into constructing open datasets. Researchers
GDPR. A major reason is that GDPR principles and data can use unified public datasets to compare their new algo-
collection for big-data analysis are not always mutually exclu- rithms and discoveries to state-of-the-art ones, especially in
sive. For example, a large amount of data is preferred in the method-oriented topics like app usage prediction, app rec-
data collection to reduce deviation and bias in analysis results. ommendation, and predictive user profiling. The standard
However, it goes against the data minimization principle. New platform can help to reduce the number of unrepresentative
hypotheses are frequently introduced after data collection in repeated experiments and boost academic loyalty. Thankfully,
data analyses. However, it is constrained by the purpose of the we have taken the first step toward creating benchmark
users’ initial consent. One possible way to overcome such con- datasets. As we mentioned in Section II, some researchers have
tradictions is to collect anonymized data because anonymized made their app usage datasets public. We believe their public
data is excluded from GDPR [214]. While the GDPR has a datasets will be strong candidates for benchmark datasets in
strict definition of anonymized data, which means that the data terms of data quality and impact.
subject is not or is no longer identifiable. In other words,
it must be impossible to obtain personal information from
anonymized data. As a result, anonymized app usage data can
only be used in the app and smartphone domain research rather B. Challenge in Methods
than user domain research. 1) Heterogeneous Data Fusion: Other types of data, such
We have discovered that GDPR has some unintended conse- as smartphone sensor data and app metadata, are also impor-
quences. Collecting data from network operators, for example, tant and helpful for analyzing smartphone app usage. Although
has become extremely difficult due to the difficulty in obtain- some approaches for fusing heterogeneous data in app usage
ing user consent. Researchers are hesitant to publish datasets analysis have been proposed, current methods can be improved
for fear of penalties, especially when European citizens are by increasing their effectiveness and generalizability. For
involved. Furthermore, because data collection in the European example, there are two widely used methods in existing studies
Union is difficult, there is a risk that researchers will prefer for fusing context data into app usage records. The first one
to use data from places with fewer or no regulations. A few is to use the tensor structure by adding additional dimensions
trends have emerged. Recent studies from the last three years, to represent context information. The second one is to take
i.e., 2018, 2019, and 2020, are primarily based on data from context features as conditions that are parallelly input into a
China and the United States. Only a few studies are based on probabilistic model or Bayesian networks. However, when the
data from Europe. scale of the context grows large, these two methods will suffer
2) Mode Effect and Data Bias: Existing app usage datasets from the curse of dimensionality and high computational cost.
vary greatly in terms of collection methods, scales, col- Most existing data fusion models are task-specific, making it
lected items, and duration. These differences could lead to difficult to generalize results across different studies. App fea-
biases in their studies and make research more difficult to tures, for example, are commonly used in the tasks of app cat-
replicate. Different collection methods will result in a mode egorization and app recommendations, which are determined
effect, which will introduce biases into the collected data. based on app descriptions and app usage behaviors. However,
App stores and monitoring apps, for example, can only col- the model is difficult to transfer directly due to different dataset
lect data from their users. Furthermore, monitoring apps can settings [2]. The question of how to better fuse heteroge-
impact users’ normal app usage behavior due to privacy con- neous data is still an active challenge. Embedding technology
cerns, resulting in data collection biases. Approximately 1% could be a promising method. Embedding technology aims
of participants changed their normal usage behavior during to map high-dimensional data into low-dimensional vectors
data collection [215]. As a result, it is critical to address in latent embedding space while maintaining similarity across
participants’ privacy concerns and develop more unobtrusive data points. A few studies [167], [174] have tried to employ
monitoring apps. The population difference across studies also embedding to model app usage sequences. Heterogeneous data
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 959
fusion can also benefit from embedding technology. For exam- millions of users before eventually disappearing. Exploring the
ple, to describe the relationship between different types of app ecosystem’s evolutionary processes and extracting gen-
data, we can create a heterogeneous graph. We can learn a eral rules and impact factors behind app usage is critical for
low-dimensional data representation that embeds node features all relevant stakeholders, including smartphone manufactur-
and relations together using graph embedding technologies ers, service providers, and app developers, to provide valuable
like Metapath2Vec [219] and HAN [220], which improves the guidance.
effectiveness of data fusion. We can learn universal data rep- Globalization has a significant impact on the app ecosys-
resentations in this way, such as app embeddings that capture tem’s development. Globalization has accelerated interaction
global patterns of app usage, context information, and app and integration among people from various countries and cul-
metadata. The learned embeddings can be reused in multiple tural backgrounds in recent decades. The app’s usage also
models. reflects such interaction and integration. Apps are easier to
2) Privacy Preserving Analysis: Privacy is always an distribute around the globe and attract a large number of
important consideration when accessing, using, and sharing users. For instance, Pokémon Go, a popular smartphone game,
mobile app usage data. As we discussed in Section IV-A, was first released in the United States on July 6, 2016, and
users’ attributes can be inferred from their app usage data. quickly spread around the world. The study of how popular
Thus, privacy-preserving technologies must be used when apps spread is important for developing business strategies,
dealing with such sensitive data. Anonymization is a com- predicting app popularity, and profiling the app ecosystem.
mon technique. Böhmer et al. [87], for example, used hash Localization, on the other hand, is also a trend in the app
functions to replace all participants’ personal identifiers with ecosystem. Many apps provide access to local information
unrecognizable symbols. This method can delink multiple and are useful to both tourists and residents. For example,
sets of users, reducing the risk of individual identities being residents of Shenzhen, one of China’s largest cities, can use
leaked along with other public auxiliary data. However, sim- a parking app called Yitingche. KYOTO Trip+4 is an offi-
ply anonymizing user IDs is insufficient to ensure privacy. cial app that provides information to visitors and residents of
Based on anonymized app usage records, previous stud- Kyoto, Japan’s largest city. According to our careful inves-
ies have shown that most people can be re-identified and tigation, a few previous studies have worked on local apps.
tracked [187], [188]. In the privacy-preserving analysis, the Ochiai et al. [60], for example, created a framework for
problem of how to create a well-established anonymization identifying local apps in app stores and making app recom-
mechanism remains unresolved. mendations to residents. However, we pay less attention to
Most existing studies rely on centralized data processing, local apps in existing studies than we do to global apps. There
which is bound to raise privacy concerns. Collecting app could be two reasons for this. Local apps, tied to a specific
usage data and centralized processing has become more dif- location, usually have fewer users and have a less international
ficult as a result of recently enacted strict privacy protection impact than global apps. Second, conducting studies on local
regulations. The European GDPR, for example, requires data apps is more difficult due to a lack of data. Nonetheless, we
controllers to disclose any data collection and to declare the would like to emphasize that, in contrast to global apps that
lawful basis and purpose for data processing. In addition, support a broad range of interests, local apps focus on local
after iOS 9, the iOS system removed most data collection life services and deserve more research attention.
APIs to address user privacy concerns. As a result, convert- 2) Context-Aware App Usage Modeling: Context-aware app
ing to a decentralized analysis framework is critical to protect usage modeling is a critical but difficult problem to solve.
user privacy and facilitate the research community’s long-term Breakthroughs in mobile app usage pattern discovery, app
development. Federated Learning could be a promising analy- usage prediction, and app recommendations will be made if
sis framework [221]–[223]. Federated Learning allows mobiles this problem is solved. App usage modeling can also help with
to learn a shared model collaboratively while keeping all data the creation of synthetic datasets for benchmarking systems.
on the local device. Users are not required to upload their However, because app usage varies depending on the context,
usage data to cloud servers, and their sensitive information modeling such complex and dynamic behaviors is difficult.
will be well protected [224]. Researchers can also take proac- Thankfully, a few existing studies have taken the first steps.
tive steps to build privacy-preserving mechanisms by making Yu et al. [131] depicted the co-occurrence of apps, locations,
the analysis framework transparent and giving users complete and time to model app usage behaviors. Zhao et al. [126] con-
control over their data. These measures will help to protect sidered user characteristics. However, existing research either
user privacy to some extent and alleviate privacy concerns. ignores the sequential features of app usage or the location
context. Thus, modeling the sequential features of app usage
in various contexts, such as location, time, and motion, is still
C. Future Research a work in progress. The spatiotemporal graph neural network
1) App Evolution Globalization VS. Localization: Since the is one possible solution, in which we can use graph structure to
release of the first iPhone in 2007, the app ecosystem has model the co-occurrence of app usage and context factors, and
evolved significantly over the last two decades. Starting with a recurrent neural network (RNN) model to capture sequential
a few system-embedded apps, the app ecosystem has grown to patterns.
include over 3 billion apps that cater to a wide range of user
needs. Many apps have appeared in recent decades, attracting 4 https://play.google.com/store/apps/details?id=jp.kyoto.pref.visitkyoto&hl
960 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022
3) Deep Reasoning App Usage Behaviors: Existing and fine-grained datasets provide rich user behavioral data,
research is limited to correlation analysis. Even though correla- such as app usage and user mobility [228], for conducting
tion analysis can reveal co-occurrence, it lacks interpretability urban computing studies. There have been a few studies that
in some cases. When we find a user who uses travel apps fre- have focused on this promising area. Xia and Li [89], for
quently, for example, we do not know if he/she is traveling or example, used users’ online behavior, such as app usage, as
just planning to travel. Thus, it is difficult to provide direct well as offline behavior, such as human mobility, to uncover
guidance to service providers. Furthermore, the correlation urban dynamics and identify urban zone functions.
analysis may become stuck in Simpson’s Paradox, resulting
in incorrect analysis results. Therefore, adding causal analysis VII. C ONCLUSION
to deep reasoning app usage behaviors is required. Fortunately,
The research efforts on smartphone app usage analysis are
causal inference has progressed significantly in recent years.
surveyed and summarized in this paper. We first introduced
A number of promising methods, such as individual treatment
and compared various data sources, such as surveys, monitor-
effects estimation [225] and shared causal model [226], have
ing apps, network operators, and app stores. For the research
emerged. For reasoning analysis of app usage behaviors, these
community, we presented a set of public datasets and discussed
advanced methods provide robust technology and algorithm
privacy and ethical issues. The related studies in the app,
support.
user, and smartphone domains were surveyed, respectively. We
Sophisticated deep learning methods are now used to inves-
made a detailed taxonomy for each research domain based on
tigate app usage behaviors to improve system performance.
the problem investigated, the characteristics of the datasets
These black-box models, however, are unable to interpret the
used, the methods used, and the key results obtained. Finally,
discovered features and how they relate to the significant
we discussed two current research challenges and identified
results obtained. Several deep learning explainers [227] have
five future research directions for this hot topic.
recently been developed by advanced machine learning stud-
ies in an attempt to open the black-box models. Incorporating
these deep learning explainer models into app usage analysis ACKNOWLEDGMENT
will be a promising step toward improving the interoperability The authors would like to thank Prof. Yong Li for all of the
of profiling results and providing reliable guidance to relevant support and valuable discussions.
stakeholders.
4) Linking User Activities and App Usage: Smartphones, R EFERENCES
which are supported by a diverse set of mobile apps, allow
[1] N. Oliver et al., “Mobile phone data for informing public health actions
people to do things like order food, shop, manage their across the COVID-19 pandemic life cycle,” Sci. Adv., vol. 6, no. 23,
finances, and socialize more conveniently. These spatiotem- 2020, Art. no. eabc0764.
poral app usage traces have the potential to infer users’ [2] K. Church, D. Ferreira, N. Banovic, and K. Lyons, “Understanding
the challenges of mobile phone usage data,” in Proc. 17th Int.
physical world activities and can be used for user profiling Conf. Human–Comput. Interact. Mobile Devices Services, 2015,
and authentication. A strong link between physical activities pp. 504–514.
and mobile app usage has been demonstrated in a few stud- [3] T. Li, Y. Li, T. Xia, and P. Hui, “Finding spatiotemporal patterns of
mobile application usage,” IEEE Trans. Netw. Sci. Eng., early access,
ies. Li et al. [104] used app usage data to identify seven user Nov. 30, 2021, doi: 10.1109/TNSE.2021.3131194.
activities. However, they focused solely on app usage, ignor- [4] Statista. “Number of available applications in the google play store
ing the impact of time and locations. Different physical world from December 2009 to July 2021.” 2021. [Online]. Available:
https://www.statista.com/statistics/266210/number-of-available-
activities may be reflected by the same app usage behavior at applications-in-the-google-play-store/
different times and in different places. As a result, it is nec- [5] Statista. “Worldwide mobile app revenues in 2014 to 2023.”
essary to introduce spatial and temporal factors to infer users’ 2021. [Online]. Available: https://www.statista.com/statistics/269025/
worldwide-mobile-app-revenue-forecasts
physical activities better. This is difficult due to the high com- [6] J. Liu, Y. Fu, J. Ming, Y. Ren, L. Sun, and H. Xiong, “Effective and
plexity of spatiotemporal features. The probabilistic graphical real-time in-app activity analysis in encrypted Internet traffic streams,”
model, which is scalable to model multiple factors and can in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., 2017,
pp. 335–344.
apply different distribution functions to capture various activity [7] Y. Chen, X. Jin, J. Sun, R. Zhang, and Y. Zhang, “POWERFUL: Mobile
patterns, is one possible solution to this problem. app fingerprinting via power analysis,” in Proc. IEEE INFOCOM Conf.
5) Location-Based Service and Urban Computing: There Comput. Commun., 2017, pp. 1–9.
[8] A. J. Oliner, A. P. Iyer, I. Stoica, E. Lagerspetz, and S. Tarkoma, “Carat:
is a strong link between app usage behaviors and locations, Collaborative energy diagnosis for mobile devices,” in Proc. 11th ACM
according to numerous existing studies. However, the majority Conf. Embedded Netw. Sensor Syst., 2013, p. 10.
of them were only interested in using locations as contextual [9] F. Xu, Y. Li, H. Wang, P. Zhang, and D. Jin, “Understanding mobile
traffic patterns of large scale cellular towers in urban environment,”
features to improve app usage prediction and recommenda- IEEE/ACM Trans. Netw., vol. 25, no. 2, pp. 1147–1161, Apr. 2017.
tions. By leveraging app-location relationships, app usage data [10] M. Zeng et al., “Temporal–spatial mobile application usage under-
can be used in location-based services, such as urban com- standing and popularity prediction for edge caching,” IEEE Wireless
Commun., vol. 25, no. 3, pp. 36–42, Jun. 2018.
puting, in addition to app-oriented tasks. As we discussed in [11] S. Zhao et al., “User profiling from their use of smartphone appli-
Section II, the app usage dataset gathered from network oper- cations: A survey,” Pervasive Mobile Comput., vol. 59, Oct. 2019,
ators can cover the majority of mobile users in a given area, Art. no. 101052.
[12] A. De Masi and K. Wac, “You’re using this app for what? A MQOL
such as a city or state. The datasets usually include location living lab study,” in Proc. ACM Int. Joint Conf. Int. Symp. Pervasive
data derived from associated base stations. Such high-coverage Ubiquitous Comput. Wearable Comput., 2018, pp. 612–617.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 961
[13] B. Guo, Y. Ouyang, T. Guo, L. Cao, and Z. Yu, “Enhancing mobile app [37] V. F. Taylor, R. Spolaor, M. Conti, and I. Martinovic, “Robust smart-
user understanding and marketing with heterogeneous crowdsourced phone app identification via encrypted network traffic analysis,” IEEE
data: A review,” IEEE Access, vol. 7, pp. 68557–68571, 2019. Trans. Inf. Forensics Security, vol. 13, no. 1, pp. 63–78, Jan. 2018.
[14] W. Pan, N. Aharony, and A. Pentland, “Composite social network for [38] C. Xiang, Q. Chen, M. Xue, and H. Zhu, “APPCLASSIFIER:
predicting mobile apps installation,” in Proc. 25th AAAI Conf. Artif. Automated app inference on encrypted traffic via meta data analysis,”
Intell., 2011, pp. 821–827. in Proc. IEEE Global Commun. Conf. (GLOBECOM), 2018, pp. 1–7.
[15] D. Li, W. Li, X. Wang, C.-T. Nguyen, and S. Lu, “App trajectory recog- [39] J. K. Laurila et al., “The mobile data challenge: Big data for mobile
nition over encrypted Internet traffic based on deep neural network,” computing research,” in Proc. Pervasive Computing, 2012, pp. 1–8.
Comput. Netw., vol. 179, Oct. 2020, Art. no. 107372. [40] C. Shepard, A. Rahmati, C. Tossell, L. Zhong, and P. Kortum,
[16] X. Lu et al., “Mining usage data from large-scale Android users: “LiveLab: Measuring wireless networks and smartphone users in
Challenges and opportunities,” in Proc. IEEE/ACM Int. Conf. Mobile the field,” ACM SIGMETRICS Perform. Eval. Rev., vol. 38, no. 3,
Softw. Eng. Syst. (MOBILESoft), 2016, pp. 301–302. pp. 15–20, 2011.
[17] H. Cao and M. Lin, “Mining smartphone data for app usage prediction [41] C. Stachl et al., “Predicting personality from patterns of behavior col-
and recommendations: A survey,” Pervasive Mobile Comput., vol. 37, lected with smartphones,” Proc. Nat. Acad. Sci. USA, vol. 117, no. 30,
pp. 1–22, Jun. 2017. pp. 17680–17687, 2020.
[18] F. Kreuter, G.-C. Haas, F. Keusch, S. Bähr, and M. Trappmann, [42] H. Wang, F. Xu, Y. Li, P. Zhang, and D. Jin, “Understanding mobile
“Collecting survey and smartphone sensor data with an app: traffic patterns of large scale cellular towers in urban environment,” in
Opportunities and challenges around privacy and informed consent,” Proc. Internet Meas. Conf., 2015, pp. 225–238.
Soc. Sci. Comput. Rev., vol. 38, no. 5, pp. 533–549, 2020. [43] X. Liu et al., “Understanding diverse usage patterns from large-scale
[19] S. L. Lim, P. J. Bentley, N. Kanakam, F. Ishikawa, and S. Honiden, appstore-service profiles,” IEEE Trans. Softw. Eng., vol. 44, no. 4,
“Investigating country differences in mobile app user behavior and pp. 384–411, Apr. 2018.
challenges for software engineering,” IEEE Trans. Softw. Eng., vol. 41, [44] F. Lin, H. Wang, L. Wang, and X. Liu, “A longitudinal study
no. 1, pp. 40–64, Jan. 2015. of removed apps in IoS app store,” in Proc. Web Conf., 2021,
[20] G. R. Jesse, “Smartphone and app usage among college students: Using pp. 1435–1446.
smartphones effectively for social and educational needs,” in Proc. [45] S. Greengard, “Weighing the impact of GDPR,” Commun. ACM,
EDSIG Conf., 2015, p. 3424. vol. 61, no. 11, pp. 16–18, Oct. 2018.
[21] A. Hiniker, S. N. Patel, T. Kohno, and J. A. Kientz, “Why would [46] N. Momen, M. Hatamian, and L. Fritsch, “Did app privacy improve
you do that? Predicting the uses and gratifications behind smartphone- after the GDPR?” IEEE Security Privacy, vol. 17, no. 6, pp. 10–20,
usage behaviors,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Nov./Dec. 2019.
Comput., 2016, pp. 634–645. [47] J. L. Kröger, J. Lindemann, and D. Herrmann, “How do app vendors
[22] R. Wang et al., “StudentLife: Assessing mental health, academic respond to subject access requests? A longitudinal privacy study on IoS
performance and behavioral trends of college students using smart- and Android apps,” in Proc. 15th Int. Conf. Availability Rel. Security,
phones,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput., 2020, pp. 1–10.
2014, pp. 3–14. [48] J. M. Wing, “Data for good,” in Proc. 24th ACM SIGKDD Int. Conf.
[23] M. L. Gordon, L. Gatys, C. Guestrin, J. P. Bigham, A. Trister, and Knowl. Disc. Data Min., 2018, p. 4.
K. Patel, “App usage predicts cognitive ability in older adults,” in Proc. [49] J. Isaak and M. J. Hanna, “User data privacy: Facebook, cambridge
CHI Conf. Human Factors Comput. Syst., 2019, pp. 1–12. analytica, and privacy protection,” Computer, vol. 51, no. 8, pp. 56–59,
[24] M. Chen, S. Mao, and Y. Liu, “Big data: A survey,” Mobile Netw. 2018.
Appl., vol. 19, no. 2, pp. 171–209, 2014. [50] T. Petsas, A. Papadogiannakis, M. Polychronakis, E. P. Markatos, and
[25] I. Andone, K. Błaszkiewicz, M. Eibes, B. Trendafilov, C. Montag, and T. Karagiannis, “Rise of the planet of the apps: A systematic study of
A. Markowetz, “Menthal: A framework for mobile data collection and the mobile app ecosystem,” in Proc. Conf. Internet Meas. Conf., 2013,
analysis,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput. pp. 277–290.
Adjunct, 2016, pp. 624–629. [51] H. Mendelson and K. Moon, “Modeling success and engagement for
[26] B. Finley and T. Soikkeli, “Mobile device type substitution,” Proc. the app economy,” in Proc. World Wide Web Conf., 2018, pp. 569–578.
ACM Interact. Mobile Wearable Ubiquitous Technol., vol. 2, no. 1, [52] H. Wang, H. Li, and Y. Guo, “Understanding the evolution of mobile
pp. 1–20, 2018. app ecosystems: A longitudinal measurement study of google play,” in
[27] X. Su, D. Zhang, W. Li, and X. Wang, “AndroGenerator: An automated Proc. World Wide Web Conf., 2019, pp. 1988–1999.
and configurable Android app network traffic generation system,” [53] T. Petsas, A. Papadogiannakis, M. Polychronakis, E. P. Markatos, and
Security Commun. Netw., vol. 8, no. 18, pp. 4273–4288, 2015. T. Karagiannis, “Measurement, modeling, and analysis of the mobile
[28] A. Zuniga et al., “Tortoise or hare? Quantifying the effects of app ecosystem,” ACM Trans. Model. Perform. Eval. Comput. Syst.,
performance on mobile app retention,” in Proc. World Wide Web Conf., vol. 2, no. 2, pp. 1–33, 2017.
2019, pp. 2517–2528. [54] C. Jesdabodi and W. Maalej, “Understanding usage states on mobile
[29] M. Krichen, “Anomalies detection through smartphone sensors: A devices,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput.,
review,” IEEE Sensors J., vol. 21, no. 6, pp. 7207–7217, Mar. 2021. 2015, pp. 1221–1225.
[30] Q. Xu et al., “Automatic generation of mobile app signatures from traf- [55] W. Sun, S. Wu, and Z. Xue, “Clustering mobile apps based on design
fic observations,” in Proc. IEEE Conf. Comput. Commun. (INFOCOM), and manufacturing genre,” in Proc. IEEE 6th Int. Conf. Comput.
2015, pp. 1481–1489. Commun. (ICCC), 2020, pp. 1956–1960.
[31] J. Yang, Y. Qiao, X. Zhang, H. He, F. Liu, and G. Cheng, [56] H. Zhu, H. Cao, E. Chen, H. Xiong, and J. Tian, “Exploiting enriched
“Characterizing user behavior in mobile Internet,” IEEE Trans. Emerg. contextual information for mobile app classification,” in Proc. 21st
Topics Comput., vol. 3, no. 1, pp. 95–106, Mar. 2015. ACM Int. Conf. Inf. Knowl. Manag., 2012, pp. 1617–1621.
[32] Q. Xu, J. Erman, A. Gerber, Z. M. Mao, J. Pang, and S. Venkataraman, [57] S. Xue, L. Zhang, A. Li, X.-Y. Li, C. Ruan, and W. Huang, “AppDNA:
“Identifying diverse usage behaviors of smartphone apps,” in Proc. App behavior profiling via graph-based deep learning,” in Proc. IEEE
ACM SIGCOMM Conf. Internet Meas. Conf., 2011, pp. 329–344. INFOCOM Conf. Comput. Commun., 2018, pp. 1475–1483.
[33] S. Zhao et al., “Discovering different kinds of smartphone users [58] M. Shamsujjoha, J. Grundy, L. Li, H. Khalajzadeh, and Q. Lu,
through their application usage behaviors,” in Proc. ACM Int. Joint “Checking app behavior against app descriptions: What if there are
Conf. Pervasive Ubiquitous Comput., 2016, pp. 498–509. no app descriptions?” in Proc. IEEE/ACM 29th Int. Conf. Program
[34] J. Huang, F. Xu, Y. Lin, and Y. Li, “On the understanding of interde- Comprehension (ICPC) (ICPC), May 2021, pp. 422–432.
pendency of mobile app usage,” in Proc. IEEE 14th Int. Conf. Mobile [59] A. A. Al-Subaihin et al., “Clustering mobile apps based on mined
Ad Hoc Sensor Syst. (MASS), 2017, pp. 471–475. textual features,” in Proc. 10th ACM/IEEE Int. Symp. Empirical Softw.
[35] H. Yao, G. Ranjan, A. Tongaonkar, Y. Liao, and Z. M. Mao, Eng. Meas., 2016, pp. 1–10.
“SAMPLES: Self adaptive mining of persistent lexical snippets for [60] K. Ochiai, F. Putri, and Y. Fukazawa, “Local app classification using
classifying mobile application traffic,” in Proc. 21st Annu. Int. Conf. deep neural network based on mobile app market data,” in Proc. IEEE
Mobile Comput. Netw., 2015, pp. 439–451. Int. Conf. Pervasive Comput. Commun. PerCom, 2019, pp. 186–191.
[36] X. Wang, S. Chen, and J. Su, “Real network traffic collection and [61] A. Fuad and M. Al-Yahya, “Analysis and classification of mobile
deep learning for mobile app identification,” Wireless Commun. Mobile apps using topic modeling: A case study on google play arabic apps,”
Comput., vol. 2020, pp. 1–14, Feb. 2020. Complexity, vol. 2021, Feb. 2021, Art. no. 6677413.
962 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022
[62] D. Surian, S. Seneviratne, A. Seneviratne, and S. Chawla, “App mis- [87] M. Böhmer, B. Hecht, J. Schöning, A. Krüger, and G. Bauer, “Falling
categorization detection: A case study on google play,” IEEE Trans. asleep with angry birds, facebook and kindle: A large scale study on
Knowl. Data Eng., vol. 29, no. 8, pp. 1591–1604, Aug. 2017. mobile application usage,” in Proc. 13th Int. Conf. Human Comput.
[63] H. Zhu, E. Chen, H. Xiong, H. Cao, and J. Tian, “Mobile app clas- Interact. Mobile Devices Services, 2011, pp. 47–56.
sification with enriched contextual information,” IEEE Trans. Mobile [88] E. Graells-Garrido, D. Caro, O. Miranda, R. Schifanella, and
Comput., vol. 13, no. 7, pp. 1550–1563, Jul. 2014. O. F. Peredo, “The WWW (and an H) of mobile application usage
[64] A. Berger, S. A. D. Pietra, and V. J. D. Pietra, “A maximum entropy in the city: The what, where, when, and how,” in Proc. Web Conf.,
approach to natural language processing,” Comput. Linguist., vol. 22, 2018, pp. 1221–1229.
no. 1, pp. 39–71, 1996. [89] T. Xia and Y. Li, “Revealing urban dynamics by learning online and
[65] V. Radosavljevic et al., “Smartphone app categorization for interest tar- offline behaviours together,” Proc. ACM Interact. Mobile Wearable
geting in advertising marketplace,” in Proc. 25th Int. Conf. Companion Ubiquitous Technol., vol. 3, no. 1, pp. 1–25, 2019.
World Wide Web, 2016, pp. 93–94. [90] Y. Ren, T. Xia, Y. Li, and X. Chen, “Predicting socio-economic levels
[66] Y. He, C. Wang, G. Xu, W. Lian, H. Xian, and W. Wang, “Privacy- of urban regions via offline and online indicators,” PLoS ONE, vol. 14,
preserving categorization of mobile applications based on large-scale no. 7, 2019, Art. no. e0219058.
usage data,” Inf. Sci., vol. 514, pp. 557–570, Apr. 2020. [91] S. Van Canneyt, M. Bron, A. Haines, and M. Lalmas, “Describing
[67] E. J. Keogh and C. A. Ratanamahatana, “Exact indexing of dynamic patterns and disruptions in large scale mobile app usage data,”
time warping,” Knowl. Inf. Syst., vol. 7, no. 3, pp. 358–386, 2005. in Proc. 26th Int. Conf. World Wide Web Companion, 2017,
[68] A. Morrison, X. Xiong, M. Higgs, M. Bell, and M. Chalmers, “A pp. 1579–1584.
large-scale study of iphone app launch behaviour,” in Proc. CHI Conf. [92] T. Li, M. Zhang, Y. Li, E. Lagerspetz, S. Tarkoma, and P. Hui, “The
Human Factors Comput. Syst., 2018, pp. 1–13. impact of COVID-19 on smartphone usage,” IEEE Internet Things J.,
[69] H. Li et al., “Characterizing smartphone usage patterns from millions vol. 8, no. 23, pp. 16723–16733, Dec. 2021.
of Android users,” in Proc. Internet Meas. Conf., 2015, pp. 459–472. [93] V. Srinivasan, S. Moghaddam, A. Mukherji, K. K. Rachuri, C. Xu, and
[70] H. Wang et al., “Beyond google play: A large-scale comparative study E. M. Tapia, “MobileMiner: Mining your frequent patterns on your
of Chinese Android app markets,” in Proc. Internet Meas. Conf., 2018, phone,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput.,
pp. 293–307. 2014, pp. 389–400.
[71] S. Sigg, E. Lagerspetz, E. Peltonen, P. Nurmi, and S. Tarkoma, [94] Y. Tian, K. Zhou, M. Lalmas, and D. Pelleg, “Identifying tasks from
“Exploiting usage to predict instantaneous app popularity: Trend fil- mobile app usage patterns,” in Proc. 43rd Int. ACM SIGIR Conf. Res.
ters and retention rates,” ACM Trans. Web, vol. 13, no. 2, pp. 1–25, Develop. Inf. Retrieval, 2020, pp. 2357–2366.
2019. [95] B. Friedrichs, L. D. Turner, and S. M. Allen, “Discovering types
[72] Y. Tian, K. Zhou, and D. Pelleg, “What and how long: Prediction of smartphone usage sessions from user-app interactions,” in Proc.
of mobile app engagement,” ACM Trans. Inf. Syst., vol. 40, no. 1, IEEE Int. Conf. Pervasive Comput. Commun. Workshops (PerCom
pp. 1–38, 2022. Workshops), 2021, pp. 459–464.
[96] A. Rahmati, C. Shepard, C. Tossell, L. Zhong, and P. Kortum, “Practical
[73] T. Li, M. Zhang, H. Cao, Y. Li, S. Tarkoma, and P. Hui, “What apps did
context awareness: Measuring and utilizing the context dependency
you use? Understanding the long-term evolution of mobile app usage,”
of mobile usage,” IEEE Trans. Mobile Comput., vol. 14, no. 9,
in Proc. Web Conf., 2020, pp. 66–76.
pp. 1932–1946, Sep. 2015.
[74] T. Li, Y. Fan, Y. Li, S. Tarkoma, and P. Hui, “Understanding the long-
[97] Y. Fan et al., “Understanding the long-term dynamics of mobile app
term evolution of mobile app usage,” IEEE Trans. Mobile Comput.,
usage context via graph embedding,” IEEE Trans. Knowl. Data Eng.,
early access, Jul. 21, 2021, doi: 10.1109/TMC.2021.3098664.
early access, Sep. 3, 2021, doi: 10.1109/TKDE.2021.3110141.
[75] S. Shen, X. Lu, Z. Hu, and X. Liu, “Towards release strategy
[98] Y. Liu, K. Yu, X. Wu, Y. Shi, and Y. Tan, “Association rules mining
optimization for apps in google play,” in Proc. 9th Asia–Pac. Symp.
analysis of app usage based on mobile traffic flow data,” in Proc. IEEE
Internetware, 2017, pp. 1–10.
3rd Int. Conf. Big Data Anal. (ICBDA), 2018, pp. 55–60.
[76] B. Fu, J. Lin, L. Li, C. Faloutsos, J. Hong, and N. Sadeh, “Why people [99] W.-R. Tseng and K.-W. Hsu, “Smartphone app usage log mining,” Int.
hate your app: Making sense of user feedback in a mobile app store,” J. Comput. Elect. Eng., vol. 6, no. 2, pp. 151–156, 2014.
in Proc. 19th ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., 2013, [100] D. Ferreira, J. Goncalves, V. Kostakos, L. Barkhuus, and A. K. Dey,
pp. 1276–1284. “Contextual experience sampling of mobile application micro-usage,”
[77] R. Potharaju, M. Rahman, and B. Carbunar, “A longitudinal study in Proc. 16th Int. Conf. Human–Comput. Interact. Mobile Devices
of google play,” IEEE Trans. Comput. Soc. Syst., vol. 4, no. 3, Services, 2014, pp. 91–100.
pp. 135–149, Sep. 2017. [101] A. Shema and D. E. Acuna, “Show me your app usage and I will tell
[78] X. Lu et al., “PRADA: Prioritizing Android devices for apps by mining who your close friends are: Predicting user’s context from simple cell-
large-scale usage data,” in Proc. IEEE/ACM 38th Int. Conf. Softw. Eng. phone activity,” in Proc. CHI Conf. Extended Abstracts Human Factors
(ICSE), 2016, pp. 3–13. Comput. Syst., 2017, pp. 2929–2935.
[79] Y. Wang, N. J. Yuan, Y. Sun, C. Qin, and X. Xie, “App download fore- [102] I. M. Kloumann, L. A. Adamic, J. M. Kleinberg, and S. Wu, “The
casting: An evolutionary hierarchical competition approach,” in Proc. lifecycles of apps in a social ecosystem,” in Proc. 24th Int. Conf. World
IJCAI, 2017, pp. 2978–2984. Wide Web, 2015, pp. 581–591.
[80] H. Zhu, C. Liu, Y. Ge, H. Xiong, and E. Chen, “Popularity modeling [103] D. G. Taylor, T. A. Voelker, and I. Pentina, Mobile Application
for mobile apps: A sequential approach,” IEEE Trans. Cybern., vol. 45, Adoption by Young Adults: A Social Network Perspective, WCOB
no. 7, pp. 1303–1314, Jul. 2015. Faculty, Jack Welch College Bus., Fairfield CT, USA, 2011.
[81] Y. Ouyang, B. Guo, T. Guo, L. Cao, and Z. Yu, “Modeling and fore- [104] T. Li, Y. Li, M. A. Hoque, T. Xia, S. Tarkoma, and P. Hui, “To
casting the popularity evolution of mobile apps: A multivariate hawkes what extent we repeat ourselves? Discovering daily activity patterns
process approach,” Proc. ACM Interact. Mobile Wearable Ubiquitous across mobile app usage,” IEEE Trans. Mobile Comput., vol. 21, no. 4,
Technol., vol. 2, no. 4, pp. 1–23, 2018. pp. 1492–1507, Apr. 2022.
[82] Y. Zhang, J. Liu, B. Guo, Z. Wang, Y. Liang, and Z. Yu, “App popular- [105] M. Z. Shafiq, L. Ji, A. X. Liu, J. Pang, and J. Wang, “Characterizing
ity prediction by incorporating time-varying hierarchical interactions,” geospatial dynamics of application usage in a 3G cellular data
IEEE Trans. Mobile Comput., vol. 21, no. 5, pp. 1566–1579, May 2022. network,” in Proc. IEEE INFOCOM, 2012, pp. 1341–1349.
[83] H. Li et al., “Voting with their feet: Inferring user preferences from [106] N. Ghahramani and C. Brakewood, “Trends in mobile transit
app management activities,” in Proc. 25th Int. Conf. World Wide Web, information utilization: An exploratory analysis of transit app in
2016, pp. 1351–1362. New York City,” J. Public Transp., vol. 19, no. 3, p. 9, 2016.
[84] P. Papapetrou and G. Roussos, “Social context discovery from temporal [107] V. Kostakos, D. Ferreira, J. Goncalves, and S. Hosio, “Modelling smart-
app use patterns,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous phone usage: A Markov state transition model,” in Proc. ACM Int. Joint
Comput. Adjunct Publication, 2014, pp. 397–402. Conf. Pervasive Ubiquitous Comput., 2016, pp. 486–497.
[85] A. Mehrotra et al., “Understanding the role of places and activities [108] T.-M.-T. Do and D. Gatica-Perez, “By their apps you shall understand
on mobile phone interaction and usage patterns,” Proc. ACM Interact. them: Mining large-scale patterns of mobile phone usage,” in Proc. 9th
Mobile Wearable Ubiquitous Technol., vol. 1, no. 3, pp. 1–22, 2017. Int. Conf. Mobile Ubiquitous Multimedia, 2010, pp. 1–10.
[86] T. M. T. Do, J. Blom, and D. Gatica-Perez, “Smartphone usage in the [109] F. A. Silva, A. C. S. A. Domingues, and T. R. B. Silva, “Discovering
wild: A large-scale analysis of applications and context,” in Proc. 13th mobile application usage patterns from a large-scale dataset,” ACM
Int. Conf. Multimodal Interfaces, 2011, pp. 353–360. Trans. Knowl. Disc. Data, vol. 12, no. 5, pp. 1–36, 2018.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 963
[110] J. Lee and D.-H. Shin, “Targeting potential active users for mobile [133] X.-X. Zhao, Y. Qiao, Z. Si, J. Yang, and A. Lindgren, “Prediction of
app install advertising: An exploratory study,” Int. J. Human–Comput. user app usage behavior from geo-spatial data,” in Proc. 3rd Int. ACM
Interact., vol. 32, no. 11, pp. 827–834, 2016. SIGMOD Workshop Manag. Min. Enriched Geo-Spatial Data, 2016,
[111] S. L. Jones, D. Ferreira, S. Hosio, J. Goncalves, and V. Kostakos, pp. 1–6.
“Revisitation analysis of smartphone app use,” in Proc. ACM Int. Joint [134] M. De Nadai, A. Cardoso, A. Lima, B. Lepri, and N. Oliver, “Strategies
Conf. Pervasive Ubiquitous Comput., 2015, pp. 1197–1208. and limitations in app usage and human mobility,” Sci. Rep., vol. 9,
[112] L. A. Leiva, M. Böhmer, S. Gehring, and A. Krüger, “Back to the no. 1, pp. 1–9, 2019.
app: The costs of mobile application interruptions,” in Proc. 14th [135] T. M. T. Do and D. Gatica-Perez, “Where and what: Using smartphones
Int. Conf. Human–Comput. Interact. Mobile Devices Services, 2012, to predict next locations and applications in daily life,” Pervasive
pp. 291–294. Mobile Comput., vol. 12, pp. 79–91, Jun. 2014.
[113] J. P. Rula, B. Jun, and F. E. Bustamante, “Mobile AD (D) estimating [136] Y. Xu et al., “Preference, context and communities: A multi-faceted
mobile app session times for better ads,” in Proc. 16th Int. Workshop approach to predicting smartphone app usage patterns,” in Proc. Int.
Mobile Comput. Syst. Appl., 2015, pp. 123–128. Symp. Wearable Comput., 2013, pp. 69–76.
[114] H. Cao, Z. Chen, F. Xu, Y. Li, and V. Kostakos, “Revisitation in urban [137] D. Han, J. Li, L. Yang, and Z. Zeng, “A recommender system to address
space vs. online: A comparison across pois, websites, and smartphone the cold start problem for app usage prediction,” Int. J. Mach. Learn.
apps,” Proc. ACM Interact. Mobile Wearable Ubiquitous Technol., Cybern., vol. 10, no. 9, pp. 2257–2268, 2019.
vol. 2, no. 4, pp. 1–24, 2018. [138] R. Jisha, J. Amrita, A. R. Vijay, and G. Indhu, “Mobile app recommen-
[115] Y. Tian, K. Zhou, M. Lalmas, Y. Liu, and D. Pelleg, “Cohort modeling dation system using machine learning classification,” in Proc. IEEE 4th
based app category usage prediction,” in Proc. 28th ACM Conf. User Int. Conf. Comput. Methodol. Commun. (ICCMC), 2020, pp. 940–943.
Model. Adapt. Pers., 2020, pp. 248–256. [139] B. Yan and G. Chen, “Appjoy: Personalized mobile application dis-
[116] Z. Tu, Y. Li, P. Hui, L. Su, and D. Jin, “Personalized mobile app covery,” in Proc. 9th Int. Conf. Mobile Syst. Appl. Services, 2011,
prediction by learning user’s interest from social media,” IEEE Trans. pp. 113–126.
Mobile Comput., vol. 19, no. 11, pp. 2670–2683, Nov. 2020. [140] K. Shi and K. Ali, “Getjar mobile application recommendations with
[117] D. Yu, Y. Li, F. Xu, P. Zhang, and V. Kostakos, “Smartphone app very sparse datasets,” in Proc. 18th ACM SIGKDD Int. Conf. Knowl.
usage prediction using points of interest,” Proc. ACM Interact. Mobile Disc. Data Min., 2012, pp. 204–212.
Wearable Ubiquitous Technol., vol. 1, no. 4, pp. 1–21, 2018. [141] T. Liang et al., “Mobile app recommendation via heterogeneous graph
[118] C. Shin, J.-H. Hong, and A. K. Dey, “Understanding and prediction neural network in edge computing,” Appl. Soft Comput., vol. 103,
of mobile application usage for smart phones,” in Proc. ACM Conf. May 2021, Art. no. 107162.
Ubiquitous Comput., 2012, pp. 173–182. [142] Y. Ouyang, B. Guo, X. Tang, X. He, J. Xiong, and Z. Yu, “Mobile
[119] N. Natarajan, D. Shin, and I. S. Dhillon, “Which app will you use app cross-domain recommendation with multi-graph neural network,”
next? Collaborative filtering with interactional context,” in Proc. 7th ACM Trans. Knowl. Disc. Data, vol. 15, no. 4, pp. 1–21, 2021.
ACM Conf. Recommender Syst., 2013, pp. 201–208. [143] Y. Xu, Y. Zhu, Y. Shen, and J. Yu, “Leveraging app usage contexts for
[120] X. Zou, W. Zhang, S. Li, and G. Pan, “Prophet: What app you wish to app recommendation: A neural approach,” World Wide Web, vol. 22,
use next,” in Proc. ACM Conf. Pervasive Ubiquitous Comput. Adjunct no. 6, pp. 2721–2745, 2019.
Publication, 2013, pp. 167–170. [144] M. Böhmer, L. Ganev, and A. Krüger, “AppFunnel: A framework
[121] R. Baeza-Yates, D. Jiang, F. Silvestri, and B. Harrison, “Predicting the for usage-centric evaluation of recommender systems that suggest
next app that you are going to use,” in Proc. 8th ACM Int. Conf. Web mobile applications,” in Proc. Int. Conf. Intell. User Interfaces, 2013,
Search Data Min., 2015, pp. 285–294. pp. 267–276.
[122] S. Xu, W. Li, X. Zhang, S. Gao, T. Zhan, and S. Lu, “Predicting [145] Z. Tu, B. Duan, Z. Wang, and X. Xu, “Bidirectional sensing of user
and recommending the next smartphone apps based on recurrent neu- preferences and application changes for dynamic mobile app recom-
ral network,” CCF Trans. Pervasive Comput. Interact., vol. 2, no. 4, mendations,” Neural Comput. Appl., vol. 33, no. 16, pp. 9791–9803,
pp. 314–328, 2020. 2021.
[123] Y. Lee, S. Cho, and J. Choi, “App usage prediction for dual display [146] C. Liu, J. Cao, and S. Feng, “Leveraging kernel-incorporated matrix
device via two-phase sequence modeling,” Pervasive Mobile Comput., factorization for app recommendation,” ACM Trans. Knowl. Disc. Data,
vol. 58, Aug. 2019, Art. no. 101025. vol. 13, no. 3, pp. 1–27, 2019.
[124] Y. Jiang, X. Du, and T. Jin, “Using combined network information [147] T. Liang et al., “A broad learning approach for context-aware mobile
to predict mobile application usage,” Physica A Stat. Mech. Appl., application recommendation,” in Proc. IEEE Int. Conf. Data Min.
vol. 515, pp. 430–439, Feb. 2019. (ICDM), 2017, pp. 955–960.
[125] H. Wang, Y. Li, M. Du, Z. Li, and D. Jin, “App2Vec: Context-aware [148] J. Lin, K. Sugiyama, M.-Y. Kan, and T.-S. Chua, “New and improved:
application usage prediction,” ACM Trans. Knowl. Disc. Data, vol. 15, Modeling versions to improve app recommendation,” in Proc. 37th Int.
no. 6, pp. 1–21, 2021. ACM SIGIR Conf. Res Develop. Inf. Retrieval, 2014, pp. 647–656.
[126] S. Zhao et al., “AppUsage2Vec: Modeling smartphone app usage for [149] D. Cao et al., “Version-sensitive mobile app recommendation,” Inf. Sci.,
prediction,” in Proc. IEEE 35th Int. Conf. Data Eng. (ICDE), 2019, vol. 381, pp. 161–175, Mar. 2017.
pp. 1322–1333. [150] S. Gao, L. Liu, Y. Liu, H. Liu, Y. Wang, and P. Liu, “App recom-
[127] A. Parate, M. Böhmer, D. Chu, D. Ganesan, and B. M. Marlin, mendation based on both quality and security,” J. Softw. Evol. Process,
“Practical prediction and prefetch for faster access to applications on vol. 33, no. 3, 2021, Art. no. e2325.
mobile phones,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous [151] T. Rocha, E. Souto, and K. El-Khatib, “Functionality-based mobile
Comput., 2013, pp. 275–284. application recommendation system with security and privacy aware-
[128] H. Wang et al., “Modeling spatio-temporal app usage for a large ness,” Comput. Security, vol. 97, Oct. 2020, Art. no. 101972.
user population,” Proc. ACM Interact. Mobile Wearable Ubiquitous [152] H. Wang, Z. Tu, Y. Fu, Z. Wang, and X. Xu, “Time-aware user profiling
Technol., vol. 3, no. 1, pp. 1–23, 2019. from personal service ecosystem,” Neural Comput. Appl., vol. 33, no. 8,
[129] T. Xia et al., “DeepApp: Predicting personalized smartphone app pp. 3597–3619, 2021.
usage via context-aware multi-task learning,” ACM Trans. Intell. Syst. [153] H. Kwon, S. Lee, and D. Jeong, “User profiling via application usage
Technol., vol. 11, no. 6, pp. 1–12, 2020. pattern on digital devices for digital forensics,” Exp. Syst. Appl.,
[130] X. Chen, Y. Wang, J. He, S. Pan, Y. Li, and P. Zhang, “CAP: Context- vol. 168, Apr. 2021, Art. no. 114488.
aware app usage prediction with heterogeneous graph embedding,” [154] Y. Liang, Z. Cai, J. Yu, Q. Han, and Y. Li, “Deep learning based infer-
Proc. ACM Interact. Mobile Wearable Ubiquitous Technol., vol. 3, ence of private information using embedded sensors in smart devices,”
no. 1, pp. 1–25, 2019. IEEE Netw., vol. 32, no. 4, pp. 8–14, Jul./Aug. 2018.
[131] Y. Yu, T. Xia, H. Wang, J. Feng, and Y. Li, “Semantic-aware spatio- [155] R. Razavi, “Personality segmentation of users through mining their
temporal app usage representation via graph convolutional network,” mobile usage patterns,” Int. J. Human–Comput. Stud., vol. 143,
Proc. ACM Interact. Mobile Wearable Ubiquitous Technol., vol. 4, Nov. 2020, Art. no. 102470.
no. 3, pp. 1–24, 2020. [156] I. Andone, K. Błaszkiewicz, M. Eibes, B. Trendafilov, C. Montag, and
[132] Y. Zhou, S. Li, and Y. Liu, “Graph-based method for app usage A. Markowetz, “How age and gender affect smartphone usage,” in
prediction with attributed heterogeneous network embedding,” Future Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput. Adjunct,
Internet, vol. 12, no. 3, p. 58, 2020. 2016, pp. 9–12.
964 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022
[157] S. Zhao et al., “Investigating smartphone user differences in their appli- [180] G. Chittaranjan, J. Blom, and D. Gatica-Perez, “Mining large-scale
cation usage behaviors: An empirical study,” CCF Trans. Pervasive smartphone data for personality studies,” Pers. Ubiquitous Comput.,
Comput. Interact., vol. 1, no. 2, pp. 140–161, 2019. vol. 17, no. 3, pp. 433–450, 2013.
[158] Z. Tu et al., “Demographics of mobile app usage: Long-term analysis [181] N. K. Kambham, K. G. Stanley, and S. Bell, “Predicting personality
of mobile app usage,” CCF Trans. Pervasive Comput. Interact., vol. 3, traits using smartphone sensor data and app usage data,” in Proc. IEEE
pp. 235–252, Apr. 2021. 9th Annu. Inf. Technol. Electron. Mobile Commun. Conf. (IEMCON),
[159] E. Peltonen et al., “The hidden image of mobile apps: Geographic, 2018, pp. 125–132.
demographic, and cultural factors in mobile usage,” in Proc. 20th [182] S. Gao, W. Li, L. J. Song, X. Zhang, M. Lin, and S. Lu,
Int. Conf. Human–Comput. Interact. Mobile Devices Services, 2018, “PersonalitySensing: A multi-view multi-task learning approach for
pp. 1–12. personality detection based on smartphone usage,” in Proc. 28th ACM
[160] E. Guzman, L. Oliveira, Y. Steiner, L. C. Wagner, and M. Glinz, Int. Conf. Multimedia, 2020, pp. 2862–2870.
“User feedback in the app store: A cross-cultural study,” in Proc. [183] K. Ochiai, N. Yamamoto, T. Hamatani, Y. Fukazawa, and
IEEE/ACM 40th Int. Conf. Softw. Eng. Softw. Eng. Soc. (ICSE-SEIS), T. Yamaguchi, “Exploiting graph convolutional networks for represen-
2018, pp. 13–22. tation learning of mobile app usage,” in Proc. IEEE Int. Conf. Big Data
[161] F. Huseynov, “Understanding usage behavior of different mobile (Big Data), 2019, pp. 5379–5383.
application categories based on personality traits,” Interact. Comput., [184] S. Zhao et al., “Mining user attributes using large-scale app lists of
vol. 32, no. 1, pp. 66–80, 2020. smartphones,” IEEE Syst. J., vol. 11, no. 1, pp. 315–323, Mar. 2017.
[162] F. Beierle et al., “Frequency and duration of daily smartphone usage in [185] H. Falaki, R. Mahajan, S. Kandula, D. Lymberopoulos, R. Govindan,
relation to personality traits,” Digit. Psychol., vol. 1, no. 1, pp. 20–28, and D. Estrin, “Diversity in smartphone usage,” in Proc. 8th Int. Conf.
2020. Mobile Syst. Appl. Services, 2010, pp. 179–194.
[163] Y. Gao, A. Li, T. Zhu, X. Liu, and X. Liu, “How smartphone usage [186] P. Welke, I. Andone, K. Blaszkiewicz, and A. Markowetz,
correlates with social anxiety and loneliness,” PeerJ, vol. 4, Jul. 2016, “Differentiating smartphone users by app usage,” in Proc. ACM Int.
Art. no. e2197. Joint Conf. Pervasive Ubiquitous Comput., 2016, pp. 519–523.
[164] K. Katevas, I. Arapakis, and M. Pielot, “Typical phone use habits: [187] Z. Tu et al., “Your apps give you away: Distinguishing mobile users
Intense use does not predict negative well-being,” in Proc. 20th by their app usage fingerprints,” Proc. ACM Interact. Mobile Wearable
Int. Conf. Human–Comput. Interact. Mobile Devices Services, 2018, Ubiquitous Technol., vol. 2, no. 3, pp. 1–23, 2018.
pp. 1–13. [188] V. Sekara, L. Alessandretti, E. Mones, and H. Jonsson, “Temporal and
[165] Z. Sarsenbayeva et al., “Does smartphone use drive our emotions or cultural limits of privacy in smartphone app usage,” Sci. Rep., vol. 11,
vice versa? A causal analysis,” in Proc. CHI Conf. Human Factors no. 1, pp. 1–9, 2021.
Comput. Syst., 2020, pp. 1–15. [189] Y. Ashibani and Q. H. Mahmoud, “A multi-feature user authentica-
[166] S. Zhao et al., “Who are the smartphone users? Identifying user groups tion model based on mobile app interactions,” IEEE Access, vol. 8,
with apps usage behaviors,” Mobile Comput. Commun., vol. 21, no. 2, pp. 96322–96339, 2020.
pp. 31–34, 2017. [190] Y. Ashibani and Q. H. Mahmoud, “A behavior profiling model for
user authentication in iot networks based on app usage patterns,” in
[167] Y. Lee, I. Park, S. Cho, and J. Choi, “Smartphone user segmentation
Proc. IEEE 44th Annu. Conf. Ind. Electron. Soc. (IECON), 2018,
based on app usage sequence with neural networks,” Telemat. Informat.,
pp. 2841–2846.
vol. 35, no. 2, pp. 329–339, 2018.
[191] Y. Ashibani and Q. H. Mahmoud, “Design and evaluation of a user
[168] R. M. Frey, R. Xu, and A. Ilic, “Mobile app adoption in different
authentication model for iot networks based on app event patterns,”
life stages: An empirical analysis,” Pervasive Mobile Comput., vol. 40,
Clust. Comput., vol. 24, no. 2, pp. 837–850, 2021.
pp. 512–527, Sep. 2017.
[192] D. Bassu, M. Cochinwala, and A. Jain, “A new mobile biometric based
[169] X. Chen, Z. Zhu, M. Chen, and Y. Li, “Large-scale mobile fitness app
upon usage context,” in Proc. IEEE Int. Conf. Technol. Homeland
usage analysis for smart health,” IEEE Commun. Mag., vol. 56, no. 4,
Security (HST), 2013, pp. 441–446.
pp. 46–52, Apr. 2018.
[193] L. Fridman, S. Weber, R. Greenstadt, and M. Kam, “Active authen-
[170] S. Seneviratne, A. Seneviratne, P. Mohapatra, and A. Mahanti, “Your tication on mobile devices via stylometry, application usage, Web
installed apps reveal your gender and more!” ACM SIGMOBILE Mobile browsing, and GPS location,” IEEE Syst. J., vol. 11, no. 2, pp. 513–521,
Comput. Commun. Rev., vol. 18, no. 3, pp. 55–61, 2015. Jun. 2017.
[171] S. Seneviratne, A. Seneviratne, P. Mohapatra, and A. Mahanti, [194] U. Mahbub, J. Komulainen, D. Ferreira, and R. Chellappa, “Continuous
“Predicting user traits from a snapshot of apps installed on a smart- authentication of smartphones based on application usage,” IEEE Trans.
phone,” ACM SIGMOBILE Mobile Comput. Commun. Rev., vol. 18, Biometr. Behav. Identity Sci., vol. 1, no. 3, pp. 165–180, Jul. 2019.
no. 2, pp. 1–8, 2014. [195] R. Pereira et al., “GreenHub: A large-scale collaborative dataset to
[172] E. Malmi and I. Weber, “You are what apps you use: Demographic battery consumption analysis of Android devices,” Empirical Softw.
prediction based on user’s apps,” in Proc. 10th Int. AAAI Conf. Web Eng., vol. 26, no. 3, pp. 1–55, 2021.
Soc. Media, 2016, pp. 635–638. [196] S. Song, F. Wedyan, and Y. Jararweh, “Empirical evaluation of energy
[173] S. Zhao, G. Pan, J. Tao, Z. Luo, S. Li, and Z. Wu, “Understanding consumption for mobile applications,” in Proc. IEEE 12th Int. Conf.
smartphone users from installed app lists using Boolean matrix factor- Inf. Commun. Syst. (ICICS), 2021, pp. 352–357.
ization,” IEEE Trans. Cybern., vol. 52, no. 1, pp. 384–397, Jan. 2022. [197] X. Chen, N. Ding, A. Jindal, Y. C. Hu, M. Gupta, and R. Vannithamby,
[174] S. Zhao, F. Xu, Z. Luo, S. Li, and G. Pan, “Demographic attributes “Smartphone energy drain in the wild: Analysis and implications,”
prediction through app usage behaviors on smartphones,” in Proc. ACM ACM SIGMETRICS Perform. Eval. Rev., vol. 43, no. 1, pp. 151–164,
Int. Joint Conf. Int. Symp. Pervasive Ubiquitous Comput. Wearable 2015.
Comput., 2018, pp. 870–877. [198] X. Chen, A. Jindal, N. Ding, Y. C. Hu, M. Gupta, and R. Vannithamby,
[175] S. Bian et al., “A novel macro-micro fusion network for user rep- “Smartphone background activities in the wild: Origin, energy drain,
resentation learning on mobile apps,” in Proc. Web Conf., 2021, and optimization,” in Proc. ACM 21st Annu. Int. Conf. Mobile Comput.
pp. 3199–3209. Netw., 2015, pp. 40–52.
[176] S. Zhao et al., “Gender profiling from a single snapshot of apps [199] X. Li, Y. Yang, Y. Liu, J. P. Gallagher, and K. Wu, “Detecting and
installed on a smartphone: An empirical study,” IEEE Trans. Ind. diagnosing energy issues for mobile applications,” in Proc. 29th ACM
Informat., vol. 16, no. 2, pp. 1330–1342, Feb. 2020. SIGSOFT Int. Symp. Softw. Test. Anal., 2020, pp. 115–127.
[177] Z. Yu, E. Xu, H. Du, B. Guo, and L. Yao, “Inferring user profile [200] H. Z. Q. Rex, J. C. Chuen, and A. Herkersdorf, “Apps-usage driven
attributes from multidimensional mobile phone sensory data,” IEEE energy management for multicore mobile computing systems,” in Proc.
Internet Things J., vol. 6, no. 3, pp. 5152–5162, Jun. 2019. IEEE Int. Symp. Integr. Circuits (ISIC), 2014, pp. 472–475.
[178] R. Xu, R. M. Frey, E. Fleisch, and A. Ilic, “Understanding the impact [201] D. Li, S. Hao, J. Gui, and W. G. Halfond, “An empirical study of the
of personality traits on mobile app adoption–insights from a large-scale energy consumption of Android applications,” in Proc. IEEE Int. Conf.
field study,” Comput. Human Behav., vol. 62, pp. 244–256, Sep. 2016. Softw. Maintenance Evol., 2014, pp. 121–130.
[179] E. Peltonen, P. Sharmila, K. O. Asare, A. Visuri, E. Lagerspetz, and [202] Z. Jiang, R. Kuang, J. Gong, H. Yin, Y. Lyu, and X. Zhang, “What
D. Ferreira, “When phones get personal: Predicting big five personal- makes a great mobile app? A quantitative study using a new mobile
ity traits from application usage,” Pervasive Mobile Comput., vol. 69, crawler,” in Proc. IEEE Symp. Service Orient. Syst. Eng. (SOSE), 2018,
Nov. 2020, Art. no. 101269. pp. 222–227.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 965
[203] Y. Li, J. Yang, and N. Ansari, “Cellular smartphone traffic and user [226] B. Huang, K. Zhang, P. Xie, M. Gong, E. P. Xing, and C. Glymour,
behavior analysis,” in Proc. IEEE Int. Conf. Commun. (ICC), 2014, “Specific and shared causal relation modeling and mechanism-
pp. 1326–1331. based clustering,” in Proc. Adv. Neural Inf. Process. Syst., 2019,
[204] H. Jin et al., “Why are they collecting my data? Inferring the pur- pp. 13510–13521.
poses of network traffic in mobile apps,” Proc. ACM Interact. Mobile [227] G. Jaume et al., “Quantifying explainers of graph neural networks
Wearable Ubiquitous Technol., vol. 2, no. 4, pp. 1–27, 2018. in computational pathology,” in Proc. IEEE/CVF Conf. Comput. Vis.
[205] E. A. Walelgne, A. S. Asrese, J. Manner, V. Bajpai, and J. Ott, Pattern Recognit., 2021, pp. 8106–8116.
“Clustering and predicting the data usage patterns of geographi- [228] M. Zhang, T. Li, Y. Li, and P. Hui, “Multi-view joint graph represen-
cally diverse mobile users,” Comput. Netw., vol. 187, Mar. 2021, tation learning for urban region embedding,” in Proc. 29th Int. Conf.
Art. no. 107737. Int. Joint Conf. Artif. Intell., 2021, pp. 4431–4437.
[206] A. Okic, A. E. Redondi, I. Galimberti, F. Foglia, and L. Venturini,
“Analyzing different mobile applications in time and space: A city-
wide scenario,” in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC),
2019, pp. 1–6.
[207] Y. Ouyang and T. Yan, “Profiling wireless resource usage for mobile
apps via crowdsourcing-based network analytics,” IEEE Internet Things
J., vol. 2, no. 5, pp. 391–398, Oct. 2015. Tong Li (Member, IEEE) received the B.S. and
[208] G. Aceto, G. Bovenzi, D. Ciuonzo, A. Montieri, V. Persico, and M.S. degrees in communication engineering from
A. Pescapé, “Characterization and prediction of mobile-app traffic Hunan University, China, in 2014 and 2017. He
using Markov modeling,” IEEE Trans. Netw. Service Manag., vol. 18, is currently pursuing the Ph.D. degree with the
no. 1, pp. 907–925, Mar. 2021. Hong Kong University of Science and Technology
and University of Helsinki. His research interests
[209] J. Yin et al., “Mobile app usage patterns aware smart data pricing,”
include data mining and machine learning, espe-
IEEE J. Sel. Areas Commun., vol. 38, no. 4, pp. 645–654, Apr. 2020.
cially with applications to mobile big data and urban
[210] D. Li, S. Hao, W. G. Halfond, and R. Govindan, “Calculating source computing.
line level energy information for Android applications,” in Proc. ACM
Int. Symp. Softw. Test Anal., 2013, pp. 78–89.
[211] S. Rezaei, B. Kroencke, and X. Liu, “Large-scale mobile app iden-
tification using deep learning,” IEEE Access, vol. 8, pp. 348–362,
2019.
[212] Cisco. “Cisco Annual Internet Report (2018–2023) White Paper.”
2020. [Online]. Available: https://www.cisco.com/c/en/us/solutions/
collateral/service-provider/visual-networking-index-vni/white-paper- Tong Xia received the B.S. degree in electrical engi-
c11-741490.html neering from the School of Electrical Information,
[213] H. Falaki, D. Lymberopoulos, R. Mahajan, S. Kandula, and D. Estrin, Wuhan University, Wuhan, China, in 2017, and the
“A first look at traffic on smartphones,” in Proc. 10th ACM SIGCOMM M.S. degree in electronic engineering from Tsinghua
Conf. Internet Meas., 2010, pp. 281–287. University, Beijing, China, in 2020. She is currently
[214] D. Peloquin, M. DiMaio, B. Bierer, and M. Barnes, “Disruptive and pursuing the Ph.D. degree in machine learning with
avoidable: GDPR challenges to secondary research uses of data,” Eur. the Department of Computer Science, University of
J. Human Genet., vol. 28, pp. 697–705, Mar. 2020. Cambridge, Cambridge, U.K. Her research interests
[215] M. de Reuver, H. Bouwman, N. Heerschap, and H. Verkasalo, include user behavior modeling, ubiquitous comput-
“Smartphone measurement: Do people use mobile applications as they ing, and artificial intelligence for healthcare.
say they do?” in Proc. ICMB, 2012, p. 2.
[216] N. Kiukkonen, J. Blom, O. Dousse, D. Gatica-Perez, and J. Laurila,
“Towards rich mobile phone datasets: Lausanne data collection cam-
paign,” in Proc. ICPS, vol. 68, 2010, pp. 1–7.
[217] A. Rahmati, C. Tossell, C. Shepard, P. Kortum, and L. Zhong,
“Exploring iPhone usage: The influence of socioeconomic differences
on smartphone adoption, usage and usability,” in Proc. 14th Int. Conf.
Human–Comput. Interact. Mobile Devices Services (MobileHCI), 2012, Huandong Wang (Member, IEEE) received the first
pp. 11–20. B.S. degree in electronic engineering, the second
B.S. degree in mathematical sciences, and the Ph.D.
[218] T. Yan, D. Chu, D. Ganesan, A. Kansal, and J. Liu, “Fast app launching
degree in electronic engineering from Tsinghua
for mobile devices using predictive user context,” in Proc. 10th Int.
University, Beijing, China, in 2014, 2015, and 2019,
Conf. Mobile Syst. Appl. Services, 2012, pp. 113–126.
respectively, where he is currently a Postdoctoral
[219] Y. Dong, N. V. Chawla, and A. Swami, “MetaPath2Vec: Scalable rep- Research Fellow with the Department of Electronic
resentation learning for heterogeneous networks,” in Proc. 23rd ACM Engineering. His research interests include urban
SIGKDD Int. Conf. Knowl. Disc. Data Min., 2017, pp. 135–144. computing, mobile big data mining, and machine
[220] X. Wang et al., “Heterogeneous graph attention network,” in Proc. learning.
World Wide Web Conf., 2019, pp. 2022–2032.
[221] V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha,
and G. Srivastava, “A survey on security and privacy of feder-
ated learning,” Future Gener. Comput. Syst., vol. 115, pp. 619–640,
Feb. 2021.
[222] W. Y. B. Lim et al., “Federated learning in mobile edge networks: A
comprehensive survey,” IEEE Commun. Surveys Tuts., vol. 22, no. 3,
pp. 2031–2063, 3rd Quart., 2020. Zhen Tu received the first B.S. degree in elec-
[223] D. Xu et al., “Edge intelligence: Empowering intelligence to the edge tronics and information engineering and the second
of network,” Proc. IEEE, vol. 109, no. 11, pp. 1778–1837, Nov. 2021. B.S. degree in economics from Wuhan University,
[224] D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, J. Li, and Wuhan, China, in 2016. She is currently pursuing
H. Vincent Poor, “Federated learning for Internet of Things: A com- the master’s degree with the Electronic Engineering
prehensive survey,” IEEE Commun. Surveys Tuts., vol. 23, no. 3, Department, Tsinghua University, Beijing, China.
pp. 1622–1658, 3rd Quart., 2021. Her research interests include mobile big data min-
[225] L. Yao, S. Li, Y. Li, M. Huai, J. Gao, and A. Zhang, “Representation ing, user behavior modeling, data privacy, and
learning for treatment effect estimation from observational data,” in security.
Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 2633–2643.
966 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022
Sasu Tarkoma (Senior Member, IEEE) received the Pan Hui (Fellow, IEEE) received the bachelor’s and
M.Sc. and Ph.D. degrees in computer science from M.Phil. degrees from the University of Hong Kong
the Department of Computer Science, University of and the Ph.D. degree from the Computer Laboratory,
Helsinki, where he is currently the Dean of the University of Cambridge. He is a Professor of
Faculty of Science and a Professor of Computer Computational Media and Arts with the Hong Kong
Science. He is also affiliated with the Helsinki University of Science and Technology and the Nokia
Institute for Information Technology HIIT. He has Chair Professor of Data Science with the University
authored four textbooks and has published over 240 of Helsinki. He was a Distinguished Scientist
scientific articles. He has ten granted U.S. patents. for Telekom Innovation Laboratories Germany and
His research has received a number of best paper an Adjunct Professor of Social Computing and
awards and mentions, such as at IEEE PerCom, Networking with Aalto University. His industrial
ACM CCR, and ACM OSR. profile also includes his research with Intel Research Cambridge and Thomson
Research Paris. He has published more than 400 research papers and with
over 22,500 citations. He has 30 granted and filed European and U.S. patents
in the areas of augmented reality, data science, and mobile computing.
He has been serving on the organizing and technical program commit-
Zhu Han (Fellow, IEEE) received the B.S. degree tee of numerous top international conferences, including ACM SIGCOMM,
in electronic engineering from Tsinghua University MobiCom, MobiSys, IEEE Infocom, ICNP, IJCAI, AAAI, ICWSM, and
in 1997, and the M.S. and Ph.D. degrees in electri- WWW. He served as an Associate Editor for IEEE T RANSACTIONS ON
cal and computer engineering from the University M OBILE C OMPUTING and IEEE T RANSACTIONS ON C LOUD C OMPUTING.
of Maryland, College Park, in 1999 and 2003, He is an International Fellow of the Royal Academy of Engineering, an ACM
respectively. Distinguished Scientist, and a member of the Academia Europaea.
From 2000 to 2002, he was an R&D Engineer
with JDSU, Germantown, MD, USA. From 2003
to 2006, he was a Research Associate with the
University of Maryland. From 2006 to 2008, he was
an Assistant Professor with Boise State University,
Idaho. He is currently a John and Rebecca Moores Professor with the
Electrical and Computer Engineering Department as well as with the
Computer Science Department, University of Houston, Houston, TX, USA.
His research interests include wireless resource allocation and management,
wireless communications and networking, game theory, big data analysis,
security, and smart grid. He received the NSF Career Award in 2010, the
Fred W. Ellersick Prize of the IEEE Communication Society in 2011, the
EURASIP Best Paper Award for the Journal on Advances in Signal Processing
in 2015, IEEE Leonard G. Abraham Prize in the Field of Communications
Systems (Best Paper Award in IEEE JSAC) in 2016, and several best paper
awards in IEEE conferences. He was an IEEE Communications Society
Distinguished Lecturer from 2015 to 2018, an AAAS Fellow since 2019,
and an ACM distinguished Member since 2019. He is a 1% highly cited
researcher since 2017 according to Web of Science. He is also the Winner
of the 2021 IEEE Kiyo Tomiyasu Award, for outstanding early to mid-career
contributions to technologies holding the promise of innovative applications,
with the following citation: “for contributions to game theory and distributed
management of autonomous communication networks.”