0% found this document useful (0 votes)

49 views

Smartphone_App_Usage_Analysis_Datasets_Methods_and_Applications_2_-1

Uploaded by

sahilkumarjamwal464

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views

Smartphone_App_Usage_Analysis_Datasets_Methods_and_Applications_2_-1

Uploaded by

sahilkumarjamwal464

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

https://helda.helsinki.

Smartphone App Usage Analysis : Datasets, Methods, and Applications

Li, Tong
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
2022

Li , T , Xia , T , Wang , H , Tu , Z , Tarkoma , S , Han , Z & Hui , P 2022 , ' Smartphone App
Usage Analysis : Datasets, Methods, and Applications ' , IEEE Communications Surveys
and Tutorials , vol. 24 , no. 2 , pp. 937-966 . https://doi.org/10.1109/COMST.2022.3163176

http://hdl.handle.net/10138/354371
10.1109/COMST.2022.3163176

cc_by
publishedVersion

Downloaded from Helda, University of Helsinki institutional repository.

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail.
Please cite the original version.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022 937

Smartphone App Usage Analysis: Datasets,

Methods, and Applications
Tong Li , Member, IEEE, Tong Xia, Huandong Wang , Member, IEEE, Zhen Tu ,
Sasu Tarkoma , Senior Member, IEEE, Zhu Han , Fellow, IEEE, and Pan Hui , Fellow, IEEE

Abstract—As smartphones have become indispensable personal Index Terms—Smartphone device, mobile app, app usage,
devices, the number of smartphone users has increased dramat- behavior analysis, data mining.
ically over the last decade. These personal devices, which are
supported by a variety of smartphone apps, allow people to
access Internet services in a convenient and ubiquitous manner.
App developers and service providers can collect fine-grained
app usage traces, revealing connections between users, apps, and I. I NTRODUCTION
smartphones. We present a comprehensive review of the most EOPLE can now use their smartphone apps to access a
recent research on smartphone app usage analysis in this survey.
Our survey summarizes advanced technologies and key patterns
P variety of Internet services, including instant messaging
(e.g., WhatsApp, WeChat), online socializing (e.g., Twitter,
in smartphone app usage behaviors, all of which have significant
implications for all relevant stakeholders, including academia and Weibo), electronic commerce (e.g., Amazon, Taobao), and
industry. We begin by describing four data collection methods: online payment (e.g., PayPal, Alipay). These services have
surveys, monitoring apps, network operators, and app stores, as become an important part of the infrastructure of the mod-
well as nine publicly available app usage datasets. We then sys- ern information society, making smartphone apps a necessity
tematically summarize the related studies of app usage analysis
in three domains: app domain, user domain, and smartphone in daily life [1]–[3]. According to a report from Statista [4],
domain. We make a detailed taxonomy of the problem studied, the number of apps available in Google Play, the official app
the datasets used, the methods used, and the significant results store of Android, has increased exponentially from 16,000 in
obtained in each domain. Finally, we discuss future directions in December 2009 to 2,893,806 in July 2021. The app market
this exciting field by highlighting research challenges. is expected to generate 935.2 billion US dollars in business
value by 2023 [5]. Such a vast and vital app market has
Manuscript received March 10, 2021; revised September 15, 2021 attracted developers and service providers to investigate app
and January 27, 2022; accepted March 18, 2022. Date of publication
March 31, 2022; date of current version May 24, 2022. This work was usage behavior to better develop and deliver mobile apps.
supported in part by the National Key Research and Development Program Understanding app usage behaviors has significant impli-
of China under Grant 2020YFA0711403; in part by the National Natural cations for all relevant stakeholders, including smartphone
Science Foundation of China under Grant 61972223, Grant 61971267,
Grant 62171260, Grant U1936217, Grant U20B2060, and Grant U21B2036; manufacturers, network operators, market intermediaries, app
in part by the International Postdoctoral Exchange Fellowship Program developers, and end consumers [6]. To improve device
(Talent-Introduction Program) under Grant YJ20210274; in part by the performance and extend usage time, smartphone manufac-
Academy of Finland under Project 319669, Project 319670, Project 325570,
Project 326305, Project 325774, and Project 335934; and in part by NSF turers can optimize the scheduling of various smartphone
under Grant CNS-2107216 and Grant CNS-2128368. (Corresponding author: resources, such as CPU, memory, and battery power, based on
Pan Hui.) the usage patterns of specific apps [7], [8]. Based on app traffic
Tong Li is with the Department of Computer Science, University
of Helsinki, 00100 Helsinki, Finland, also with the System and Media patterns, network operators can dynamically optimize traffic
Laboratory, Hong Kong University of Science and Technology, Hong offloading schemes and improve network services [9], [10].
Kong, and also with the Department of Electronic Engineering, Beijing Furthermore, network operators and market intermediaries can
National Research Center for Information Science and Technology, Tsinghua
University, Beijing 100190, China (e-mail: [email protected]). provide personalized services, such as accurate recommenda-
Tong Xia, Huandong Wang, and Zhen Tu are with the Department of tions and targeted advertisements, by profiling mobile users’
Electronic Engineering, Beijing National Research Center for Information preferences and interests from their app usage behaviors. By
Science and Technology, Tsinghua University, Beijing 100084, China (e-mail:
[email protected]). doing so, operators and intermediaries can improve the quality
Sasu Tarkoma is with the Department of Computer Science, University of of experience (QoE) while increasing profits [11], [12]. App
Helsinki, 00100 Helsinki, Finland (e-mail: [email protected]). developers can better understand customer satisfaction and
Zhu Han is with the Department of Electrical and Computer Engineering,
University of Houston, Houston, TX 77004 USA, and also with the market trends by analyzing app usage and profiling app pop-
Department of Computer Science and Engineering, Kyung Hee University, ularity, which may provide excellent guidance for upgrading
Seoul 446-701, South Korea (e-mail: [email protected]). existing apps and designing new apps [13].
Pan Hui is with the Computational Media and Arts Trust, Hong Kong
University of Science and Technology (Guangzhou), Guangzhou 511458, In recent years, extensive research efforts have been invested
China, also with the Division of Emerging Interdisciplinary Area, Hong in understanding user behaviors using data-driven methods
Kong University of Science and Technology, Hong Kong, and also with the based on mobile app usage data. As shown in Fig. 1, we
Department of Computer Science, University of Helsinki, 00100 Helsinki,
Finland (e-mail: [email protected]). counted the number of papers published in the field of app
Digital Object Identifier 10.1109/COMST.2022.3163176 usage data analysis. We can see that this research area began
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
938 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

Fig. 3. Structure of the survey article.

Fig. 1. Volume of publications in the field of app usage data analysis. The
statistical values are up to Aug. 2021. The predicted value for 2021 is based
on previous years’ values.
characteristics of mobile apps and app markets, such as app
categorization and popularity modeling. The goal of app usage
pattern discovery is to find regularities in mobile app usage
behaviors. Predicting which apps will be released and inferring
users’ preferences for undiscovered new apps are the goals of
app usage prediction and recommendation.
The goal of user domain studies is to discover the link
between user characteristics and app usage behaviors. User
profiling and user identification are two research topics in this
domain. User profiling focuses on group-level user attributes
like gender, occupation, income, and personality. User iden-
tification, focusing on the individual level, aims to identify a
user based on his or her app usage behaviors.
The smartphone domain research focuses on smartphone
characteristics, with two main areas of investigation: app
Fig. 2. Structure of existing studies for app usage analysis. energy drain and app traffic patterns. These two areas aim
to improve smartphone performance by analyzing app energy
and traffic consumption patterns.
in 2010, then grew slowly until 2012. Most notably, after In this survey, we aim to answer the following questions.
2012, there was a rapid increase in the number of publi- How should one collect mobile app data? What are the ben-
cations. Overall, the volume of publications in this field is efits and limitations of the various data collection methods?
expanding, indicating a burgeoning research field. Thus, we What are the critical features of app usage datasets that have
aim to conduct a comprehensive literature survey in response been used in existing studies? What are the main research
to significant recent progress in mobile app usage analysis. topics in mobile app usage analysis? What are the most com-
The app usage data are collected through surveys [14], mon- mon data-driven methods used, the significant findings, and
itoring apps [8], network operators [15], and app stores [16]. the main results in existing studies? What are the challenges
These data form a cross-domain and multi-view data ecosys- and promising future directions in mobile app usage analysis?
tem that includes various app usage behaviors, e.g., down- To answer these questions and draw conclusions, we sys-
loading, installing, launching, uninstalling apps, contextual tematically review the recent research. Fig. 3 shows our survey
information about app usage, e.g., time, location, traffic, paper’s structure. In Section II, we describe and compare data
energy consumption, and information about apps, e.g., app collection methods. We also present a set of public datasets
description, app rating, and user reviews. As a result, the and discuss privacy and ethical issues. Section III summa-
app usage data reflects the characteristics of apps, users, and rizes existing studies in the app domain. Section IV reviews
smartphones. research efforts in the user domain. Section V reviews exist-
Existing studies for app usage data analysis fall into three ing studies in the smartphone domain. We compare app usage
domains: app domain, user domain, and smartphone domain, datasets, methods, and findings across all research topics. In
as shown in Fig. 2. From the app perspective, app domain Section VI, we discuss the challenges and future research
research aims to reveal app features from app usage data. The in mobile app usage analysis. Section VII concludes the
research topics in this area include app ecosystem profiling, survey.
app usage pattern discovery, and app usage prediction and rec- The main contributions of this paper can be summarized as
ommendation. App ecosystem profiling looks into the inherent follows.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 939

TABLE I
S UMMARY OF E XISTING S URVEY A RTICLES ON A PP U SAGE A NALYSIS

• A comprehensive survey of data collection methods used of existing datasets, commonly used analytical methods, and
to collect app usage data is presented, including surveys, key findings and results.
monitoring apps, network operators, and app stores. We
discuss data characteristics and compare the benefits and II. DATA C OLLECTION AND OVERVIEW
limitations of various data collection methods. We also The foundation and core parts of smartphone app usage
gather and introduce prominent app usage datasets to the analysis are the real-world app usage data. We will introduce
research community for further study. and compare different data collection methods in this section,
• A comprehensive review of current research in the field and we will present a set of public datasets. We will also
of smartphone app usage data analysis is presented, discuss privacy and ethical concerns.
including primary research topics in the app, user,
and smartphone domains. As for each research topic, A. Data Collection Methods
we make a thorough comparison in terms of the
1) Surveys: Surveys are a simple and important method
problem they studied, the datasets they used, the
to collect information about app usage. Researchers generally
methods they applied, and the significant results they
create a questionnaire based on their goals and then collect
achieved.
data from the participants’ responses. In practice, researchers
• Advanced technologies and key findings gleaned from
should pay close attention to questionnaire design, particularly
app usage behaviors are thoroughly summarized, with
question wording, to reduce respondent bias. The American
significant implications for all relevant stakeholders in
Association for Public Opinion Research offers some tips on
the research community and industries.
how to conduct a high-quality survey.1
• The challenges and future research opportunities concern-
There are several methods for conducting a survey, includ-
ing mobile app usage analysis are identified. We give a
ing telephone, mail, and the Internet. In-app surveys, which
thorough discussion of the challenges in data and meth-
post questionnaires and collect user feedback directly inside
ods. We also go over future research topics, such as app
mobile apps, have been popular in recent years and are often
evolution, context-aware app usage modeling, deep rea-
used to obtain app usage data. WhatsApp and Skype, for exam-
soning of app usage behaviors, linking physical activities
ple, ask users to rate the quality of their audio calls at the end
and app usage, and location-based services and urban
of the call. YouTube uses questionnaires to gather information
computing.
about its users’ preferences. In-app surveys require less main-
There are only a few surveys on related topics. Cao and
tenance than other methods such as telephone and face-to-face
Lin [17] surveyed studies that looked at app usage prediction
surveys [18]. In-app surveys can capture a lot of data because
and recommendations. Zhao et al. [11] summarized stud-
they can reach a large number of people lightly. However, in-
ies on user profiling based on their app usage behaviors.
app surveys, respondents only represent one group, namely
Guo et al. [13] conducted a review of the literature using app
app users, which may limit the generalizability and represen-
usage data gathered through crowdsourced methods. Existing
tativeness of the data obtained. In other words, in-app surveys
surveys are either limited to a single data source [13] or a
offer a mode impact, whereas different survey modes result in
single small topic [11], [17]. Table I summarizes the contribu-
different data being collected.
tions and limitations of existing survey articles on app usage
Surveys can collect only the coarse-grained app usage
analysis. In comparison, our survey examines the state-of-the-
behaviors of users, such as which app store they used and
art studies in all app, user, and smartphone domains, as well as
the connections between them. We provide a systematic review 1 https://www.aapor.org/Standards-Ethics/Best-Practices.aspx
940 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

the number of apps they downloaded per month [19], [20].

However, fine-grained app usage traces, such as when and
what apps are launched, are difficult to collect with surveys.
Participants may be hesitant to share sensitive information and
may also give socially acceptable responses to the questions
posed. As a result, there are biases in the app usage data gath-
ered through surveys. Researchers should be aware of this and
take proactive steps to mitigate biases, such as data sampling.
2) Monitoring Apps: One common data collection method
is to use monitoring apps installed on participants’ smart-
phones to record fine-grained app usage behaviors automat-
ically. Researchers can use this data collection method on a
small scale by recruiting volunteers [21] and on a large scale
by publishing monitoring apps in app stores [8]. Recruiting Fig. 4. Data-collecting points in networks. MME: Mobility Management
Entity; SGW: Serving Gateway; PGW: Packet Data Network Gateway; SGi:
volunteers can focus on a particular group of users, e.g., Serving Gateway interface; AMF: Access & Mobility Management Function;
students [22] and older adults [23]. In addition, recruiting SMF: Session Management Function; UPF: User Plane Function.
volunteers, such as selecting involved users based on their
backgrounds and properties, can pre-control the quality of
data. Publishing in app stores can improve data quality by data, network operators have to use sample strategies, col-
filtering out noisy data and reducing bias by taking advan- lecting network traffic records at regular intervals like every
tage of a large number of active users [24]. Notably, as a hour [33] or every several minutes [34]. The datasets col-
result of globalization, it is now easier to collect app usage lected by network operators typically include location data for
data from multiple countries by publishing monitoring apps app usage records. The location is approximated by the GPS
in international app stores such as Google Play and Apple location function of the associated base station.
Store. The results of such an analysis will be more general An important step in this data collection method is to
and representative. infer app usage activity from network traffic flows. One
Monitoring apps generally use an event-triggered collec- popular method is to use the HTTP header of network pack-
tion scheme, which means they collect app usage data when ets [35], [36]. Yao et al. [35], for example, created SAMPLE,
an event occurs. The event can be user actions [25] (e.g., a systematic tool that inspects the HTTP head and uses
turning on the screen, launching apps, or typing), mes- the destination domain and user-agent as the app identifier.
sages received [26] (e.g., notifications, e-mails), network SAMPLE can automatically generate the conjunctive rules
requests [27], and hardware status [28] (e.g., CPU usage, that identify over 90% of apps with an average accuracy of
battery levels). Researchers can collect a variety of user behav- 99%. However, due to security concerns, most apps now use
iors and control the granularity of data collection by properly the HTTPS protocol to transmit packets. Some research has
selecting trigger events. Smartphones also have a variety of focused on identifying app usage traces from encrypted data
sensors [29], such as an accelerometer, a gyroscope, and a traffic [36]–[38]. Taylor et al. [37], for example, developed
GPS. Monitoring apps can gather enough sensor data, such AppScanner, a system that used packets’ side information,
as CPU usage, movement status, GPS location, and battery such as packet size, direction, and delay, to identify the
status. This sensor data can be used to analyze app usage app label. AppScanner was able to correctly identify the
by providing sensor contextual information. It is worth noting top 110 most popular apps with a 96% accuracy rate. Most
that monitoring apps are available to obtain almost all kinds importantly, when collecting data from network operators,
of smartphone usage behaviors as long as the user’s consent researchers should be aware of the mode effect. This data col-
is obtained. lection method cannot be used to track app usage that does
3) Network Operators: Most apps nowadays rely on the not generate network traffic.
Internet to provide their services. As a result, network opera- 4) App Stores: App stores have access to a variety of
tors can gather network data and deduce app usage traces from app usage data. The app store, which is an app manage-
it. Network data can be collected and extracted from multiple ment tool, records users’ app management behaviors, such
network interfaces, such as from the serving gateway (SGW), as downloading, updating, installing, and uninstalling apps.
the mobility management entity (MME), the access & mobil- User preferences and app popularity are implicitly reflected in
ity management function (AMF), or the session management these management activities. Due to the user account mech-
function (SMF), as shown in Fig. 4. Deep packet inspection anism, app stores can track the same user’s behaviors across
and deep flow inspection are typically used to infer app usage multiple devices, which is difficult to do with surveys and
information from traffic flow records collected from the SGi network measurements. Additionally, some app stores, such as
and N6 network interfaces [30]. Wandoujia,2 a free Android store in China, support the moni-
This data collection method is typically used by network toring of smartphone sensors. As a result, app stores can still
operators and in large-scale measurements. The data collected provide sensor contextual information about app usage, such
generally cover most mobile users in an entire city [31] or
a country [32]. Due to the large volume of network traffic 2 https://www.wandoujia.com/
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 941

TABLE II
C OMPARISON OF D IFFERENT DATA C OLLECTION M ETHODS

as network traffic, CPU usage, memory usage, battery status. but willing to provide socially desirable responses, resulting
However, because of the security concerns of iOS, iPhones in biases in the data collected. Although other methods collect
cannot support third-party app stores. Thus, this data collection data automatically and avoid biases, network operators and app
method is most concentrated on Android users. stores are still limited to specific types of behaviors, such as
App stores can provide various types of app metadata, such network access and app management. Monitoring apps, which
as app name, app category, app description, app rating, and are installed on smartphones and based on event-triggered col-
user reviews. App metadata produces valuable information lection, can be used to collect all types of app usage behaviors
about smartphone apps and allows the collection of user feed- by selecting different trigger events. Different data collection
back about them. App descriptions, for example, are frequently methods can provide additional side information about app
used to extract app functions for app categorization. For user usage behaviors, such as user profiles, sensor data, location,
profiling, app categories provide semantic meanings to explain and app metadata.
usage activities. App ratings and user reviews are used to Every coin has two sides. Different collection methods have
model app popularity and recommend apps based on their pop- their own set of benefits and drawbacks. In practice, we must
ularity and usage experience. App metadata can be crawled select data collection methods with care and precision to
from app stores, such as Google Play and Apple Store, or answer the research question. For example, surveys, moni-
provided directly by app stores. toring apps, and app stores are better choices than network
Table II compares the four data collection methods men- operators for investigating country and culture differences in
tioned above. In terms of data collection scale, all four mobile app usage behaviors because it is difficult to conduct
approaches are capable of large-scale measurements involv- worldwide data collection from network operators. App stores
ing tens of thousands of users, thanks to advancements in and monitoring apps are also better if we want to track a
communication and network technologies. However, surveys user’s usage behaviors over time because they can link the
and monitoring apps are limited by response rates or app same user based on user accounts even if the user changes
popularity, which makes it difficult to collect millions of smartphones. Different collection methods can be combined at
users. By recruiting volunteers, both surveys and monitor- times. For example, operator data can be used to discover app
ing apps can collect small-scale datasets. It is also possible usage patterns across millions of users and then conduct con-
to conduct control studies by carefully selecting participants, trol studies on small-scale datasets collected from monitoring
which is impracticable for network operators and app stores. apps or surveys to verify the discovered patterns.
Monitoring apps outperform the competition in terms of col-
lectible app usage behavior. It is difficult for surveys to collect
fine-grained usage traces because they are limited by ques- B. Public Datasets
tionnaires. Because users are directly responsible for their In this section, we present nine datasets that are pub-
responses, they may be hesitant to share sensitive information licly available for further research to the research community.
942 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

We introduce the collection method, collection period and users in over 100 countries so far.3 The dataset owners made a
collection items of these datasets in detail. long-term app usage dataset available to the research commu-
1) Worldwide Survey Dataset: This dataset was conducted nity. The top 1,000 users ranked by total time spent using
through a survey methodology in 2012, collecting mobile Carat from 2014 to 2018 are included in the public long-
app usage information of 10,208 people from more than terms app usage dataset. The user with the longest duration in
15 countries [19]. The countries include the United States, the public dataset has 18,146,042 time-series records spanning
Canada, Mexico, the United Kingdom, Australia, France, 4.65 years, and even the user with the shortest duration has
Germany, Italy, Spain, Brazil, Russia, India, Japan, China, more than two years of records. The public dataset is available
and South Korea. The collected items for mobile app usage at ‘https://www.cs.helsinki.fi/group/carat/data-sharing/’.
behaviors include the most frequently used app stores and 5) TalkingData: The dataset is collected using the
app categories, the reasons that lead users to find apps, TalkingData Software Development Kit (SDK), which is inte-
and the reasons that lead users to download and aban- grated into monitoring apps. This dataset contains the complete
don apps. It is worth noting that this dataset only contains app usage traces for over 70 thousand users from May
usage data for app categories instead of individual apps. 01, 2016, to May 07, 2016. Each app usage log includes
The dataset includes participants’ demographics, including an anonymous user ID, timestamp, app ID, and location
age, gender, nationality, marital status, country of residence, information (longitude and latitude). This dataset contains
first language, ethnicity, education level, occupation, and demographic information about the users involved, such as
income. This dataset also collects participants’ personality age and gender. Researchers should be aware of the mode
traits by using the Big-Five personality measurement. This effect and biases in the analysis when using this public
dataset and corresponding questionnaire are public, available dataset because the users involved are mostly from main-
at ‘http://www0.cs.ucl.ac.uk/staff/S.Lim/app_user_survey/’. land China. This dataset was released in a Kaggle chal-
2) Mobile Data Challenge (MDC) Dataset: The Lausanne lenge, TalkingData Mobile User Demographics, and can be
Data Collection Campaign (LDCC) developed a specific moni- accessed via ‘https://www.kaggle.com/c/talkingdata-mobile-
toring app based on Nokia platforms to implement smartphone user-demographics/’.
data collection. From October 2009 to March 2011, they 6) PhoneStudy Dataset: The PhoneStudy smartphone
collected a longitudinal smartphone dataset from nearly 200 research app for Android was used to collect this data [41].
volunteers. The volunteers are primarily dispersed through- Between September 2014 and January 2018, 743 volunteers
out the Lake Geneva region of Switzerland. Large amounts were recruited through forums, social media, blackboards, fly-
of smartphone data, such as app usage behaviors (launch, ers, and direct recruitment. The activities of users on their
foreground, close, and view), location (GPS, WLAN), motion smartphones were logged in the form of time-stamped logs
(accelerometer), and proximity (Bluetooth), are all recorded in of events. Those events included calls, app starts/installations,
the dataset. The dataset was released to the research commu- screen de/activations, contact entries, texting, global posi-
nity in the Mobile Data Challenge (MDC) [39] held by Nokia, tioning system (GPS) locations, etc. The dataset is publicly
which is available at ‘https://www.idiap.ch/dataset/mdc’. available at ‘https://osf.io/kqjhr/’.
3) LiveLab Dataset: This dataset was collected through 7) Context-Aware App Usage Dataset: This dataset was
a monitoring app called LiveLab [40]. LiveLab is an iOS collected by a primary Internet Service Provider (ISP) in
app used to gather smartphone usage data from 24 iPhone China [42]. The data was collected over the course of one
users over the course of a year, from February 2010 to week in April 2016, and it covered the entire metropoli-
February 2011. The dataset specifically tracks two types of tan area of Shanghai, one of the world’s largest cities. The
app usage behaviors: app launches and changes to the fore- SAMPLES [35] tool was used to identify app usage records
ground app. The dataset also includes contextual information from network metadata, and the traffic generated by the
about user behaviors gathered from iPhone sensors such as 2,000 most popular apps was successfully recognized. Each
the accelerometer, GPS, battery, Bluetooth, and WiFi status. It app usage record contains an anonymized user ID, times-
is important to note that this data was collected from skewed tamp, base station ID, app ID, and traffic volume. The top
samples; all 24 volunteers are Rice University students. This 1,000 active users in the app usage dataset are released to
dataset was released to the public and can be accessed via the research community. The public dataset is available at
‘http://yecl.org/livelab/traces.html’. ‘http://fi.ee.tsinghua.edu.cn/appusage/”.
4) Carat Dataset: Carat [8], a monitoring app, was used 8) Wandoujia Dataset: Wandoujia is a leading app store in
to collect this dataset. Carat applies an event-triggered collec- China [43]. The dataset was collected from over 17 million
tion scheme, gathering a data sample every time the battery users and recorded their app management activities, such as
level changes by 1%. Each data sample contains a list of downloading, updating, and uninstalling apps, from May 1 to
apps used, a user-specific identifier, and a timestamp. Carat September 30, 2014. Wandoujia also gathered daily network
also gathers sensor contextual data, such as battery level, bat- activities for each app, such as traffic volume and Wi-Fi and
tery status, mobile country code. Carat is available on both cellular network connection time. A portion of this dataset
Google Play and Apple Store, which the developers hope has been made available to the public. The public dataset is
will increase the number of people who participate in data available at ‘http://www.liuxuanzhe.com/appdata/’.
collection. In this way, they collect data from all over the
world. Carat has collected data from over 500,000 mobile 3 http://carat.cs.helsinki.fi/.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 943

9) iOS Apps Dataset: This dataset [44] was extracted from must also take into account the ethical decisions that developed
the Apple Store and covers the metadata of removed iOS apps systems will make.
for a period of 1.5 years, from January 1, 2019, to April 30,
2020. There are 1,129,615 app records in total, correspond-
ing to 1,033,488 unique mobile apps. The dataset contains III. A PP D OMAIN R ESEARCH
app objective data such as app name and release date, app App features such as functionality and popularity have a
subjective data such as app ratings and reviews, and app pop- significant impact on mobile app usage behavior. Many efforts
ularity data such as app ranking. The information is open to the have been made in recent years to examine mobile app usage
public and can be found on ‘https://github.com/LuckyFQ/iOS- data from the standpoint of apps by analyzing app features.
Removed-Apps-Dataset’. App ecosystem profiling, app usage pattern discovery, and app
In Table III, we summarize the above seven public datasets usage prediction and recommendation are the three main top-
in terms of collection method, collection area, collection date, ics in app domain research, according to our careful review
number of apps and users, collected items, and availability of existing studies. In this section, we will summarize exist-
status. ing app domain studies on the three topics and compare their
datasets, methods, and key findings.
C. Privacy and Ethical Considerations
In 2018, the General Data Protection Regulation (GDPR) A. App Ecosystem Profiling
became enforceable in the European Union. GDPR gives Millions of mobile apps form a symbiotic mobile app
European citizens greater control over their data and raises ecosystem [50]–[52]. App ecosystem profiling aims to inves-
awareness of privacy issues and the value of their personal tigate the inherent characteristics of mobile apps, focusing
data [45]. GDPR, as a strong regulation for collecting and on two main sub-problems: app categorization and popularity
processing personal data, imposes a number of requirements. modeling.
The following are the most relevant and important data col- 1) App Categorization: An app category groups apps that
lection principles. First, personal data can only be collected if perform similar functions. WhatsApp and Skype, for example,
the user has given consent for a specific purpose. Namely, user are both classified as communication apps on Google Play.
consent is only used for a specific purpose. Second, the data App categories are currently determined by app developers
minimization principle limits data collection to the minimum when they release apps in app stores [54], [55]. However,
required to achieve the app’s purpose. Third, individuals have deciding on a category for an app can be difficult at times [56].
the right to withdraw their consent and erase their personal WeChat, for example, can be used for social networking as
data. The GDPR has had a significant impact on how mobile well as messaging. As a result, both the communication and
app usage data is collected. According to a recent study [46], social app categories appear to be appropriate. Moreover, man-
the number of permission items used by apps decreased signifi- ually selecting app categories opens the door to deliberate
cantly after the GDPR was implemented. This finding suggests gaming [57], [58]. App developers may desire to avoid fierce
that monitoring apps are cautious about the data they col- competition by selecting less appropriate categories to improve
lect to follow to the GDPR’s data minimization principle. their app’s ranking. For the reasons stated above, many mobile
Furthermore, the data collection section is also responsible apps are miscategorized in app stores. Hence, it is expected
for consent management. Many app vendors and app stores to have an effective and automated method for categorizing
have mechanisms in place to respond to user requests, such mobile apps.
as erasing data and withdrawing consent [47]. Several studies looked into the possibility of automatically
Smartphone app usage data, as a type of sensitive personal categorizing apps based on their descriptions. They gener-
data, must be handled with caution when it comes to ethics. ally modeled app categorization as a problem of short text
When collecting, processing, and analyzing app usage data, topic modeling, which they solved with Natural Language
researchers must adhere to the principle of Data for Good [48]. Processing (NLP) tools. Al-Subaihin et al. [59] used Term
First, app usage data should be used for the greater good of Frequency-Inverse Document Frequency (TF-IDF), while
humanity and society. We should use data to improve peo- Ochiai et al. [60] used a word embedding model to extract
ple’s lives, such as by improving interpersonal relationships, app features from app descriptions. Fuad and Al-Yahya [61]
providing convenient and ubiquitous services, improving user further used Latent Dirichlet Allocation (LDA) to analyze
experience. However, some dangerous and terrible things have the app descriptions of 13,282 apps in Google Play. They
occurred in recent years. Cambridge Analytica, for example, discovered some new categories, including occasions, pan-
abused the data of millions of Facebook users [49]. To prevent demics, and languages, that were not present in the original
such things from happening again, apart from government reg- app store classification. Surian et al. [62], on the other hand,
ulations, we still need data scientists to hold themselves to concentrated on the app market’s miscategorization issue. To
a high standard. Second, we should make good use of app refine the category of apps, they developed a von Mises-Fisher
usage data. We should adhere to the principle of end-user distribution-based probabilistic topic model. They discovered
transparency by outlining the items collected, the purpose of that 0.35 ∼ 1.10 percent of game apps are miscategorized
data collection, the potential privacy risk. Collection and anal- after running experiments on 48,663 game apps crawled from
ysis should be done in a way that protects people’s privacy. We Google Play.
944 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

TABLE III
P UBLIC A PP U SAGE DATASETS

Untrusted developers may manipulate app descriptions,

affecting categorization accuracy. As a result, some researchers
used customer-generated app usage data to improve app cate-
gorization, assuming that different types of apps have different
usage patterns. For example, Zhu et al. [63] extracted con-
textual features of mobile apps from usage records, such as
time, location, and battery level. They combined all of the
features into a Maximum Entropy model [64] for app cate-
gorization. Radosavljevic et al. [65] took advantage of users’
app installation behavior to categorize apps. He et al. [66] Fig. 5. The downloads of free apps follow a power-law distribution with
explored sequential characteristics of app usage. They used truncated edges, while paid apps follow a clear power-law distribution [53].
time series to formalize users’ app-launching behavior and
extracted app features using dynamic time warping [67] across
time series. After conducting experiments on 3,086 apps, they downstream applications. For example, the recommendation
concluded that app usage traces are a good source for app system for free apps does not need to consider the long tail
categorization. effect. Li et al. [73], [74] studied the long-term evolution of
2) Popularity Modeling: App popularity, as measured by app popularity from 2012 to 2017, spanning six years. They
chart rankings, user ratings, and downloads, reflects the user discovered that the popularity of apps exhibits a typical Pareto
experience of mobile apps. App popularity modeling is criti- effect. Shen et al. [75] discovered that, in addition to the Pareto
cal for developers and app market intermediaries to understand effect, app popularity exhibits a Matthew effect when consid-
the factors that influence app adoptions and then make appro- ering time-varying data. In other words, high-ranked apps tend
priate decisions about which apps to develop, release, and to receive higher ratings and have more consistency in top app
remove [44], [68]. lists.
Modeling app popularity by depicting popularity distribu- Some researchers investigated the factors that influence app
tions is one of the most common approaches used in existing popularity. Fu et al. [76] used LDA to extract key topics
research. Many studies have demonstrated that app popular- from over 13 million user reviews of 171,493 apps on Google
ity has a power-law distribution [69]–[72]. Petsas et al. [53] Play. They identified the top five user concerns as attrac-
further discovered that paid and free apps have different pop- tiveness, stability, accuracy, compatibility, and connectivity.
ularity distributions. Paid app popularity follows a power-law Potharaju et al. [77] investigated the relationship between the
distribution with truncated edges, as shown in Fig. 5, whereas price and popularity of paid apps and discovered that the price
free app popularity follows a clear power-law distribution. of paid apps does not affect app popularity. Lu et al. [78]
Such a discovery provides valuable prior knowledge for noticed that different mobile device models have an impact
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 945

as installation and usage behaviors, to categorize apps more

reliably.
In popularity modeling, app downloads, rankings, and rat-
ings are commonly used to indicate app popularity. Descriptive
statistics were used to depict the distribution of app popular-
ity and analyze factors influencing app popularity. Time-series
methods, like the Markov model, Wavelet transform, Hawkes
Process, and Recurrent Neural Network, are used to predict
app popularity. In addition to positive activities such as down-
loading and installing apps, some studies tried to use negative
Fig. 6. A three-level hierarchy structure to model app adoption [79].
behaviors such as uninstalling apps to model app popularity.

B. App Usage Pattern Discovery

on app adoption. Lower-end users, for example, prefer Opera App usage pattern discovery aims to find regularities in
Mini, whereas higher-end users prefer Chrome. Shen et al. [75] usage behaviors to learn typical usage habits and improve the
discovered that app release strategy has an impact on app pop- quality of user experience. Existing studies focused primarily
ularity. A timely subsequent release has a higher chance of on two sub-fields: contextual pattern discovery and temporal
turning a descending rating trend than a late release. pattern discovery.
Some studies have made efforts towards app popularity 1) Contextual Pattern Discovery: The goal of contextual
prediction. Zhu et al. [80] used chart rankings and user ratings pattern discovery is to investigate the connection between app
to define app popularity states and proposed a popularity- usage and contextual factors [84]. As shown in Table V, there
based hidden Markov model (PHMM) to predict the popularity are three different types of context: sensor context, usage
states of mobile apps. Wang et al. [79] proposed an evo- context, and social context. Sensor context, collected from
lutionary hierarchical competition model (EHCM) to predict smartphone sensors, includes location, time, battery levels,
app downloads by using a three-level hierarchy structure to movement status of users, WiFi or cell connectivity, etc. Usage
model app adoption. As shown in Fig. 6, the top-level in context refers to the prior and posterior used apps. Social con-
the app adoption model is users; the middle-level is cate- text refers to the social environments in which an app is used,
gories of user demands; and the bottom-level is apps. Each such as with friends, family members, strangers, coworkers,
app serves several categories of user demands and competes and so on.
with other apps by providing engaging functions. In terms of Locations, as a sensor context, have a significant impact
model design, EHCM has higher interpretability than PHMM. on app usage. Mehrotra et al. [85] discovered that student
Ouyang et al. [81] further proposed a Multivariate Hawkes users are more attentive to app notifications at college, in
Process-based prediction model (MHP). The MHP jointly con- libraries, on the streets, and in residential areas. However, users
sidered exogenous stimuli, e.g., updates, reviews, ratings, and at religious institutions are less receptive to app notifications.
endogenous excitations, e.g., historical popularity, app ages. Do et al. [86] and Böhmer et al. [87] found that while waiting
MHP outperforms PHMM and EHCM, according to exten- for and during trips, users prefer to use Web and multimedia
sive experiments. Zhang et al. [82] leveraged deep learning to apps. Graells-Garrido et al. [88] further discovered that street
predict app popularity and proposed DeePoP, an RNN-based types have an impact on app usage. Message apps, for exam-
prediction model. DeePoP considered time-varying app inter- ple, generate more traffic on main streets, whereas dating apps
actions and modeled their impact on app popularity. DeePOP are more popular on pedestrian streets. Xia and Li [89] and
outperforms MHP and EHCM, effectively lowering the Root Ren et al. [90] demonstrated how the function of a location
Mean Square Error (RMSE) to 0.088. Li et al. [83], on the influences app usage in that location.
other hand, argued that app ratings do not accurately reflect Time is also an important sensor contextual factor that influ-
app quality or user preferences. As a result, they proposed ences app usage. Böhmer et al. [87] discovered that news
modeling user preferences and predicting app popularity using apps are most popular in the morning, while game apps are
uninstallation-downloading sequential activities. at night. Van Canneyt et al. [91] and Li et al. [92] noted
3) Discussion: Existing research primarily examined the that some specific dates and events, such as New year’s day,
app ecosystem in terms of app categories and popularity. UEFA European Championship, and Covid-19, will disrupt
Table IV summarizes prominent literature by dataset size, users’ regular app usage patterns. Several studies investigated
data type, methods, results, and findings. As can be seen in the relationship between app usage and other sensor con-
Table IV, the study scale ranges from 600 to over 1 million text factors [86], [93], like Bluetooth data, battery levels,
apps. The majority of the datasets used were obtained from and movement status. For example, Do et al. [86] studies
app stores. App descriptions are commonly used for app cate- how Bluetooth density affects app usage and discovered that
gorization. To extract app features from app descriptions, NLP communication apps are more likely to be used with high
tools such as TF-IDF, LDA, and probabilistic topic models are Bluetooth density.
used. However, because untrustworthy developers may operate App usage is still strongly linked to the usage context, i.e.,
app descriptions, there is a trend to use app usage data, such prior and posterior used apps. The fact that several apps need
946 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

TABLE IV
L ITERATURE R EVIEW IN A PP E COSYSTEM P ROFILING

TABLE V
T YPES OF C ONTEXTUAL FACTORS sequential characteristics. For example, using the Camera app
first and then the Album app is twice as likely as using the
Album app first and then the Camera app.
App usage is also influenced by the social context.
Ferreira et al. [100] classified the social context into four
types alone, with friends, with strangers, and others. They
observed a significant correlation between app usage and
social context with a p-value of 0.002. Shema and Acuna [101]
to work together to complete a single task causes such correla- used app usage sequences to predict social context, achiev-
tions [94], [95]. Rahmati et al. [96] first demonstrated the app ing an accuracy of 70% with a random forest model.
usage dependency of one-nearest prior used apps and found Kloumann et al. [102] and Taylor et al. [103] discovered that
that such a dependency remains relatively constant for one to social contexts have varying degrees of influence on mobile
three months. By analyzing a dataset spanning seven years, app usage. The context of friendship has a greater impact when
Fan et al. [97] looked into how usage context changes over compared to family members.
time. Huang et al. [34] and Liu et al. [98] identified frequent 2) Temporal Pattern Discovery: Unlike contextual patterns,
cooccurrence app sets. They discovered that e-commerce and which focus on static analysis, temporal patterns look into
online payment apps, such as Taobao and Alipay, are fre- the dynamics of app usage. The diurnal pattern is a basic
quently used together to complete the task of online shopping. temporal pattern of app usage behavior that has been dis-
Tseng and Hsu [99] found that usage context dependency has covered by numerous studies [104]–[106]. A typical diurnal
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 947

patterns of app sessions and discovered that app sessions dis-

rupted by phone calls resulted in task completion delays of up
to four times.
3) Discussion: Contextual pattern discovery and tempo-
ral pattern discovery are two sub-topics of existing research
on app usage pattern discovery. Table VI summarizes promi-
nent literature in terms of dataset information, methods, and
patterns discovered. We can see that datasets vary greatly
according to size and duration. The majority of datasets are
collected from monitoring apps and network operators because
Fig. 7. A typical diurnal pattern of app usage [104]. pattern discovery requires fine-grained usage behaviors, such
as launching apps and switching apps. The studies on con-
textual pattern discovery focus on static patterns to discover
relationships between app usage and context factors. Statistical
methods such as Pearson correlation, analysis of variance,
hypothesis testing, and posterior probability are commonly
used to investigate correlations. These studies pave the way for
context-aware app prediction and recommendation. Temporal
Fig. 8. Diagram of app usage sessions [113]. For each app usage, only one pattern discovery, on the other hand, focuses on dynamic pat-
app is in the foreground and the usage will last for a period.
terns to investigate how app usage changes over time. In this
subtopic, descriptive statistics and time series analysis are fre-
pattern of app usage is shown in Fig. 7. The intensity of quently used methods. It is worth noting that some studies
app usage increases during the day and decreases during the looked into temporal patterns and used them to predict and
night. However, in different scenarios, such a diurnal pattern recommend app usage. In the following section, we will go
may change. The diurnal pattern presents differently at var- over these studies in detail.
ious granularities [105], such as bytes, packets, flows, and
users. For example, in terms of the number of bytes and pack-
ets, app usage is still highly active at night from 20.00 to C. App Usage Prediction and Recommendation
24.00, but not observed in flows and users. Also, different 1) App Usage Prediction: App usage prediction aims to
app categories may have distinct diurnal patterns. In contrast forecast the apps that will be launched so that smartphones
to other app categories, the diurnal usage pattern of trans- can load app-related resources in advance to cut down the
portation apps has more than two peaks on weekends [106]. searching and loading time [115]–[117].
Meanwhile, some researchers attempted to model the tempo- App usage prediction is solved by examining usage
ral patterns of app usage over the course of a single day. rules based on historical user behavior. Predicting the most
Kostakos et al. [107] used a time-sensitive Markov model, frequently used (MFU) and most recently used (MRU)
while Do and Gatica-Perez [108] represented app usage traces apps [118], as the basic benchmark method, takes advantage
in one day as a bag-of-apps to infer the underlying structure of the most basic regularity of user behavior. Furthermore,
of daily app usage. probabilistic models are used to investigate complex regu-
The temporal traces of mobile app usage, on the other larity [119]–[121]. Natarajan et al. [119] modeled the app
hand, can be represented as a series of app sessions. An usage sequence as a Markov chain and utilized the one-order
app session, as shown in Fig. 8, is a period during which transition probability for prediction. Zou et al. [120] con-
a user is actively engaged with the app. Some studies looked sidered high-order app transitions to improve the prediction
into the temporal patterns of app sessions. Silva et al. [109] accuracy by leveraging a bayesian network model. Baeza-
conducted description statistics on app session time and dis- Yates et al. [121] used all apps used in the recent time
covered that map and media apps have longer app sessions. window for prediction. They achieved an accuracy of 85.7%
As users’ activities for app advertisement are dependent on by using a parallel tree augmented Naive Bayesian network.
app session time [110], Rula et al. [113] proposed using Thanks to recent advances in deep learning, Xu et al. [122]
decision trees to predict app session time for better adver- and Lee et al. [123] proposed using long short-term memory
tising. Jones et al. [111] and Cao et al. [114] analyzed (LSTM) to capture the temporal regularity for app usage
users’ app re-visitation patterns across sessions and discov- prediction.
ered three distinct patterns: checkers, waiters, and responsives. App usage is also influenced by a variety of sensor contexts,
Users who exhibit brief revisit patterns of fast re-visitation are as shown in Section III-B. As a result, many existing studies
referred to as checkers (less than one hour). Users with longer used various kinds of sensor contextual information, such as
revisit patterns, uniformly distributed between short-medium time, location, and movement status, to improve prediction
re-visitations (between 1min and 4hrs) and long re-visitations, performance instead of solely relying on users’ historical app
are referred to as waiters (from 2hrs to 3days). Users who usage traces.
exhibit both brief and long revisit patterns are referred to Time is widely used for app usage prediction as a fundamen-
as responsives. Leiva et al. [112] looked into the disrupted tal sensor contextual feature, indicating the hour of one day or
948 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

TABLE VI
L ITERATURE R EVIEW IN A PP U SAGE PATTERN D ISCOVERY

the day of one week of usage behavior. Jiang et al. [124] added
time as a new feature and used the most similar app usage case
from history to generate prediction results. Wang et al. [125]
used time as a condition in their probabilistic graphical mod-
els. Zhao et al. [126] developed a deep learning-based model
for predicting app usage based on time context. As illustrated
in Fig. 9, they concatenated the time feature with historical
used apps and fed the time feature into the last layer of the
neural network to introduce time context. It is a state-of-the-
art deep learning-based app usage prediction model, with a
Recall@5 of 84.47%.
Location is also an important contextual factor to consider
when predicting app usage. To achieve location-aware app
prediction, Parate et al. [127] split app usage sequences into
a variable-length Markov chain according to location features, Fig. 9. A deep learning model for time context-aware app usage
prediction [126].
while Wang et al. [128] used the location factor as a condition
in Bayesian networks. Furthermore, deep learning techniques
are used in location-aware prediction. Xia et al. [129], for app-time, and app-category subgraphs. Yu et al. [131] used a
example, developed a recurrent neural network-based model graph neural network to learn the node embeddings on an
to predict the next used app and visited location at the same app-location-time graph. They then used the learned node
time. Chen et al. [130] proposed CAP, a graph embedding- embeddings to predict app usage. Zhou et al. [132] fur-
based model for learning node embeddings from app-location, ther proposed a heterogeneous graph-based model that learns
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 949

embeddings for apps, locations, and time, to achieve an

end-to-end prediction. The above frameworks based on graph
embedding open the door to modeling correlations between
app, location, and time through spatiotemporal app usage
graphs.
Movement status, network mode, and battery level are also
used for app usage prediction. In [133] and [134], movement
status, including mobility entropy and travel patterns, is
extracted from users’ trajectories. The above information and
app usage sequences are jointly utilized for training a clas-
sifier for app usage prediction. Do and Gatica-Perez [135]
considered network mode and battery level when building the
probabilistic model for app usage prediction. Xu et al. [136] Fig. 10. A deep learning model for usage context-aware app recommenda-
enriched query vector using screen state and network mode to tions [143].
recall the most likely used apps as prediction results.
2) App Recommendation: App recommendation aims to
infer users’ preferences towards unobserved new apps and then four different social networks for users, including a call log
recommend favorable ones [137], [138]. Regarding whether network, a Bluetooth proximity network, a friendship network,
contextual information is used, app recommendations can and an affiliation network. They then proposed a graph model
be classified into non-contextualized recommendations and to predict the probability of installing a new app.
context-aware recommendations. App metadata like category, description, and permission
Non-contextualized app recommendations are made solely are frequently used in context-aware app recommendations.
based on the user’s previous app usage behavior. Yan and Liu et al. [146] leveraged app category information for app
Chen [139] presented AppJoy, a framework for tracking recommendation based on a kernelized non-negative matrix
how users install and use apps. The AppJoy system uses factorization (KNMF) model. Liang et al. [147] proposed
an item-based collaborative filtering (CF) model to predict characterizing app features based on three factors: cate-
usage scores and make recommendations. To deal with sparse gories, permissions, and descriptions. They then used a
datasets, Shi and Ali [140] proposed a PCA-based model tensor-based framework to integrate the multi-view features
for app recommendation, which uses PCA to extract fea- of apps to achieve app recommendations. Lin et al. [148]
tures from the user-item matrix before performing item-based and Cao et al. [149] looked into version-sensitive app rec-
CF. Instead of collaborative filtering, Liang et al. [141] and ommendation by considering the version update of apps.
Ouyang et al. [142] used graph neural network models to Rocha et al. [151] and Gao et al. [150] proposed using app
depict the similarity of apps and make recommendations. permissions, to improve security degree of recommendations.
Specifically, Liang et al. [141] considered only app installa- 3) Discussion: We reviewed existing studies on app
tion behavior, while Ouyang et al. [142] took both installation usage prediction and recommendation in this subsection.
and search behaviors into account. Table VII summarizes prominent literature in terms of dataset
Many studies extended to a context-aware case in which information, methods, and system performance.
apps are recommended based on contextual information In the field of app usage prediction, sensor context-based
such as time, location, movement status, and app meta- cases are the mainstream because sensor context features, such
data. One of the most widely used methods for utilizing as time and location, can provide a significant performance
contextual information is context-aware collaborative filter- boost. Most of the datasets are collected from monitoring apps
ing (CAF), which extends conventional collaborative filter- and network operators because these two collection methods
ing by multiplexing user identities through context features. can support sensor context data. There is a clear progression
Böhmer et al. [144] compared different recommendation from basic statistical learning models (e.g., nearest neigh-
systems and demonstrated that context-aware collaborative fil- bor, random forest, Markov model) to probabilistic models
tering outperformed non-contextualized collaborative filtering. (e.g., Bayesian networks, probabilistic graphical models), and
Tu et al. [145] took time as a contextual factor and con- finally to deep learning models (e.g., deep neural networks,
sidered changes in user preferences over time. They proposed graph embedding, graph neural networks). This is because
a personal interest evolution network to model user dynamic deep learning models can explore contextual information in-
preference and make recommendations. Xu et al. [143] lever- depth and extract coupling relationships between users, apps,
aged usage context, prior and posterior used apps, to infer time, and locations.
users’ preference to recommend apps. The proposed model, In the field of app recommendations, contextual
as shown in Fig. 10, contains two paths to combine user fea- information, such as location, time, and app metadata,
tures and app usage context. They started by merging the user is frequently used. To support supplementary data, the
and app embeddings. The merged embedding was then used datasets used are primarily collected from monitoring apps
as inputs to predict context apps (on the right path) and to and app stores. Regarding the methods applied, collaborative
predict user preference (on the left path). Pan et al. [14] used filtering (CF) is the most basic algorithm framework. Tensor
social contextual features for app recommendations by creating factorization is a popular and effective way to address
950 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

TABLE VII
L ITERATURE R EVIEW IN A PP U SAGE P REDICTION AND R ECOMMENDATION

context-aware app recommendations by taking location as deep neural networks and graph neural networks, have
and time as context information and adding additional become increasingly popular in recent years for mobile app
dimensions to represent them. Deep learning models, such recommendations.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 951

TABLE VIII
U SER ATTRIBUTES T HAT C AN BE P ROFILED F ROM M OBILE A PP U SAGE DATA

separated into two groups: descriptive profiling and predictive

profiling. In descriptive profiling, researchers focus on how
user attributes affect their app usage behavior. Alternatively,
researchers aim to use app usage data to predict users’ profile
labels in predictive profiling.
1) Descriptive Profiling: References [19], [23],
[156]–[160] investigated how demographic characteris-
tics affect app usage behavior. Andone et al. [156] looked
into the effects of age and gender. They analyzed the app
usage patterns of 30 thousand users for 30 days. According
to the study, females spent more time on communication
and social apps, while males spent more time playing
games. Teenagers aged 12 to 17 spend the most time on
Fig. 11. The relationship between user attributes and app usage. communication, social media, and gaming apps, averaging
over 40 minutes per day. Nonetheless, participants over the
age of 30 spend less than ten minutes using these apps.
IV. U SER D OMAIN R ESEARCH Gordon et al. [23] pointed out that older adults using fewer
Personal characteristics, such as interests, age, gender, and apps is caused by cognitive decline. As a result, they recom-
occupation, have a significant impact on mobile app usage. mended that developers consider cognitive function to better
Many studies analyzed mobile app usage data from the support older adults. Zhao et al. [157] looked at how app
user’s perspective to reveal a link between user characteris- usage differed by income level. They discovered that users
tics and app usage behavior. This section summarizes user with a higher income use apps in the categories of shopping,
domain research on two primary topics: user profiling and finance, travel, and business more frequently. Users with a
user identification. low-income level use game and video apps more frequently.
Tu et al. [158] expanded the analysis to include education
level and discovered that Ph.D. users use more communica-
A. User Profiling tion apps but fewer sports apps than other education levels.
User profiling aims to infer personal attributes from user- References [19], [159], [160] investigated the differences
generated data, which is essential for personalized services in mobile app usage by country. Lim et al. [19] gathered
such as personalized search, recommendations, and advertise- data on app adoption in an online survey of 10,208 partic-
ments [152]–[154]. Demographic characteristics, personality ipants from over 15 countries. They discovered significant
traits, psychological status, personal interests, and life status differences in different countries. Users in the United States,
are five categories of user features profiled from mobile app for example, are more likely to download medical apps.
usage data, as shown in Table VIII. Demographic characteris- Peltonen et al. [159] suggested that cross-cultural differences
tics refer to a person’s inherent properties, such as gender, could explain differences in mobile app usage behavior across
age, income level, nationality, and occupation. Personality countries. They discovered three main clusters of countries
traits reflect people’s characteristic patterns of thoughts, feel- with distinct usage patterns by using app category usage as
ings, and social adjustments. The Big-Five model is the most features and applying the hierarchical clustering algorithm.
widely used personality trait measurement system [41], [155], The countries within the same cluster, as shown in Fig. 12,
which includes five broad traits: extraversion, agreeableness, have similar cultural backgrounds, with three main clusters:
conscientiousness, neuroticism, and openness. Psychological European, English-speaking, and mixed. Guzman et al. [160]
status is a mental state that influences people’s behavior and also demonstrated that cross-cultural differences in mobile
decisions, such as stress, well-being, and emotion. Personal app reviews still exist.
interests are the objects or events that users prefer to concen- The impact of personality traits on mobile app usage was
trate on. Individuals’ life status denotes their preferred way of investigated in [161], [162]. They used the Big-Five frame-
life, which includes things like life events and life stages. work, which consists of five bipolar factors: extraversion,
The relationship between user attributes and app usage is agreeableness, neuroticism, conscientiousness, and openness,
depicted in Fig. 11. On the one hand, user attributes shape to represent users’ personality traits. The five factors in
how people use mobile apps. On the other hand, app usage Table IX are explained using adjective examples to describe
behavior also reflects individual attributes. Based on differ- each trait. The impact of personality traits on app category
ent directions of the relationship, existing studies can be usage was highlighted by Huseynov [161]. Extraverts use
952 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

clustering. Each user group is assigned a meaningful label,

such as night communicators, evening learners, or finan-
cial users. Lee et al. [167], on the other hand, undertook
a small-scale investigation. They used Seq2Seq to extract
app usage sequence embeddings from 180 volunteers’ app
usage sequences. They found eleven user groups using the K-
means clustering method on embeddings: conversationalists,
utilitarians, social stars, photographers, music fans, news and
magazine readers, video streamers, gaming buffs, power users,
and beginners. Both large-scale and small-scale analyses in the
Fig. 12. Countries colored by cluster labels in terms of app category previous studies highlighted the diversity of personal interests
usage [159]. in app usage.
Some researchers have indicated links between app usage
TABLE IX and life status [168], [169]. Frey et al. [168] are the first
T HE B IG -F IVE P ERSONALITY F RAMEWORK AND E XAMPLES OF to recognize that a user’s current life stage has a significant
A DJECTIVES D ESCRIBING E ACH T RAIT
impact on their app adoptions. For example, when a person’s
life stage changes from without children to with children,
the use of travel and entertainment apps drops dramatically.
Chen et al. [169] used fitness app data to examine people’s
workout activities and discover their exercise styles. They
found that people in the central business district (CBD) walk
or jog more than those in rural areas and that people with
longer mobility spans exercise less.
2) Predictive Profiling: Given users’ app usage data,
predictive profiling aims to predict their attribute labels.
References [170]–[174], [176], [177], [184] estimated users’
more photography and video editing apps. Agreeableness indi- demographic characteristics, such as gender, age, and income,
viduals are negatively associated with the use of health and based on their app usage behaviors. Seneviratne et al. [170]
lifestyle mobile apps, while e-commerce-related apps are usu- gathered app adoption data and gender labels from 200 users
ally avoided by conscientious individuals. Beierle et al. [162] and found that using only app installation lists could accurately
looked at the link between personality traits and app ses- estimate users’ gender by 75%. They then expanded their
sion characteristics. They discovered that extraversion and research to include other characteristics such as religion, spo-
neuroticism are linked to more app sessions, whereas con- ken languages, and nationality [171], where precision reached
scientiousness is linked to a shorter average session time. over 90%. Malmi and Weber [172] further used a larger dataset
Some researchers looked into the psychological status of of 3,760 users to confirm Seneviratne’s findings and consid-
users [163]–[165]. Gao et al. [163] analyzed the links between ered new demographics, such as wealth and race. According
app use and social anxiety and loneliness. Using the Wilcoxon- to their study, gender is the most predictable attribute, with an
Mann-Whitney test to examine differences in user behaviors, accuracy of 82.3%, while wealth is the most difficult to fore-
they discovered that people with social anxiety or lone- cast, with an accuracy of 60.3%. Instead of using the list of
liness receive fewer incoming calls and use health apps users’ apps directly as features, Zhao et al. [173] constructed a
more frequently. People with higher levels of social anxiety Boolean user-app matrix to represent user app adoptions and
receive fewer messages and utilize camera apps less often. used Boolean matrix factorization (BMF) to extract hidden
Katevas et al. [164] focused on users’ well-being. Based on representations of users. Hidden representations had a higher
data from 340 participants over the course of four weeks, they gender prediction accuracy of 77.2% than app lists (75.2%).
offered evidence that intense app usage alone does not indicate Zhao et al. [174] also looked into the sequential characteris-
negative well-being. However, nightly usage behavior shows tics of app usage and learned user embeddings from app usage
a strong link to a reduced sense of well-being. According sequences. With the help of users’ embeddings, they improved
to [165], user emotions and app usage behaviors have bidi- gender prediction accuracy to 82.49%. Bian et al. [175] con-
rectional causality. App usage, on the one hand, influences sidered user-app interactions as well as user-item interactions
emotions. Entertainment apps, like YouTube and music apps, on apps to learn user embeddings and infer users’ occupations
evoke positive emotions. On the other hand, emotions drive with a 70% precision. To improve prediction accuracy, some
app usage. When users are experiencing happy emotions, they studies use additional information such as app descriptions and
are more active in using social apps. sensory data. Zhao et al. [176] used the LDA model to extract
Personal interests are the subject of [33], [166], [167]. topic features from app descriptions as side information. They
Zhao et al. [33], [166] conducted a large-scale study in then used both topic features and app lists to predict users’
which they assessed 105,762 users’ app usage traces for one gender labels. Yu et al. [177] used sensory data such as screen
month. They identified 382 unique groups of users based and battery status to predict gender with a precision of 91.70
on their temporal patterns of app usage using K-means percent and an RSME of 4.3696 in age estimation.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 953

TABLE X
L ITERATURE R EVIEW IN D ESCRIPTIVE P ROFILING

In terms of personality trait prediction, Xu et al. [178] a regression problem by placing participants on a Big Five
developed a prototype, Personality Test, which automatically Personality Inventory continuum. They used app usage and
evaluates users’ personality traits based on apps they installed. smartphone sensory data and achieved RMSEs ranging from
Personality Test tracks the use of seven app categories: social 12.7% to 22.2% for different traits.
apps, games, music and video apps, shopping apps, photog- Ochiai et al. [183] looked at app usage patterns to see if
raphy apps, finance apps, and personalization apps. With a they might forecast users’ psychological states, such as stress
random forest classifier, the Personality Test can obtain an levels. They considered the order in which the apps were
average prediction precision of 42.72%. Peltonen et al. [179] used and created an app usage graph for each user to depict
went on to collect data on app usage across all categories. They the contextual relationship between apps. Based on the Graph
showed that category-level aggregated app usage can predict Isomorphism Network (GIN), they classified app usage graphs
Big Five personality traits with a prediction fit of 86%-96%. In to predict users’ stress levels and achieved an accuracy of
addition to app usage data, Chittaranjan et al. [180] considered 54.5%.
call logs, SMS logs, and Bluetooth scan logs. They achieved 3) Discussion: We summarized recent studies on user pro-
an average F1-score of 0.57 using an SVM classifier with a filing based on app usage behaviors in this section. In Tables X
radial basis function (RBF) kernel. Kambham et al. [181] and and XI, we review prominent literature on descriptive profiling
Gao et al. [182] characterized personality trait prediction as and predictive profiling. The tables include research topics,
954 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

TABLE XI
L ITERATURE R EVIEW IN P REDICTIVE P ROFILING

dataset information, methods, and findings and results. We Welke et al. [186] studied app adoption of 46,726 participants
can see that the dataset scale ranges from 28 users to over and discovered that using 500 of the world’s most popular
100,000 users. App usage data is generally collected through apps could distinguish unique app-signatures for 99.67% of
monitoring apps and network operators, while user attributes users. Tu et al. [187] analyzed a larger dataset of 1.37 million
are collected through online surveys. Descriptive statistics are Chinese users and found that using only four apps can uniquely
frequently used in descriptive profiling to reveal the relation- identify 88% of users. Sekara et al. [188] further compared
ship between app usage and user characteristics. The statistical the size of app-signatures of users in different countries. They
significance of discovered relationship is demonstrated using discovered that the identification rate is heavily influenced by
correlation test metrics such as Pearson correlation, Wilcoxon- the country of users. When the size of app-signatures is set to
Mann-Whitney test, and Tukey-Kramer test. In predictive 5, the average identification rate varies dramatically between
profiling, SVM, LR, MLP, and random forest are commonly countries, ranging from 41.2% in Finland to 66.5% in the
used classification methods. A few studies recently looked United States. These observations present severe privacy con-
into advanced machine learning techniques, such as trans- cerns as most individuals are identifiable and trackable, even
former, attention, and embedding, indicating a hot direction in large-scale anonymized datasets.
for predictive user profiling. Some researchers use such identifiable characteristics for
active authentication to secure users’ smartphones. The basic
idea is to use a user’s app usage patterns to verify his or her
B. User Identification identity [189]. For each user, Ashibani and Mahmoud [190]
Although data collection agencies have anonymized user chose the eight most-used apps and used the timestamp of
IDs to protect users’ privacy, mining or sharing app usage app usage as a feature to continuously identify users. Their
datasets still poses a significant privacy risk. Many studies authentication model verifies users with an average F1-score
have demonstrated that users can be identified or re-identified of 96.5% after conducting experiments on ten users. They
from anonymized datasets based on their app usage behav- then improved the F1-score to 98% by including traffic pat-
iors. In 2010, Falaki et al. [185] were the first ones to terns while accessing apps as features in their model [191].
demonstrate the diversity of smartphone and app usage among For more reliable authentication, Bassu et al. [192] proposed
individuals, paving the way for user identification research. combining contextual information, such as location and time,
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 955

TABLE XII
L ITERATURE R EVIEW IN U SER I DENTIFICATION

98% or an equal error rate of 3%. However, existing frame-

works continue to rely on a centralized learning structure, in
which all users’ data is uploaded to a server, putting user
privacy at risk during data uploading and computing processes.

V. S MARTPHONE D OMAIN R ESEARCH

Fig. 13. User authentication by using text entered via soft keyboards, apps Many researchers analyzed app usage data to improve
used, websites visited, locations visited [193].
system performance and solve problems in the smartphone
domain. This section summarizes smartphone domain research
in terms of two perspectives: app energy drain and app traffic
with users’ uploading, downloading, and updating behaviors. patterns.
Fridman et al. [193] also looked at various types of biomet-
ric behaviors, including text entered via soft keyboards, apps A. App Energy Drain
used, websites visited, and locations visited. Each distinct bio- The smartphone is a limited-energy device whose battery
metric behavior is fed into a classifier, as shown in Fig. 13. life is a critical performance and user experience metric [195].
A data fusion center then fuses the local binary decisions Both the research community and the industry are working on
from each classifier to produce a single global binary deci- ways to extend the life of batteries. Increasing battery capacity
sion. The authentication system can achieve an equal error is one simple solution. However, due to hardware limitations,
rate (ERR) of 5% after conducting experiments on 200 users. we need to improve the energy efficiency of smartphone apps
Notable, unexpected events, such as newly installed apps, can parallelly. Some researchers have studied the energy drain of
have a significant impact on the performance of authentica- app usage in order to improve energy efficiency.
tion systems. Mahbub et al. [194] investigated the impact Song et al. [196] empirically measured energy consumption
of unknown apps on the verification task. In their case, the for ten mobile apps. They discovered that average energy con-
unknown apps refer to the test set’s apps but not in the train sumption in the studied apps varies significantly, ranging from
set. They discovered that the appearance of unknown apps 0.25 Joule/second to 1.25 Joule/second. Their research empha-
increased the number of false negatives. sizes the importance of optimizing app design to reduce energy
User identification can be regarded as individual-level user consumption. To gain a thorough understanding of app energy
profiling, i.e., predicting a user’s specific identity label instead consumption patterns, Oliner et al. [8] developed Carat, an app
of an attribute label. Table XII summarizes prominent lit- for both iOS and Android platforms, which detects and diag-
erature related to user identification. We can observe that noses energy anomalies in apps. On stock devices, Carat runs
users’ identifiable characteristics based on their app usage as a user-level app that collects information such as battery
behaviors are demonstrated in both small-scale [185] and level, running app names, memory status, and device model.
large-scale [186]–[188] datasets. Several studies have also Oliner et al. discovered 10,110 hogs and 233,258 buggy apps
looked into using such identifiable characteristics for user by analyzing energy drain data from over 500,000 devices. An
authentication. By utilizing simple classifiers, like RF, LR, app is a hog if it consumes a lot of energy to run. Pandora
SVM, they achieved good performance with an F1-score of Radio, Skype, Live Wallpapers are typical hogs. Apps that
956 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

consume significantly more energy on some clients than oth- component, confirming the findings in [197]. Most impor-
ers are referred to as bug apps. That may be caused by the tantly, they discovered that called system APIs account for
different usage habits of users. For example, some users prefer the majority of an app’s energy consumption, implying that
to use Kindle with networks to synchronize notes and book- app developers should exercise caution when using them.
marks, while others prefer not to do so to save money on
mobile data. The energy consumption will differ significantly
depending on whether the network connection is turned on or B. App Traffic Patterns
off. In this way, Kindle, Facebook, Youtube are typical bug Smartphones, which are supported by a variety of network-
apps. Based on these findings, they then created actionable based apps, allow users to stay connected to the ubiquitous
diagnosis trees to recommend practical actions such as turn- Internet [211]. According to a Cisco report [212], smartphone
ing on/off WiFi/GPS, avoiding using certain apps under certain traffic accounted for 73% of all traffic in 2018 and will rise
conditions, and upgrading/keeping app or operating system to 89% by 2023. Because of the high volume and growing
versions. importance of mobile traffic, researchers and network oper-
Chen et al. [197] created eStar, a free Android app that col- ators are interested in understanding how mobile apps use
lects energy drain data, and used it to gather an energy trace network resources.
dataset from 1,520 Galaxy S3 and S4 devices covering 800 Xu et al. [32] conducted anonymous network measurements
different apps in the wild. They discovered that background from a tier-1 cellular carrier in the United States, covering
apps and services consume 16.1% of total energy during the over 600 thousand users, and used HTTP signatures to iden-
screen-off period. This finding implies that energy consump- tify traffic from various apps. In terms of traffic sources, they
tion can be reduced by improving the app scheduling algorithm discovered that mobile apps could be divided into two groups:
and killing background apps when the screen is off. However, local apps and national apps. The majority of the traffic of
if we disable all apps without considering the differences in local apps comes from a single region, whereas the traffic of
background activities, the user experience may be adversely national apps comes from diverse regions. This finding sug-
affected. In [198], Chen et al. looked into this issue in depth. gests that content optimization in access networks could be
By analyzing the same dataset, they discovered 76 no-sleep significant, with local app content being placed on servers
apps for which the CPU was never suspended. They then closer to end-users. Jiang et al. [202] looked into the relation-
designed a metric, Background-Foreground Correlation (BFC), ship between app popularity and network performance. They
to assess the utility of background app activities. A simple discovered that over 60% of popular apps optimize network
yet effective screen-off energy optimizer based on BFC was delay under 350ms, highlighting the importance of app service
developed to learn and suppress useless background activi- network performance. Falaki et al. [213] and Li et al. [203]
ties of apps automatically. Li et al. [199] pointed out that the looked at a small and large scale network dataset, respectively,
unnecessary workload of apps is a major root cause of energy and found that smartphone operating systems have an impact
issues. Many apps, they discovered, perform computations that on app traffic consumption. When compared to Android and
do not provide users with discernible benefits, resulting in Windows Phone, iOS consumes more traffic. Jin et al. [204]
unnecessary workload and energy consumption. also discovered that different data collection purposes resulted
The above studies look at the energy drain pattern of a sin- in different app network traffic usage. As a result, they created
gle app. However, because most smartphones have multi-core MobiPupose, a system that could track app network requests
CPUs, parallelism execution results in different app combina- and classify data collection purposes based on app traffic pat-
tions having different power consumption patterns. The energy terns. Walelgne et al. [205] and Okic et al. [206] showed that
consumption of one app’s threads will affect the energy con- different app categories have different traffic patterns. In terms
sumption of other apps’ threads. By studying different apps as of traffic volume, entertainment and social media apps are the
a group, Rex et al. [200] attempted to reschedule app threads most traffic-intensive, while education and weather apps are
and correlate their influence on energy consumption. Because the least traffic-intensive [205]. Moreover, in terms of tempo-
the number of total possible combinations of apps is enor- ral patterns, music and shopping apps see the most traffic in
mous, they explored the frequent app usage sets based on the morning, while e-mail and game apps see the most traffic
users’ usage habits and then optimize scheduling sequences during lunchtime [206]. Some studies leveraged app traffic pat-
for frequent app usage sets. terns to improve network performance. Ouyang and Yan [207]
To help developers create ‘greener’ apps that use less developed AppWiR, a crowdsourcing-based system that gath-
energy, some researchers looked at energy drain patterns of ers app usage data and derives relationships between app
specific app events or functions. Li et al. [210] developed usage, network traffic, and network resources. AppWiR can
a prototype tool called vLens to collect energy data at the predict network resources occupied by apps with a mean abso-
nanosecond and millisecond levels. As a result, vLens can lute percentage error of 12.54% sim13.39$. Zeng et al. [10]
calculate the energy consumption of smartphone apps at the analyzed the temporal traffic consumption patterns of various
source line level and track high-energy events like thread app categories. They then used a linear regression model to
switching and garbage collection. In [201], they then used forecast the amount of traffic consumed by the most popu-
vLens to collect the energy drain patterns of 405 apps. They lar app categories over a given period of base stations and
discovered that, on average, apps waste 61% of their energy devised an edge caching strategy to cache the content for the
in idle states, and the network is the most energy-consuming most popular app services. Aceto et al. [208] defined network
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 957

TABLE XIII
L ITERATURE R EVIEW IN S MARTPHONE D OMAIN R ESEARCH

traffic at various levels, such as packet and message levels, is primarily caused by hogs and background apps and unnec-
and proposed using the Markov model to predict app network essary workload. These findings suggest that app developers
traffic at different levels. Yin et al. [209] incorporated mobile should carefully select system APIs to reduce unnecessary
app usage patterns into data pricing strategies, creating a pric- power consumption and that operating systems should improve
ing scheme that is updated based on user satisfaction and app scheduling algorithms. Some studies have attempted to
operational conditions. optimize app scheduling; however, their optimizer is still
unable to support personalized requirements, such as adjust-
ing optimization strategies for different user habits. In the field
C. Discussion of app traffic patterns, the datasets are collected from network
This section summarized relevant smartphone domain stud- operators and monitoring apps. Network operator datasets typ-
ies from two principal fields: app energy drain and app ically cover millions of users, which is significantly more
traffic patterns. Table XIII lists the most important litera- than those of monitoring apps. Clustering algorithms, such
ture in terms of research topics, dataset information, methods, as K-means and DBSCAN, are used to discover app traf-
and key findings. We can see that most datasets for energy fic patterns. In the meantime, classification and regression
drain analysis are gathered from monitoring apps to sup- methods, such as SVM, decision tree, markov model, and
port battery state collection. Excessive energy consumption random forest, are used to forecast app traffic consumption
958 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

to optimize network resource allocation and data pricing leads to data bias. It is hard to organize a smartphone app
strategies. usage dataset wholly and accurately reflecting real-world user
behaviors. Most existing studies are based on biased popu-
VI. C HALLENGES AND F UTURE R ESEARCH lations. The users involved in [34], [43], [69], [78], [169],
[187] are all Chinese. The participants in [39], [101], [216]
Smartphone app usage analysis is a promising direction.
all live in the Lake Geneva region in Switzerland. University
A significant number of studies have been done in recent
students are only considered in [40], [217], [218]. To alleviate
years and obtained a lot of achievements. However, there
the biases, improving the diversity of participants and sampling
are still several active challenges that have not been well
technology are useful methods to increase the robustness of
addressed. This section will discuss the challenges first and
results.
then look forward to the future research of smartphone app
The replicability of research will be hampered by data
usage analysis.
bias. Church et al. [2], for example, attempted to duplicate
previous work. However, their findings differed significantly
A. Challenge in Data from previous research, revealing replicability issues and
1) Data Collection: The collection of app usage data biases across datasets. To solve this problem, we need to put
has become much more difficult after the implementation of a lot more effort into constructing open datasets. Researchers
GDPR. A major reason is that GDPR principles and data can use unified public datasets to compare their new algo-
collection for big-data analysis are not always mutually exclu- rithms and discoveries to state-of-the-art ones, especially in
sive. For example, a large amount of data is preferred in the method-oriented topics like app usage prediction, app rec-
data collection to reduce deviation and bias in analysis results. ommendation, and predictive user profiling. The standard
However, it goes against the data minimization principle. New platform can help to reduce the number of unrepresentative
hypotheses are frequently introduced after data collection in repeated experiments and boost academic loyalty. Thankfully,
data analyses. However, it is constrained by the purpose of the we have taken the first step toward creating benchmark
users’ initial consent. One possible way to overcome such con- datasets. As we mentioned in Section II, some researchers have
tradictions is to collect anonymized data because anonymized made their app usage datasets public. We believe their public
data is excluded from GDPR [214]. While the GDPR has a datasets will be strong candidates for benchmark datasets in
strict definition of anonymized data, which means that the data terms of data quality and impact.
subject is not or is no longer identifiable. In other words,
it must be impossible to obtain personal information from
anonymized data. As a result, anonymized app usage data can
only be used in the app and smartphone domain research rather B. Challenge in Methods
than user domain research. 1) Heterogeneous Data Fusion: Other types of data, such
We have discovered that GDPR has some unintended conse- as smartphone sensor data and app metadata, are also impor-
quences. Collecting data from network operators, for example, tant and helpful for analyzing smartphone app usage. Although
has become extremely difficult due to the difficulty in obtain- some approaches for fusing heterogeneous data in app usage
ing user consent. Researchers are hesitant to publish datasets analysis have been proposed, current methods can be improved
for fear of penalties, especially when European citizens are by increasing their effectiveness and generalizability. For
involved. Furthermore, because data collection in the European example, there are two widely used methods in existing studies
Union is difficult, there is a risk that researchers will prefer for fusing context data into app usage records. The first one
to use data from places with fewer or no regulations. A few is to use the tensor structure by adding additional dimensions
trends have emerged. Recent studies from the last three years, to represent context information. The second one is to take
i.e., 2018, 2019, and 2020, are primarily based on data from context features as conditions that are parallelly input into a
China and the United States. Only a few studies are based on probabilistic model or Bayesian networks. However, when the
data from Europe. scale of the context grows large, these two methods will suffer
2) Mode Effect and Data Bias: Existing app usage datasets from the curse of dimensionality and high computational cost.
vary greatly in terms of collection methods, scales, col- Most existing data fusion models are task-specific, making it
lected items, and duration. These differences could lead to difficult to generalize results across different studies. App fea-
biases in their studies and make research more difficult to tures, for example, are commonly used in the tasks of app cat-
replicate. Different collection methods will result in a mode egorization and app recommendations, which are determined
effect, which will introduce biases into the collected data. based on app descriptions and app usage behaviors. However,
App stores and monitoring apps, for example, can only col- the model is difficult to transfer directly due to different dataset
lect data from their users. Furthermore, monitoring apps can settings [2]. The question of how to better fuse heteroge-
impact users’ normal app usage behavior due to privacy con- neous data is still an active challenge. Embedding technology
cerns, resulting in data collection biases. Approximately 1% could be a promising method. Embedding technology aims
of participants changed their normal usage behavior during to map high-dimensional data into low-dimensional vectors
data collection [215]. As a result, it is critical to address in latent embedding space while maintaining similarity across
participants’ privacy concerns and develop more unobtrusive data points. A few studies [167], [174] have tried to employ
monitoring apps. The population difference across studies also embedding to model app usage sequences. Heterogeneous data
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 959

fusion can also benefit from embedding technology. For exam- millions of users before eventually disappearing. Exploring the
ple, to describe the relationship between different types of app ecosystem’s evolutionary processes and extracting gen-
data, we can create a heterogeneous graph. We can learn a eral rules and impact factors behind app usage is critical for
low-dimensional data representation that embeds node features all relevant stakeholders, including smartphone manufactur-
and relations together using graph embedding technologies ers, service providers, and app developers, to provide valuable
like Metapath2Vec [219] and HAN [220], which improves the guidance.
effectiveness of data fusion. We can learn universal data rep- Globalization has a significant impact on the app ecosys-
resentations in this way, such as app embeddings that capture tem’s development. Globalization has accelerated interaction
global patterns of app usage, context information, and app and integration among people from various countries and cul-
metadata. The learned embeddings can be reused in multiple tural backgrounds in recent decades. The app’s usage also
models. reflects such interaction and integration. Apps are easier to
2) Privacy Preserving Analysis: Privacy is always an distribute around the globe and attract a large number of
important consideration when accessing, using, and sharing users. For instance, Pokémon Go, a popular smartphone game,
mobile app usage data. As we discussed in Section IV-A, was first released in the United States on July 6, 2016, and
users’ attributes can be inferred from their app usage data. quickly spread around the world. The study of how popular
Thus, privacy-preserving technologies must be used when apps spread is important for developing business strategies,
dealing with such sensitive data. Anonymization is a com- predicting app popularity, and profiling the app ecosystem.
mon technique. Böhmer et al. [87], for example, used hash Localization, on the other hand, is also a trend in the app
functions to replace all participants’ personal identifiers with ecosystem. Many apps provide access to local information
unrecognizable symbols. This method can delink multiple and are useful to both tourists and residents. For example,
sets of users, reducing the risk of individual identities being residents of Shenzhen, one of China’s largest cities, can use
leaked along with other public auxiliary data. However, sim- a parking app called Yitingche. KYOTO Trip+4 is an offi-
ply anonymizing user IDs is insufficient to ensure privacy. cial app that provides information to visitors and residents of
Based on anonymized app usage records, previous stud- Kyoto, Japan’s largest city. According to our careful inves-
ies have shown that most people can be re-identified and tigation, a few previous studies have worked on local apps.
tracked [187], [188]. In the privacy-preserving analysis, the Ochiai et al. [60], for example, created a framework for
problem of how to create a well-established anonymization identifying local apps in app stores and making app recom-
mechanism remains unresolved. mendations to residents. However, we pay less attention to
Most existing studies rely on centralized data processing, local apps in existing studies than we do to global apps. There
which is bound to raise privacy concerns. Collecting app could be two reasons for this. Local apps, tied to a specific
usage data and centralized processing has become more dif- location, usually have fewer users and have a less international
ficult as a result of recently enacted strict privacy protection impact than global apps. Second, conducting studies on local
regulations. The European GDPR, for example, requires data apps is more difficult due to a lack of data. Nonetheless, we
controllers to disclose any data collection and to declare the would like to emphasize that, in contrast to global apps that
lawful basis and purpose for data processing. In addition, support a broad range of interests, local apps focus on local
after iOS 9, the iOS system removed most data collection life services and deserve more research attention.
APIs to address user privacy concerns. As a result, convert- 2) Context-Aware App Usage Modeling: Context-aware app
ing to a decentralized analysis framework is critical to protect usage modeling is a critical but difficult problem to solve.
user privacy and facilitate the research community’s long-term Breakthroughs in mobile app usage pattern discovery, app
development. Federated Learning could be a promising analy- usage prediction, and app recommendations will be made if
sis framework [221]–[223]. Federated Learning allows mobiles this problem is solved. App usage modeling can also help with
to learn a shared model collaboratively while keeping all data the creation of synthetic datasets for benchmarking systems.
on the local device. Users are not required to upload their However, because app usage varies depending on the context,
usage data to cloud servers, and their sensitive information modeling such complex and dynamic behaviors is difficult.
will be well protected [224]. Researchers can also take proac- Thankfully, a few existing studies have taken the first steps.
tive steps to build privacy-preserving mechanisms by making Yu et al. [131] depicted the co-occurrence of apps, locations,
the analysis framework transparent and giving users complete and time to model app usage behaviors. Zhao et al. [126] con-
control over their data. These measures will help to protect sidered user characteristics. However, existing research either
user privacy to some extent and alleviate privacy concerns. ignores the sequential features of app usage or the location
context. Thus, modeling the sequential features of app usage
in various contexts, such as location, time, and motion, is still
C. Future Research a work in progress. The spatiotemporal graph neural network
1) App Evolution Globalization VS. Localization: Since the is one possible solution, in which we can use graph structure to
release of the first iPhone in 2007, the app ecosystem has model the co-occurrence of app usage and context factors, and
evolved significantly over the last two decades. Starting with a recurrent neural network (RNN) model to capture sequential
a few system-embedded apps, the app ecosystem has grown to patterns.
include over 3 billion apps that cater to a wide range of user
needs. Many apps have appeared in recent decades, attracting 4 https://play.google.com/store/apps/details?id=jp.kyoto.pref.visitkyoto&hl
960 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

3) Deep Reasoning App Usage Behaviors: Existing and fine-grained datasets provide rich user behavioral data,
research is limited to correlation analysis. Even though correla- such as app usage and user mobility [228], for conducting
tion analysis can reveal co-occurrence, it lacks interpretability urban computing studies. There have been a few studies that
in some cases. When we find a user who uses travel apps fre- have focused on this promising area. Xia and Li [89], for
quently, for example, we do not know if he/she is traveling or example, used users’ online behavior, such as app usage, as
just planning to travel. Thus, it is difficult to provide direct well as offline behavior, such as human mobility, to uncover
guidance to service providers. Furthermore, the correlation urban dynamics and identify urban zone functions.
analysis may become stuck in Simpson’s Paradox, resulting
in incorrect analysis results. Therefore, adding causal analysis VII. C ONCLUSION
to deep reasoning app usage behaviors is required. Fortunately,
The research efforts on smartphone app usage analysis are
causal inference has progressed significantly in recent years.
surveyed and summarized in this paper. We first introduced
A number of promising methods, such as individual treatment
and compared various data sources, such as surveys, monitor-
effects estimation [225] and shared causal model [226], have
ing apps, network operators, and app stores. For the research
emerged. For reasoning analysis of app usage behaviors, these
community, we presented a set of public datasets and discussed
advanced methods provide robust technology and algorithm
privacy and ethical issues. The related studies in the app,
support.
user, and smartphone domains were surveyed, respectively. We
Sophisticated deep learning methods are now used to inves-
made a detailed taxonomy for each research domain based on
tigate app usage behaviors to improve system performance.
the problem investigated, the characteristics of the datasets
These black-box models, however, are unable to interpret the
used, the methods used, and the key results obtained. Finally,
discovered features and how they relate to the significant
we discussed two current research challenges and identified
results obtained. Several deep learning explainers [227] have
five future research directions for this hot topic.
recently been developed by advanced machine learning stud-
ies in an attempt to open the black-box models. Incorporating
these deep learning explainer models into app usage analysis ACKNOWLEDGMENT
will be a promising step toward improving the interoperability The authors would like to thank Prof. Yong Li for all of the
of profiling results and providing reliable guidance to relevant support and valuable discussions.
stakeholders.
4) Linking User Activities and App Usage: Smartphones, R EFERENCES
which are supported by a diverse set of mobile apps, allow
[1] N. Oliver et al., “Mobile phone data for informing public health actions
people to do things like order food, shop, manage their across the COVID-19 pandemic life cycle,” Sci. Adv., vol. 6, no. 23,
finances, and socialize more conveniently. These spatiotem- 2020, Art. no. eabc0764.
poral app usage traces have the potential to infer users’ [2] K. Church, D. Ferreira, N. Banovic, and K. Lyons, “Understanding
the challenges of mobile phone usage data,” in Proc. 17th Int.
physical world activities and can be used for user profiling Conf. Human–Comput. Interact. Mobile Devices Services, 2015,
and authentication. A strong link between physical activities pp. 504–514.
and mobile app usage has been demonstrated in a few stud- [3] T. Li, Y. Li, T. Xia, and P. Hui, “Finding spatiotemporal patterns of
mobile application usage,” IEEE Trans. Netw. Sci. Eng., early access,
ies. Li et al. [104] used app usage data to identify seven user Nov. 30, 2021, doi: 10.1109/TNSE.2021.3131194.
activities. However, they focused solely on app usage, ignor- [4] Statista. “Number of available applications in the google play store
ing the impact of time and locations. Different physical world from December 2009 to July 2021.” 2021. [Online]. Available:
https://www.statista.com/statistics/266210/number-of-available-
activities may be reflected by the same app usage behavior at applications-in-the-google-play-store/
different times and in different places. As a result, it is nec- [5] Statista. “Worldwide mobile app revenues in 2014 to 2023.”
essary to introduce spatial and temporal factors to infer users’ 2021. [Online]. Available: https://www.statista.com/statistics/269025/
worldwide-mobile-app-revenue-forecasts
physical activities better. This is difficult due to the high com- [6] J. Liu, Y. Fu, J. Ming, Y. Ren, L. Sun, and H. Xiong, “Effective and
plexity of spatiotemporal features. The probabilistic graphical real-time in-app activity analysis in encrypted Internet traffic streams,”
model, which is scalable to model multiple factors and can in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., 2017,
pp. 335–344.
apply different distribution functions to capture various activity [7] Y. Chen, X. Jin, J. Sun, R. Zhang, and Y. Zhang, “POWERFUL: Mobile
patterns, is one possible solution to this problem. app fingerprinting via power analysis,” in Proc. IEEE INFOCOM Conf.
5) Location-Based Service and Urban Computing: There Comput. Commun., 2017, pp. 1–9.
[8] A. J. Oliner, A. P. Iyer, I. Stoica, E. Lagerspetz, and S. Tarkoma, “Carat:
is a strong link between app usage behaviors and locations, Collaborative energy diagnosis for mobile devices,” in Proc. 11th ACM
according to numerous existing studies. However, the majority Conf. Embedded Netw. Sensor Syst., 2013, p. 10.
of them were only interested in using locations as contextual [9] F. Xu, Y. Li, H. Wang, P. Zhang, and D. Jin, “Understanding mobile
traffic patterns of large scale cellular towers in urban environment,”
features to improve app usage prediction and recommenda- IEEE/ACM Trans. Netw., vol. 25, no. 2, pp. 1147–1161, Apr. 2017.
tions. By leveraging app-location relationships, app usage data [10] M. Zeng et al., “Temporal–spatial mobile application usage under-
can be used in location-based services, such as urban com- standing and popularity prediction for edge caching,” IEEE Wireless
Commun., vol. 25, no. 3, pp. 36–42, Jun. 2018.
puting, in addition to app-oriented tasks. As we discussed in [11] S. Zhao et al., “User profiling from their use of smartphone appli-
Section II, the app usage dataset gathered from network oper- cations: A survey,” Pervasive Mobile Comput., vol. 59, Oct. 2019,
ators can cover the majority of mobile users in a given area, Art. no. 101052.
[12] A. De Masi and K. Wac, “You’re using this app for what? A MQOL
such as a city or state. The datasets usually include location living lab study,” in Proc. ACM Int. Joint Conf. Int. Symp. Pervasive
data derived from associated base stations. Such high-coverage Ubiquitous Comput. Wearable Comput., 2018, pp. 612–617.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 961

[13] B. Guo, Y. Ouyang, T. Guo, L. Cao, and Z. Yu, “Enhancing mobile app [37] V. F. Taylor, R. Spolaor, M. Conti, and I. Martinovic, “Robust smart-
user understanding and marketing with heterogeneous crowdsourced phone app identification via encrypted network traffic analysis,” IEEE
data: A review,” IEEE Access, vol. 7, pp. 68557–68571, 2019. Trans. Inf. Forensics Security, vol. 13, no. 1, pp. 63–78, Jan. 2018.
[14] W. Pan, N. Aharony, and A. Pentland, “Composite social network for [38] C. Xiang, Q. Chen, M. Xue, and H. Zhu, “APPCLASSIFIER:
predicting mobile apps installation,” in Proc. 25th AAAI Conf. Artif. Automated app inference on encrypted traffic via meta data analysis,”
Intell., 2011, pp. 821–827. in Proc. IEEE Global Commun. Conf. (GLOBECOM), 2018, pp. 1–7.
[15] D. Li, W. Li, X. Wang, C.-T. Nguyen, and S. Lu, “App trajectory recog- [39] J. K. Laurila et al., “The mobile data challenge: Big data for mobile
nition over encrypted Internet traffic based on deep neural network,” computing research,” in Proc. Pervasive Computing, 2012, pp. 1–8.
Comput. Netw., vol. 179, Oct. 2020, Art. no. 107372. [40] C. Shepard, A. Rahmati, C. Tossell, L. Zhong, and P. Kortum,
[16] X. Lu et al., “Mining usage data from large-scale Android users: “LiveLab: Measuring wireless networks and smartphone users in
Challenges and opportunities,” in Proc. IEEE/ACM Int. Conf. Mobile the field,” ACM SIGMETRICS Perform. Eval. Rev., vol. 38, no. 3,
Softw. Eng. Syst. (MOBILESoft), 2016, pp. 301–302. pp. 15–20, 2011.
[17] H. Cao and M. Lin, “Mining smartphone data for app usage prediction [41] C. Stachl et al., “Predicting personality from patterns of behavior col-
and recommendations: A survey,” Pervasive Mobile Comput., vol. 37, lected with smartphones,” Proc. Nat. Acad. Sci. USA, vol. 117, no. 30,
pp. 1–22, Jun. 2017. pp. 17680–17687, 2020.
[18] F. Kreuter, G.-C. Haas, F. Keusch, S. Bähr, and M. Trappmann, [42] H. Wang, F. Xu, Y. Li, P. Zhang, and D. Jin, “Understanding mobile
“Collecting survey and smartphone sensor data with an app: traffic patterns of large scale cellular towers in urban environment,” in
Opportunities and challenges around privacy and informed consent,” Proc. Internet Meas. Conf., 2015, pp. 225–238.
Soc. Sci. Comput. Rev., vol. 38, no. 5, pp. 533–549, 2020. [43] X. Liu et al., “Understanding diverse usage patterns from large-scale
[19] S. L. Lim, P. J. Bentley, N. Kanakam, F. Ishikawa, and S. Honiden, appstore-service profiles,” IEEE Trans. Softw. Eng., vol. 44, no. 4,
“Investigating country differences in mobile app user behavior and pp. 384–411, Apr. 2018.
challenges for software engineering,” IEEE Trans. Softw. Eng., vol. 41, [44] F. Lin, H. Wang, L. Wang, and X. Liu, “A longitudinal study
no. 1, pp. 40–64, Jan. 2015. of removed apps in IoS app store,” in Proc. Web Conf., 2021,
[20] G. R. Jesse, “Smartphone and app usage among college students: Using pp. 1435–1446.
smartphones effectively for social and educational needs,” in Proc. [45] S. Greengard, “Weighing the impact of GDPR,” Commun. ACM,
EDSIG Conf., 2015, p. 3424. vol. 61, no. 11, pp. 16–18, Oct. 2018.
[21] A. Hiniker, S. N. Patel, T. Kohno, and J. A. Kientz, “Why would [46] N. Momen, M. Hatamian, and L. Fritsch, “Did app privacy improve
you do that? Predicting the uses and gratifications behind smartphone- after the GDPR?” IEEE Security Privacy, vol. 17, no. 6, pp. 10–20,
usage behaviors,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Nov./Dec. 2019.
Comput., 2016, pp. 634–645. [47] J. L. Kröger, J. Lindemann, and D. Herrmann, “How do app vendors
[22] R. Wang et al., “StudentLife: Assessing mental health, academic respond to subject access requests? A longitudinal privacy study on IoS
performance and behavioral trends of college students using smart- and Android apps,” in Proc. 15th Int. Conf. Availability Rel. Security,
phones,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput., 2020, pp. 1–10.
2014, pp. 3–14. [48] J. M. Wing, “Data for good,” in Proc. 24th ACM SIGKDD Int. Conf.
[23] M. L. Gordon, L. Gatys, C. Guestrin, J. P. Bigham, A. Trister, and Knowl. Disc. Data Min., 2018, p. 4.
K. Patel, “App usage predicts cognitive ability in older adults,” in Proc. [49] J. Isaak and M. J. Hanna, “User data privacy: Facebook, cambridge
CHI Conf. Human Factors Comput. Syst., 2019, pp. 1–12. analytica, and privacy protection,” Computer, vol. 51, no. 8, pp. 56–59,
[24] M. Chen, S. Mao, and Y. Liu, “Big data: A survey,” Mobile Netw. 2018.
Appl., vol. 19, no. 2, pp. 171–209, 2014. [50] T. Petsas, A. Papadogiannakis, M. Polychronakis, E. P. Markatos, and
[25] I. Andone, K. Błaszkiewicz, M. Eibes, B. Trendafilov, C. Montag, and T. Karagiannis, “Rise of the planet of the apps: A systematic study of
A. Markowetz, “Menthal: A framework for mobile data collection and the mobile app ecosystem,” in Proc. Conf. Internet Meas. Conf., 2013,
analysis,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput. pp. 277–290.
Adjunct, 2016, pp. 624–629. [51] H. Mendelson and K. Moon, “Modeling success and engagement for
[26] B. Finley and T. Soikkeli, “Mobile device type substitution,” Proc. the app economy,” in Proc. World Wide Web Conf., 2018, pp. 569–578.
ACM Interact. Mobile Wearable Ubiquitous Technol., vol. 2, no. 1, [52] H. Wang, H. Li, and Y. Guo, “Understanding the evolution of mobile
pp. 1–20, 2018. app ecosystems: A longitudinal measurement study of google play,” in
[27] X. Su, D. Zhang, W. Li, and X. Wang, “AndroGenerator: An automated Proc. World Wide Web Conf., 2019, pp. 1988–1999.
and configurable Android app network traffic generation system,” [53] T. Petsas, A. Papadogiannakis, M. Polychronakis, E. P. Markatos, and
Security Commun. Netw., vol. 8, no. 18, pp. 4273–4288, 2015. T. Karagiannis, “Measurement, modeling, and analysis of the mobile
[28] A. Zuniga et al., “Tortoise or hare? Quantifying the effects of app ecosystem,” ACM Trans. Model. Perform. Eval. Comput. Syst.,
performance on mobile app retention,” in Proc. World Wide Web Conf., vol. 2, no. 2, pp. 1–33, 2017.
2019, pp. 2517–2528. [54] C. Jesdabodi and W. Maalej, “Understanding usage states on mobile
[29] M. Krichen, “Anomalies detection through smartphone sensors: A devices,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput.,
review,” IEEE Sensors J., vol. 21, no. 6, pp. 7207–7217, Mar. 2021. 2015, pp. 1221–1225.
[30] Q. Xu et al., “Automatic generation of mobile app signatures from traf- [55] W. Sun, S. Wu, and Z. Xue, “Clustering mobile apps based on design
fic observations,” in Proc. IEEE Conf. Comput. Commun. (INFOCOM), and manufacturing genre,” in Proc. IEEE 6th Int. Conf. Comput.
2015, pp. 1481–1489. Commun. (ICCC), 2020, pp. 1956–1960.
[31] J. Yang, Y. Qiao, X. Zhang, H. He, F. Liu, and G. Cheng, [56] H. Zhu, H. Cao, E. Chen, H. Xiong, and J. Tian, “Exploiting enriched
“Characterizing user behavior in mobile Internet,” IEEE Trans. Emerg. contextual information for mobile app classification,” in Proc. 21st
Topics Comput., vol. 3, no. 1, pp. 95–106, Mar. 2015. ACM Int. Conf. Inf. Knowl. Manag., 2012, pp. 1617–1621.
[32] Q. Xu, J. Erman, A. Gerber, Z. M. Mao, J. Pang, and S. Venkataraman, [57] S. Xue, L. Zhang, A. Li, X.-Y. Li, C. Ruan, and W. Huang, “AppDNA:
“Identifying diverse usage behaviors of smartphone apps,” in Proc. App behavior profiling via graph-based deep learning,” in Proc. IEEE
ACM SIGCOMM Conf. Internet Meas. Conf., 2011, pp. 329–344. INFOCOM Conf. Comput. Commun., 2018, pp. 1475–1483.
[33] S. Zhao et al., “Discovering different kinds of smartphone users [58] M. Shamsujjoha, J. Grundy, L. Li, H. Khalajzadeh, and Q. Lu,
through their application usage behaviors,” in Proc. ACM Int. Joint “Checking app behavior against app descriptions: What if there are
Conf. Pervasive Ubiquitous Comput., 2016, pp. 498–509. no app descriptions?” in Proc. IEEE/ACM 29th Int. Conf. Program
[34] J. Huang, F. Xu, Y. Lin, and Y. Li, “On the understanding of interde- Comprehension (ICPC) (ICPC), May 2021, pp. 422–432.
pendency of mobile app usage,” in Proc. IEEE 14th Int. Conf. Mobile [59] A. A. Al-Subaihin et al., “Clustering mobile apps based on mined
Ad Hoc Sensor Syst. (MASS), 2017, pp. 471–475. textual features,” in Proc. 10th ACM/IEEE Int. Symp. Empirical Softw.
[35] H. Yao, G. Ranjan, A. Tongaonkar, Y. Liao, and Z. M. Mao, Eng. Meas., 2016, pp. 1–10.
“SAMPLES: Self adaptive mining of persistent lexical snippets for [60] K. Ochiai, F. Putri, and Y. Fukazawa, “Local app classification using
classifying mobile application traffic,” in Proc. 21st Annu. Int. Conf. deep neural network based on mobile app market data,” in Proc. IEEE
Mobile Comput. Netw., 2015, pp. 439–451. Int. Conf. Pervasive Comput. Commun. PerCom, 2019, pp. 186–191.
[36] X. Wang, S. Chen, and J. Su, “Real network traffic collection and [61] A. Fuad and M. Al-Yahya, “Analysis and classification of mobile
deep learning for mobile app identification,” Wireless Commun. Mobile apps using topic modeling: A case study on google play arabic apps,”
Comput., vol. 2020, pp. 1–14, Feb. 2020. Complexity, vol. 2021, Feb. 2021, Art. no. 6677413.
962 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

[62] D. Surian, S. Seneviratne, A. Seneviratne, and S. Chawla, “App mis- [87] M. Böhmer, B. Hecht, J. Schöning, A. Krüger, and G. Bauer, “Falling
categorization detection: A case study on google play,” IEEE Trans. asleep with angry birds, facebook and kindle: A large scale study on
Knowl. Data Eng., vol. 29, no. 8, pp. 1591–1604, Aug. 2017. mobile application usage,” in Proc. 13th Int. Conf. Human Comput.
[63] H. Zhu, E. Chen, H. Xiong, H. Cao, and J. Tian, “Mobile app clas- Interact. Mobile Devices Services, 2011, pp. 47–56.
sification with enriched contextual information,” IEEE Trans. Mobile [88] E. Graells-Garrido, D. Caro, O. Miranda, R. Schifanella, and
Comput., vol. 13, no. 7, pp. 1550–1563, Jul. 2014. O. F. Peredo, “The WWW (and an H) of mobile application usage
[64] A. Berger, S. A. D. Pietra, and V. J. D. Pietra, “A maximum entropy in the city: The what, where, when, and how,” in Proc. Web Conf.,
approach to natural language processing,” Comput. Linguist., vol. 22, 2018, pp. 1221–1229.
no. 1, pp. 39–71, 1996. [89] T. Xia and Y. Li, “Revealing urban dynamics by learning online and
[65] V. Radosavljevic et al., “Smartphone app categorization for interest tar- offline behaviours together,” Proc. ACM Interact. Mobile Wearable
geting in advertising marketplace,” in Proc. 25th Int. Conf. Companion Ubiquitous Technol., vol. 3, no. 1, pp. 1–25, 2019.
World Wide Web, 2016, pp. 93–94. [90] Y. Ren, T. Xia, Y. Li, and X. Chen, “Predicting socio-economic levels
[66] Y. He, C. Wang, G. Xu, W. Lian, H. Xian, and W. Wang, “Privacy- of urban regions via offline and online indicators,” PLoS ONE, vol. 14,
preserving categorization of mobile applications based on large-scale no. 7, 2019, Art. no. e0219058.
usage data,” Inf. Sci., vol. 514, pp. 557–570, Apr. 2020. [91] S. Van Canneyt, M. Bron, A. Haines, and M. Lalmas, “Describing
[67] E. J. Keogh and C. A. Ratanamahatana, “Exact indexing of dynamic patterns and disruptions in large scale mobile app usage data,”
time warping,” Knowl. Inf. Syst., vol. 7, no. 3, pp. 358–386, 2005. in Proc. 26th Int. Conf. World Wide Web Companion, 2017,
[68] A. Morrison, X. Xiong, M. Higgs, M. Bell, and M. Chalmers, “A pp. 1579–1584.
large-scale study of iphone app launch behaviour,” in Proc. CHI Conf. [92] T. Li, M. Zhang, Y. Li, E. Lagerspetz, S. Tarkoma, and P. Hui, “The
Human Factors Comput. Syst., 2018, pp. 1–13. impact of COVID-19 on smartphone usage,” IEEE Internet Things J.,
[69] H. Li et al., “Characterizing smartphone usage patterns from millions vol. 8, no. 23, pp. 16723–16733, Dec. 2021.
of Android users,” in Proc. Internet Meas. Conf., 2015, pp. 459–472. [93] V. Srinivasan, S. Moghaddam, A. Mukherji, K. K. Rachuri, C. Xu, and
[70] H. Wang et al., “Beyond google play: A large-scale comparative study E. M. Tapia, “MobileMiner: Mining your frequent patterns on your
of Chinese Android app markets,” in Proc. Internet Meas. Conf., 2018, phone,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput.,
pp. 293–307. 2014, pp. 389–400.
[71] S. Sigg, E. Lagerspetz, E. Peltonen, P. Nurmi, and S. Tarkoma, [94] Y. Tian, K. Zhou, M. Lalmas, and D. Pelleg, “Identifying tasks from
“Exploiting usage to predict instantaneous app popularity: Trend fil- mobile app usage patterns,” in Proc. 43rd Int. ACM SIGIR Conf. Res.
ters and retention rates,” ACM Trans. Web, vol. 13, no. 2, pp. 1–25, Develop. Inf. Retrieval, 2020, pp. 2357–2366.
2019. [95] B. Friedrichs, L. D. Turner, and S. M. Allen, “Discovering types
[72] Y. Tian, K. Zhou, and D. Pelleg, “What and how long: Prediction of smartphone usage sessions from user-app interactions,” in Proc.
of mobile app engagement,” ACM Trans. Inf. Syst., vol. 40, no. 1, IEEE Int. Conf. Pervasive Comput. Commun. Workshops (PerCom
pp. 1–38, 2022. Workshops), 2021, pp. 459–464.
[96] A. Rahmati, C. Shepard, C. Tossell, L. Zhong, and P. Kortum, “Practical
[73] T. Li, M. Zhang, H. Cao, Y. Li, S. Tarkoma, and P. Hui, “What apps did
context awareness: Measuring and utilizing the context dependency
you use? Understanding the long-term evolution of mobile app usage,”
of mobile usage,” IEEE Trans. Mobile Comput., vol. 14, no. 9,
in Proc. Web Conf., 2020, pp. 66–76.
pp. 1932–1946, Sep. 2015.
[74] T. Li, Y. Fan, Y. Li, S. Tarkoma, and P. Hui, “Understanding the long-
[97] Y. Fan et al., “Understanding the long-term dynamics of mobile app
term evolution of mobile app usage,” IEEE Trans. Mobile Comput.,
usage context via graph embedding,” IEEE Trans. Knowl. Data Eng.,
early access, Jul. 21, 2021, doi: 10.1109/TMC.2021.3098664.
early access, Sep. 3, 2021, doi: 10.1109/TKDE.2021.3110141.
[75] S. Shen, X. Lu, Z. Hu, and X. Liu, “Towards release strategy
[98] Y. Liu, K. Yu, X. Wu, Y. Shi, and Y. Tan, “Association rules mining
optimization for apps in google play,” in Proc. 9th Asia–Pac. Symp.
analysis of app usage based on mobile traffic flow data,” in Proc. IEEE
Internetware, 2017, pp. 1–10.
3rd Int. Conf. Big Data Anal. (ICBDA), 2018, pp. 55–60.
[76] B. Fu, J. Lin, L. Li, C. Faloutsos, J. Hong, and N. Sadeh, “Why people [99] W.-R. Tseng and K.-W. Hsu, “Smartphone app usage log mining,” Int.
hate your app: Making sense of user feedback in a mobile app store,” J. Comput. Elect. Eng., vol. 6, no. 2, pp. 151–156, 2014.
in Proc. 19th ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., 2013, [100] D. Ferreira, J. Goncalves, V. Kostakos, L. Barkhuus, and A. K. Dey,
pp. 1276–1284. “Contextual experience sampling of mobile application micro-usage,”
[77] R. Potharaju, M. Rahman, and B. Carbunar, “A longitudinal study in Proc. 16th Int. Conf. Human–Comput. Interact. Mobile Devices
of google play,” IEEE Trans. Comput. Soc. Syst., vol. 4, no. 3, Services, 2014, pp. 91–100.
pp. 135–149, Sep. 2017. [101] A. Shema and D. E. Acuna, “Show me your app usage and I will tell
[78] X. Lu et al., “PRADA: Prioritizing Android devices for apps by mining who your close friends are: Predicting user’s context from simple cell-
large-scale usage data,” in Proc. IEEE/ACM 38th Int. Conf. Softw. Eng. phone activity,” in Proc. CHI Conf. Extended Abstracts Human Factors
(ICSE), 2016, pp. 3–13. Comput. Syst., 2017, pp. 2929–2935.
[79] Y. Wang, N. J. Yuan, Y. Sun, C. Qin, and X. Xie, “App download fore- [102] I. M. Kloumann, L. A. Adamic, J. M. Kleinberg, and S. Wu, “The
casting: An evolutionary hierarchical competition approach,” in Proc. lifecycles of apps in a social ecosystem,” in Proc. 24th Int. Conf. World
IJCAI, 2017, pp. 2978–2984. Wide Web, 2015, pp. 581–591.
[80] H. Zhu, C. Liu, Y. Ge, H. Xiong, and E. Chen, “Popularity modeling [103] D. G. Taylor, T. A. Voelker, and I. Pentina, Mobile Application
for mobile apps: A sequential approach,” IEEE Trans. Cybern., vol. 45, Adoption by Young Adults: A Social Network Perspective, WCOB
no. 7, pp. 1303–1314, Jul. 2015. Faculty, Jack Welch College Bus., Fairfield CT, USA, 2011.
[81] Y. Ouyang, B. Guo, T. Guo, L. Cao, and Z. Yu, “Modeling and fore- [104] T. Li, Y. Li, M. A. Hoque, T. Xia, S. Tarkoma, and P. Hui, “To
casting the popularity evolution of mobile apps: A multivariate hawkes what extent we repeat ourselves? Discovering daily activity patterns
process approach,” Proc. ACM Interact. Mobile Wearable Ubiquitous across mobile app usage,” IEEE Trans. Mobile Comput., vol. 21, no. 4,
Technol., vol. 2, no. 4, pp. 1–23, 2018. pp. 1492–1507, Apr. 2022.
[82] Y. Zhang, J. Liu, B. Guo, Z. Wang, Y. Liang, and Z. Yu, “App popular- [105] M. Z. Shafiq, L. Ji, A. X. Liu, J. Pang, and J. Wang, “Characterizing
ity prediction by incorporating time-varying hierarchical interactions,” geospatial dynamics of application usage in a 3G cellular data
IEEE Trans. Mobile Comput., vol. 21, no. 5, pp. 1566–1579, May 2022. network,” in Proc. IEEE INFOCOM, 2012, pp. 1341–1349.
[83] H. Li et al., “Voting with their feet: Inferring user preferences from [106] N. Ghahramani and C. Brakewood, “Trends in mobile transit
app management activities,” in Proc. 25th Int. Conf. World Wide Web, information utilization: An exploratory analysis of transit app in
2016, pp. 1351–1362. New York City,” J. Public Transp., vol. 19, no. 3, p. 9, 2016.
[84] P. Papapetrou and G. Roussos, “Social context discovery from temporal [107] V. Kostakos, D. Ferreira, J. Goncalves, and S. Hosio, “Modelling smart-
app use patterns,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous phone usage: A Markov state transition model,” in Proc. ACM Int. Joint
Comput. Adjunct Publication, 2014, pp. 397–402. Conf. Pervasive Ubiquitous Comput., 2016, pp. 486–497.
[85] A. Mehrotra et al., “Understanding the role of places and activities [108] T.-M.-T. Do and D. Gatica-Perez, “By their apps you shall understand
on mobile phone interaction and usage patterns,” Proc. ACM Interact. them: Mining large-scale patterns of mobile phone usage,” in Proc. 9th
Mobile Wearable Ubiquitous Technol., vol. 1, no. 3, pp. 1–22, 2017. Int. Conf. Mobile Ubiquitous Multimedia, 2010, pp. 1–10.
[86] T. M. T. Do, J. Blom, and D. Gatica-Perez, “Smartphone usage in the [109] F. A. Silva, A. C. S. A. Domingues, and T. R. B. Silva, “Discovering
wild: A large-scale analysis of applications and context,” in Proc. 13th mobile application usage patterns from a large-scale dataset,” ACM
Int. Conf. Multimodal Interfaces, 2011, pp. 353–360. Trans. Knowl. Disc. Data, vol. 12, no. 5, pp. 1–36, 2018.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 963

[110] J. Lee and D.-H. Shin, “Targeting potential active users for mobile [133] X.-X. Zhao, Y. Qiao, Z. Si, J. Yang, and A. Lindgren, “Prediction of
app install advertising: An exploratory study,” Int. J. Human–Comput. user app usage behavior from geo-spatial data,” in Proc. 3rd Int. ACM
Interact., vol. 32, no. 11, pp. 827–834, 2016. SIGMOD Workshop Manag. Min. Enriched Geo-Spatial Data, 2016,
[111] S. L. Jones, D. Ferreira, S. Hosio, J. Goncalves, and V. Kostakos, pp. 1–6.
“Revisitation analysis of smartphone app use,” in Proc. ACM Int. Joint [134] M. De Nadai, A. Cardoso, A. Lima, B. Lepri, and N. Oliver, “Strategies
Conf. Pervasive Ubiquitous Comput., 2015, pp. 1197–1208. and limitations in app usage and human mobility,” Sci. Rep., vol. 9,
[112] L. A. Leiva, M. Böhmer, S. Gehring, and A. Krüger, “Back to the no. 1, pp. 1–9, 2019.
app: The costs of mobile application interruptions,” in Proc. 14th [135] T. M. T. Do and D. Gatica-Perez, “Where and what: Using smartphones
Int. Conf. Human–Comput. Interact. Mobile Devices Services, 2012, to predict next locations and applications in daily life,” Pervasive
pp. 291–294. Mobile Comput., vol. 12, pp. 79–91, Jun. 2014.
[113] J. P. Rula, B. Jun, and F. E. Bustamante, “Mobile AD (D) estimating [136] Y. Xu et al., “Preference, context and communities: A multi-faceted
mobile app session times for better ads,” in Proc. 16th Int. Workshop approach to predicting smartphone app usage patterns,” in Proc. Int.
Mobile Comput. Syst. Appl., 2015, pp. 123–128. Symp. Wearable Comput., 2013, pp. 69–76.
[114] H. Cao, Z. Chen, F. Xu, Y. Li, and V. Kostakos, “Revisitation in urban [137] D. Han, J. Li, L. Yang, and Z. Zeng, “A recommender system to address
space vs. online: A comparison across pois, websites, and smartphone the cold start problem for app usage prediction,” Int. J. Mach. Learn.
apps,” Proc. ACM Interact. Mobile Wearable Ubiquitous Technol., Cybern., vol. 10, no. 9, pp. 2257–2268, 2019.
vol. 2, no. 4, pp. 1–24, 2018. [138] R. Jisha, J. Amrita, A. R. Vijay, and G. Indhu, “Mobile app recommen-
[115] Y. Tian, K. Zhou, M. Lalmas, Y. Liu, and D. Pelleg, “Cohort modeling dation system using machine learning classification,” in Proc. IEEE 4th
based app category usage prediction,” in Proc. 28th ACM Conf. User Int. Conf. Comput. Methodol. Commun. (ICCMC), 2020, pp. 940–943.
Model. Adapt. Pers., 2020, pp. 248–256. [139] B. Yan and G. Chen, “Appjoy: Personalized mobile application dis-
[116] Z. Tu, Y. Li, P. Hui, L. Su, and D. Jin, “Personalized mobile app covery,” in Proc. 9th Int. Conf. Mobile Syst. Appl. Services, 2011,
prediction by learning user’s interest from social media,” IEEE Trans. pp. 113–126.
Mobile Comput., vol. 19, no. 11, pp. 2670–2683, Nov. 2020. [140] K. Shi and K. Ali, “Getjar mobile application recommendations with
[117] D. Yu, Y. Li, F. Xu, P. Zhang, and V. Kostakos, “Smartphone app very sparse datasets,” in Proc. 18th ACM SIGKDD Int. Conf. Knowl.
usage prediction using points of interest,” Proc. ACM Interact. Mobile Disc. Data Min., 2012, pp. 204–212.
Wearable Ubiquitous Technol., vol. 1, no. 4, pp. 1–21, 2018. [141] T. Liang et al., “Mobile app recommendation via heterogeneous graph
[118] C. Shin, J.-H. Hong, and A. K. Dey, “Understanding and prediction neural network in edge computing,” Appl. Soft Comput., vol. 103,
of mobile application usage for smart phones,” in Proc. ACM Conf. May 2021, Art. no. 107162.
Ubiquitous Comput., 2012, pp. 173–182. [142] Y. Ouyang, B. Guo, X. Tang, X. He, J. Xiong, and Z. Yu, “Mobile
[119] N. Natarajan, D. Shin, and I. S. Dhillon, “Which app will you use app cross-domain recommendation with multi-graph neural network,”
next? Collaborative filtering with interactional context,” in Proc. 7th ACM Trans. Knowl. Disc. Data, vol. 15, no. 4, pp. 1–21, 2021.
ACM Conf. Recommender Syst., 2013, pp. 201–208. [143] Y. Xu, Y. Zhu, Y. Shen, and J. Yu, “Leveraging app usage contexts for
[120] X. Zou, W. Zhang, S. Li, and G. Pan, “Prophet: What app you wish to app recommendation: A neural approach,” World Wide Web, vol. 22,
use next,” in Proc. ACM Conf. Pervasive Ubiquitous Comput. Adjunct no. 6, pp. 2721–2745, 2019.
Publication, 2013, pp. 167–170. [144] M. Böhmer, L. Ganev, and A. Krüger, “AppFunnel: A framework
[121] R. Baeza-Yates, D. Jiang, F. Silvestri, and B. Harrison, “Predicting the for usage-centric evaluation of recommender systems that suggest
next app that you are going to use,” in Proc. 8th ACM Int. Conf. Web mobile applications,” in Proc. Int. Conf. Intell. User Interfaces, 2013,
Search Data Min., 2015, pp. 285–294. pp. 267–276.
[122] S. Xu, W. Li, X. Zhang, S. Gao, T. Zhan, and S. Lu, “Predicting [145] Z. Tu, B. Duan, Z. Wang, and X. Xu, “Bidirectional sensing of user
and recommending the next smartphone apps based on recurrent neu- preferences and application changes for dynamic mobile app recom-
ral network,” CCF Trans. Pervasive Comput. Interact., vol. 2, no. 4, mendations,” Neural Comput. Appl., vol. 33, no. 16, pp. 9791–9803,
pp. 314–328, 2020. 2021.
[123] Y. Lee, S. Cho, and J. Choi, “App usage prediction for dual display [146] C. Liu, J. Cao, and S. Feng, “Leveraging kernel-incorporated matrix
device via two-phase sequence modeling,” Pervasive Mobile Comput., factorization for app recommendation,” ACM Trans. Knowl. Disc. Data,
vol. 58, Aug. 2019, Art. no. 101025. vol. 13, no. 3, pp. 1–27, 2019.
[124] Y. Jiang, X. Du, and T. Jin, “Using combined network information [147] T. Liang et al., “A broad learning approach for context-aware mobile
to predict mobile application usage,” Physica A Stat. Mech. Appl., application recommendation,” in Proc. IEEE Int. Conf. Data Min.
vol. 515, pp. 430–439, Feb. 2019. (ICDM), 2017, pp. 955–960.
[125] H. Wang, Y. Li, M. Du, Z. Li, and D. Jin, “App2Vec: Context-aware [148] J. Lin, K. Sugiyama, M.-Y. Kan, and T.-S. Chua, “New and improved:
application usage prediction,” ACM Trans. Knowl. Disc. Data, vol. 15, Modeling versions to improve app recommendation,” in Proc. 37th Int.
no. 6, pp. 1–21, 2021. ACM SIGIR Conf. Res Develop. Inf. Retrieval, 2014, pp. 647–656.
[126] S. Zhao et al., “AppUsage2Vec: Modeling smartphone app usage for [149] D. Cao et al., “Version-sensitive mobile app recommendation,” Inf. Sci.,
prediction,” in Proc. IEEE 35th Int. Conf. Data Eng. (ICDE), 2019, vol. 381, pp. 161–175, Mar. 2017.
pp. 1322–1333. [150] S. Gao, L. Liu, Y. Liu, H. Liu, Y. Wang, and P. Liu, “App recom-
[127] A. Parate, M. Böhmer, D. Chu, D. Ganesan, and B. M. Marlin, mendation based on both quality and security,” J. Softw. Evol. Process,
“Practical prediction and prefetch for faster access to applications on vol. 33, no. 3, 2021, Art. no. e2325.
mobile phones,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous [151] T. Rocha, E. Souto, and K. El-Khatib, “Functionality-based mobile
Comput., 2013, pp. 275–284. application recommendation system with security and privacy aware-
[128] H. Wang et al., “Modeling spatio-temporal app usage for a large ness,” Comput. Security, vol. 97, Oct. 2020, Art. no. 101972.
user population,” Proc. ACM Interact. Mobile Wearable Ubiquitous [152] H. Wang, Z. Tu, Y. Fu, Z. Wang, and X. Xu, “Time-aware user profiling
Technol., vol. 3, no. 1, pp. 1–23, 2019. from personal service ecosystem,” Neural Comput. Appl., vol. 33, no. 8,
[129] T. Xia et al., “DeepApp: Predicting personalized smartphone app pp. 3597–3619, 2021.
usage via context-aware multi-task learning,” ACM Trans. Intell. Syst. [153] H. Kwon, S. Lee, and D. Jeong, “User profiling via application usage
Technol., vol. 11, no. 6, pp. 1–12, 2020. pattern on digital devices for digital forensics,” Exp. Syst. Appl.,
[130] X. Chen, Y. Wang, J. He, S. Pan, Y. Li, and P. Zhang, “CAP: Context- vol. 168, Apr. 2021, Art. no. 114488.
aware app usage prediction with heterogeneous graph embedding,” [154] Y. Liang, Z. Cai, J. Yu, Q. Han, and Y. Li, “Deep learning based infer-
Proc. ACM Interact. Mobile Wearable Ubiquitous Technol., vol. 3, ence of private information using embedded sensors in smart devices,”
no. 1, pp. 1–25, 2019. IEEE Netw., vol. 32, no. 4, pp. 8–14, Jul./Aug. 2018.
[131] Y. Yu, T. Xia, H. Wang, J. Feng, and Y. Li, “Semantic-aware spatio- [155] R. Razavi, “Personality segmentation of users through mining their
temporal app usage representation via graph convolutional network,” mobile usage patterns,” Int. J. Human–Comput. Stud., vol. 143,
Proc. ACM Interact. Mobile Wearable Ubiquitous Technol., vol. 4, Nov. 2020, Art. no. 102470.
no. 3, pp. 1–24, 2020. [156] I. Andone, K. Błaszkiewicz, M. Eibes, B. Trendafilov, C. Montag, and
[132] Y. Zhou, S. Li, and Y. Liu, “Graph-based method for app usage A. Markowetz, “How age and gender affect smartphone usage,” in
prediction with attributed heterogeneous network embedding,” Future Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput. Adjunct,
Internet, vol. 12, no. 3, p. 58, 2020. 2016, pp. 9–12.
964 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

[157] S. Zhao et al., “Investigating smartphone user differences in their appli- [180] G. Chittaranjan, J. Blom, and D. Gatica-Perez, “Mining large-scale
cation usage behaviors: An empirical study,” CCF Trans. Pervasive smartphone data for personality studies,” Pers. Ubiquitous Comput.,
Comput. Interact., vol. 1, no. 2, pp. 140–161, 2019. vol. 17, no. 3, pp. 433–450, 2013.
[158] Z. Tu et al., “Demographics of mobile app usage: Long-term analysis [181] N. K. Kambham, K. G. Stanley, and S. Bell, “Predicting personality
of mobile app usage,” CCF Trans. Pervasive Comput. Interact., vol. 3, traits using smartphone sensor data and app usage data,” in Proc. IEEE
pp. 235–252, Apr. 2021. 9th Annu. Inf. Technol. Electron. Mobile Commun. Conf. (IEMCON),
[159] E. Peltonen et al., “The hidden image of mobile apps: Geographic, 2018, pp. 125–132.
demographic, and cultural factors in mobile usage,” in Proc. 20th [182] S. Gao, W. Li, L. J. Song, X. Zhang, M. Lin, and S. Lu,
Int. Conf. Human–Comput. Interact. Mobile Devices Services, 2018, “PersonalitySensing: A multi-view multi-task learning approach for
pp. 1–12. personality detection based on smartphone usage,” in Proc. 28th ACM
[160] E. Guzman, L. Oliveira, Y. Steiner, L. C. Wagner, and M. Glinz, Int. Conf. Multimedia, 2020, pp. 2862–2870.
“User feedback in the app store: A cross-cultural study,” in Proc. [183] K. Ochiai, N. Yamamoto, T. Hamatani, Y. Fukazawa, and
IEEE/ACM 40th Int. Conf. Softw. Eng. Softw. Eng. Soc. (ICSE-SEIS), T. Yamaguchi, “Exploiting graph convolutional networks for represen-
2018, pp. 13–22. tation learning of mobile app usage,” in Proc. IEEE Int. Conf. Big Data
[161] F. Huseynov, “Understanding usage behavior of different mobile (Big Data), 2019, pp. 5379–5383.
application categories based on personality traits,” Interact. Comput., [184] S. Zhao et al., “Mining user attributes using large-scale app lists of
vol. 32, no. 1, pp. 66–80, 2020. smartphones,” IEEE Syst. J., vol. 11, no. 1, pp. 315–323, Mar. 2017.
[162] F. Beierle et al., “Frequency and duration of daily smartphone usage in [185] H. Falaki, R. Mahajan, S. Kandula, D. Lymberopoulos, R. Govindan,
relation to personality traits,” Digit. Psychol., vol. 1, no. 1, pp. 20–28, and D. Estrin, “Diversity in smartphone usage,” in Proc. 8th Int. Conf.
2020. Mobile Syst. Appl. Services, 2010, pp. 179–194.
[163] Y. Gao, A. Li, T. Zhu, X. Liu, and X. Liu, “How smartphone usage [186] P. Welke, I. Andone, K. Blaszkiewicz, and A. Markowetz,
correlates with social anxiety and loneliness,” PeerJ, vol. 4, Jul. 2016, “Differentiating smartphone users by app usage,” in Proc. ACM Int.
Art. no. e2197. Joint Conf. Pervasive Ubiquitous Comput., 2016, pp. 519–523.
[164] K. Katevas, I. Arapakis, and M. Pielot, “Typical phone use habits: [187] Z. Tu et al., “Your apps give you away: Distinguishing mobile users
Intense use does not predict negative well-being,” in Proc. 20th by their app usage fingerprints,” Proc. ACM Interact. Mobile Wearable
Int. Conf. Human–Comput. Interact. Mobile Devices Services, 2018, Ubiquitous Technol., vol. 2, no. 3, pp. 1–23, 2018.
pp. 1–13. [188] V. Sekara, L. Alessandretti, E. Mones, and H. Jonsson, “Temporal and
[165] Z. Sarsenbayeva et al., “Does smartphone use drive our emotions or cultural limits of privacy in smartphone app usage,” Sci. Rep., vol. 11,
vice versa? A causal analysis,” in Proc. CHI Conf. Human Factors no. 1, pp. 1–9, 2021.
Comput. Syst., 2020, pp. 1–15. [189] Y. Ashibani and Q. H. Mahmoud, “A multi-feature user authentica-
[166] S. Zhao et al., “Who are the smartphone users? Identifying user groups tion model based on mobile app interactions,” IEEE Access, vol. 8,
with apps usage behaviors,” Mobile Comput. Commun., vol. 21, no. 2, pp. 96322–96339, 2020.
pp. 31–34, 2017. [190] Y. Ashibani and Q. H. Mahmoud, “A behavior profiling model for
user authentication in iot networks based on app usage patterns,” in
[167] Y. Lee, I. Park, S. Cho, and J. Choi, “Smartphone user segmentation
Proc. IEEE 44th Annu. Conf. Ind. Electron. Soc. (IECON), 2018,
based on app usage sequence with neural networks,” Telemat. Informat.,
pp. 2841–2846.
vol. 35, no. 2, pp. 329–339, 2018.
[191] Y. Ashibani and Q. H. Mahmoud, “Design and evaluation of a user
[168] R. M. Frey, R. Xu, and A. Ilic, “Mobile app adoption in different
authentication model for iot networks based on app event patterns,”
life stages: An empirical analysis,” Pervasive Mobile Comput., vol. 40,
Clust. Comput., vol. 24, no. 2, pp. 837–850, 2021.
pp. 512–527, Sep. 2017.
[192] D. Bassu, M. Cochinwala, and A. Jain, “A new mobile biometric based
[169] X. Chen, Z. Zhu, M. Chen, and Y. Li, “Large-scale mobile fitness app
upon usage context,” in Proc. IEEE Int. Conf. Technol. Homeland
usage analysis for smart health,” IEEE Commun. Mag., vol. 56, no. 4,
Security (HST), 2013, pp. 441–446.
pp. 46–52, Apr. 2018.
[193] L. Fridman, S. Weber, R. Greenstadt, and M. Kam, “Active authen-
[170] S. Seneviratne, A. Seneviratne, P. Mohapatra, and A. Mahanti, “Your tication on mobile devices via stylometry, application usage, Web
installed apps reveal your gender and more!” ACM SIGMOBILE Mobile browsing, and GPS location,” IEEE Syst. J., vol. 11, no. 2, pp. 513–521,
Comput. Commun. Rev., vol. 18, no. 3, pp. 55–61, 2015. Jun. 2017.
[171] S. Seneviratne, A. Seneviratne, P. Mohapatra, and A. Mahanti, [194] U. Mahbub, J. Komulainen, D. Ferreira, and R. Chellappa, “Continuous
“Predicting user traits from a snapshot of apps installed on a smart- authentication of smartphones based on application usage,” IEEE Trans.
phone,” ACM SIGMOBILE Mobile Comput. Commun. Rev., vol. 18, Biometr. Behav. Identity Sci., vol. 1, no. 3, pp. 165–180, Jul. 2019.
no. 2, pp. 1–8, 2014. [195] R. Pereira et al., “GreenHub: A large-scale collaborative dataset to
[172] E. Malmi and I. Weber, “You are what apps you use: Demographic battery consumption analysis of Android devices,” Empirical Softw.
prediction based on user’s apps,” in Proc. 10th Int. AAAI Conf. Web Eng., vol. 26, no. 3, pp. 1–55, 2021.
Soc. Media, 2016, pp. 635–638. [196] S. Song, F. Wedyan, and Y. Jararweh, “Empirical evaluation of energy
[173] S. Zhao, G. Pan, J. Tao, Z. Luo, S. Li, and Z. Wu, “Understanding consumption for mobile applications,” in Proc. IEEE 12th Int. Conf.
smartphone users from installed app lists using Boolean matrix factor- Inf. Commun. Syst. (ICICS), 2021, pp. 352–357.
ization,” IEEE Trans. Cybern., vol. 52, no. 1, pp. 384–397, Jan. 2022. [197] X. Chen, N. Ding, A. Jindal, Y. C. Hu, M. Gupta, and R. Vannithamby,
[174] S. Zhao, F. Xu, Z. Luo, S. Li, and G. Pan, “Demographic attributes “Smartphone energy drain in the wild: Analysis and implications,”
prediction through app usage behaviors on smartphones,” in Proc. ACM ACM SIGMETRICS Perform. Eval. Rev., vol. 43, no. 1, pp. 151–164,
Int. Joint Conf. Int. Symp. Pervasive Ubiquitous Comput. Wearable 2015.
Comput., 2018, pp. 870–877. [198] X. Chen, A. Jindal, N. Ding, Y. C. Hu, M. Gupta, and R. Vannithamby,
[175] S. Bian et al., “A novel macro-micro fusion network for user rep- “Smartphone background activities in the wild: Origin, energy drain,
resentation learning on mobile apps,” in Proc. Web Conf., 2021, and optimization,” in Proc. ACM 21st Annu. Int. Conf. Mobile Comput.
pp. 3199–3209. Netw., 2015, pp. 40–52.
[176] S. Zhao et al., “Gender profiling from a single snapshot of apps [199] X. Li, Y. Yang, Y. Liu, J. P. Gallagher, and K. Wu, “Detecting and
installed on a smartphone: An empirical study,” IEEE Trans. Ind. diagnosing energy issues for mobile applications,” in Proc. 29th ACM
Informat., vol. 16, no. 2, pp. 1330–1342, Feb. 2020. SIGSOFT Int. Symp. Softw. Test. Anal., 2020, pp. 115–127.
[177] Z. Yu, E. Xu, H. Du, B. Guo, and L. Yao, “Inferring user profile [200] H. Z. Q. Rex, J. C. Chuen, and A. Herkersdorf, “Apps-usage driven
attributes from multidimensional mobile phone sensory data,” IEEE energy management for multicore mobile computing systems,” in Proc.
Internet Things J., vol. 6, no. 3, pp. 5152–5162, Jun. 2019. IEEE Int. Symp. Integr. Circuits (ISIC), 2014, pp. 472–475.
[178] R. Xu, R. M. Frey, E. Fleisch, and A. Ilic, “Understanding the impact [201] D. Li, S. Hao, J. Gui, and W. G. Halfond, “An empirical study of the
of personality traits on mobile app adoption–insights from a large-scale energy consumption of Android applications,” in Proc. IEEE Int. Conf.
field study,” Comput. Human Behav., vol. 62, pp. 244–256, Sep. 2016. Softw. Maintenance Evol., 2014, pp. 121–130.
[179] E. Peltonen, P. Sharmila, K. O. Asare, A. Visuri, E. Lagerspetz, and [202] Z. Jiang, R. Kuang, J. Gong, H. Yin, Y. Lyu, and X. Zhang, “What
D. Ferreira, “When phones get personal: Predicting big five personal- makes a great mobile app? A quantitative study using a new mobile
ity traits from application usage,” Pervasive Mobile Comput., vol. 69, crawler,” in Proc. IEEE Symp. Service Orient. Syst. Eng. (SOSE), 2018,
Nov. 2020, Art. no. 101269. pp. 222–227.
LI et al.: SMARTPHONE APP USAGE ANALYSIS: DATASETS, METHODS, AND APPLICATIONS 965

[203] Y. Li, J. Yang, and N. Ansari, “Cellular smartphone traffic and user [226] B. Huang, K. Zhang, P. Xie, M. Gong, E. P. Xing, and C. Glymour,
behavior analysis,” in Proc. IEEE Int. Conf. Commun. (ICC), 2014, “Specific and shared causal relation modeling and mechanism-
pp. 1326–1331. based clustering,” in Proc. Adv. Neural Inf. Process. Syst., 2019,
[204] H. Jin et al., “Why are they collecting my data? Inferring the pur- pp. 13510–13521.
poses of network traffic in mobile apps,” Proc. ACM Interact. Mobile [227] G. Jaume et al., “Quantifying explainers of graph neural networks
Wearable Ubiquitous Technol., vol. 2, no. 4, pp. 1–27, 2018. in computational pathology,” in Proc. IEEE/CVF Conf. Comput. Vis.
[205] E. A. Walelgne, A. S. Asrese, J. Manner, V. Bajpai, and J. Ott, Pattern Recognit., 2021, pp. 8106–8116.
“Clustering and predicting the data usage patterns of geographi- [228] M. Zhang, T. Li, Y. Li, and P. Hui, “Multi-view joint graph represen-
cally diverse mobile users,” Comput. Netw., vol. 187, Mar. 2021, tation learning for urban region embedding,” in Proc. 29th Int. Conf.
Art. no. 107737. Int. Joint Conf. Artif. Intell., 2021, pp. 4431–4437.
[206] A. Okic, A. E. Redondi, I. Galimberti, F. Foglia, and L. Venturini,
“Analyzing different mobile applications in time and space: A city-
wide scenario,” in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC),
2019, pp. 1–6.
[207] Y. Ouyang and T. Yan, “Profiling wireless resource usage for mobile
apps via crowdsourcing-based network analytics,” IEEE Internet Things
J., vol. 2, no. 5, pp. 391–398, Oct. 2015. Tong Li (Member, IEEE) received the B.S. and
[208] G. Aceto, G. Bovenzi, D. Ciuonzo, A. Montieri, V. Persico, and M.S. degrees in communication engineering from
A. Pescapé, “Characterization and prediction of mobile-app traffic Hunan University, China, in 2014 and 2017. He
using Markov modeling,” IEEE Trans. Netw. Service Manag., vol. 18, is currently pursuing the Ph.D. degree with the
no. 1, pp. 907–925, Mar. 2021. Hong Kong University of Science and Technology
and University of Helsinki. His research interests
[209] J. Yin et al., “Mobile app usage patterns aware smart data pricing,”
include data mining and machine learning, espe-
IEEE J. Sel. Areas Commun., vol. 38, no. 4, pp. 645–654, Apr. 2020.
cially with applications to mobile big data and urban
[210] D. Li, S. Hao, W. G. Halfond, and R. Govindan, “Calculating source computing.
line level energy information for Android applications,” in Proc. ACM
Int. Symp. Softw. Test Anal., 2013, pp. 78–89.
[211] S. Rezaei, B. Kroencke, and X. Liu, “Large-scale mobile app iden-
tification using deep learning,” IEEE Access, vol. 8, pp. 348–362,
2019.
[212] Cisco. “Cisco Annual Internet Report (2018–2023) White Paper.”
2020. [Online]. Available: https://www.cisco.com/c/en/us/solutions/
collateral/service-provider/visual-networking-index-vni/white-paper- Tong Xia received the B.S. degree in electrical engi-
c11-741490.html neering from the School of Electrical Information,
[213] H. Falaki, D. Lymberopoulos, R. Mahajan, S. Kandula, and D. Estrin, Wuhan University, Wuhan, China, in 2017, and the
“A first look at traffic on smartphones,” in Proc. 10th ACM SIGCOMM M.S. degree in electronic engineering from Tsinghua
Conf. Internet Meas., 2010, pp. 281–287. University, Beijing, China, in 2020. She is currently
[214] D. Peloquin, M. DiMaio, B. Bierer, and M. Barnes, “Disruptive and pursuing the Ph.D. degree in machine learning with
avoidable: GDPR challenges to secondary research uses of data,” Eur. the Department of Computer Science, University of
J. Human Genet., vol. 28, pp. 697–705, Mar. 2020. Cambridge, Cambridge, U.K. Her research interests
[215] M. de Reuver, H. Bouwman, N. Heerschap, and H. Verkasalo, include user behavior modeling, ubiquitous comput-
“Smartphone measurement: Do people use mobile applications as they ing, and artificial intelligence for healthcare.
say they do?” in Proc. ICMB, 2012, p. 2.
[216] N. Kiukkonen, J. Blom, O. Dousse, D. Gatica-Perez, and J. Laurila,
“Towards rich mobile phone datasets: Lausanne data collection cam-
paign,” in Proc. ICPS, vol. 68, 2010, pp. 1–7.
[217] A. Rahmati, C. Tossell, C. Shepard, P. Kortum, and L. Zhong,
“Exploring iPhone usage: The influence of socioeconomic differences
on smartphone adoption, usage and usability,” in Proc. 14th Int. Conf.
Human–Comput. Interact. Mobile Devices Services (MobileHCI), 2012, Huandong Wang (Member, IEEE) received the first
pp. 11–20. B.S. degree in electronic engineering, the second
B.S. degree in mathematical sciences, and the Ph.D.
[218] T. Yan, D. Chu, D. Ganesan, A. Kansal, and J. Liu, “Fast app launching
degree in electronic engineering from Tsinghua
for mobile devices using predictive user context,” in Proc. 10th Int.
University, Beijing, China, in 2014, 2015, and 2019,
Conf. Mobile Syst. Appl. Services, 2012, pp. 113–126.
respectively, where he is currently a Postdoctoral
[219] Y. Dong, N. V. Chawla, and A. Swami, “MetaPath2Vec: Scalable rep- Research Fellow with the Department of Electronic
resentation learning for heterogeneous networks,” in Proc. 23rd ACM Engineering. His research interests include urban
SIGKDD Int. Conf. Knowl. Disc. Data Min., 2017, pp. 135–144. computing, mobile big data mining, and machine
[220] X. Wang et al., “Heterogeneous graph attention network,” in Proc. learning.
World Wide Web Conf., 2019, pp. 2022–2032.
[221] V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha,
and G. Srivastava, “A survey on security and privacy of feder-
ated learning,” Future Gener. Comput. Syst., vol. 115, pp. 619–640,
Feb. 2021.
[222] W. Y. B. Lim et al., “Federated learning in mobile edge networks: A
comprehensive survey,” IEEE Commun. Surveys Tuts., vol. 22, no. 3,
pp. 2031–2063, 3rd Quart., 2020. Zhen Tu received the first B.S. degree in elec-
[223] D. Xu et al., “Edge intelligence: Empowering intelligence to the edge tronics and information engineering and the second
of network,” Proc. IEEE, vol. 109, no. 11, pp. 1778–1837, Nov. 2021. B.S. degree in economics from Wuhan University,
[224] D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, J. Li, and Wuhan, China, in 2016. She is currently pursuing
H. Vincent Poor, “Federated learning for Internet of Things: A com- the master’s degree with the Electronic Engineering
prehensive survey,” IEEE Commun. Surveys Tuts., vol. 23, no. 3, Department, Tsinghua University, Beijing, China.
pp. 1622–1658, 3rd Quart., 2021. Her research interests include mobile big data min-
[225] L. Yao, S. Li, Y. Li, M. Huai, J. Gao, and A. Zhang, “Representation ing, user behavior modeling, data privacy, and
learning for treatment effect estimation from observational data,” in security.
Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 2633–2643.
966 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 24, NO. 2, SECOND QUARTER 2022

Sasu Tarkoma (Senior Member, IEEE) received the Pan Hui (Fellow, IEEE) received the bachelor’s and
M.Sc. and Ph.D. degrees in computer science from M.Phil. degrees from the University of Hong Kong
the Department of Computer Science, University of and the Ph.D. degree from the Computer Laboratory,
Helsinki, where he is currently the Dean of the University of Cambridge. He is a Professor of
Faculty of Science and a Professor of Computer Computational Media and Arts with the Hong Kong
Science. He is also affiliated with the Helsinki University of Science and Technology and the Nokia
Institute for Information Technology HIIT. He has Chair Professor of Data Science with the University
authored four textbooks and has published over 240 of Helsinki. He was a Distinguished Scientist
scientific articles. He has ten granted U.S. patents. for Telekom Innovation Laboratories Germany and
His research has received a number of best paper an Adjunct Professor of Social Computing and
awards and mentions, such as at IEEE PerCom, Networking with Aalto University. His industrial
ACM CCR, and ACM OSR. profile also includes his research with Intel Research Cambridge and Thomson
Research Paris. He has published more than 400 research papers and with
over 22,500 citations. He has 30 granted and filed European and U.S. patents
in the areas of augmented reality, data science, and mobile computing.
He has been serving on the organizing and technical program commit-
Zhu Han (Fellow, IEEE) received the B.S. degree tee of numerous top international conferences, including ACM SIGCOMM,
in electronic engineering from Tsinghua University MobiCom, MobiSys, IEEE Infocom, ICNP, IJCAI, AAAI, ICWSM, and
in 1997, and the M.S. and Ph.D. degrees in electri- WWW. He served as an Associate Editor for IEEE T RANSACTIONS ON
cal and computer engineering from the University M OBILE C OMPUTING and IEEE T RANSACTIONS ON C LOUD C OMPUTING.
of Maryland, College Park, in 1999 and 2003, He is an International Fellow of the Royal Academy of Engineering, an ACM
respectively. Distinguished Scientist, and a member of the Academia Europaea.
From 2000 to 2002, he was an R&D Engineer
with JDSU, Germantown, MD, USA. From 2003
to 2006, he was a Research Associate with the
University of Maryland. From 2006 to 2008, he was
an Assistant Professor with Boise State University,
Idaho. He is currently a John and Rebecca Moores Professor with the
Electrical and Computer Engineering Department as well as with the
Computer Science Department, University of Houston, Houston, TX, USA.
His research interests include wireless resource allocation and management,
wireless communications and networking, game theory, big data analysis,
security, and smart grid. He received the NSF Career Award in 2010, the
Fred W. Ellersick Prize of the IEEE Communication Society in 2011, the
EURASIP Best Paper Award for the Journal on Advances in Signal Processing
in 2015, IEEE Leonard G. Abraham Prize in the Field of Communications
Systems (Best Paper Award in IEEE JSAC) in 2016, and several best paper
awards in IEEE conferences. He was an IEEE Communications Society
Distinguished Lecturer from 2015 to 2018, an AAAS Fellow since 2019,
and an ACM distinguished Member since 2019. He is a 1% highly cited
researcher since 2017 according to Web of Science. He is also the Winner
of the 2021 IEEE Kiyo Tomiyasu Award, for outstanding early to mid-career
contributions to technologies holding the promise of innovative applications,
with the following citation: “for contributions to game theory and distributed
management of autonomous communication networks.”

Behavior Analysis Based On Google Play ST
No ratings yet
Behavior Analysis Based On Google Play ST
11 pages
Building Mobile Apps at Scale: 39 Engineering Challenges
From Everand
Building Mobile Apps at Scale: 39 Engineering Challenges
Gergely Orosz
5/5 (2)
Software Fails Watch
No ratings yet
Software Fails Watch
37 pages
Learning Software Engineering
From Everand
Learning Software Engineering
IT Campus Academy
No ratings yet
The Ultimate SDK Guide For Mobile Apps 2019
No ratings yet
The Ultimate SDK Guide For Mobile Apps 2019
33 pages
Enterprise Mobility Strategy & Solutions
From Everand
Enterprise Mobility Strategy & Solutions
Rakesh Patel
No ratings yet
Usability Design for Location Based Mobile Services: in Wireless Metropolitan Networks
From Everand
Usability Design for Location Based Mobile Services: in Wireless Metropolitan Networks
Etienne Samii
No ratings yet
Mining-smartphone-data2017
No ratings yet
Mining-smartphone-data2017
61 pages
Smartphone Communication and Society
From Everand
Smartphone Communication and Society
Baalaaditya Mishra
No ratings yet
The Foundations of Mobile Development: Theory and Practice
From Everand
The Foundations of Mobile Development: Theory and Practice
Kameron Hussain
No ratings yet
Prediction of Application Usage On Smartphones Via Deep Learning
No ratings yet
Prediction of Application Usage On Smartphones Via Deep Learning
9 pages
Wu Et Al 2022 How is Mobile User Behavior Different a Hidden Markov Model of Cross Mobile Application Usage Dynamics
No ratings yet
Wu Et Al 2022 How is Mobile User Behavior Different a Hidden Markov Model of Cross Mobile Application Usage Dynamics
21 pages
Mobile App Development: A Beginner's Guide to Building Your First Mobile Application
From Everand
Mobile App Development: A Beginner's Guide to Building Your First Mobile Application
Jordan Mitchell
No ratings yet
Mobile App Analytics
100% (1)
Mobile App Analytics
58 pages
App Success Secrets: Unlocking Wealth For Non-Programmers
From Everand
App Success Secrets: Unlocking Wealth For Non-Programmers
Maxwell Gold
No ratings yet
TechnicalSeminar Doc Format
No ratings yet
TechnicalSeminar Doc Format
47 pages
Building Mobile Experiences
From Everand
Building Mobile Experiences
Frank Bentley
No ratings yet
Mastering Modern AI Tools
From Everand
Mastering Modern AI Tools
Jean Claude AI
No ratings yet
The Mobile Enterprise: A pragmatic vision
From Everand
The Mobile Enterprise: A pragmatic vision
Axel Beauduin
No ratings yet
004 Dataset
No ratings yet
004 Dataset
2 pages
A Study On Mobile Application
No ratings yet
A Study On Mobile Application
51 pages
Synopsis
No ratings yet
Synopsis
3 pages
The Future of Photo Editing
From Everand
The Future of Photo Editing
Ali Alsiad
No ratings yet
Mobile Agents in Networking and Distributed Computing
From Everand
Mobile Agents in Networking and Distributed Computing
Jiannong Cao
No ratings yet
App Innovator's Guide: Mobile App Development for Android and iOS: Building Cross-Platform Solutions
From Everand
App Innovator's Guide: Mobile App Development for Android and iOS: Building Cross-Platform Solutions
Lily Chang
No ratings yet
How to Be a Successful Software Project Manager
From Everand
How to Be a Successful Software Project Manager
Dr. Tuhin Chattopadhyay
No ratings yet
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
From Everand
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
Editor IJSMI
No ratings yet
App Ecosystems
From Everand
App Ecosystems
Mei Gates
No ratings yet
User Behavior - Phase 1
No ratings yet
User Behavior - Phase 1
6 pages
Fundamentals of Android App Development: Android Development for Beginners to Learn Android Technology, SQLite, Firebase and Unity
From Everand
Fundamentals of Android App Development: Android Development for Beginners to Learn Android Technology, SQLite, Firebase and Unity
Sujit Kumar Mishra
No ratings yet
Internet of Things & Wireless Sensor Network
From Everand
Internet of Things & Wireless Sensor Network
Ajit Singh
No ratings yet
Enterprise Mobile App Development & Testing: Challenges to Watch Out for In 2017
From Everand
Enterprise Mobile App Development & Testing: Challenges to Watch Out for In 2017
Mobile Labs
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Software Development Fundamentals
From Everand
Software Development Fundamentals
IntroBooks Team
No ratings yet
Lesson 1 The Basics of Application Development
No ratings yet
Lesson 1 The Basics of Application Development
6 pages
Changing Humanities and Smart Application of Digital Technologies
From Everand
Changing Humanities and Smart Application of Digital Technologies
PublishDrive
No ratings yet
Free Antivirus and its Market Implimentation: a Case Study of Qihoo 360 And Baidu
From Everand
Free Antivirus and its Market Implimentation: a Case Study of Qihoo 360 And Baidu
Yang Yiming
No ratings yet
Research Article: Performance Analysis of Spotify Model-Based Testing
No ratings yet
Research Article: Performance Analysis of Spotify Model-Based Testing
15 pages
MobileAppUsageBehaviour Report
No ratings yet
MobileAppUsageBehaviour Report
89 pages
Problem Statement
No ratings yet
Problem Statement
2 pages
Mobile Applications Ecosystem Final
No ratings yet
Mobile Applications Ecosystem Final
10 pages
Mobile Computing Textbook
From Everand
Mobile Computing Textbook
Manish Soni
No ratings yet
Industrial Automation: Learn the current and leading-edge research on SCADA security
From Everand
Industrial Automation: Learn the current and leading-edge research on SCADA security
Vikalp Joshi
No ratings yet
Wearable Technology Trends: Marketing Hype or True Customer Value?
From Everand
Wearable Technology Trends: Marketing Hype or True Customer Value?
Lawrence E. Wilson
No ratings yet
Agile App Development For Market Deman
No ratings yet
Agile App Development For Market Deman
13 pages
Integrated Information and Communication Technology Strategies for Competitive Higher Education in Asia and the Pacific
From Everand
Integrated Information and Communication Technology Strategies for Competitive Higher Education in Asia and the Pacific
Jouko Sarvi
No ratings yet
Mastering Mobile Learning
From Everand
Mastering Mobile Learning
Chad Udell
No ratings yet
Mobile Apps: Sagar Kulkarni, MBA, CPA, CMA, ITIL, CMP MBA3378A University of Ottawa
No ratings yet
Mobile Apps: Sagar Kulkarni, MBA, CPA, CMA, ITIL, CMP MBA3378A University of Ottawa
23 pages
Personalized App Service System Algorithm For Effective Classification of Mobile Applications
No ratings yet
Personalized App Service System Algorithm For Effective Classification of Mobile Applications
4 pages
Technology Industry Trends Report
From Everand
Technology Industry Trends Report
IntroBooks Team
No ratings yet
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
From Everand
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
Isrin Ismail
No ratings yet
How to Become A Successful Academic Writer: An Excellent Guide for Beginners
From Everand
How to Become A Successful Academic Writer: An Excellent Guide for Beginners
Shardul Khhanage
No ratings yet
graduation project-4
No ratings yet
graduation project-4
33 pages
Appsee SDK Guide - Appsee Ebook
No ratings yet
Appsee SDK Guide - Appsee Ebook
45 pages
Mobile Application and Useful Gadgets
No ratings yet
Mobile Application and Useful Gadgets
17 pages
How to Become A Successful Academic Writer: An Excellent Guide For Beginners
From Everand
How to Become A Successful Academic Writer: An Excellent Guide For Beginners
GAWS
No ratings yet
Google Play Store Apps-Data Analysis and Ratings Prediction
No ratings yet
Google Play Store Apps-Data Analysis and Ratings Prediction
10 pages
Microservices Architecture Handbook: Non-Programmer's Guide for Building Microservices
From Everand
Microservices Architecture Handbook: Non-Programmer's Guide for Building Microservices
Stephen Fleming
4/5 (5)
Thesis Appd
No ratings yet
Thesis Appd
4 pages
Methods, pilot platforms and recommendations for active mobility and sustainable lifestyle: SimpliCITY - Marketplace for user-centered sustainability services
From Everand
Methods, pilot platforms and recommendations for active mobility and sustainable lifestyle: SimpliCITY - Marketplace for user-centered sustainability services
BoD - Books on Demand
No ratings yet
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Water Distribution System
No ratings yet
Water Distribution System
9 pages
MAD Unit 3 UI Components and Layouts
No ratings yet
MAD Unit 3 UI Components and Layouts
24 pages
Fortinet ZTNA
No ratings yet
Fortinet ZTNA
29 pages
RECIPIE APP USING REACT
No ratings yet
RECIPIE APP USING REACT
24 pages
Registering For PING ID On A Desktop
No ratings yet
Registering For PING ID On A Desktop
6 pages
Night Owl XHD 3.0 Manual
No ratings yet
Night Owl XHD 3.0 Manual
87 pages
A Comparative Study Between Android Ios
No ratings yet
A Comparative Study Between Android Ios
7 pages
TruPulse 200X Laser Rangefinder Specifications PDF
No ratings yet
TruPulse 200X Laser Rangefinder Specifications PDF
2 pages
SRS SE Project
No ratings yet
SRS SE Project
13 pages
Catalogo de Productos
No ratings yet
Catalogo de Productos
20 pages
CHAPTER 3-WPS Office
No ratings yet
CHAPTER 3-WPS Office
16 pages
Ocr 21st Century Science Homework Booklet Answers
100% (1)
Ocr 21st Century Science Homework Booklet Answers
8 pages
Project Appraisal Final Report
No ratings yet
Project Appraisal Final Report
34 pages
Chapter 8 - Deployment
No ratings yet
Chapter 8 - Deployment
30 pages
IManager NetEco 1000S V100R003C00 User Manual
100% (1)
IManager NetEco 1000S V100R003C00 User Manual
263 pages
Company Profile PDF
No ratings yet
Company Profile PDF
16 pages
Cymbeline SlidesCarnival
No ratings yet
Cymbeline SlidesCarnival
27 pages
SAP Store Partner Guide - 082023
No ratings yet
SAP Store Partner Guide - 082023
11 pages
The Guide To The Future of Medicine White Paper
100% (5)
The Guide To The Future of Medicine White Paper
25 pages
Mini Project Python
No ratings yet
Mini Project Python
4 pages
A Report ON "Digitalization and Acquisitions Of: HDFC Bank"
No ratings yet
A Report ON "Digitalization and Acquisitions Of: HDFC Bank"
40 pages
Untitled Design
No ratings yet
Untitled Design
27 pages
Instagram Video Downloader Story Saver
No ratings yet
Instagram Video Downloader Story Saver
4 pages
c448hd c450hd Ip Phone For Microsoft Teams User S and Administrator S Manual Ver 1 17
No ratings yet
c448hd c450hd Ip Phone For Microsoft Teams User S and Administrator S Manual Ver 1 17
130 pages
Eq Group Lite Paper
No ratings yet
Eq Group Lite Paper
9 pages
Inter CA EIS - Chapter 4
No ratings yet
Inter CA EIS - Chapter 4
77 pages
University of Cagliari: Blynk Platform
No ratings yet
University of Cagliari: Blynk Platform
34 pages
A.Datum Case Study
No ratings yet
A.Datum Case Study
23 pages
Quotation For Mobile Apps - Orig Proposal NG Optimind
No ratings yet
Quotation For Mobile Apps - Orig Proposal NG Optimind
6 pages

Smartphone_App_Usage_Analysis_Datasets_Methods_and_Applications_2_-1

Uploaded by

Smartphone_App_Usage_Analysis_Datasets_Methods_and_Applications_2_-1

Uploaded by

https://helda.helsinki.

Smartphone App Usage Analysis : Datasets, Methods, and Applications

Downloaded from Helda, University of Helsinki institutional repository.

Smartphone App Usage Analysis: Datasets,

Fig. 3. Structure of the survey article.

the number of apps they downloaded per month [19], [20].

Untrusted developers may manipulate app descriptions,

as installation and usage behaviors, to categorize apps more

B. App Usage Pattern Discovery

patterns of app sessions and discovered that app sessions dis-

embeddings for apps, locations, and time, to achieve an

separated into two groups: descriptive profiling and predictive

clustering. Each user group is assigned a meaningful label,

98% or an equal error rate of 3%. However, existing frame-

V. S MARTPHONE D OMAIN R ESEARCH

You might also like