IEEE Pervasive Computing
IEEE Pervasive Computing
www.computer.org/pervasive
IEEE COMPUTER SOCIETY D&I FUND
Drive Diversity
& Inclusion in
Computing
Supporting projects
and programs that
positively impact
diversity, equity, and
inclusion throughout
the computing DONATE TODAY!
community.
4
GUEST EDITORS’ INTRODUCTION
Pandemic Preparedness With
Pervasive Computing
Oliver Amft and Hassan Ghasemzadeh
OCTOBER–DECEMBER 2023
7 15 26
Theme Articles
Feature Articles
58 GreenCrowd: Toward a Holistic Algorithmic Crowd
Charging Framework
Theofanis P. Raptis and Luca Bedogni
©shutterstock.com/
DisobeyArt
www.computer.org/pervasive
ISSN: 1536-1268
www.computer.org/pervasive
Reuse Rights and Reprint Permissions: Educational or personal use of this material is permitted without fee, provided such use: 1) is not made
for profit; 2) includes this notice and a full citation to the original work on the first page of the copy; and 3) does not imply IEEE endorsement of any
third-party products or services. Authors and their companies are permitted to post the accepted version of their IEEE-copyrighted material on their
own web servers without permission, provided that the IEEE copyright notice and a full citation to the original work appear on the first screen of
the posted copy. An accepted manuscript is a version that has been revised by the author to incorporate review suggestions, but not the published
version with copyediting, proofreading, and formatting added by IEEE. For more information, please go to: www.ieee.org/publications_standards/
publications/rights/paperversionpolicy. html. Permission to reprint/republish this material for commercial, advertising, or promotional purposes or
for creating new collective works for resale or redistribution must be obtained from IEEE by writing to the IEEE Intellectual Property Rights Office, 445
Hoes Lane, Piscataway, NJ 08854-4141 or [email protected]. Copyright © 2023 IEEE. All rights reserved. Abstracting and Library Use:
Abstracting is permitted with credit to the source. Libraries are permitted to photocopy for private use of patrons, provided the per-copy fee indicated
in the code at the bottom of the first page is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. Circulation:
IEEE Pervasive Computing (ISSN 1536-1268) is published quarterly by the IEEE Computer Society. IEEE Headquarters, Three Park Ave., 17th Floor, New
York, NY 10016-5997; IEEE Computer Society Publications Office, 10662 Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1314, phone +1 714
821 8380; IEEE Computer Society Headquarters, 2001 L St., Ste. 700, Washington, DC 20036. Subscribe to IEEE Pervasive Computing by visiting www.
computer.org/pervasive. Postmaster: Send undelivered copies and address changes to IEEE Pervasive Computing, Membership Processing Dept.,
IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854-4141. Periodicals postage paid at New York, NY, and at additional mailing offices. Canadian
GST #125634188. Canada Post Publications Mail Agreement Number 40013885. Return undeliverable Canadian addresses to PO Box 122, Niagara Falls,
ON L2E 6S8. Printed in the USA. Editorial: Unless otherwise stated, bylined articles, as well as product and service descriptions, reflect the author’s
or firm’s opinion. Inclusion in IEEE Pervasive Computing does not necessarily constitute endorsement by the IEEE or the IEEE Computer Society.
All submissions are subject to editing for style, clarity, and length. IEEE prohibits discrimination, harassment and bullying: visit www.ieee.org/web/
aboutus/whatis/policies/p9-26.html.
www.computer.org/pervasive
GUEST EDITORS' INTRODUCTION
D
uring the COVID-19, pandemic pervasive TraceTogether. DCT could complement the compara-
computing has established new application bly slow, laborious, and error-prone manual tracing of
fields in epidemiology and crisis manage- an infected individual’s contact persons by passively
ment. Never before have we seen a similarly rapid monitoring nearby Bluetooth (BT) modules. After one
uptake of pervasive technology in public health tools, month, TraceTogether had already more than 600k
i.e., for infection monitoring and prevention manage- users. During 2021, TraceTogether was used by more
ment. Pervasive public health tools provide informa- than 90% of Singaporeans. Soon although, challenges
tion scaling from every individual using, e.g., a in privacy management and tracing accuracy ham-
smartphone or smartwatch, to the population in a pered adoption in many countries. For example, Ger-
region or country. But, could pervasive computing many’s CoronaWarnApp peaked somewhat above
help us to stay prepared in the future? 30% in active users, while analyses showed that DCT
On the global outbreak of SARS-CoV-2 in early indeed cut infection chains (https://www.coronawarn.
2020, a hasty search on methods to track infection app/de/science/2022-03-03-science-blog-5). Interest-
chains and monitor pandemic status began. Pervasive ingly, the challenges observed with DCT are all
tools or even technical blueprints of how to limit infec- grounded in pervasive and distributed computing. For
tions were not established. The pandemic manage- one, the received signal strength indication of classi-
ment expertise at the time came from previous viral cal BT modules is affected by multipath and absorp-
infection waves, including Ebola and H1N1. By March tion, among others. As a consequence, BT-based
2020, the concept of “digital proximity tracing” or “dig- contact tracing provides limited performance in esti-
ital contact tracing” (DCT) entered center -stage with mating BT modules in the 1–2m range, which is the
the first apps, e.g., the Singapore Goverment’s key range for viral load transfer. Another challenge
for DCT has been privacy management. While
anonymous identification logging of decentralized
tracing systems, e.g., the exposure notification, has
eliminated the risk of privacy loss, privacy concerns
1536-1268 ß 2023 IEEE
Digital Object Identifier 10.1109/MPRV.2023.3329556 certainly reflected in the low DCT app adoption rates
Date of current version 30 November 2023. seen world-wide.
4 IEEE Pervasive Computing Published by the IEEE Computer Society October-December 2023
GUEST EDITORS' INTRODUCTION
Crowd dynamics across relative and absolute of the typical pandemic preparedness measures are
locations have been intensively studied during the matches for pervasive technology:
COVID-19 pandemic to understand pandemic charac-
teristics. However, conventional data that describes 1) monitor infection dynamics and resources;
people movement, including airport traffic data, lack 2) prevent, identify, and control local outbreaks;
spatial granularity and real-time information. Instead, 3) safeguard regional or country-wide healthcare
location-based services (LBS) data collected from systems;
smartphone apps, e.g., Google Community Mobility 4) assess the individual’s health state and risk of
Reports (https://www.google.com/covid19/mobility), infection;
provide access to movement dynamics. Current LBS 5) define and monitor optimal treatment.
data are still coarsely representing mobility, e.g., the
radius of gyration, or provides summaries only, e.g., Notably, across all preparedness measures, there
number of visitors, at some specific points of interest is the need for interoperable data, pervasive comput-
(POIs). Nevertheless, the data can already increase ing tools, and data analysis methods. As the policy
modeling fidelity for specific locations and for specific and regulatory frameworks evolve, pervasive comput-
conditions, e.g., public services closures, social gather- ing tools become accepted as a trusted component.
ings at POIs, and holiday periods. Note however, that While the COVID-19 pandemic fades from the
LBS data showing increasing traffic in a region may news, the disaster it created must remain a marking
not imply increase in social contacts. Thus, activity point, in particular considering the risk of other viral
and behavior data will be a central element in future, infections. In the light of the open technical chal-
population-scale simulations, e.g., for epidemiological lenges, further pervasive computing research is a key
and disaster management. to fill into the opportunity created over the last years
Furthermore, symptom tracking with pervasive and eventually save more lives at global scale.
devices has made a fast-forward leap during the pan- In this special issue, we capture new technical
demic. Before 2019, reports on the connection of approaches to the pervasive tool inventory that help
unsupervised everyday activity and behavior with dis- dealing with a pandemic situation, but also investiga-
ease symptoms were mostly anecdotal. The prospect tions that highlight opportunities for further research.
to identify regional outbreaks and rapidly apply non- Nussbaummueller et al.A1 presents a method to
pharmaceutical interventions (e.g., public services clo- model the BT pathloss dependent on contextual infor-
sures), much before classical population-scale disease mation. The authors analyze distance estimation
reporting would show alarming trends, urged several errors related to smartphone carrying positions and
initiatives, including the Scripps Institute DETECT highlight their method’s potential to improve DCT
study and the Robert-Koch-Institute “Corona Datens- effectiveness. Cherini et al.A2 expands the DCT con-
pende.” The latter set a worldwide record with over cept by introducing an exchange of “deep contacts”
500k users, who donated their smartwatch data between BT nodes. Their investigation could open a
(https://doi.org/10.1109/MPRV.2020.3021321). The data research area on contact management, privacy, and
could deliver markers of influenza-like symptoms, in pervasive device resources for DCT. Moving on,
particular fever, based on the known connection Anno et al.A3 investigates crowd dynamics forecast-
between simultaneous increase in the resting heart ing. At a one week forecast horizon for 58 POI, their
rate (RHR) and body temperature. Yet, context knowl- approach compares favorably to several related crowd
edge, e.g., markers of physical activity, including step prediction methods. Crowd dynamics forecasts could
count, is essential to rule out alternative explanations help to estimate infection risk. Vargo et al.A4 focuses
of RHR increase. Thus, passively tracking health state on wearable health tracking. The authors report a one-
in the complex and dynamic context of everyday life year study using Oura Ring and detail data of con-
remains a research frontier, which can contribute to firmed COVID-19 cases. Finally, Perez et al.A5 address
pandemic preparedness. the pandemic-related app malware surge and propose
So, what is our level of preparedness? Over the a malware detection method. Based on performance
past years multiple initiatives have started out to sort, analysis results, the authors suggest that their mal-
rank, and organize approaches to deal with pandemic ware detector could be used to combat cyberattacks
situations. While scaling use cases for clinical work- on smartphones.
flows and patient pathways has naturally been at a We believe that the selected articles provide an
core of preparedness efforts, the need for pervasive overview across the bandwidth of opportunities to
tools is getting broadly recognized now. Most if not all maintain pandemic preparedness with pervasive
computing. Moreover, the articles collected in this OLIVER AMFT is a professor of computer science with
special issue provide detailed insight on ongoing the University of Freiburg, 79110, Freiburg, Germany, and an
development for key areas of pervasive health tools.
executive board member of Hahn-Schickard, Freiburg,
With this special issue, we aim to motivate contin-
Germany. His research interests include wearables, embed-
ued research and discussion of new ideas on perva-
ded AI, and biomedical applications of pervasive computing.
sive computing for public health tools that spurs
pandemic preparedness. He is a member of IEEE. He is the corresponding author of
this article. Contact him at [email protected].
W
hen fighting a contagious human-to- the distance of the phones and their owners. Yet,
human transmitted disease, social dis- the approach has a major shortcoming. BLE—with-
tancing and contact tracing are effec- out enhancements—does not provide sufficiently
tive measures to break infection chains.1 Infections accurate distance estimates.2,3
based on aerosol emission of a virus are traditionally The distance estimation problem relates to the
modeled under the assumption that being in close oversimplified use of the BLE pathloss when measur-
contact with an infectious person over a certain ing the Received Signal Strength Indicator (RSSI)
period of time allows considering this contact as value of BLE, which neglects major influential factors.
potentially contagious, having been too close for Our previous study showed that the BLE pathloss
too long (TC4TL). During the COVID-19 pandemic, spread is wide at a given distance.2 Thus, it is difficult
automated contact tracing has been attempted to derive an accurate distance based on a pathloss
worldwide by deploying mobile apps based on measurement only. This article contributes by propos-
TC4TL. The closeness is estimated by the use of ing a method to improve BLE-based distance estima-
Bluetooth Low Energy (BLE), a widely available, tion. First, we summarize the principles of BLE-based
energy-efficient, and privacy-preserving technology. distance estimation and the influence of context.
The intuition is that the measured BLE signal path- Then, indoor/outdoor environment and carry position
loss between smartphones allows us to estimate context classification is presented. Finally, a model is
developed that enhances standard pathloss-distance
models by an environment factor, which can be exper-
imentally derived for each context class. Results are
1536-1268 ß 2023 IEEE
Digital Object Identifier 10.1109/MPRV.2023.3323747
presented that demonstrate the improvements in dis-
Date of publication 2 November 2023; date of current version tance estimation accuracy when employing machine
30 November 2023. learning-based context detection.
October-December 2023 Published by the IEEE Computer Society IEEE Pervasive Computing 7
PANDEMIC PREPAREDNESS WITH PERVASIVE COMPUTING
Pathloss-Based Distance Calculation While it is known that the model parameters PL0;dB
The relation between signal attenuation and the dis- (RSSI0;dBm ) and g vary significantly in real deploy-
tance between a transmitter and a receiver in free ments, in conventional implementations they are fixed
space can be formalized. The basic pathloss PL is the to values fitted to the overall use-case. With varying
relation between transmitted and received power (iso- parameters, a refinement of the model dependent on
tropic antennas, without antenna gains): the context is possible.
PTX 4pd 2
PL ¼ ¼ (1) Distance Estimation and Context
PRX
The distance estimate is sensitive to changes of the
where PTX and PRX are the transmitted and signal propagation path, mostly due to body shield-
received signal power, respectively, d is the distance ing, which depends on the phone carry position and
between transmitter and receiver, and is the relative orientation, and the environment due to
wavelength. The expression of the ideal pathloss multipath propagation and signal attenuation. Fur-
allows us to analytically determine the distance by ther, the signal path is determined by the antenna
measuring the received power. In practice, the orientation.9 With our previous study we have
received power is often measured as the RSSI value, shown that knowing the situation context, i.e., the
which is a relative value that needs calibration to environment and carry position, can reduce the
device configurations. The pathloss between trans- RSSI measurement noise and thus the distance esti-
mitted and received power in dB representation can mation error.2 In real settings, both environment
thus be expressed as and carry position context are not known but need
to be detected, which adds inaccuracies as we will
PLdB ¼ PTX;dBm ðRSSIdBm þ DRX;dBm Þ (2) discuss in the following two sections.
FIGURE 3. Testbed consisting of two Samsung XCover 4s quality is observed for the carry positions in pocket
smartphones with external UWB modules. and phone call. As Figure 4 shows, many wrong classi-
fications happened between these two classes. Still,
with an accuracy of 82.6%, the RF-based classifier
devices. UWB distance measurements are used for complies to related studies on phone carry position
comparison with BLE. detection, reporting on an accuracy of 85% with three
To train the carry position classifiers, we collected carry position classes,12 and 74.6% with nine carry
sensor data labeled with the carry position, cf. Figure 2. position classes.13
The study participants carried the smartphone in each
of the carry positions for a few minutes. The context
dataset contains 160,000 randomly selected entries
ENHANCED DISTANCE
from these measurements (classes balanced).
ESTIMATION
In a second experiment, we collected the contact We enhance distance estimation by adjusting the
tracing dataset. This dataset contains the sensor data BLE RSSI value by an error correction term that
labeled with the carry position class, environment, depends on the context, i.e., the phone carry posi-
BLE RSSI values, UWB measures, and the ground truth tion and the indoor/outdoor environment. First, the
distance. One study participant remained stationary, approach is introduced, then experimental results
carry position hand-held, the second participant per- are presented.
formed all carry positions at varying distances ranging
from 0.5 to 3 m, in indoor and outdoor scenarios. Due Context Modeling
to the use case contact tracing, the ranges are limited To include context into distance estimation, we
to distances where an infection is likely (larger distan- extend the pathloss model of (3) by context correc-
ces and distance errors are studied in Etzlinger et al.2). tion factors. While conventionally only the pathloss
For each position, the receiver orientation was
changed in quarter turns. Measurements were
recorded for 3 min each. The contact tracing dataset
consists of 32,640 entries. It is used for the evaluation
of the carry position classifiers and the distance esti-
mation study.
TABLE 2. Distance estimation RMSE per carry position: BLE TABLE 3. Distance error in terms of mean absolute error
baseline with unknown context and UWB. (MAE), 75th quantile (Q75 ), and RMSE for different
combinations of carry position and environment detection.
Hand-held At table In pocket Phone call
BLE 1.04 m 0.98 m 1.47 m 1.27 m Carry/env MAE Q75 RMSE, Red.
exponent and the variance of the additive noise are Optimal/unkown 0.85 m 2.35 m 1.07 m, 10.83%
adapted to the environment, the proposed context Class/BLE 86% 0.80 m 2.27 m 1.02 m, 15.00%
factors allow us to adjust both the reference pathloss Class/UWB 99% 0.77 m 2.21 m 0.97 m, 19.17%
PL0;dBm (RSSI 0;dBm ) and the pathloss exponent based
Optimal/optimal 0.75 m 2.09 m 0.93 m, 22.50%
on the perceived context. Therefore, (2) and (3) are
extended by the context correction factor 0 for dis-
tance d0 and factor for distance d as follows:
99% accuracy (UWB 99%) or BLE with about 86%
RSSI0;dBm þ DRX;dBm ¼ PTX;dBm 0 ð20log d0 þ CÞ (6) accuracy (BLE 86%). The ideal case is given when
RSSIdBm þ DRX;dBm ¼ PTX;dBm ð20log d þ CÞ (7) carry position and environment detection are
always correct (optimal).
2
Þ . For simplicity and in accordance
with C ¼ 10log ð4p The average distance estimation RMSE of the
with GAEN-based calibration, we fix d0 ¼ 1 m. The baseline setting is 1.20 m; a result of similar magnitude
pathloss at the receiver at distance d is given as is reported in Katircioglu et al.16 The error varies
depending on the carry position as shown in Table 2.
RSSI0;dBm RSSIdBm ¼ ð 0 ÞC þ 20log ðdÞ (8)
For comparison, the table also shows the variation of
^ can be calculated as
and the estimated distance d the distance errors of the more accurate UWB time-
of-flight estimation (UWB module pinned to the
RSSI0;dBm RSSIdBm ð 0 ÞC smartphone).
^ ¼ 10
d 20 : (9) The impact of context detection is now assessed
by combinations of carry position detection
While in the denominator is a context-aware (unknown, class, and optimal) and indoor/outdoor
path loss exponent, ð 0 ÞC adapts the calibration detection (unknown, BLE 86%, UWB 99%, and opti-
measurement RSSI0;dBm to the context, similar to mal), as summarized in Table 3. The observed error
the approaches described in Subhan et al.15 and ranges comply to indoor positioning errors in related
Katircioglu et al.16 Note that for the special case studies.17,18,19 In terms of RMSE, the baseline distance
¼ 0 ¼ g=2, (9) is equivalent to (5). With these con- estimation error of 1.20 m can be reduced to 0.97 m
text factors, we now adjust the estimation. The con-
text factor is set based on the detected context
class. For each context class determined by carry
position and environment, the context factor is
derived as the value with the least distance estima-
tion root mean square error (RMSE). Factor 0 ¼ 1 in
our experiments.
FIGURE 6. Violin distribution plots of pathloss and distance estimate (with median marker) over the actual distance. Pathloss
plots show the actual pathloss distribution and the pathloss fitted with context-depending (green solid line) and fixed ¼ 2
(red dashed line); distance estimation plots show the cases of context-depending (green colored distribution) and fixed ¼ 2
(red colored distribution), and the ground truth distance (black dashed line).
(class/UWB 99%); with class/BLE 86%, the distance is high in every environment, yet the outdoor envi-
estimation error is 1.02 m. The optimal/optimal refer- ronment shows a more coherent behavior than the
ence case results in an error of 0.93 m. Figure 5 indoor environment. The use of specific context
details the error distribution of carry position detec- factors did not lead to improvements in distance
tion with BLE 86% versus the baseline and the opti- estimation for hand-held and phone call indoors—
mal case. here, we may still miss relevant context factors. At
Figure 6 shows the distributions of pathloss and table and in pocket scenarios benefit from a con-
distance estimation both for the fixed ¼ 2 (base- text-dependent : the median distance values
line) and the context-dependent over varying dis- match the ground truth distance better (green col-
tances. It can be observed that the pathloss spread ored distributions) than the estimates using a fixed
¼ 2 (red colored distributions). This effect is also
visible for outdoor phone call and in pocket
TABLE 4. Distance estimation RMSE at four different receiver scenarios.
orientations; with carry position detection and with carry The orientation of the users carrying the devices
position and environment detection. is a further potential influencing factor. As shown in
Table 4, the error varies with the orientation of the
0 90 180 270 receiving device as expected, being smallest under
Carry 0.97 m 1.03 m 1.28 m 1.11 m line-of-sight conditions without body shielding at 0
and largest at full body shielding at 180 . Yet, the
Carry & env. 0.83 m 0.95 m 1.17 m 0.91 m
differences are remarkably small.
€
BARBARA NUßBAUMMULLER is a master student with the systems, as well as low-power sensor networks. Etzlinger
Institute of Telecooperation, JKU Linz, 4040, Linz, Austria, in received his Ph.D. degree in technical science from JKU Linz.
her final year. Her current research interests include wireless Contact him at [email protected].
network experimentation and machine learning. Contact her
at [email protected]. KARIN ANNA HUMMEL is an associate professor with the Insti-
tute of Telecooperation, JKU Linz, 4040, Linz, Austria. Her
BERNHARD ETZLINGER is a senior researcher with the Insti- research interests include mobile networked systems and predic-
tute of Communication Systems and RF Systems, JKU Linz, tive networking. Hummel received her Ph.D. degree from the
4040, Linz, Austria. His research interests include cooperative Vienna University of Technology. She is the corresponding author
indoor localization, trustworthy decentralized wireless of this article. Contact her at [email protected].
During the COVID-19 pandemic, digital contact tracing using mobile devices has
been widely explored, with many proposals from academia and industry highlighting
the benefits and challenges. Most approaches use Bluetooth low energy signals to
learn and trace close contacts among users. However, tracing only these contacts
can mask the risk of virus exposure in scenarios with low detection rates. To
address this issue, we propose fostering users to exchange information beyond
close contacts, particularly about prior “deep” contacts that may have transmitted
the virus. This presents new opportunities for controlling the spread of the virus, but
also poses challenges that require further investigation. We provide directions for
addressing these challenges based on our recent work developing a technological
solution using this approach.
D
igital contact tracing (DCT) has received sig- must collect, store, and manage more data than the
nificant attention during the COVID-19 pan- traditional contact tracing scheme. So far, most of the
demic because of its potential to surpass effort in designing DCT solutions has focused on pre-
traditional contact tracing methods.1 Unlike manual serving user privacy, resulting mainly in two architec-
contact tracing, which relies on individuals to recall their tural approaches.3 One approach involves centralized
interactions, DCT utilizes mobile phone technology to solutions, where users must share their contact data
record encounters among people. Users can easily par- with a trusted central authority that can notify them if
ticipate by installing an app on their smartphone and any risks are detected. However, this approach risks
exchanging information with other users, typically via compromising individuals’ privacy, as the central
Bluetooth low energy (BLE). The app can then register authority may access sensitive information such as
close contacts, which, in combination with positive location, personal details, and social activity. The
COVID-19 test results, can be used to analyze the risk of other approach involves distributed solutions that
virus exposure for app users. As a result, DCT has the store contact history locally on the user’s device. This
potential to significantly improve the speed, accuracy, approach can help protect app users’ privacy by avoid-
privacy, and scalability of contact tracing efforts in the ing disclosing sensitive information. However, to
fight against COVID-19 and other infectious diseases.2 assess the risk of virus exposure in a decentralized
In contrast to the traditional procedure, which scheme, data from infected users shall be accessed to
reactively investigates close contacts only after an contrast with contact history stored inside the phone;
infected case, DCT proactively collects all close con- while in centralized systems, data from infected users
tacts to analyze their risk for each positive case. This are always kept in the authority domain. While differ-
helps to stop virus transmission much earlier than tra- ent solutions have been proposed for centralized and
ditional contact tracing as it avoids typical delays of distributed systems to protect privacy, limitations,
interviewing and notifying individuals. However, DCT and risks cannot be entirely avoided.
October-December 2023 Published by the IEEE Computer Society IEEE Pervasive Computing 15
PANDEMIC PREPAREDNESS WITH PERVASIVE COMPUTING
solutions only consider close contacts for risk assess- any connection among them. This aims to protect
ment, similar to the traditional contact tracing pro- user privacy, mainly regarding social activity, but also
cess. This means that to determine the potential to make the system scalable as it may become too
contagion risk, a positive test has to be associated complex to store and process. Besides, DCT proposals
with each close contact. However, the accuracy of based on ephemeral identifiers impose limitations on
this approach is significantly impacted when asymp- the capability of relating close contacts as close con-
tomatic cases are present or when individuals are tacts from the same user may not be feasible to asso-
reluctant to be tested, leading to a low detection ciate. As a result, a different strategy is required to
rate.4 To address this issue, tracing could consider trace deep contacts.
contacts beyond close ones. These contacts, which This article introduces and discusses a novel DCT
we refer to as deep contacts, are not detected as approach to enable close and deep contact tracing
close contacts but can be identified by tracing a chain using mobile devices. This is achieved by extending
of previous contacts. A deep contact is established the mobile app’s capabilities to share relevant con-
between the two individuals at the ends of a chain of tacts via BLE toward users from which virus transmis-
close contacts of length greater than 1. By tracing sion is feasible. As a result, each device collects all
deep contacts, it may be possible to identify potential relevant contacts that, given a positive test, can be
infection paths that would not be found by only con- used to assess the contagion risk better. In other
sidering direct contacts.5 In particular, this enables words, each user shares with others all contacts from
the analysis of individuals’ risk even when their close which a contagion path can be considered feasible,
contacts are all asymptomatic and may not have been given the specific factors required for the virus to
tested. Asymptomatic case detection can also be spread. These contacts include close ones, learned by
improved as they can be more aware of being infected detecting the presence of other users, and deep ones,
by considering the contagion paths determined by learned instead by the information shared by other
deep contacts, encouraging getting diagnosed even users during an encounter.
without symptoms. Besides, since DCT impact is high-
est when the testing delay is low,6 tracing deep con-
tacts enables access to diagnoses even before a close
THIS MEANS THAT TO DETERMINE THE
contact is tested; hence, speeding up the risk assess-
POTENTIAL CONTAGION RISK, A
ment process. To ensure effective risk analysis in DCT,
POSITIVE TEST HAS TO BE ASSOCIATED
it becomes important to consider both close and deep
contacts, especially when the detection rate is low. WITH EACH CLOSE CONTACT.
Tracing deep contacts is done in traditional con-
tact tracing and recently proposed DCT solutions,
although recursively. In practice, close contacts of
an infected case are expected to get tested, and if From Early Detection to Prevention
positive, a new tracing iteration is started to identify Early notification of contagion risk is among the most
its close contacts. Even in scenarios where detec- relevant opportunities this approach can offer. Users
tion rates are high, tracing contagion paths toward may get risk notifications even before close contacts
all potentially infected persons can require several are tested positive. Indeed, the infection period for
iterations, each of which introduces delays given to COVID-19 typically begins before any symptom is pres-
the need to test close contacts at each step. The ent. This means that eventually, an individual may be
slower the process, the more likely the spread of warned of potential risks. At the same time, their close
the virus. Tracing deep contacts opens new oppor- contacts are also notified of the risk, which helps to
tunities to analyze contagion risk and challenges in isolate all the contacts in the chain that may be
properly exploiting contact data collected by mobile affected. In particular, this could enable being notified
devices. as soon as possible, likely before starting the infection
A straightforward approach for tracing deep con- period.
tacts would be storing close contacts from all users in Besides speed, our proposal can also offer the
a single database to establish contagion paths among capability to monitor the spread of the virus. Like
individuals who might not have acknowledged any radar, risk notifications could indicate the distance of
close contact between them. This goes beyond what contacts from the positive cases. This could help to
centralized architectures currently propose, which iso- take preventive measures to avoid infection as much
lates close contacts from users instead of enabling as possible. Even if no risk is found, the amount of
FIGURE 1. (a) Encounters between individuals (20 time units), (b) encounter graph up to instant 18, and (c) up to instant 20.
satisfies the temporal constraint. An important possibility that the infection has reached the individ-
parameter of our model is Tc , the maximum time ual. If true, the individual is still infected and conta-
between when an individual becomes infected and gious because the chain is active. In turn, while the
stops being contagious. This period may depend on chain remains active, a new encounter would indicate
their vaccination history, previous infections, and the new participant is at risk of contagion. This way,
other factors. Based on Kampen et al.10 we estimate the contagion risk spreads from a diagnosed individual
Tc as 14 days. In a contagion chain, the time between to other increasingly distant individuals.
an encounter’s end time and the next one’s start time We assume the existence of a set of diagnoses
must be less than Tc . This ensures that if an individual containing all available positive diagnoses up to the
is infected in the first timestamp, he is still contagious instant t. Each diagnosis of the form ði; ds ; de Þ indi-
in the second timestamp, passing the infection to the cates the interval, from ds to de , in which infected indi-
subsequent individual in the chain. In infectious dis- vidual i is contagious, and satisfies de ds Tc .
ease dynamics, the interval between when an individ- Recalling the encounters of Figure 1(b): if there exists
ual is infected and becomes infectious is called latent a diagnosis ðg; 10; 15Þ, then h and i are at risk of conta-
period. In our model, a lower bound for this interval is gion. Since the encounter between g and k is previous,
incorporated as parameter Tl , whose value is esti- the green chain does not involve an infection spread.
mated as two days11,12 and plays an important role in However, at instant 20 [see Figure 1(c)], once the
the containment of the infection risk. Finally, in a con- encounter between i and j begins, the chain from g is
tagion chain the time between an encounter’s end extended (in dots), putting j at risk.
time and the next one’s start time must be equal or The model presented so far is not intended to cal-
greater than Tl . culate the contagion risk effectively. It is impractical
In Figure 1(b), we show examples of contagion and insecure to depend on the availability of complete
chains. Assuming Tc ¼ 5, and for simplicity, Tl ¼ 0, and centralized information on the encounters of all
there exist a chain from g to i (blue) and one from g to the individuals in a community. But the centralization
j (green). Each encounter is itself a contagion chain of of diagnoses may be reasonable as it is information
unit length. Besides, notice that there is no possible protected by a health authority. The presented char-
chain from f to i because the end time of the encoun- acterization allows us to define precisely what we
ter between f and h is too far away from the start understand by the risk of contagion and the informa-
time of any encounters between h and i. tion necessary to compute it. We will use this model
to determine the scope of what we can compute in a
Contagion Risk more concise and distributed model while considering
We define two classes of contagion chains: confirmed data volume, privacy, and security concerns.
and active chains. Both classes are valid at specific
moments in time. A chain can become confirmed with A MODEL FOR DEEP CONTACTS
the availability of new diagnoses. A chain can stop By analyzing the notion of contagion risk on an
being active with the mere passage of time. encounter graph, we can discover three key factors
A confirmed chain begins with an infected individ- that allow us to design a distributed model for assess-
ual and continues with a sequence of hosts that may ing the infection spread:
be infected without knowing it. These hosts certainly
could spread the infection. For a chain to be con- 1) The contagion risk for a particular individual in a
firmed at instant t, it is enough that there is a positive given time t depends on the contagion chains
diagnosis available at t for the individual at the begin- reaching them.
ning of the chain. The diagnostic must overlap tempo- 2) A contagion chain is determined by the individu-
rally with the start time of the first encounter of the als on both ends and the time restrictions of the
chain. encounters of the chain; hence, data from other
A chain is active when it can still be extended by individuals of the chain are not necessary.
incorporating new encounters, thus reaching new indi- 3) Contagion chains become irrelevant over time
viduals. For a chain to be active at instant t, the time depending on whether individuals at the origin
elapsed between the ending time of the chain’s last have a positive diagnosis.
encounter and t must be less than Tc .
Thus, we can establish that the last individual of an We can exploit these factors to define a model,
active and confirmed chain is at risk of contagion. where individuals only track information relevant to
Since the chain is confirmed, there is a particular their risk.
The first step is to move to a more compact and the participant j, i registers the close contact
unstructured representation. We replace the notion of ðj; ts ; te ; 1Þ.
encounter with that of contact between i and j, repre- Regarding the information that must be transmit-
senting a chain of contagion from j to i. We leverage ted, notice that notions of an active and confirmed
the explicit start timestamp of the first link and the chain of contagion extend directly to notions of active
ending timestamp of the last link in the chain. Also, we and confirmed contact. Thus, active and confirmed
incorporate the length of the chain as the depth of contacts of i determine their risk of being infected.
the contact, distinguishing close from deep contacts, During the encounter with j, all the active contacts of
with depth equal to 1 and depth greater than 1, i, which satisfy the timing constraint imposed by the
respectively. latent period, become active contacts of j. Therefore,
An individual only has to be aware of their close i must transmit every active and confirmed contact
and deep contacts to determine the possibility of ðf; t0s ; t0e ; nÞ to j as long as ts t0e Tl .
infection. Thus, the notion of contact allows us to However, just transmitting these contacts is
move from a centralized and structured global insufficient. There is a period between an individual
repository of information like the encounters graph suspect being infected (e.g., due to the appearance
to a local unstructured one for each individual. How- of symptoms or having had an encounter with a
ever, the amount of information required could scale confirmed case) and the test is performed, and the
significantly. Each final subchain in a chain of conta- result is obtained. Fortunately, it is reasonable to
gion is, by definition, a chain of contagion. There- assume that there is a limit to the time between an
fore, each participant in a chain reaching i defines a individual becoming infected and when the respec-
contact of i. tive positive diagnosis is available. We encode this
Recall the example of Figure 1(b). For i, chain limit as a parameter of the model, named Td . This
beginning at g is represented as the contact parameter helps us to delimit the contacts with indi-
ðg; 14; 18; 2Þ. The first encounter with h is represented viduals who do not have a positive diagnosis but
as ðh; 8; 9; 1Þ, and the second as ðh; 15; 18; 1Þ. can still receive it, representing a risk if it happens.
For the case of COVID-19, we can estimate Td as 6
Contact Computation days, considering 5 days for the appearance of
The main idea of a distributed model is that an individ- symptoms plus one day for the test.11,12 For
ual can compute their contacts by recording the close instance, while i and j encounter at ts , there may be
contact with any other individual and from the con- an ongoing analysis that determines in the future a
tacts that the individual had with third parties. In a diagnosis ðh; ds ; de Þ for some previous contact
scheme where i and j keep an account of their con- ðh; t00s ; t00e ; mÞ of i, with ds t00s and de t00s . If ts t00s
tacts and can communicate them to each other, Td , contact information about h must be transmit-
the management of contacts of i comprises three ted to j even if their diagnosis is unavailable.
activities: With respect to how to derive deep contacts from
the received information, when i receives a contact
1) To determine and register the occurrence of the ðk; t000 000
s ; te ; lÞ from j during the encounter ending at te ,
encounter with j. they only need to appropriately update timing and
2) To determine and transmit relevant contacts to j. depth components, registering ðk; t000 s ; te ; l þ 1Þ as their
3) To derive own (deep) contacts from the contacts contact.
provided by j. Let us examine the encounter between i and j
according to Figure 1(c). Assuming Tc ¼ 5, Td ¼ 4, and
The detection of encounters and the communi- a positive diagnosis ðg; 10; 15Þ, at instant 19 i transmit
cation aspects are outside the description of this to j contacts ðg; 14; 18; 2Þ since it is active and con-
model since they depend on the implementation firmed, and ðh; 15; 18; 1Þ since it is active and 19 15
technology. We will present some of its details later. Td . However, contact ðh; 6; 7; 1Þ is not transmitted
According to the listed activities, the computation because it is inactive. In response, j registers contacts
of contacts of i during their encounter with j is car- ðg; 14; 20; 3Þ and ðh; 15; 20; 2Þ, respectively.
ried out as follows. Finally, recall that the determination of relevant
The encounter registration does not present contacts, i.e., those contacts that must be transmit-
major challenges once the technical aspects are ted, only depends on the availability of diagnosis and
resolved. After determining the start time ts and time restrictions. Thus, individuals can manage their
ending time te of the encounter, and the identity of contacts over time by registering encounters with
others, getting relevant contacts from them, and individual to individual, increasing confirmed contacts
deleting irrelevant ones. Moreover, at any instant t, an circulating among the community. However, at the
individual will have received all active and confirmed same time, we can speculate that the longer a con-
contacts at t, either having received them as already firmed chain of contagion becomes, the more likely
confirmed or as contacts who obtained their confir- other contacts representing the same path of infec-
mation after the reception. Therefore, the risk of con- tion will be confirmed. Thus, it is reasonable to estab-
tagion can be assessed using an individual’s own lish the hypothesis that a confirmed contact
contacts in the same way as can be determined from represents a risk as long as it does not become obso-
the complete encounters graph. lete, i.e., its depth or the time passed since its start
time does not exceed certain limits. Such limits are
Equivalent Contacts incorporated as model parameters, whose value
There may exist situations where an individual will should be determined by expert knowledge or by simu-
receive multiple similar contacts, matching the par- lating the infection dynamics. During encounters, indi-
ticipant at the origin and the start time. Such situa- viduals avoid transmitting obsolete contacts, reducing
tions can originate from multiple paths between a the circulation of confirmed contacts, but maintaining
pair of individuals in the encounters graph and lead an appropriate notion of infection risk.
to considerable data circulation. Moreover, circulat-
ing data could grow indefinitely in some cases of Extensions
cyclic paths.
Duration and Other Characteristics of
It should be noted that the circular flow of con-
Encounters
tacts is not a problem of the model, but an intrinsic In general, exposure to a certain minimum amount
characteristic of the circulation of an infection. It is of a pathogen is necessary to become infected,
perfectly plausible that an infection passes through which translates into being in contact for a certain
an individual and continues a path that, at some point, amount of time with an infected individual. This min-
returns to the same individual. In such cases, if imum period can be incorporated as a model param-
enough time has passed, this poses a real risk to the eter Tm , commonly established as 15 minutes in the
individual, who may become reinfected. case of COVID-19. Thus, an individual should register
contacts from an encounter only if it last longer
than Tm .
MOREOVER, CIRCULATING DATA Also, the encounters could be characterized more
COULD GROW INDEFINITELY IN SOME qualitatively, including the physical distance between
individuals and whether it occurs indoors or outdoors.
CASES OF CYCLIC PATHS.
Defining a criterion for registering close contacts
using such characteristics would recursively limit the
circulation of deep contacts.
Fortunately, from the point of view of the conta-
gion risk, similar contacts are equivalent. From the Negative Test Results
point of view of relevance for its transmission, the Diagnosis sets could be extended to include negative
most recent contact subsumes any other. Although test results. Such information can be used to prevent
we have not considered it, a shorter chain implies a transmission of active contacts that certainly do not
greater risk. Considering these aspects, individuals pose a contagion risk.
can merge multiple similar contacts into a unique con-
tact integrating the most recent ending time and the More Expressive Risk Functions
shallower depth. When defining the notion of contagion risk, it seems
arbitrary that very deep contacts carry the same
Obsolete Contacts risk as close contacts. The most apparent improve-
Recall that a chain of contagion could be extended ment direction is moving from a binary notion to a
endlessly by new encounters as long as it remains risk scoring, considering the contact depth to weigh
active. This represents the possibility of infection con- the risk.
tinuing as long as those in close or deep contact with In the same spirit, the number of occurrences of
an infected individual continue having encounters confirmed contacts could be incorporated. It must be
with third parties. Consequently, active confirmed taken into account that, basically, the risk is linearly
contacts will continue to be transmitted from proportional to the number of confirmed close
of the so-called protocol data unit (PDU) segment contacts during one-second intervals; hence, the
within the BLE frame. Two modes are available for cost in terms of time for data exchange is very low.
this: Legacy and Extended. In Legacy, the space avail- When the scanner receives a packet, it deter-
able for the payload is up to 31 bytes, and it is trans- mines as a first step if collected data initiates a new
mitted only on the primary advertising channels. Even encounter or if it belongs to an encounter in prog-
though the reduced size, this message is visible to any ress. This depends on the time elapsed since the
BLE scanner using BLE v4.0 onwards. On the other last packet was received from the same ID. In the
hand, in the Extended mode, the available space is up first case, a new close contact is registered by set-
to 255 bytes, which is more suitable for deep contact ting the start and ending time at the current instant.
sharing but requires BLE v5.0 onwards. In the second case, the ending time of the ongoing
contact is updated. Next, the advertised contacts
Exploitation of Technologies are extracted, evaluated, and consolidated as deep
We developed the first application version to publish contacts in the local database, updating timing
BLE packets using Android SDK (Level 23). We priori- information if necessary.
tized compatibility with the largest fleet of devices
possible, so the app advertises in BLE’s Legacy
mode. Each BLE packet incorporates a 16-byte Uni-
OPEN CHALLENGES
versal Unique Identifier (UUID) into its PDU payload, Privacy
ensuring the uniqueness of each device identifica- Our contribution aims at extending risk analysis for
tion. We leverage half of the UUID to encode the DCT solutions, but we have not deepened on privacy
user ID, leaving 23 bytes free in the payload to popu- preservation aspects. We acknowledge the concerns
late with more data. Advertisement of the user ID associated with using static identifiers for tracing indi-
was enough since the goal of this first app was the viduals, particularly the potential for third-party track-
determination of other devices’ presence (i.e., ing and the implications for the privacy of infected
encounter detection). individuals. We claim that DCT poses no technical
Later, we prototyped a second version of the impediments to leverage ephemeral identifiers like
mobile app for deep contact sharing, including an those used in the decentralized privacy-preserving
advertiser and a scanner modules. These modules are proximity tracing (DP-3 T) and widely adopted in the
responsible for sending and receiving information Google–Apple Exposure Notification (GAEN) system.
about deep contacts. Advertiser and scanner modules Nevertheless, DCT methodology unveils a tradeoff
run in parallel every certain predetermined time inter- between accuracy and privacy, contingent upon the
val. A third component is responsible for calculating frequency of ephemeral ID generation. An excessively
the risk of contagion using the stored information on high frequency (i.e., less than a day) may result in dif-
user demand. ferent edges in the graph corresponding to the same
Contacts to be transmitted are computed once contact event, thereby compromising detection effi-
diagnosis information is received from a remote cacy. On the other hand, if the interval between identi-
database Relevant contacts are prioritized and fier changes is too long (i.e., over multiple days), it
encoded. Then, all available PDU payload space is could weaken privacy measures. Likewise, it could
used by the advertiser to share them with nearby potentially allow malicious actors to track contact
devices, according to their priority order. The PDU patterns over significant periods.20 To mitigate these
payload carries both the user ID (8 bytes) and a risks, techniques like rate-limiting, anomaly detection,
sequence of encoded contacts, each of which and even integrating cryptographic measures to
includes the identity of the contact (8 bytes), the ensure data integrity can be explored. Furthermore,
start time (4 bytes), the ending time expressed as our approach can be adapted to identify encounters
duration (2 bytes), the depth (3 bits), the transmis- rather than individuals, enhancing privacy at the
sion power to infer distance (1 B), and optional envi- expense of larger contact identifiers. The use of static
ronment characteristics (5 bits). Thus, in Legacy identifiers in our analysis is primarily for our analysis
mode, each packet allows only one contact to be of deep contact tracing, and privacy-preserving meth-
sent, making the exchange of information very ineffi- ods should be considered in the real-world implemen-
cient. We decided to move to Extended mode to tation of deep contact tracing. The research and
send multiple contacts at once. This allows us to development of privacy-preserving enhancements are
send up to 15 contacts per message every 100 ms. among the most relevant challenges in deep contact
In this mode, we can exchange hundreds of tracing.
attention due to its capability to register close con- 10. J. van Kampen et al., “Duration and key determinants
tacts automatically, without any user intervention. of infectious virus shedding in hospitalized patients
In this work, we proposed deep DCT as a means to with coronavirus disease-2019 (COVID-19),” Nature
collect and process close contacts and all contacts Commun., vol. 12, no. 1, 2021, Art. no. 267.
from which the virus may have been transmitted. 11. S. A. Lauer et al., “The incubation period of coronavirus
This enables risk analysis even when infected close disease 2019 (COVID-19) from publicly reported
contacts have not been tested. We presented a con- confirmed cases: Estimation and application,” Ann.
tagion chain computation and dissemination model Intern. Med., vol. 172, pp. 577–582, 2020.
with multiple practical uses, such as risk notifica- 12. S. Flaxman et al., “Estimating the effects of non-
tions and monitoring. The number of transmitted pharmaceutical interventions on COVID-19 in Europe,”
contacts per encounter was analyzed by simulating Nature, vol. 584, pp. 257–261, 2020.
up to depths of three contacts, which seem suitable 13. G. J. Soldano et al., “COVID-19 mitigation by digital
for real-life implementations. Then, based on experi- contact tracing and contact prevention (app-based
mental experiences, we put forward concrete tech- social exposure warnings),” Sci. Rep., vol. 11, no. 1,
nical approaches to implement deep contact tracing pp. 1–8, 2021.
in mobile devices. A series of open challenges com- 14. P. Poletti et al., “Association of age with likelihood of
prising privacy, evaluation, storage, and energy moti- developing symptoms and critical disease among
vates this emerging field of study. close contacts exposed to patients with confirmed
SARS-CoV-2 infection in Italy,” JAMA Netw Open,
vol. 4, no. 3, 2021, Art. no. e211085.
REFERENCES 15. L. Kahnbach et al., “Quality and adoption of
1. T. Jiang et al., “A survey on contact tracing: The latest COVID-19 tracing apps and recommendations for
advancements and challenges,” ACM Trans. Spatial development: Systematic interdisciplinary review of
Algorithms Syst., vol. 8, no. 2, pp. 1–35, 2022. European apps,” J. Med Internet Res., vol. 23, no. 6,
2. E. Hernandez-Orallo et al., “Evaluating how 2021, Art. no. e27989.
smartphone contact tracing technology can reduce 16. P. Madoery et al., “Feature selection for proximity
the spread of infectious diseases: The case of estimation in COVID-19 contact tracing apps based on
COVID-19,” IEEE Access, vol. 8, pp. 99083–99097, Bluetooth Low Energy (BLE),” Pervasive Mobile
2020. Comput., vol. 77, 2021, Art. no. 101474.
3. N. Ahmed et al., “A survey of COVID-19 contact 17. L. Reichert et al., “A survey of automatic contact
tracing apps,” IEEE Access, vol. 8, pp. 134577–134601, tracing approaches using Bluetooth Low Energy,” ACM
2020. Trans. Comput. Healthcare, vol. 2, no. 2, pp. 1–33, 2021.
4. L. Ferretti et al., “Quantifying SARS-CoV-2 transmission 18. P. C. Ng, P. Spachos, and K. N. Plataniotis, “COVID-19
suggests epidemic control with digital contact and your smartphone: BLE-based smart contact
tracing,” Science, vol. 368, no. 6491, May 2020, Art. no. tracing,” IEEE Syst. J., vol. 15, no. 4, pp. 5367–5378, Dec.
eabb6936. 2021.
ndez-Orallo et al., “A methodology for
5. E. Herna 19. L. Flueratoru, V. Shubina, D. Niculescu, and E. S. Lohan,
evaluating digital contact tracing apps based on the “On the high fluctuations of received signal strength
COVID-19 experience,” Sci. Rep., vol. 12, 2022, measurements with BLE signals for contact tracing
Art. no. 12728. and proximity detection,” IEEE Sensors J., vol. 22, no. 6,
6. C. Boldrini, A. Passarella, and M. Conti, “Models for pp. 5086–5100, Mar. 2022.
digitally contact-traced epidemics,” IEEE Access, 20. B. Sowmiya et al., “A survey on security and privacy
vol. 10, pp. 106180–106190, 2022. issues in contact tracing application of Covid-19,” SN
7. P. Gupta et al., “Proactive contact tracing,” PLoS Digit. Comput. Sci., vol. 2, no. 3, 2021, Art. no. 136.
Health, vol. 2, no. 3, 2023, Art. no. e0000199. 21. A. Akinbi, M. Forshaw, and V. Blinkhorn, “Contact
8. J. Lelieveld et al., “Model calculations of aerosol tracing apps for the COVID-19 pandemic: A systematic
transmission and infection risk of COVID-19 in indoor literature review of challenges and future directions
environments,” Int. J. Environ. Res. Public Health, for neo-liberal societies,” Health Inf. Sci. Syst., vol. 9,
vol. 17, 2020, Art. no. 8114. no. 1, 2021, Art. no. 18.
9. J. M. Robles-Romero et al., “Behaviour of aerosols and 22. M. Walrave et al., “Adoption of a contact tracing app
their role in the transmission of SARS-CoV-2; A for containing COVID-19: A health belief model
scoping review,” Rev. Med. Virol., vol. 32, no. 3, 2022, approach,” JMIR Public Health Surveill., vol. 6, no. 3,
Art. no. e2297. 2020, Art. no. e20572.
RENATO CHERINI is currently an associate professor with his Ph.D. degree in computer science from the Universidad
rdoba, Co
the Universidad Nacional de Co rdoba, 5000, rdoba. Contact him at [email protected].
Nacional de Co
Argentina. His research interests include network analysis
and modeling. He received his Ph.D. degree in computer sci- PABLO G. MADOERY is currently an assistant professor
rdoba. He is a
ence from the Universidad Nacional de Co rdoba, Co
with the Universidad Nacional de Co rdoba, 5000,
member of IEEE. Contact him at [email protected]. Argentina, and a postdoctoral fellow with Carleton University,
Canada. His research interests include routing and transport
RAMIRO DETKE is currently a graduate student in the engi- protocols for satellite networks. He received his Ph.D. degree
neering doctorate program with the Universidad Nacional de rdoba.
in computer science from Universidad Nacional de Co
rdoba, Co
Co rdoba, 5000, Argentina. His research interests Contact him at [email protected].
include access and routing algorithms in contact networks.
Contact him at [email protected]. JORGE M. FINOCHIETTO is currently a full professor with the
rdoba, Co
Universidad Nacional de Co rdoba, 5000, Argentina,
and an independent researcher with CONICET, Argentina.
JUAN FRAIRE is currently a researcher and an associate His research interests include access and routing protocols
rdoba
professor with the Universidad Nacional de Co for communications networks. He received his Ph.D. degree
(Argentina) - CONICET, 5000, Argentina, and also Inria, Univ in electronic and communications engineering from Politec-
Lyon, INSA Lyon, Villeurbanne, France. He is also a guest nico di Torino, Italy. He is the corresponding author of this
professor with Saarland University, Germany. He received article. Contact him at jorge.fi[email protected].
This article studies crowd dynamics forecast one week in advance to detect
irregular urban events, which plays an important role in infection prevention and
crowd control. Previous approaches have failed to deal with the scarcity of
anomalous events, resulting in a large model bias, and could not quantify the
number of visitors in anomalous crowding. We proposed an unbiased regression
using importance weighting (IW), called CityOutlook, and successfully reduced the
model bias and showed promising results. However, the straightforward weighting
of the scarce data risks leading to the instability of the model due to the increase in
model variance. To address this issue, we propose a nontrivial extension of our prior
work called CityOutlook+ that realizes unbiased and less-variant regression by
performing synthetic minority oversampling based on the importance. We evaluate
CityOutlook+ using real datasets and demonstrate the superiority of our model to
CityOutlook and state-of-the-art approaches.
T
his study forecasts crowd dynamics one week in technology into everyday life, is vividly demonstrated
advance to detect regular and irregular events in our use of GPS-based mobility logs. These logs
and counter anomalous people movements. enable real-time analysis of crowd dynamics,1,2 and
Crowd dynamics, i.e., the crowd density changes over simulating the crowd flows using regressive models in
time, significantly increase during unusual events and an online learning manner is one of the prominent
pose a tremendous threat to public safety (e.g., acci- methods3 for anomalous crowds forecast; however,
dents or epidemics due to surging crowds). Early fore- these approaches cannot provide long-term predic-
casting of crowd dynamics enables us to facilitate tions (e.g., one week ahead) because the crowd flow
strategic planning for infection prevention and crowd starts to change only just before the anomalies. Alter-
control, such as allocating personnel for crowd man- natively, given that people’s behavioral schedules
agement and medical resources. However, forecasting reflect future human mobility patterns, empowering
becomes challenging when it comes to both the nor- the early forecast with people’s schedule patterns
mal dynamics (i.e., daily changes) and abnormal using additional data (e.g., searching histories of train
dynamics (i.e., changes under irregular events). transit) has also been explored.4,5
The rise of pervasive and ubiquitous computing, However, as yet, there are no methods to success-
characterized by the seamless incorporation of fully forecast in advance the number of people visiting
unusually because the existing methods suffer from
the rarity of anomalous events and, consequently, the
problem of data imbalance. Figure 1 shows the crowd
1536-1268 ß 2023 IEEE
dynamics over several days and its degree of irregular-
Digital Object Identifier 10.1109/MPRV.2023.3312652
Date of publication 26 September 2023; date of current ity, which we call the irregularity score. Most of the
version 30 November 2023. data become normal, and the number of anomalous
26 IEEE Pervasive Computing Published by the IEEE Computer Society October-December 2023
PANDEMIC PREPAREDNESS WITH PERVASIVE COMPUTING
RELATED WORK
Crowd Dynamics Forecast: Mobile device-based loca-
tion history has facilitated the forecasting of crowd
dynamics. Fan et al.10 and Jiang et al.3 proposed online
FIGURE 1. Crowd dynamics and irregularity scores in Meiji learning-based systems using human-mobility logs to
Jingu Shrine. predict crowd flow, but they lacked long-term predic-
tion of anomalies. Researchers explored using sched-
ule-based features like transit search applications;4,5
records is limited. This leads to significant estimator however, due to the rarity of anomalous events, these
bias in a regression model of anomalous crowd methods cannot provide accurate early forecasts (e.g.,
dynamics, as the model cannot fit to anomalous data, one week ahead).
although it well represents normal patterns of Imbalanced Learning tackles learning patterns
dynamics. from imbalanced data11 and is extensively researched
Our prior work6 proposed CityOutlook to over- in various machine learning fields. Resampling8 and
come the challenges and limitations of related work cost-sensitive learning12 approaches are used for clas-
by using density ratio-based importance weighting sification problems. A few studies focused on regres-
(IW)7 for an unbiased estimate of the distribution of sion problems. Prior studies have proposed sigmoid-
anomalous data. However, weighting the anomalous like relevance13 and extended SMOTE8 method for
data causes prediction instability due to the model regression. However, these methods are unsuitable
becoming more sensitive to the noise in the anomaly for forecasting normal and anomalous crowd dynam-
data, leading to overfitting and increased model vari- ics due to the limited applicability of the output proba-
ance. Solving this problem is inherently challenging bility distribution.
due to the tradeoff between bias and variance. Importance Weighting,7,14 which has been conven-
Motivated by this challenge, we propose a nontriv- tionally used for penalizing the loss function under
ial extension of our previous work, CityOutlook+, covariate shift,15 is formulated as a shift in the distri-
which uses synthetic sampling of data to achieve bution of explanatory variables between source and
unbiased and less-variant regression. We combine the target data. The importance is used to reweight
IW with synthetic minority oversampling (SMOTE)8 to source data as an unbiased estimate for target data
reduce model variance. Although SMOTE and its fam- functions.7 However, dividing the dataset makes it dif-
ily9 aim to suppress overfitting to minority, i.e., scarce ficult to apply IW to the regression task, and penalizing
data by synthesizing internally dividing points of exist- scarce patterns increases the variance of the estima-
ing data, a way of theoretically determining the num- tor, which is problematic.
ber of sampled data to reduce the bias has not been Our prior work6 was established to address these
established. We address this issue by developing a issues, and provided an effective measure for the rele-
new resampling algorithm that calculates the number vance of anomalies by applying IW to the crowd
of samples based on the importance. dynamics modeling.
Our contributions are summarized as follows:
PRELIMINARIES
› We extend our prior work of unbiased regres-
sion6 to realize both unbiased and less-variant Variable Definition and Problem
prediction for early crowd dynamics forecast. Statement
› We propose a novel regression framework with Let t be a time segment on a day, and each day be
importance estimation-based resampling, called divided into T time segments (i.e., t ¼ 1; 2; . . .; T ). In
CityOutlook+, to robustly model both normal addition, l denotes the point of interest (POI), which is
and abnormal crowd dynamics. a certain urban region on which we are focusing.
ðlÞ
Definition 1 (Ground-Truth Crowd Density): GPS- prediction model fð d;t ; u Þ using an autoregressive
based mobility logs are used to address crowd density model16 as follows:
patterns. The number of mobility records in an area at
ðlÞ ðlÞ
a specific time is the ground-truth crowd density. The ^nd;t ¼ fð d;t ; u Þ
observed crowd density at a POI l on the date d and ðlÞ >
pX
d þpw X
1
ðlÞ
ðlÞ
time segment t is denoted by yd;t . ¼ ½1; d;t uu ¼ ui;j d;tjjdi þ uc (2)
i¼pd j¼1
Definition 2 (Scheduled Crowd Density Set): The
study defines the scheduled crowd density for where u 2 R3pw þ1 , whose elements are denoted by ui;j
future crowd dynamics and uses transit search logs and uc . To simplify the notation for readability, we omit
to obtain the scheduled crowd density patterns. l; d, and t from the description.
The transit search history consists of query records The learning parameters are inferred by minimizing
with scheduled date d, time t, searching date d0 , the ordinary least squared (OLS) loss, Lðn; fð ; u ÞÞ, as
ðlÞ follows:
and destination POI l. sd;tjdi denotes the number of
logs for scheduled date d and time t, searched i " #
1X
days before the date d. Then, we define the sched- min Lðn; fð ; u ÞÞ
ðlÞ ðlÞ ðlÞ Q N
u 2Q
n
uled crowd density set S d;t as S d;t ¼ fsd;tjdi j i ¼ " #
pd ; pd þ 1; . . .; pd þ pw g; where pd is the earliest day 1X
before the scheduled date, and pw denotes the ¼ min ðnn fð n ; u ÞÞ2 (3)
Q N
u 2Q
n
range of days.
where Q is the parameter space, and N and n denote
Definition 3 (Crowd Dynamics Irregularity Score):
the number of data and its index, respectively. Bilinear
We define irregular crowd dynamics as abnormally
Poisson regression2 is used to estimate normal crowd
high crowd density. Hence, the irregularity score ðlÞ ðlÞ
dynamics yd;t and sd;t by predicting it from contextual
should be large for anomalous crowds to reflect the
factors such as holidays, weekdays/weekends, and
high degree of congestion and close to zero for no
ðlÞ weather. This model is described in detail in the
crowds. The crowd dynamics irregularity score nd;t rep-
experiments section.
resents the deviation of ground-truth crowd density
ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ However, as discussed in,6 minimizing OLS loss
yd;t from normal dynamics yd;t as nd;t ¼ ðyd;t yd;t Þ=yd;t :
fails to robustly capture anomalous crowd dynamics
Definition 4 (Early Crowd Dynamics Forecast): One patterns This is because minimizing OLS loss in (3)
week before events, using scheduled crowd density can be regarded as learning normal crowd dynamics,
ðlÞ ðlÞ
S d;t , where pd ¼ 7, and normal crowd dynamics yd;t , we and there is still a significant bias in the regression
forecast normal and abnormal crowd dynamics at a model on the abnormal patterns. For this reason, we
POI l on date d and time t. consider defining the criteria for the relevance of data
and penalizing the OLS loss of each training data.
Supervised-CityProphet: Baseline
Approach CITYOUTLOOK+: PROPOSED
Drawing on the previous work,5 we design a predic- METHOD
tive model of the irregularity score ^n. In this model, In this section, we present our proposed method
the crowd anomaly is forecasted by associating the CityOutlook+, which is an extension of CityOutlook
mobility logs and schedule patterns by transit leveraging the concept of synthetic minority oversam-
search logs. pling (SMOTE). First, we review the research chal-
ðlÞ ðlÞ
We build a regression function ^nd;t ¼ fð d;t ; u Þ, lenges of the baseline approach and illustrate the
ðlÞ pw
where u is the learning parameter. d;t 2 R is the framework overview of the proposed method. Then,
schedule deviation score calculated by the scheduled we describe data preprocessing and the IW-based
ðlÞ
crowd density set S d;t , and the normal scheduled unbiased regression proposed in our prior work.1
ðlÞ
crowd density sd;t . This is expressed as follows: Finally, we extend our prior work for realizing the unbi-
8 9 ased and less-variant regression, i.e., CityOutlook+.
< sd;tjjdi sd;t =
ðlÞ ðlÞ
ðlÞ ðlÞ ðlÞ
d;t ¼ d;tjjdi j d;tjjdi ¼ (1)
: ðlÞ
sd;t ; Research Challenges and Basic
Concept
ðlÞ ðlÞ
where sd;tjdi 2 S d;tj , and j ¼ 1; 0; 1. Based on the We focus on IW as a powerful indicator to penalize the
defined terms, we formulate the irregularity loss. However, it is nontrivial to employ the IW
technique for two reasons: 1) Dividing the set of input Data Preprocessing
ðlÞ
data into normal and abnormal data is challenging We preprocess the crowd density yd;t and scheduled
ðlÞ
because the abnormality of the input complicatedly crowd density set S d;t as5 to obtain ðn; Þ, where n is the
depends on the contextual factors such as weekday- irregularity score of crowd dynamics as defined in
or-not and schedules for 1-week-ahead or 10-days- Definition 3, and schedule deviation score based on (1).
ðlÞ ðlÞ
ahead. This issue makes it difficult to applying the IW To estimate the normal crowd dynamics yd;t and sd;t ,
2
to crowd dynamics forecasting. 2) Straightforward we use bilinear Poisson regression. This method
weighting of abnormal data results in model instability assumes normal crowd dynamics can be modeled from
owing to the large variance of the estimator. Weight- external contextual factors and a time factor. For these
ing the loss of the abnormal data makes the model factors, we used holiday-or-not, weekday-or-weekend,
become more sensitive to the inherent data noise. and weather information. Based on one-hot encoding,
This means that the estimator overfits the data noise holiday-or-not, and weekday-or-weekend features are
and increases the variant error on the prediction. 2-D vectors. Weather information is a 4-D vector:
Therefore, forecasting a crowd anomaly becomes sunny, cloudy, rainy, and the others. We use the tensor
much more difficult even if the importance is product to compose these features into one input vec-
established. tor and regress the normal density by forming a bilinear
To address the aforementioned issues, we focus representation with a time factor.
on the heterogeneous properties of mobility logs and
data augmentation by synthetic minority oversam- Importance Setup by Heterogeneous
pling.8 The basic concept of CityOutlook+ is 1) to Anomaly-Aware Annotation Scheme
design a data anomality annotation strategy with the As discussed in the preliminaries section, minimizing
heterogeneous property of mobility data, and 2) to OLS loss suffers from a significant bias on abnormal
build an importance-based resampling approach to patterns. To address this issue, we define the rele-
augment data and mitigate the learning instability. vance of data and penalize the OLS loss. As proposed
The overall framework of CityOutlook+ is illus- in our prior work,6 we used the density-ratio-based
trated in Figure 2. It uses the dataset of schedule devi- importance for defining the relevance. With the impor-
ation score and crowd dynamics irregularity score pðs¼1j Þ
tance wð Þ ¼ pðs¼0j
Þ , importance-weighted least
based on the mobility logs and transit search histories, squared loss can be minimized as follows:
as discussed in the preliminary section. To build an
unbiased regression model, we first review the " #
1 X pðsn ¼ 1j Þ 2
data preprocessing [see Figure 2(a)], describe a het- min ðnn fð n ; u ÞÞ : (4)
Q N
u 2Q pðsn ¼ 0j Þ
erogeneous anomaly-aware annotation scheme for n
crowd dynamics learning. This scheme refers to the because they have mainly focused on the resampling
crowd dynamics irregularity score n, which is defined of classification tasks, whose oversampling criteria
based on the number of mobility logs, and explicitly can be easily obtained based on the number of sam-
defines the anomaly labels for the input . We spuri- ples belonging to minority classes.
ously separate the input dataset by using the upper
bound of the normality nthre . The normal input dataset Algorithm 1. Importance-based Minority
Dno and anomalous input dataset Dano are defined as Resampling
follows: Input:D; k
1: // D - dataset of ð ; nÞ
Dno ¼ f j ðn; Þ; n < nthre g (5) 2: // k - number of nearest neighbors
Dano ¼ f j ðn; Þ; nthre ng: (6) 3: newD fg, Dno fg, Dano fg.
4: Divide D into Dno and Dano .
We estimate the density pðs ¼ 0; Þ and pðs ¼ 1; Þ 5: Estimate pðs ¼ 0; ~ Þ by (7).
respectively in a nonparametric manner by using kernel 6: Estimate pðs ¼ 1; ~ Þ by (8).
density estimation17 with a Gaussian kernel as follows: 7: for all ð n ; nn Þ D do
( ) 8: newCases fg
1 X 1 jj i jj2 9: case ð n ; nn Þ
pðs ¼ 0; Þ ¼ exp (7) 10: // estimate the importance
jDno j ~ ð2ph2 ÞD=2 2 h2
i 2Dno
( ) 11: w ~n wð
~ n Þ by (9).
1 X 1 jj j jj2 12: if w ~ n 2 then
pðs ¼ 1; Þ ¼ exp (8)
13: nns KNNðk; case; DnfcasegÞ // k-nearest
jDano j 2D ð2ph2 ÞD=2 2 h2
j ano
neighbors
where h denotes the Gaussian kernel width, and 14: for i 1 to bw ~ n c do
D ¼ 3pw þ 2. In practice, we use relative impor- 15: // importance-based synthetic minority oversampling
16: ð nns ; nnns Þ randomly choose one of the nns
tance14 to prevent learning instability caused by
17: for all j 2 indices of nns do
importance explosion and allow the model to learn
18: diff nns ½j n ½j
both normal and abnormal patterns. This is defined 19: new ½j n ½j þ RANDOMð0; 1Þ diff
as follows: 20: end for
21: d1 DISTð new ; n Þ // Euclidean distance
pðs ¼ 1j Þ 22: d2 DISTð new ; nns Þ // Euclidean distance
wð
~ Þ ¼
bpðs ¼ 1j Þ þ ð1 bÞpðs ¼ 0j Þ 23: nnew d2 nn þd1 nnns
d1 þd2
pðs ¼ 1; Þ 24: new ð new ; nnew Þ
¼ (9)
bpðs ¼ 1; Þ þ ð1 bÞpðs ¼ 0; Þ 25: newCases newCases [ fnewg
26: end for
where b 2 ½0; 1 is a hyperparameter. 27: else
28: // importance weighting
pffiffiffiffiffiffi for least square loss.
pffiffiffiffiffiffi
Importance-Based Minority Sampling 29: new ð w ~n n ; w ~ n nn Þ.
IW provides an efficient and quantitative learning norm 30: newCases newCases [ fnewg
for the data imbalance issue; however, the scarcity of 31: end if
anomalous patterns due to the limited sample size of 32: newD newD [ fnewCasesg
33: end for
abnormal data makes it difficult to robustly learn the
Output:newD - resampled dataset
anomaly. This is because the small sample size of
anomalous data results in a large variance of the esti-
mator, especially on the weighted loss function. Therefore, we extend the algorithm of synthetic
We overcome this issue by leveraging another per- minority oversampling for our regression task with the
spective of a data augmentation strategy, called syn- sampling criteria based on the importance. We present
thetic minority oversampling.8 In this strategy, the a new sampling algorithm for the regression task in
anomalous data are synthetically oversampled by gen- Algorithm 1. We focus on learning a training sample
erating a dividing point, which divides the data and its with the importance of 10 is the same as learning the
neighbors internally. Intuitively, the model learns noise ten identical samples. To implement this principle
that enhances its representative power through syn- in the algorithm, the importance is estimated for
thetic oversampling-based data augmentation. each training sample (line 11), and the training
However, the sampling criteria for the regression sample is oversampled by the number of importance
approach have not been discussed in previous studies (lines 12–26).
Parameter Learning and Dynamics and we counted the mobility logs at each time segment
Forecast as crowd dynamics.
In the learning process of the proposed model, we
minimize the least squared loss based on the Experimental Setups
resampled dataset. The learned parameter ^u can be T is set to 24 (i.e., one time segment denotes 1-h
obtained by solving the following optimization period). Following previous research,2 the start of a
problem: day was 3:00 AM, which had the least active popula-
tion, and the end was 3:00 AM the next day (i.e.,
" 0 #
1 XN 27:00 in 24-h notation). We used the scheduled crowd
^u ¼ arg min Lðnn ; fð n ; u ÞÞ þ gjjuu jj22 (10) dynamics observed one week in advance; thus, pd ¼ 7.
u 2Q N 0 n¼1
We also set pw ¼ 7 to consider the people’s schedule
patterns specified two weeks before the day of the
where L is the least squared loss, N 0 is the size of
event. For the regularization term, we set g ¼ 0:01. For
oversampled dataset newD, and gjjuu jj22 is the L2 regu-
the hyperparameter settings of the evaluation, we set
larization term with hyperparameter g.
ðlÞ b ¼ 0:1, nthre ¼ 6:0, h ¼ 5:0.
In the forecasting process, the crowd density y^d;t is
We adopted a mean absolute error (MAE)-based
ðlÞ ðlÞ
rebate from the inferred irregularity score ^nd;t as y^d;t ¼ metric to evaluate our model. The robustness of ordi-
ðlÞ ðlÞ ðlÞ
ð1 þ ^nd;t Þ
yd;t , where yd;t is the normal dynamics. nary MAE to outliers prevents proper evaluation of the
forecasting performance during anomalous crowding.
EXPERIMENTS To address this, we used the MAE conditioned by the
Dataset irregularity score-based threshold n to measure the
We evaluated the models based on two real datasets: performance on normal and anomalous crowding,
the GPS-based mobility logs and transit search logs. respectively. We defined normal sample (NS)-MAE,
The mobility logs were collected via a disaster alert which calculates the MAE with samples whose irregu-
mobile applicationa from Yahoo! JAPAN by masking larity score is less than n, and anomalous sample (AS)-
user IDs with dummies. Each record was completely MAE, which calculates the MAE with samples whose
anonymized, and characterized by timestamp, lati- irregularity score is more than n. If NS-MAE is evalu-
tude, and longitude. We aggregated the mobility logs ated with a small threshold n, the performance is eval-
in the POIs at each time segment, and counted their uated only for daily normality. Furthermore, if the AS-
number as crowd dynamics. Hence, we did not use MAE is evaluated with a significant threshold n, perfor-
any dataset including personally identifiable informa- mance is assessed on the exceptional anomalous
tion for analyzing the data and building the model. crowding.
For the scheduled crowd dynamics, we also uti-
lized transit search history data, which were searched Comparative Models
by passengers of train, bus, or taxi. These logs are CityProphet,4 Supervised-CityProphet (SCP),5 bilinear
gathered by the transit search engine,b also released Poisson regression2 (BPReg), CityOutlook6 served as
by Yahoo! JAPAN. Each record contains an anony- comparison models. BPReg is dedicated to forecasting
mized user ID, searching timestamp, scheduled time- normal dynamics, as stated in the model setting sec-
stamp, and destination. The destination mainly tion. CityProphet inputs context information and
denotes train stations, but it is sometimes set to pla- scheduled crowd dynamics and proposes two models:
ces where events occur. Therefore, we used such schedule-based population (SP) and descriptor-based
records as the transit search logs. We added the num- population (DP). Supervised-CityProphet is introduced
ber of search records per stations and time segment; in the preliminary section. CityOutlook’s parameters
therefore, we did not use any personal information for were optimized via importance-weighted least square
model learning. loss; it is the proposed CityOutlook+ without impor-
Over six months (from October 1, 2019, to March 31, tance-based synthetic minority oversampling.
2020), we used 58 POIs and their corresponding
stations, including the Greater Tokyo Area, stadiums, Experimental Results
shrines, and fireworks venues. Each POI is 600600 m2 ,
Overall Performance Comparison
Table 1 shows the overall evaluation in irregularity
a
[Online]. Available: htt_p://emg.yahoo.co.jp/ score forecast (Score) and crowd density forecast
b
[Online]. Available: htt_ps://transit.yahoo.co.jp/ (density).
TABLE 1. Performance comparison for forecasting one week in advance on 58 POIs across different thresholds.
NS-MAE AS-MAE
Score Density Score Density Score Density Score Density Score Density Score Density
4
CityProphet 3.684 210.135 3.684 210.488 3.694 210.586 25.749 234.710 29.530 165.390 33.751 129.372
SCP (Baseline)5 0.566 91.951 0.570 92.004 0.575 92.060 15.695 109.380 23.871 144.344 34.063 140.510
CityOutlook6
0.796 93.881 0.801 93.921 0.805 93.968 14.482 102.163 23.132 138.420 32.698 132.301
CityOutlook+ 0.772 98.854 0.778 98.901 0.783 98.948 15.057 102.122 23.199 137.983 32.073 126.718
CityOutlook+ outperforms CityProphet and SCP by produced a quantitative forecast on both crowd
up to 9.82% and provides the same level of accurate dynamics and irregularity scores and detected the
forecasting in normal dynamics as SCP. It shows occurrence of events, whereas CityProphet was
a 4.06%, 2.81%, and 5.84% improvement over SCP for unstable, and SCP failed to capture congestion.
n ¼ 10:0, 15.0, and 20.0, respectively, in the irregularity
score forecast. The proposed method improves 6.64%,
4.41%, and 9.82% on crowd dynamics forecast for
n ¼ 10:0, 15.0, and 20.0, relative to SCP. CityOutlook+
outperforms CityOutlook by 4.2% in crowd density
with n ¼ 20:0 and the performance improvement on
AS-MAE became larger with higher anomalous thresh-
olds. We can conclude synthetic sampling approach
was shown to be superior to just weighting by impor-
tance as CityOutlook, providing more forecasting
robustness on anomalous patterns.
On the contrary, existing methods fail to forecast
normal and abnormal dynamics simultaneously. SCP
has performance drawbacks in AS-MAE, while BPReg
cannot accurately forecast anomalous crowds, and
CityProphet exhibits instability in both forecasts.
Results are shown in Figure 3(a). We also visualized 4. T. Konishi et al., “Cityprophet: City-scale irregularity
the forecasting on November 8, 2019, where no events prediction using transit app logs,” in Proc. ACM Int.
were held, and confirmed that the proposed method Joint Conf. Pervasive Ubiquitous Comput., 2016,
provided an accurate forecast for normal crowd pp. 752–757.
dynamics similar to comparative models. 5. S. Anno et al., “Supervised-cityprophet: Towards
accurate anomalous crowd prediction,” in Proc. 28th
Int. Conf. Adv. Geographic Inf. Syst., 2020, pp. 175–178.
DISCUSSION
6. S. Anno et al., “Cityoutlook: Early crowd dynamics
Uncertainty of Early Forecasting: Our method involves
forecast towards irregular events detection with
uncertainty in the early forecast due to using the follow-
synthetically unbiased regression,” in Proc. 29th Int.
ing external information: transit search logs and weather
Conf. Adv. Geographic Inf. Syst., 2021, pp. 207–210.
information. As shown in prior work,6 the earlier is from
7. H. Shimodaira, “Improving predictive inference under
the event date, the lower the search volume and the
covariate shift by weighting the log-likelihood
lower the indicator power of congestion. Therefore, ear-
function,” J. Stat. Plan. Inference, vol. 90, no. 2,
lier prediction (e.g., two weeks in advance) might result in
pp. 227–244, 2000.
lower prediction performance. We also used weather
8. N. V. Chawla et al., “Smote: Synthetic minority over-
information to model the normal crowd density, but the
sampling technique,” J. Artif. Intell. Res., vol. 16,
weather forecast for the event day one week in advance
pp. 321–357, 2002.
may not be accurate. In a practical scenario, it is advis-
9. P. Branco et al., “SMOGN: A pre-processing approach
able to calculate multiple forecasts for possible weather
for imbalanced regression,” in Proc. 1st Int. Workshop
patterns and plan countermeasures comprehensively.
Learn. Imbalanced Domains, Theory Appl., 2017,
pp. 36–50.
CONCLUSION 10. Z. Fan et al., “Citymomentum: An online approach for
In this study, we proposed CityOutlook+ for early crowd behavior prediction at a citywide level,” in Proc.
crowd dynamics forecast one week in advance. Com- ACM Int. Joint Conf. Pervasive Ubiquitous Comput.,
pared with the recent advances in crowding forecast- 2015, pp. 559–569.
ing systems, the proposed method provides an 11. H. He and Y. Ma, Imbalanced Learning: Foundations,
effective learning strategy for anomalies, addressing Algorithms, and Applications. Hoboken, NJ, USA:
the problem of data imbalance and scarcity of anoma- Wiley-IEEE Press, 2013.
lies by the importance-based minority resampling. The 12. C. Elkan, “The foundations of cost-sensitive learning,”
experimental results on massive real datasets demon- in Proc. Int. Joint Conf. Artif. Intell., 2001, pp. 973–978.
strate the superiority of our model over the existing 13. L. Torgo and R. Ribeiro, “Utility-based regression,” in
methods. Our predictive methodologies will enhance Proc. Knowl. Discov. Databases, 11th Eur. Conf. Princ.
the accuracy of real-world crowd congestion forecast- Pract. Knowl. Discov. Databases, 2007, pp. 597–604.
ing, contribute significantly to improving crowd secu- 14. M. Yamada et al., “Relative density-ratio estimation for
rity and infection control measures, and stimulate robust distribution comparison,” in Proc. Adv. Neural
further research within the community utilizing GPS- Inf. Process. Syst., 2011, pp. 594–602.
loggable pervasive devices. ri, “Analysis of kernel mean
15. Y.-L. Yu and C. Szepesva
matching under covariate shift,” in Proc. Int. Conf.
Mach. Learn., 2012, pp. 1147–1154.
REFERENCES 16. H. Akaike, “Fitting autoregressive models for prediction,”
1. T. Xia and Y. Li, “Revealing urban dynamics by learning Ann. Inst. Stat. Math., vol. 21, pp. 243–247, 1969.
online and offline behaviours together,” in Proc. ACM 17. Y.-C. Chen, “A tutorial on kernel density estimation and
Interact. Mobile Wearable Ubiquitous Technol., 2019, recent advances,” Biostatistics Epidemiol., vol. 1, no. 1,
pp. 1–25. pp. 161–187, 2017.
2. M. Shimosaka et al., “Forecasting urban dynamics with
mobility logs by bilinear poisson regression,” in Proc. SOTO ANNO is currently working toward the Ph.D. degree at
ACM Int. Joint Conf. Pervasive Ubiquitous Comput., the Tokyo Institute of Technology, Tokyo, 152-8550, Japan.
2015, pp. 535–546.
His research focuses on urban computing. He received his
3. R. Jiang et al., “Deepurbanevent: A system for
M.E. degree from the Tokyo Institute of Technology. He is
predicting citywide crowd dynamics at big events,” in
the corresponding author of this article. Contact him at
Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discov. Data
Mining, 2019, pp. 2114–2122. [email protected].
KOTA TSUBOUCHI is a data scientist and a senior researcher MASAMICHI SHIMOSAKA is with Tokyo Institute of Technol-
with Yahoo JAPAN Research, Tokyo, 102-8282, Japan. His ogy, Tokyo, 152-8550, Japan, as an associate professor since
research interest focuses on data analysis including human July 2015. He received his Ph.D. degree from the University of
activity logs, such as location information, search logs, shop- Tokyo in 2006. He is a member of ACM and IEEE. Contact
ping history, and sensor data. He received his Ph.D degree him at [email protected].
from the University of Tokyo, Japan. Contact him at
[email protected].
In the future, it is possible that universities could use sensing devices to help
manage students and classroom disruptions during pandemics. Wearable sensing
devices have shown the capability to identify oncoming illnesses. In particular, the
Oura Ring, a smart-ring with sleep-tracking features, can detect cases of COVID-19
before users are aware of it. While these devices show promise, they are only
effective for pandemic preparedness and response if users pay attention to the
outputs of the devices and consistently use them. In this article, we discuss the
results of a year-long in-the-wild study with 35 participants at a university who wore
Oura Rings. After an orientation, the participants wore their rings with no
restrictions or minimum wearing requirements. By retroactively looking at how
participants used the rings for monitoring their health, we identify strategies and
potential problems with employing wearables for health monitoring in universities.
C
OVID-19 has been a challenging event for device, is effective at identifying infection in the early
higher education institutions worldwide. Uni- stages,7 and can even be used to avoid work short-
versities were forced to cancel classes, alter ages in a military setting.8 The Oura Ring can effec-
instruction methods, and restrict student movement.1 tively monitor the health of the wearer by using
This not only stressed university finances,2 but also sensors to measure sleep and sleep stages, heart rate,
led to higher levels of stress in students due to fears heart rate variability, and body temperature,7 thus pro-
of receiving less effective instruction,3 changes in viding adequate data for machine learning models to
instruction methods,4 and changes in how emotions classify positive cases. It would seem that a straight-
are regulated.1 Given the associated societal, psycho- forward solution to pandemic preparedness for univer-
logical, and financial costs associated with campus sities would be to implement a device like the Oura
closures and remote learning, it is vital that universi- Ring to manage campus interactions.
ties work to mitigate the impact of future pandemics. The reality is not so straightforward. While the
A possible solution is to have students use wearable technical capability of such a system might be feasi-
physiological sensing devices to reduce the need for ble, the political and practical realities are difficult.
campus closures. First, it is hard or even impossible for universities in
In COVID-19’s case, wearables can be used to iden- most countries to demand participation from its stu-
tify a positive case before the user is aware of infec- dent body. Participation needs to be voluntary, which
tion.5,6 The Oura Ring,a a sleep and health tracking requires the students to both accept the reasons for
wearing the ring and use it correctly. Student attach-
ment to the device would likely be connected with the
perceived usefulness of the device.9 In addition, there
a
[Online]. Availabe: htt_ps://ouraring.com is a need to carefully understand how the device
might alter student behavior, since there is the possi-
bility that interacting with a wearable like the Oura
1536-1268 ß 2023 IEEE
Ring and its interface can impact individuals differ-
Digital Object Identifier 10.1109/MPRV.2023.3322460
Date of publication 30 October 2023; date of current version ently.10 There are potential privacy and security risks
30 November 2023. to any wearable regime that would need to be
October-December 2023 Published by the IEEE Computer Society IEEE Pervasive Computing 35
PANDEMIC PREPAREDNESS WITH PERVASIVE COMPUTING
managed.11 It also creates the issue of the offloading usefulness of the device and willingness to engage in
of the burden of pandemic management to nongov- a COVID-19 tracking system. The results show that
ernmental organizations and corporations, thereby there is a significant portion of students who both use
possibly entrenching existing problems and creating the ring enthusiastically and endorse the use of its
inequalities across society.12 data for pandemic mitigation. At the same time, there
Another confounding factor is the reality that a are also students who do not use the ring consistently
pandemic is also a social phenomena. In Japan, and those who effectively stop using it, thus rendering
where this study takes place, university students it ineffective for pandemic prevention. Based on these
have been seen as having initially high but waning results from a self-selected population, we discuss
adherence and recognition of pandemic measures possible paths that most universities could take
by following rules such as social distancing, mask toward employing wearable sensing devices within
wearing, and hand washing.13 This may be because their campuses.
of social costs that could have been incurred at the
beginning of the pandemic. For example, students
at a Japanese university, where an early outbreak of OURA RING STUDY
COVID-19 occurred, suffered backlash from the pub- The Oura Ring project at Osaka Metropolitan Univer-
lic and media, regardless of their connection to the sity focuses on the process of learning, mastering,
outbreak network.14 and transference of knowledge within human groups
The above highlights the challenge of implement- with the help of artificial intelligence. For this project,
ing pervasive systems in a university. The technical we sought out an unobtrusive device that would be
capability to implement such systems is present, but able to help group researchers track readiness for
the technology proves elusive to real-world implemen- learning. We chose the Oura Ring, a commercially
tation due to cultural and institutional bottlenecks. In available sleep-tracking ring, which includes the abil-
order to move toward complete or even partial imple- ity to track sleep stages, heart-rate and heart-rate
mentations, it is vital to understand how students variability (HRV), body temperature, and it has an
interact with the devices on their own and understand accelerometer.10 The dedicated cloud and accompa-
how they perceive the data to which they are exposed nying API make it easy to gather data from users.
in order to build comprehensive policies. We consider Another major benefit of the Oura Ring, and reason
three possible paths toward implementing wearables for its selection, is that it is unobtrusive. There is no
for pandemic prevention: distracting screen as found on a smart watch, and
the battery can last up to five days. All studies within
› The Absolutist Approach: All members of a cam- the project have been approved by the Ethics Com-
pus are required to use a selected wearable and mittee of the University.
there is a centralized system to monitor for Since the beginning of the usage of the device in
symptoms. the summer of 2021, approximately 100 students at
› A Comprehensive System for Willing Students: Osaka Metropolitan University have volunteered to
Students who are willing to submit data for wear the device. Participants are recruited through
health monitoring are allowed to do so and those presentations in classes at the School of Engineering
who are unwilling are allowed to opt out. A cen- and include both bachelor and master students. Par-
tralized system provides participating students ticipants are not directly paid for wearing the ring, but
with monitoring. can receive remuneration for taking part in short-term
› Using Wearables to Improve General Health: experiments and for answering surveys. Enrollment in
Willing students are given wearables to monitor the study happens on a continuing basis.
their own health with no centralized system. When participants first receive the Oura Ring, they
go through an orientation covering what the ring
In this article, we present the results from a cohort measures and how data can be accessed from the
of 35 Japanese university students who voluntarily application, web, and API. Participants are also pre-
wore the Oura Ring for approximately one year (Fall sented with on-going research, which introduces the
2021 to Winter 2022) in order to give context to under- ability of the Oura Ring to track sleep15 and predict
standing the challenges and opportunities for using a COVID-19.16 After orientation, participants are allowed
wearable health tracking device for the purpose of to use the ring as much or as little as they see fit, and
pandemic preparedness. In particular, we discuss the are also allowed to drop out of the program with no
results of two surveys, which looked at perceived penalty.
The participants allow the investigators to Figure 1 also demonstrates the changes the Oura
access their ring data and publish their anonymized Ring can pickup. The three days presented are
data by signing an informed consent form. The guar- 2–4 August 2021. The subject received a vaccination
antees in this consent form are based upon Japa- injection on the 2nd and had a corresponding fever
nese data privacy law and assure the secure over the next two days. We can see on 2 August that
management and privacy of personal data. Partici- readiness was “Good” and that both resting heart-rate
pants are given pseudo-anonymous IDs and all data and HRV were within a healthy range according to the
that is collected are then anonymized once again Oura Ring algorithm (43 BPM and 51 MS), respectively.
before analysis. Researchers who may have grading In addition, body temperature was down 0.6 C com-
authority over a student do not know the partici- pared to the subject’s baseline temperature. This tem-
pant’s ID. Management of IDs are conducted by perature is highly correlated with good quality sleep. A
researchers without grading authority. Participants much lower readiness score is visible on the days fol-
are further guaranteed that their data cannot be lowing the vaccination. Even on the fourth, when sleep
used to harm them in any way. For instance, if a stu- has recovered, we see a readiness score of 39. This is
dent came to the campus with a fever detected by due to the increase in resting heart-rate (60 BPM) and
the Oura ring (a violation of campus policy), this decreased HRV (28 MS), as well as the increase in
information could not be used as proof of an infrac- body temperature of +2.2 C. Although other updates
tion. After one year, participants must reaffirm their are possible with the ring throughout the duration of
approval of the data collection and their participa- the day, especially those related to activity, move-
tion in the project by signing a new consent form. ment, and naps, the emphasis of the application is on
The setup of the Oura Ring project creates sev- the readiness of the subject after waking up in the
eral interesting opportunities and limitations. The morning.
participants are organized in such a way that we On the other hand, Figure 2 shows an example of
can obtain data from an in-the-wild setting. Partici- the data visualization that can be harvested from the
pants have low levels of interference from the inves- Oura API. In this case, the subject has received posi-
tigators and their behavior can be viewed as natural tive COVID-19 results on 2 August 2022. From left to
and representative. However, the students are a right, we can see the trend in heart rate, where there
self-selected population who may be more inter- is a spike on the 2nd and 3rd before an elevated pla-
ested in wearable technologies and more capable to teau from the 5th onwards, a decrease in HRV from
understand the data and therefore more willing to the 2nd onward with an accompanying plateau, and a
share it. In addition, there is also the possible spec- spike in temperature followed by a return to normalcy.
tre of the Hawthorne Effect, whereby the authority On the far right, we can see the sleep stage data for
of research faculty having access to physiological the subject. In summary, the visualization shows the
data might alter behavior. In the context of trying to acquisition of COVID-19 as well as its lingering side-
understand how the participants used the Oura effects.
Ring for COVID-19, these are important factors to In both cases, the summary data shows what we
consider. would expect from a wearable like the Oura Ring.
The ring is able to give data related to the health of
Interacting With Oura Ring Data the wearer and may serve to confirm or contradict
There are several ways to access Oura Ring data. On how the wearer feels about their health. This might
the user side, a user has to open up their Oura appli- be useful in a case where an individual suffers from
cation on their smartphone. After doing this, the psychosomatic illnesses, but the question remains:
data will sync from the ring and the interpreted Is it not the case that the Oura Ring confirms what
results will be accessible to the user. Figure 1 shows the user already knows; they are sick? While classi-
an example of two types of application pages over fiers may work at detecting COVID-19 before illness
three days. The top three images show summary strikes, this requires the capability and willingness
screens, where the Oura interpolated scores for to collect and analyze the data. At this point, the
readiness and sleep are presented. Also shown are feedback to the user and most organizations would
brief descriptions of what the user should do with be almost retrospective. It would still useful to have
the information. On the bottom, we see three such data, but it would be unlikely to prevent
images for the corresponding days of the raw data asymptomatic but infectious individuals from com-
that is available to the user. More detail can be ing to their campus. Based on this, in the next sec-
gained by clicking each feature. tion we look at the self-reported and usage data of
FIGURE 1. An example of readiness screens available to an Oura Ring User on the Oura application.
FIGURE 2. An example of a confirmed COVID-19 case with data obtained from the oura cloud.
35 Oura Ring participants in order to better under- participants had to first agree to receive further sur-
stand how the wearables could be used for pan- veys. In order to reduce any discomfort regarding the
demic preparedness at the university. subject matter, taking the COVID Survey was
completely optional. In total, 35 participants answered
the general survey and 17 answered the COVID survey.
SURVEY RESULTS
Once participants reach one year of ring usage, they
are asked to come to the research office and reaffirm General Survey Results
their agreement to share their data and continue to The general survey coincided with a collection of data
wear the ring or drop out of the program. At this time, from the participants. As stated before, this cohort
they are requested to take a short survey about their was allowed to use the ring completely freely. This
usage of the Oura Ring and their impressions of the means that at no time was wearing the ring required.
ring’s usefulness. For their time and effort required to As shown in Figure 3, usage was not consistent among
visit the research office in person, participants were most users. Out of the 35 participants, two did not use
rewarded with a 3000 JPY Amazon Gift Card (approxi- the ring and dropped out of the study. In addition,
mately 20 USD$). In addition to this General Survey, a another eight participants had issues with syncing
separate survey about the ring and COVID-19 was data and updating firmware, leaving 25 participants
given to the participants through the web. The with a majority of their year with valid data. This
FIGURE 3. Usage patterns of the 35 participants. A blue dot along the time axis means either a daily activity or daily readiness
summary was available for that day and user.
indicates a significant user error rate even within a whether the belief in changing habits aligned with
self-selected pool of participants. higher usage of the ring. Using the synced data from
In addition, we looked at the relationship between the entire calendar year for 2022, we found there was
ring wearing and the number of reported new Covid-19 no significant differences between the groups, mean-
positive cases for Osaka Prefectureb as seen in Figure 4. ing that participants who professed they changed
In this figure, we define usage as daily activity or daily their habits did not wear the ring more than other par-
readiness being recorded for the calendar day. If a par- ticipants. A separate analysis looking at the sleep of
ticipant does not have the data recorded, it means the the participants found that reported sleep changes
ring was not worn or the battery was out of charge. may not improve sleep outcomes.17
Figure 4 starts at 1 January 2021, a time period in which
all participants had possessed the ring long enough to COVID Survey Results
receive valid data, and continues to the end of 2022. In order to better understand how the participants
Each data point on the user usage side is the aggregate acted on their health data during the pandemic, a sec-
number of users who had available data for that day. ond survey was sent out approximately two months
Overall, we see a clear downward trend in usage as the after the initial survey. This survey was given on an
year progresses. A correlation test was insignificant and online platform to the 33 participants who agreed to
had a R2 of 0.0136, showing no relationship between receive follow-up questions, and the participants were
the amount of local Covid-19 cases and ring usage. not required to give their identity due to possible reluc-
The survey and individual data do confirm, how- tance to answer questions about a taboo subject
ever, that there is a core of participants who firmly like declaring COVID-19 infections. The COVID survey
believe that the ring is beneficial. For instance, 13 of explored whether participants made health decisions
the participants claimed the ring had beneficially based on the data from the Oura Ring and whether they
changed their daily habits, while a further nine indi- would be willing to submit their Oura Ring data to a the-
cated that the ring had probably changed their habits. oretical health monitoring service run by the university.
Ten claimed no changes to their habits. A Kruskal– In total, 17 of the eligible 33 participants completed the
Wallis test was performed on the three groups to see survey. Out of these 17 users, 5 confirmed they were
infected with COVID-19 while wearing the ring.
The first questions dealt with whether participants
b
[Online]. Availabe: htt_ps://www3.nhk.or.jp/news/special/ used the Oura Ring to measure their health, followed
coronavirus/data-widget/ by a question asking if they had made a specific health
FIGURE 4. Cohort Oura Ring Usage (as defined by the availability of daily activity or daily readiness values) and the number of
reported Positive Covid-19 Cases in Osaka.
decision based on the Oura Ring data. As Figure 5 decisions, which promote long-term health, rather than
shows, half the participants (N=9) claim to use the ring making immediate decisions based on the data.
at least “Often” for monitoring their health, while the The next questions dealt with the willingness of
rest use the ring “Sometimes.” When asked if they had the participants to submit their data to the university
made a health decision based on the Oura Ring, 10 of for health monitoring if a pandemic situation became
the participants claimed they had. The respondents serious. Surprisingly, 16 out of the 17 respondents
who answered “Sometimes” and “Often” to the health declared that they would be willing to submit their
monitoring question were split between “Yes” and “No,” data with only one user answering they would not.
and one answer for “Unknown.” Participants who This shows an almost universal willingness to accept
answered “Always” and “Frequently” all claimed health the potential efficacy of the ring as being an effective
decisions. It should be noted that this survey relies on tool during a pandemic, but it does not match the
self-reported awareness of making health decisions wearing habits as seen in the responses to the surveys
based upon data, which may not be accurate.17 and in Figure 3.
When looking at the health decisions claimed based
on the Oura Data, the most popular response was that
of ‘changing sleeping habits” with eight respondents. DISCUSSION
Two respondents claimed they ‘changed exercise hab- The survey and usage results show a diversity of atti-
its” and one claimed they ‘changed their diet.” Finally, tudes toward wearing the Oura Ring. The pool of stu-
two respondents claimed they made the acute decision dents showed mixed results, with almost a third of the
of “staying home” based on the Oura Ring data. Clearly, participants either dropping out of the study or not
the responses show a trend toward macro health using the wearable correctly. There is also a visible
Limitations
In this article, we analyze the usage of the Oura Ring by
university students in order to understand how univer-
sities might be able to employ wearable health tracking
devices for pandemic preparedness. The scope of this
work is limited to examining unguided usage by stu-
dents in an in-the-wild context. That is, the usage ana-
lyzed does not come from a specific initiative meant to
prepare for or mitigate a pandemic, but rather from a
context in which participants simply had access to a
device and the information it can provide. Therefore,
there are some important limitations to consider.
First, the participants are self-selected and have a
high level of technical literacy. It cannot be assumed
FIGURE 5. Relationship between amount of use of Oura for
they will accurately represent a general student popu-
health monitoring and declaration of making a health based lation. In addition, this self-selected population may
decision based on Oura. want to know more about their health and what can
be quantified more than most of the other users.
These are also participants who are likely to want to
decrease in ring usage as time passes, indicating that join experiments and are willing to have their data
the ring’s novelty and usefulness wanes over time. monitored. Finally, participants may believe that they
The nature of this study as being in-the-wild allows change their habits based on the device data, but may
us to see how students perceive their usage of the fail to do so in the reality.17
device and how they actually wore it. From a perspec- Within the context of these limitations, the anal-
tive of physiological sensing data having a strong ysis of this dataset allows us to explore several
impact on users, the results are disappointing. Fewer important research questions for considering the
than half of the participants use the device consis- possibility of implementing wearables in universities
tently. This includes some users who believe the for the purpose of pandemic preparedness. First, we
device aides them in their health choices. This result can explore how technically literate students inter-
seriously limits the device’s usefulness for pandemic act with the device and the data it produces. To a
mitigation. certain extent, this represents what would happen
However, a core group of students seem to believe with the most ideal cohort of students, thus illumi-
the device can be used to make beneficial health deci- nating the issues that would likely be experienced in
sions. There is also a strong willingness among some a general population. Second, the analysis shows
respondents toward submitting data for the preven- how a cohort uses the device and information in a
tion of COVID-19. This indicates that with some orga- self-guided manner, thus informing policy designs
nization, it might be possible for a university to use when considering device implementation at a
the wearable for pandemic mitigation and also shows university.
that using such a device may empower some students
in handling their health.
One important consideration is that the downward WEARABLE HEALTH TRACKING
trend of ring usage by the study participants is possi- DEVICES AT UNIVERSITIES
bly a result of the design of the study, and one that Based on the results of our in-the-wild user study, we
could be mitigated by an institution’s interventions. discuss the potential approaches that universities can
Participants in this study were not guided in how they take toward implementing wearable technologies for
used the ring or directly given instructions or prompts pandemic preparedness and mitigation. Even with a
beyond the orientation. While factors such as enjoy- self-selected cohort, the usage of the wearable device
ment and ease of use are important for encouraging showed considerable attrition. This would indicate
usage,9 there is also a need to explicitly design for the that any university that gave wearables to its students
encouragement of positive contexts with device with the hope that organic usage would be prevalent
usage.18 For example, building a vicarious experience, would almost certainly see the scheme fail. Thus, we
whereby students see other students successfully discuss the results with the possible paths discussed
using the ring would likely increase usage.18 in the introduction.
In the Absolutist Approach, we imagine that all the data may create a campus that is better prepared
students are required to use a wearable and submit to handle pandemics. By encouraging the positive
its data. As shown in the military context,8 it is possi- contexts of the device usage, higher device usage
ble to track sickness within a group in real time. With could be achieved.18 We argue this path is what uni-
enough organizational skill and the ability to process versities should take in the immediate future.
the data, it should be feasible to minimize campus dis-
ruptions. In order for this approach to work, the imple- CONCLUSION
mentation would likely have to be a continuous
It seems unlikely that universities will implement wear-
concern regardless if there was a pandemic or not at
able devices for pandemic prevention. The endeavor is
the moment. It does seem that, based on the results
expensive, difficult to setup, and even a probable com-
of the COVID survey, there would be some willing
pliant population has a sizable number of participants
users. However, forcing all students to partake would
dropout from using the wearable. There are also a num-
ensure that all students would share in the same pri-
ber of political and societal risks that may accompany
vacy risks and benefits from the system.
any rule-based implementation. However, there is a
However, the attrition rates seen in the user study
great opportunity present to start implementing dedi-
would likely occur in the general student population.
cated health-tracking wearables for university students
Even if a university were to convince students of man-
to encourage better understanding of general health.
datory usage, the university would take on the risk of
Based on the results from our study, there is a popu-
failure. Predicting infections before symptoms mani-
lation which can gain greater access to understanding
fest is possible, but it is not infallible and can some-
their health through access to the data. If there is an
times lead to false positives or false negatives.7 If
effort to increase education and give meaning to the
stakeholders expect it to be completely accurate, the
data, the potential to increase this population exists. At
limitations of this Absolutist Approach may cause a
the current moment, it is infeasible to see a situation in
loss of trust. That is, the invasive nature of the Abso-
which universities could prevent campus shutdowns via
lutist Approach can only be justified by perfect or
a regime of wearables and health tracking. But it is a via-
near-perfect results. Even then, this approach may not
ble possibility that giving access to and education about
be acceptable in many societies.
a health-tracking application could lead to a student
The next approach would be an opt-in/opt-out
population that is better prepared to handle the physical
approach. In this system, students who submit to
and emotional problems that accompany a pandemic.
using the wearable and allowing the data to be proc-
In this article, we discussed the viability of using
essed would be allowed campus access if a pandemic
wearable health tracking devices for pandemic pre-
were to occur. This approach shares many of the
paredness in a university setting by looking at the les-
drawbacks of the first approach in regards to efficacy.
sons learned from a year-long cohort of students
In addition, it also could encourage a splintering within
wearing the Oura Ring. The results show that there is
campus culture or force unwilling groups to submit to
promise to improving student health outcomes
monitoring via social pressure.
through using the device. In the future, an important
The final approach would be to use the wearable
step will be to implement design and organizational
as an educational tool for students to wear in order to
interventions, which can help to increase and main-
better understand their own health and to make bet-
tain student usage of the device.
ter choices for their well-being overall. This may also
allow for a more natural progression towards services
that provide predictive services for personal use as ACKNOWLEDGMENTS
the technology advances. As the surveys showed, This work was supported in part by the Japan Society
there are going to be enthusiastic users of the tech- for the Promotion of Science (JSPS) Grant-in-Aid for
nology who can use it to benefit their health. Scientific Research (B) under Grant 20H04213 and the
The argument here is that pandemic preparedness Grand challenge of the Initiative for Life Design Inno-
at a university should focus on the overall health of vation (iLDi).
students in the community, which could impact the Approval of all ethical and experimental proce-
actual progression of an infection within a student.19 dures and protocols was granted by the ethics com-
This includes encouraging beneficial changes to sleep mittee of the Graduate School of Engineering, Osaka
habits as well as taking appropriate breaks when read- Prefecture University, and the ethics committee of the
iness is low. A university that sponsors the use of Graduate School of Engineering, Osaka Metropolitan
wearables and informs its users how to understand University.
More than 6 billion smartphones available worldwide can enable governments and
public health organizations to develop apps to manage global pandemics. However,
hackers can take advantage of this opportunity to target the public in nefarious
ways through malware disguised as pandemics-related apps. A recent analysis
conducted during the COVID-19 pandemic showed that several variants of COVID-19
related malware were installed by the public from nontrusted sources. We propose
the use of app permissions and an extra feature (the total number of permissions) to
develop a static detector using machine learning (ML) models to enable the fast-
detection of pandemics-related Android malware at installation time. Using a
dataset of more than 2000 COVID-19 related apps and by evaluating ML models
created using decision trees and Naive Bayes, our results show that pandemics-
related malware apps can be detected with an accuracy above 90% using decision
tree models with app permissions and the proposed feature.
T
he advent of global, real-time telecommunica- browsing apps,3 impacting healthcare systems, financial
tions along with the growth of mobile cellular services, government and media outlets, and the public.
technology in the last 25 years have helped to Cybercrime increased dramatically during the COVID-19
develop new alternatives to prepare for emergent dis- pandemic, with an estimated impact of more than 6 tril-
eases and their epidemics (and possibly pandemics). lion US dollars worldwide in 2021.4 This major increase in
Various types of wearable/portable sensors that can cybercrime activity during 2021 was due to the massive
be connected via Bluetooth to a smartphone provide online activity caused by worldwide lockdowns and
an alternative to inform, diagnose, track, treat, and restrictions in movement to mitigate the COVID-19 pan-
manage epidemics and global pandemics.1 With the demic disease,4 and was performed by not only solitaire
emergence of the COVID-19 pandemic, more than hackers and hacking groups, but also by major state-
2000 COVID-themed mobile apps (not including mal- sponsored cybercriminals.
ware) were developed for different purposes around The availability of more than 6 billion smartphones
the world as of December 2020.2 during the COVID-19 pandemic1 and their use in future
The COVID-19 pandemic has been exploited by pandemics (and other public health emergencies)
cybercriminals using different threats, attacks, and make them an attractive target for hackers to
channels including distributed denial of services attacks release malware disguised as pandemics-related apps
(DDoS), malicious domains and websites, malware, ran- through nontrusted channels (e.g., via social media,
somware, spam emails, malicious social media messag- SMS/MMS, websites) fueled by disinformation. More-
ing, business email compromise, mobile apps, and over, pandemic-related malware installed during emer-
gencies could be used to enable DDoS attacks on
other systems. Thus, to prepare for future pandemics,
we must develop systems to help mitigate the effects
1536-1268 ß 2023 IEEE
of emergent diseases and protect the global cyberin-
Digital Object Identifier 10.1109/MPRV.2023.3321218
Date of publication 19 October 2023; date of current version frastructure during the containment and mitigation of
30 November 2023. pandemics.
October-December 2023 Published by the IEEE Computer Society IEEE Pervasive Computing 45
PANDEMIC PREPAREDNESS WITH PERVASIVE COMPUTING
In this study, we seek to answer the following permissions. In their work, Wang et al. did not pro-
research question: can pandemic-related malware be pose specific approaches to detect malware for
detected using a static detector based on Android pandemic-related apps. Similarly, Sun et al.11 ana-
permissions and machine learning methods. lyzed the security and privacy of 34 COVID-19 con-
tact tracing apps with the goal of recommending
Research Contributions of This Work security practices in the development of COVID-19-
We summarize our research contributions as follows: themed apps, and the development of a tool called
COVIDGUARDIAN based on static analysis and data
› We review mobile and smartphone use cases flow analysis to find vulnerabilities of trusted
and how cybercriminals can exploit mobile apps COVID-19-themed apps. Recently Manzil and Naik12
during epidemics and pandemics. used app permissions with machine learning and
› We propose and evaluate the use of Android app achieved an accuracy of 81% and 83%, respectively,
permissions combined with machine learning with a dataset of 100 app samples using random for-
(ML) to enable static detectors for the fast est and decision trees. It is worth pointing out that
detection of pandemics-related mobile malware the work of Manzil and Naik focused only on
at the edge. COVID-19-themed apps, and their work is the most
› We propose the use of the total number of app similar one to ours but with the following differen-
permissions as an extra feature in the permis- ces: 1) we used a bigger dataset of COVID-19 related
sion-based static detector (in addition to the malware to train our models; 2) we proposed and
app permissions themselves) as an approach to explain why the use of the total number of permis-
increase the accuracy of the detection. sions as an extra feature (in addition to the app per-
missions) is helpful in detecting pandemics-related
We organize the rest of the article as follows. In malware; and 3) our approach, while targeting spe-
the following section, we review related works. Later cific type of apps (pandemics-related malware),
we present use cases of mobile phones and smart- does not incur significant overhead for detection
phones apps in epidemics/pandemics. Then, we when compared with other approaches (e.g., the
review mobile malware during pandemics with a approach proposed by Ficco9 that requires more
focus on the COVID-19 pandemic. Next, we propose features and an ensemble of ML models to detect
the use of app permissions and ML as a fast malware, or the use of permissions with Intents and
approach to detect malware in Android smart- API calls13) that are more resource intensive for the
phones. Finally, we make some concluding remarks Android Operating System’s (OS).
and some final recommendations on protection to
smartphone users in future pandemics.
PANDEMICS AND CELLPHONES/
SMARTPHONES AND THEIR
Related Work
LIMITATIONS
Although the use of general Android permissions to
detect malware has been proposed in the past using Smartphone Apps’ Use Cases During
permissions with machine learning,5 permissions with Pandemics/Epidemics
Application Programming Interface (API) calls/ Recently, the COVID-19 pandemic has highlighted the
graphs,6 comparison of permission patterns,7 intents use smartphones as tools, which can be used to man-
and permissions,8 using multiple detectors and obser- age public health emergencies. However, past epidem-
vation windows,9 and other techniques combining per- ics and pandemics had leveraged the use of
missions with other static and dynamic approaches,10 smartphones and data generated by mobile cellular
these works were developed before the onset of communications (Table 1). For example, in 2003, a
COVID-19 pandemic when there was no knowledge on Hong Kong mobile operator launched a location-
how global pandemic themed apps (both benign and based service (LBS) via short messaging service (SMS)
malign) were implemented. and wireless application protocol (WAP) to notify sub-
During the COVID-19 pandemic, Wang et al. scribers when a nearby building was contaminated
researched and collected a dataset of COVID-19- with the severe acute respiratory syndrome (SARS)
themed apps and types (e.g., malware and not mal- during the 2003 SARS outbreak in Asia.14 Radio Fre-
ware) with more than 2000 unique apps collected quency IDentification (RFID) was used during this
from trusted and nontrusted sources including both SARS outbreak in Singapore for contact tracing inside
benign and malign apps2 and analyzed their hospitals,14 allowing health officials to identify ten
TABLE 1. Mobile and smartphone’s use cases during epidemics and pandemics in healthcare-related applications.
times faster who an infected person had contact with developed a smartphone app to detect the presence
than using other methods. A similar approach was of Malaria parasites (P. falciparum) on digital photos
used during COVID-19 in different parts of the world captured using a smartphone’s camera placed on a
using bluetooth low energy (BLE).6 microscope’s eyepiece lens when a user places a slide
Using only anonymized mobile phone data from with human blood specimens to be examined under
cellular operators, Bengtsson et al.16 created a model the microscope.18 Natesan et al.19 developed a similar
to survey and track the spread of cholerae in the 2010 approach for the detection of Ebola and Marburg
Haiti epidemic. Their research showed that mobile viruses.
operators’ data can help to track and contain the To adhere to treatment using AntiRetroviral Ther-
spread of infectious diseases and serve as a surveil- apy (ART) for Human Immunodeficiency Virus (HIV)
lance mechanism for wide areas. In 2017, Priye et al.17 management, in 2012, Horvath et al.20 studied the use
reported on the rapid detection of Zika, Chikungunya, of mobile phone SMS text messaging and they found
and Dengue viruses using a portable device called the based on two randomized controlled trials (RCTs)
“LAMP Box,” a smartphone’s camera, and an app to studies in Kenya that weekly mobile phone text-mes-
detect and analyze samples of human specimens (e.g., saging improved HIV viral load suppression by remind-
blood, urine, and saliva). In a similar way, Yu et al. ing patients to take their medications, thus helping
them to adhere to their therapy. For long-term treat- COVID-19 vaccination became widely available.22
ment, Devi et al.21 found in a literature (covering the However, in Florida any kind of vaccination passport
period 2005 to 2015) review on long-term care/man- was forbidden by an executive order from Governor
agement of HIV/AIDS and tuberculosis that mobile Ron DeSantis in April 2021.23
phones were successfully used for long-term care and From the perspective of the design and marketing
management of these diseases in developing coun- of apps, the approach used to develop, implement,
tries. They reported that 73.3% of their reviewed promote, and give choices to the public about pan-
papers (66 papers) reported positive effects on HIV/ demics-related apps may affect their installation. In
tuberculosis management using mobile phones. this context, a recent survey in the U.S. with 1963
Finally, during the COVID-19 pandemic other use cases respondents which studied why somebody would
of smartphones (and tablets) apps for public health install a contact tracing app for COVID-19 (by explor-
settings included telemedicine/patient communica- ing the design space of contact tracing apps),
tion, health education, and apps implementing digital Li et al.24 found that the developers’ choice on app
health passports (DHPs).22 design and users’ individual differences (e.g., users’
job/work, income, demographics, use of public trans-
portation, technology readiness) have a significant
Limitations of Smartphones and Their impact on whether a person will install a contact trac-
Applications During Pandemics ing app over other factors such as app’s security and
While there have been great advances on the use of privacy. They recommend highlighting the public
smartphones for epidemics/pandemics, there are also health benefit as a leverage to promote contact trac-
limitations for the successful use of smartphone apps ing apps and paying attention to apps’ design and
during epidemics/pandemics in aspects such as inter- marketing strategies among essential/health workers
operability, effectiveness, politics, design choices and because their higher vulnerability to contract an emer-
marketing, and security and privacy. gent infectious disease such as COVID-19, and people
From the interoperability perspective, applications living in rural areas because their lower preference on
developed during pandemics with a healthcare (or fit- installing contact tracing apps developed by large pri-
ness) focus use a particular architecture (in hardware vate companies.
or software) that forbids (or makes it almost impossi- Finally, short software development cycles used to
ble) for users to switch components (e.g., wearables develop and launch applications/systems during pan-
for monitoring), health providers, or move healthcare demics can result in data leak (affecting the privacy
data collected through them. While limitations may be and security of users) and make software systems
related to laws, others are related to the lack of stan- developed during pandemics vulnerable to cyberat-
dardization and business models that makes it diffi- tacks. Results from a study done during the COVID-19
cult to achieve interoperability among systems.1 pandemic in 2021 showed that 78% of the companies
Many apps developed during pandemics are not surveyed believed their technical debt increased dur-
evaluated for their effectiveness before or after ing 2021, with most of the technical debt believed to
deployment. For example, Devi et al.,21 in their be arising from the development of new products.25 In
research about long-term treatment with mobile apps the same survey, 86% of respondents mentioned that
for HIV and tuberculosis, found that many research launching new digital products/services justified the
studies lack statistical evaluations on app effective- technical debt incurred.
ness and rather used casual/anecdotal observations.
The lack of evaluation is exacerbated by the need for
rapid development of many mobile apps that are cre- MOBILE MALWARE DURING
ated as a public health response aimed at an emer- PANDEMICS
gent disease (e.g., COVID-19 case), thus impacting an Mobile apps developed before COVID-19 for epidem-
app’s efficacy, reliability, and privacy/security. ics/pandemics were mostly applications developed by
Additionally, the implementation of certain types well-known organizations as part of health campaigns
of smartphone apps for pandemics may be subjected or prototype systems. However, the worldwide avail-
to politics. For example, during the COVID-19 pan- ability of smartphones and other wearables at the
demic, vaccination passports and their smartphone start of the COVID-19 pandemic, and their increasing
implementations (through DHPs) were subjected to use as the pandemic progressed,1 made smartphones
policy decisions that varied between U.S. states. DHPs an attractive target of malware which grew quickly
were implemented in the state of New York when during the COVID-19 pandemic.
More than 2 million installations of mobile malware media apps (e.g., WhatsApp, Instagram, and others),
packages were performed worldwide during the fourth and camouflaged malware distributed via app stores
quarter of 2020, which almost doubled the number of for both Android and iOS devices (i.e., Google Play
malware package installations during the third quarter Store and Apple App Store), even though app stores
of the same year (around 1.1 million in the third quarter blocked more than 1 million attempts to circumvent
of 2021).26 These numbers began to decrease during security measures to publish mobile apps.27
2021, reaching around 900,000 installations by the sec- According to Karpsersky data, most of the new
ond quarter of 2021 (Figure 1). Hackers also exploited worldwide mobile malware in 2021 was in the form of
users through malware camouflaged as legitimate AdWare (42.42% of the total), RiskTool (by which mal-
COVID-19-themed apps. There were at least 370 ware conceals files, run apps silently, or terminate
unique COVID-19-themed mobile malware apps devel- active process, 35.27% of the total), and trojans (pro-
oped worldwide as of mid-November 2020 targeting grams that claim to perform some function while
the Android operating system with most apps doing something else, 8.86% of the total).28 For
released after March 2020.2 COVID-19-themed malware, Wang et al.2 reported that
Hackers targeted smartphones during the trojans (56%) and spyware (29%) made most of the
COVID-19 pandemic not only because of their ubiq- COVID-19-themed malware in Android as of November
uitous use, but also because of the lack of cyberse- 2020. Ransomware made about 7% of mobile malware
curity hygiene of smartphone users around the in their study.
world. Misinformation and mobile malware distribu-
tion methods (different from the use of app stores),
and vulnerabilities such as SMS phishing (by which DETECTING PANDEMICS-RELATED
SMS messages are used to distribute malware) and MALWARE USING APP
Zero-Click attacks (by which no input from users is PERMISSIONS AND MACHINE
needed before deploying an attack, but rather by LEARNING
exploiting vulnerabilities in apps already installed) In this section, we describe the use of Android app
were used by hackers to launch attacks on smart- permissions and ML to detect COVID-19-themed mal-
phone users during COVID-19.27 ware. We describe our dataset, the ML models trained,
Other distribution mechanisms for mobile malware and we evaluate the models’ performance, and dis-
during COVID-19 included messages sent via social cuss our results.
FIGURE 1. Mobile malicious installation packages detected from Q4 2015 to Q2 2022 based on data from Karpersky Lab’s
Securelist.27
FIGURE 2. Permissions used by a COVID-19-themed app and its row in our spreadsheet.
Permissions Dataset Stato COVID-19 Italia Android app. This app uses four
We obtained our dataset by extracting Android app permissions, thus a value of “1” appears in each of the
permissions from the COVID-19-themed apps collec- corresponding permission columns. As this app is not
tion curated by Wang et al.2 Their collection (made malware, a “0” appears in the MALWARE column in
publicly available by its curators) has 2500 unique Figure 2. The resulting spreadsheet has 2016 rows
Android Package Kits (APKs) with 370 unique APKs (one row per app) and 205 columns (203 columns for
belonging to malicious apps collected by mid-Novem- permissions plus two more for the app name and the
ber 2020. We used AndroGuard to extract each apps’ class/type). This spreadsheet is the dataset we used
permissions from their corresponding APK’s manifest. to train the models. When using the models, the static
Due to errors generated by AndroGuard when detector extracts the permissions used by the app in
extracting app permission data from some of the the manifest at installation time, counts the number
APKs, we ended with permissions of 2016 unique apps of permissions used by the app, and then executes a
(80% of the original dataset) with 277 labeled as trained ML model. This process does not cause addi-
COVID-19-themed malware samples (75% of the origi- tional overhead for the OS because the Android secu-
nal malware samples) and 1739 labeled as nonmalware rity model extracts the permissions from the manifest
COVID-19-themed apps (81% of the original nonmal- at installation time.
ware samples). We extracted 203 unique permissions We created the chart shown in Figure 3 using our
from all apps in our dataset. We created a spreadsheet dataset. This chart shows that around 30% of nonmal-
with each row storing the permissions used by an app. ware apps used one permission, while most of the
Each column had a “1” or “0” depending on the use of a malware apps used four or more app permissions. The
permission by an app. We also used two more columns sample mean of the number permissions used by mal-
specifying the apps’ name and its type/class (malware ware apps was u0 ¼ 7.09 3.6 permissions per app,
or not). For example, if an app is a malware app and and the sample mean of the number of permissions
uses all permissions, then in the row for that app we used by non-malware apps was u1 ¼ 1.8 1.5 permis-
store in all columns a value of “1.” Otherwise, if a sec- sions per app.
ond app did not use any permission and it is not mal- Assuming a normal distribution on the total num-
ware, then in the row for that second app we store a ber of permissions per app for both malware and non-
value of “0” in all columns. Figure 2 illustrates how we malware apps, we conducted a t-test with H0: u0 - u1
stored the permissions in the spreadsheet for the 0 (null hypothesis: both app classes use the same
FIGURE 3. Distribution of total number of app permissions per app class (malware/nonmalware) for COVID-19-themed Android
apps.
number of permissions), H1: u0 - u1 > 0 (alternative is desired to build classification models for two clas-
hypothesis: malware apps use more permissions than ses, and one class has significantly less samples com-
nonmalware), a ¼ 0.1 (statistical significance), and the pared to the other class (malware samples/instances
result was that H0 (null hypothesis) was rejected versus nonmalware samples/instances in our case).
(H0 was even rejected at a statistical significance SMOTE generates synthetic instances for the minority
value a ¼ 0.01). This result suggests that the total class that are plausible (i.e., better than duplicating
number of permissions used per app can be added as the instances in the minority class).
an extra feature to detect COVID-19-themed malware. The SMOTE permissions dataset had 3955 instan-
ces (1739 for the nonmalware class and 2216 for the
Model Creation malware class). While this second dataset had more
The problem of malware detection can be modeled as instances for the malware class (1739 nonmalware
a machine learning classification problem with two instances versus 2216 malware instances), the dataset
target classes (malware/nonmalware). To create the is more balanced than the original spreadsheet data-
input data for the models, we added the total number set (1739 nonmalware instances versus 277 malware
of permissions used by an app as an extra feature (in instances).
addition to each app’s permissions). We created three classification models with both
We used Weka 3.8.6 to train the machine learning datasets (six models in total) using OneRule, J48, and
classification models. We trained the first set of three Naive Bayes algorithms. The OneRule and J48 algo-
models using the spreadsheet dataset obtained in the rithms build decision tree classification models, while
previous section (2016 instances, 203 app permissions, the Naïve Bayes builds a probabilistic model for classi-
and the extra column/feature for the total number of fication. OneRule creates a simple classification tree
permissions), and we trained a second set of three ML based only on the attribute/feature with the smallest
classification models using an augmented dataset with total error as the selected attribute/feature to build
3955 instances with the same number of app permis- the classification model. We selected these models
sions/features (203 app permissions and the extra fea- because once the models are trained, they can be exe-
ture for total number of app permissions). We ignored cuted very quickly on a smartphone.
the name of the app when training the models.
We created the second (augmented) dataset using
the Synthetic Minority Oversampling TEchnique PERFORMANCE EVALUATION OF
(SMOTE)20 to balance the number of malware instan- PROPOSED MODELS
ces of the original dataset. SMOTE29 is a machine We used a tenfold cross validation on each algorithm
learning technique to generate synthetic data when it with a 66% split for each fold (i.e., we used 66% of the
instances on each fold for model training we used the or not. We expected this result from our analysis of
other 34% to evaluate the performance of the models the sample means of the total number permissions for
on each fold). We trained and evaluated the models malware/nonmalware apps (as Figure 3 shows) and
using Weka 3.8.6 on a Windows-based Asus laptop the statistical analysis we performed on the sample
equipped with an AMD Ryzen 7 processor running at 2.3 means. We observed that, in general, all algorithms
GHz and 16GB RAM. For each dataset, algorithm and performed relatively well but models based on the
class, we computed the following performance metrics: OneRule yielded the worst results when compared
with J48 and Naïve Bayes models.
› True positive rate (TP Rate): This is the probabil- We obtained the best overall result when using
ity that an instance will be correctly classified. SMOTE with the J48 decision tree algorithm. This
› False positive rate (FP Rate): This is the probabil- model (identified as J48þSMOTE in Table 2) had the
ity that an instance will be incorrectly classified. best classification results for all the evaluation met-
In our case this means that an app that is mal- rics, especially those associated with the malware
ware is classified as nonmalware and vice versa. class among all models. Our results show that a static
› Precision: Proportion of actual instances cor- malware detector specifically targeting pandemic-
rectly identified within each class. themed apps can be implemented directly in the
› F-Measure: A measure of a model’s accuracy. It Android OS because the OS detects the permissions
is calculated from the precision and recall. Val- during the APK installation. This detector can be used
ues close to 1 means better scores. during pandemic times, specifically when users
attempt to install apps from nontrusted sources,
The performance evaluation presented in Table 2 which was frequently the case of COVID-19-themed
shows the results of using a tenfold cross validation, malware. Although we did not implement our pro-
which is an accepted methodology to evaluate ML posed methods on an actual smartphone to measure
models. In this table, we present the average values the time or the power consumption to execute a
for each measure after training and evaluating each detection (we will conduct these measurements in
model ten different times with random folds for train- the future), we argue that the computational and
ing and testing the models. power/energy costs to detect pandemics-related mal-
ware based on permissions on a smartphone does not
Discussion of Results produce significant overhead because 1) Android
When creating our models using OneRule, we found extracts permissions from app manifests when an app
that the total number of permissions was selected as is installed (the OS can run a malware detector based
the attribute/feature to build the OneRule models, on permissions at installation time), and 2) the compu-
meaning that this attribute alone is the best one to tational complexity of the ML models we tested
potentially detect a COVID-19-themed app as malware (OneRule, J48 and Naive Bayes) execute in constant
time (O(1)) because no extra work is done to extract quality, and organization. This work was supported by
the features as either a constant number of if state- the U.S. National Science Foundation under Grant
ments is needed with basic Boolean expressions (to 1950416 and Grant 2308741.
implement OneRule and J48), or a constant number of
floating point multiplications is needed (Naive Bayes),
which are approximately 508 multiplications (2 classes REFERENCES
204 features) of single precision floating point opera- 1. A. J. Perez and S. Zeadally, “Recent advances in
tions at installation time. wearable sensing technologies,” Sensors, vol. 21,
no. 20, 2021, Art. no. 6828.
CONCLUSION 2. L. Wang et al., “Beyond the virus: A first look at
We have reviewed the use of smartphones and their coronavirus-themed Android malware,” Empirical
use cases during pandemics. We also reviewed pan- Softw. Eng., vol. 26, no. 4, pp. 1–38, 2021.
demic-related malware trends during the COVID-19 3. N. A. Khan, S. N. Brohi, and N. Zaman, “Ten deadly
pandemic. We evaluated the use of permissions and cyber security threats amid COVID-19 pandemic,” to
machine learning methods to detect COVID-19- be published, doi: 10.36227/techrxiv.12278792.v1.
themed malware and we found that a static malware 4. TechXplore, “Global cost of cybercrime topped $ 6
detector can be developed in Android to detect pan- trillion in 2021: Defence firm,” 2022. [Online]. Available:
demics-related malware with an accuracy of more https://techxplore.com/news/2022-05-global-
than 90% using a combination of SMOTE, app permis- cybercrime-topped-trillion-defence.html
sions, total number of permissions, and decision trees. 5. W. Z. Zarni Aung, “Permission-based Android malware
Moreover, from our review of COVID-19 related mal- detection,” Int. J. Sci. Technol. Res., vol. 2, no. 3,
ware, we recommend the following countermeasures pp. 228–234, 2013.
to minimize the impact of cyberattacks on smart- 6. N. Peiravian and X. Zhu, “Machine learning for Android
phone users in future pandemics and global crises: malware detection using permission and API calls,” in
Proc. IEEE 25th Int. Conf. Tools Artif. Intell., 2013,
› Implement a static malware detector as part of pp. 300–305.
the mobile OS as a software update during pan- 7. C. Wang, Q. Xu, X. Lin, and S. Liu, “Research on data
demics that can detect and alert about possible mining of permissions mode for Android malware
malware being installed from a nontraditional detection,” Cluster Comput., vol. 22, pp. 13337–13350,
source (e.g., apps downloaded via SMS links or 2019.
message links in social networks that camou- 8. K. Khariwal, J. Singh, and A. Arora, “IPDroid: Android
flage malware as pandemic-related apps) or a malware detection using intents and permissions,” in
nontrusted source. Proc. 4th World Conf. Smart Trends Syst., Secur.
› Increase the training and awareness of cyberse- Sustainability, 2020, pp. 197–202.
curity and cyberhygiene specifically focused on 9. M. Ficco, “Malware analysis by combining multiple
cybersecurity for smartphones during a pan- detectors and observation windows,” IEEE Trans.
demic. This could be achieved by cybersecurity Comput., vol. 71, no. 6, pp. 1276–1290, Jun. 2022.
education before a pandemic, and public 10. T. S. R. Pimenta, F. Ceschin, and A. Gregio,
announcement about mobile malware risks dur- “Androidgyny: Reviewing clustering techniques for
ing a pandemic to diminish spear phishing Android malware family classification,” Digit. Threats,
attacks and avoid the installation of pandemics- to be published, doi: 10.1145/3587471.
related malware. 11. R. Sun, W. Wang, M. Xue, G. Tyson, S. Camtepe, and D.
› Recommend that any kind of mobile app to be C. Ranasinghe, “An empirical assessment of global
installed during a pandemic to be installed from COVID-19 contact tracing applications,” in Proc. IEEE/
a trusted source (e.g., Google Play Market, Apple ACM 43rd Int. Conf. Softw. Eng., 2021, pp. 1085–1097.
App Store). This makes mobile malware to be 12. H. H. R. Manzil and M. S. Naik, “COVID-themed Android
harder to distribute and be installed, especially malware analysis and detection framework based on
during pandemics and other global crises. permissions,” in Proc. Int. Conf. Advance. Technol.,
2022, pp. 1–5.
13. F. Idrees, M. Rajarajan, M. Conti, T. M. Chen, and Y.
ACKNOWLEDGMENTS Rahulamathavan, “PIndroid: A novel Android malware
We thank the anonymous reviewers for their valuable detection system using ensemble learning methods,”
comments, which helped improve the paper’s content, Comput. Secur., vol. 68, pp. 36–46, 2017.
14. G. Eysenbach, “SARS and population health 26. Statistica, “Number of detected malicious installation
technology,” J. Med. Internet Res., vol. 5, no. 2, 2003, packages on mobile devices worldwide from 4th
Art. no. e882. quarter 2015 to 2nd quarter 2021,” 2021. [Online].
15. N. Ahmed et al., “A survey of COVID-19 contact tracing Available: https://www.statista.com/statistics/653680/
apps,” IEEE Access, vol. 8, pp. 134577–134601, 2020. volume-of-detected-mobile-malware-packages/
16. L. Bengtsson et al., “Using mobile phone data to 27. Check Point Blog. “The mobile malware landscape in
predict the spatial spread of cholera,” Sci. Rep., vol. 5, 2022 – of spyware, zero-click attacks, smishing and
no. 1, pp. 1–5, 2015. store security,” 2022. [Online]. Available: https://blog.
17. A. Priye, S. W. Bird, Y. K. Light, C. S. Ball, O. A. Negrete, checkpoint.com/2022/09/15/the-mobile-malware-
and R. J. Meagher, “A smartphone-based diagnostic landscape-in-2022-of-spyware-zero-click-attacks-
platform for rapid detection of Zika, Chikungunya, and smishing-and-store-security/
Dengue viruses,” Sci. Rep., vol. 7, no. 1, pp. 1–11, 2017. 28. Statistica, “Distribution of new mobile malware
18. H. Yu et al., “Malaria screener: A smartphone worldwide in 2021, by type,” 2021. [Online]. Available:
application for automated Malaria screening,” BMC https://www.statista.com/statistics/653688/
Infect. Dis., vol. 20, no. 1, pp. 1–8, 2020. distribution-of-mobile-malware-type/
19. M. Natesan et al., “A smartphone-based rapid 29. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P.
telemonitoring system for Ebola and Marburg disease Kegelmeyer, “SMOTE: Synthetic minority over-
surveillance,” ACS Sensors, vol. 4, no. 1, pp. 61–68, 2018. sampling technique,” J. Artif. Intell. Res., vol. 16,
20. T. Horvath, H. Azman, G. E. Kennedy, and G. W. pp. 321–357, 2002.
Rutherford, “Mobile phone text messaging for
promoting adherence to antiretroviral therapy in ALFREDO J. PEREZ is an associate professor with the Univer-
patients with HIV infection,” Cochrane Database sity of Nebraska, Omaha, NE, 68182, USA. His research interests
Systematic Rev., vol. 2012, no. 3, 2012,
include mobile/ubiquitous computing and sensing, privacy and
Art. no. CD009756.
cybersecurity, and CS education. He received his Ph.D. degree
21. B. R. Devi et al., “mHealth: An updated systematic
from the University of South Florida, Tampa, FL, USA. He is an
review with a focus on HIV/AIDS and tuberculosis long
term management using mobile phones,” Comput. IEEE Senior Member and a member of the National Academy of
Methods Programs Biomed., vol. 122, no. 2, pp. 257– Inventors. He is the corresponding author of this article. Con-
265, 2015. tact him at [email protected].
22. L. O. Gostin, I. G. Cohen, and J. Shaw, “Digital health
passes in the age of COVID-19: Are ‘vaccine passports’ SHERALI ZEADALLY is a university research professor with the
lawful and ethical?,” JAMA, vol. 325, no. 19, pp. 1933– University of Kentucky, Lexington, KY, 40506, USA, and a Univer-
1934, 2021.
sity of Kentucky Alumni Association endowed professor. His
23. State of Florida, “Office of the Governor. Executive
research interests include cybersecurity, privacy, and the Inter-
Order Number 21-81 (prohibiting COVID-19 vaccine
net of Things. He received his doctoral degree in computer sci-
passports),” 2021. [Online]. Available: https://www.
flgov.com/wp-content/uploads/2021/04/EO-21-81.pdf ence from the University of Buckingham, England. He is a fellow
24. T. Li et al., “What makes people install a COVID-19 of British Computer Society and the Institution of Engineering
contact-tracing app? Understanding the influence of Technology. Contact him at [email protected].
app design and individual difference on contact-
tracing app adoption intention,” Pervasive Mobile
DAVID KINGSLEY TAN is currently working toward the
Comput., vol. 75, 2021, Art. no. 101439.
M.S. degree with the Department of Computer Science, Geor-
25. B. Doerrfeld, “A pandemic side effect: Rampant
technical debt,” 2022. [Online]. Available: https:// gia Institute of Technology, Atlanta, GA, 30332, USA, and he is
devops.com/a-pandemic-side-effect-rampant- a software engineer with the Space Dynamics Laboratory.
technical-debt/ Contact him at [email protected].
T
hanks to the advances in communication immediate dangers, like every mammal, has helped us
technologies, the costs of connecting to our survive as long as we have. Facing an immediate
beloved ones over distance have become negli- threat running down the street triggers us to run the
gible. The provision of bandwidth, access to connec- other way. Our brains have been configured to treat
tivity, and integration of cameras into almost any the future as if it was the present. We can take action
mobile device allows us to perceive what is happening to save for retirement or floss our teeth for a better
with our communication counterparts visually. The future outcome. We do not respond to long-term
emergence of telepresence robots even provides us threats with nearly as much diligence as we do to
with a remote presence in distant locations. Virtual clear and present dangers. The COVID-19 pandemic
and augmented environments blend virtual informa- has illustrated the challenge of forming consensus on
tion worlds and physical settings into each other. All measurements to curtail the spread of the virus when
these developments together allow better communi- the effect could not be observed immediately.
cation and collaboration over distance. During the Another example is climate change, where costly,
COVID-19 pandemic, an accompanying cultural and immediate societal reduction of demand and sustain-
organizational transformation has also been triggered, able provision could shift global temperature change,
allowing us to embrace and leverage these technologi- but only decades later. Overcoming short-term think-
cal developments and thus enabling a new way of ing, forming better decisions, and empathizing with
working. Employees fulfilling their duties from their future generations should be worthwhile efforts.
homes has become the new normal. The traditional
office with assigned seating is getting replaced by HUMANS, TRAPPED IN THE
coworking spaces used for specific occasions, rather PRESENT
than being used on a daily basis. Business travel is los-
The so-called marshmallow experiment has substanti-
ing importance due to in-person gatherings changing
ated this human trait of near-term focus: Mischel and
to online meetings. In a nutshell, the constraints of
his team1 created a delayed gratification experiment
space determining presence at a specific location are
in 1972, where young participants would sit in front of
mitigated by communication technologies that are
a bowl of marshmallows. They could pick one marsh-
bringing people together virtually over distance.
mallow immediately; however, their reward would be
Complementing this opportunity to break with
doubled if they could control their desires until a bell
the space constraint, could we also make better
rang 15 minutes later. The study’s claim that subjects’
societal (and maybe individual) decisions via break-
ability to practice more self-control would predict
ing boundaries of time? First we might ask, “why
higher SAT scores, lower body mass and less con-
should we?”
sumption of drugs has been disputed. Nevertheless,
The psychologist Daniel Gilbert has argued that
the study does provide evidence for the general
humans being good at predicting and escaping
human trait of favoring immediate reward over long-
term consequences.
Self-continuity describes the ability to project one-
1536-1268 ß 2023 IEEE self back into the past and forward into the future,
Digital Object Identifier 10.1109/MPRV.2023.3291830 despite the potential impacts of time and the environ-
Date of current version 30 November 2023. ment. Future benefits are often perceived as further
October-December 2023 Published by the IEEE Computer Society IEEE Pervasive Computing 55
IOT NEWS
away, favoring present decisions.2 McLure3 found a generations, and ultimately make better decisions in
brain is divided into an emotional part and a logical the present.
part. The logical brain reveals future consequences,
and the emotional brain prioritizes the immediate ben- POTENTIAL APPLICATIONS
efit of the current action. Can simulating future sce-
VR time traveling could help us overcome the natural
narios increase the saliency of selfish motivations,
limitations of short-term dominance of the human
such as reputational concern, to promote prosocial or
brain, which may enable us to address better prob-
sustainable behavior, ultimately helping humanity to
lems influenced by behaviors on a long time scale. The
resolve pressing and complex long-term problems?
concept is not limited to future issues like climate
Can we use computer technology to break the barrier
change or health issues; immersion in the past to
of time?
inform future decision making is also possible. In par-
ticular, we envision the following applications:
TIME-TRAVELING IN VIRTUAL The past and future self: We frequently struggle
ENVIRONMENTS with self-perception. Revisiting past situations as sim-
ulations could help us better understand decisions we
According to recent research, mixed reality could
have made in the past. Some decisions may be regret-
become one ingredient in breaking the barrier of time.
ted later, and time-traveling could help us realize that
Fender and Holz4 recently presented the concept of
these decisions might have been rational and reason-
an “Asynchronous Reality,” a method used to avoid
able at the time they were made. At the same time,
disturbing users deeply immersed in virtual environ-
we could experience a future to see how our behaviors
ments. Instead of letting bystanders invade the users’
might influence us in the long run. For example, visual-
virtual world, the bystander would be recorded (for
izing the future adverse effects of smoking, limited
example, when delivering an object). Later, the
exercising, or other health-related behaviors might
immersed user can see these recordings in VR as if
impact present-day behavior.
the events initiated by the bystander are happening
The past and future others: For most people, it
only then. Other researchers have described systems
seems bizarre that the population of democracy could
where people who have passed away could “survive”
actively choose to switch to a misanthropic authori-
in virtual worlds. Kuyda5 trained a conversational AI
tarian society, as happened with the German
with texts written by a friend who died in a car crash
National-Socialistic state just a century ago. One can
to be able to chat with the friend, thus creating the
hardly imagine being “on the wrong side” of history,
illusion of communicating with a person from the
and families have suffered after realizing that their
past. A similar concept has been presented by Art-
parents or grandparents once were obeying racist kill-
stein et al.,6 who built a system where people can
ing machines. Traveling back and immersing ourselves
interactively communicate with a holocaust survivor.
with the people involved, both victims and perpetra-
Artstein et al. received positive feedback from users
tors, could help us better comprehend how authoritar-
about “time-offset interactions.” Such interactions
ian systems evolve, potentially helping to prevent
help people to better empathize with others, but
similar evolution in the present (recent events demon-
potentially, also with another “self.” The so-called “pro-
strate that modern societies are not immune against
teus effect”7 describes how a user’s behavior in a vir-
authoritarian tendencies). Further, we could travel
tual environment can be modified by changing the
into the future to step in the footsteps of our simu-
characteristics of an avatar. For example, a more
lated grandchildren and their descendants to see how
attractive avatar led to study participants being more
decisions that we perform today may affect their well-
friendly to strangers than when the participants were
being. How would we feel about our choices (e.g., tak-
provided with less attractive avatars. Another study
ing the airplane for a weekend trip, adopting a daily
by Peck at al.8 has shown that users’ racial biases can
meat-based diet) when we see our own family mem-
be decreased by representing them with avatars of dif-
bers suffering on an unlivable future planet?
ferent skin colors. Finally, Choi9 has discussed how
imagining future episodes (in the context of prosocial
behavior) can influence present decisions. Thus, it is TOWARDS ANYTIME COMPUTING
fair to hypothesize that immersing users in episodic The development of artifacts that make past and future
simulations to interact with, or step into, the feet of events more tangible may help us not only better under-
future selves and others may be a viable approach to stand ourselves, family members, and community.10 Our
overcome short-term thinking, empathize with future logical brain could be further supported by emotional
M
obile portable devices such as smart- restricted resource capacity, and it is difficult to pro-
phones, tablets, and wearables play a cru- vide energy directly to their devices to keep them
cial role in the field of pervasive computing working for an extended period of time. If shared, the
and to the integration of technology into every aspect spare capacity or resources of other users in the
of our daily lives, so it is everywhere and always avail- crowd may be employed to satiate varying demand.
able. Mobile portable devices are essential in this By using the energy from external to the wireless
respect because they allow individuals to access and network energy sources, such as shared chargers, or
utilize technology from anywhere and at any time. This energy sources within the network, such as other
increased availability of technology has revolutionized mobile devices via wireless power transfer (WPT), it is
the way we live and work, providing us with new now possible to extend the lifetime of such networks.
opportunities and possibilities. Energy sharing techniques, either by wired or wireless
The energy reserves of mobile portable devices are medium, have made this possible. Although there
a particularly essential resource in modern cultures; have been some recent, isolated works on this type of
fast battery depletion is an issue that billions of smart- energy sharing, to the best of our knowledge, none of
phone and wearable device users worldwide face on a the related works has presented a holistic framework
daily basis. Unfortunately, the residual energy supplies that takes into account all the technological and appli-
of these devices are limited and dependent on their cative requirements and constraints. Peer-to-peer
battery power, which directly affects their usability. (P2P) crowd charging has recently emerged as an
Individual users in a social crowd are constrained by alternative energy replenishment option. Specifically,
although innovation and research initiatives have tar-
geted at improving the technological properties of
ß 2023 The Authors. This work is licensed under a Creative WPT or at profiling the individual user aspects for opti-
Commons Attribution 4.0 License. For more information, see mized usage, there has never been a holistic, network-
https://creativecommons.org/licenses/by/4.0/
wide charging optimization framework which focuses
Digital Object Identifier 10.1109/MPRV.2023.3308014
Date of publication 6 September 2023; date of current on battery aging mitigation, considers both the wired
version 30 November 2023. and the wireless energy medium, takes into account
58 IEEE Pervasive Computing Published by the IEEE Computer Society October-December 2023
FEATURE ARTICLE
the opportunistic nature of human contact networks, prior behavior, an ability better represented by
the cooperation opportunities among the network the architecture suggested by GreenCrowd.9
nodes, as well as privacy and incentivization aspects.1 5) Existing methodologies presuppose that nodes
To this end, in this article we introduce GreenCrowd: would trade energy with other nodes whenever
A holistic algorithmic crowd charging framework, which possible, independent of the social situation. A
(unlike recent, mainstream opportunistic charging few strategies also unreasonably assume that
visions which focus on the optimization of individual every node is open to exchanging energy with
devices/users for solely stationary wired charging) intro- every other node. The duration of the energy
duces ubiquitous intelligence and network-wide user exchange in such approaches is likewise consid-
profiling for both stationary wired charging and coopera- ered not constrained by the time between meet-
tive P2P-WPT, with a goal of rendering the crowd charg- ings of the associated users.10 However, in
ing process more circular and sustainable. GreenCrowd practice, and according to GreenCrowd design,
aims at advancing the state-of-the-art compared to the each node may only interact with a portion of
traditional i) crowd sensing systems, ii) wireless pow- the crowd while moving, and choices are also
ered networks, and iii) battery aging mitigation techni- influenced by the crowd’s social dynamics.
ques, providing the following novelties:
To highlight a potential scenario in which Green- energy sharing settings (e.g., P2P, centralized,
Crowd can be applied, we present a possible interac- mixed) to pave the way toward the full exploita-
tion between two GreenCrowd users. Alice goes in a tion of crowd charging capabilities for fully resil-
stadium to watch a match, but her phone is low on ient and sustainable use cases.
battery. She wants to record and send photos and vid- › Exploiting the technological breakthroughs to
eos to her friends, but the battery is too low to per- provide the framework with a groundbreaking
form such actions, since she also needs to save some algorithmic toolkit, which will include dedicated
for the way back home, and there are now power plugs operational components (namely, reward, online
available to do so. Alice then uses GreenCrowd, and social information, battery aging mitigation, pri-
finds users which are at the stadium with her with vacy-by-design in the digital domain, decision
more battery energy than what they need, hence they components).
are glad to sell some of it. Bob replies to the request › Facilitating an efficient exploration of standard-
made by Alice, and GreenCrowd provides guidance on ized crowd charging’s full potential toward
how to meet, so that Bob can transfer part of his applied energy circularity by validating the
energy to Alice, which can now use again her phone GreenCrowd’s framework in real-world settings
while watching the match. We also want to not how via emulated use cases in the lab and/or human-
similar scenarios can be found for instance at train generated datasets.
stations while waiting for a train, on a bench in a pub-
lic park, or in an airport lobby if no power plugs are In order to achieve the intended sustainable goals,
available. the GreenCrowd development strategy utilizes a novel
interplay of rigorous principles that emerge from a
combination of ICT methodologies. Table 1 provides a
Objectives and Methodologies summary of GreenCrowd’s primary research questions
The proposed GreenCrowd design aims at: and proposed methods to address them.
The technological implementation of crowd charg-
› Basing the framework on innovative system ing calls for a wide range of theoretical and practical
modeling by realizing scientific breakthroughs in instruments as it reflects the originality of Green-
a set of complementary use case paradigms Crowd. However, the specifics of electromagnetism
(e.g., opportunistic, participatory, ad hoc) and and electronics must be modeled. The network-wide
4) A “privacy component,” which ensures the obedi- Although innovation and research initiatives have
ence of the framework to the user privacy require- targeted at improving the technological properties of
ments and constraints and will apply the privacy- batteries or at profiling the individual user aspects for
by-design in the digital domain approach. optimized usage, to the best of our knowledge, there
5) A “decision component,” which is eventually mak- has never been a network-wide charging optimization
ing real-time decisions on how to share the energy framework like GreenCrowd, which focuses on battery
supplies among the users in the crowd, taking also aging mitigation, considers also the wireless energy
into account the WPT-incurred energy loss. medium, and takes into account the opportunistic
nature of human contact networks, the cooperation
These components can be exploited based on opportunities among the network nodes, as well as
innovative collaboration models, facilitated by circu- emerging energy sharing socio-technical break-
lar-related standards (such as the Qi standard) further throughs such as wireless social crowd charging.13
enhanced by distributed computing principles, which The current literature includes works solely in 1) the
are considered as a critical modern asset toward domain of battery aging mitigation for individual devi-
decentralization and circularity. ces and without any wireless power capability, and, 2)
The developed framework incorporates a reward the domain of wireless crowd charging (usually target-
component for crowd charging, which will truly ing network energy balance and WPT-incurred energy
empower users with different opportunities to partici- loss management) but without any battery aging miti-
pate in energy sharing tasks. Specifically the dynamic gation mechanisms. The GreenCrowd approach in this
reward component takes into account a series of respect combines the two concepts by assigning a
parameters, some of them which can be input by the dedicated component to battery aging mitigation and
user requesting a service, some which are computed incorporating the energy loss management in the
by the component. These include, from the requesting decision component, so as to explore the tradeoffs
user perspective: 1) the amount of charge required, 2) between charging efficiency and battery longevity.
the maximum amount of time which can be spent in
the same location, 3) the device wear and, 4) the
reward willing to be given to the service provider.
IT IS ALSO IMPORTANT TO NOTE THAT
From the component point of view, the respective
reward can be enriched by the number of users avail- THE COMPONENT’S SCOPE CAN BE
able in the area, and the previous number of tasks EASILY GENERALIZED TO OTHER
already required in the same area. Practically, the CROWD-* SYSTEMS.
reward is given in the form of tokens to use Green-
Crowd, which can be redeemed when requesting a
charge from other users, or exchanged for a monetary Users’ social connections and interests can have
prize. an impact on both their mobility and the flow of
The user can have the possibility from its own energy between them. The current techniques of
application to request a specific service and to input crowd charging do not take into account the dual
the required parameters, depending on the user spe- issue of energy exchange brought on by the user’s
cific needs. No parameter is mandatory, so that users inescapable mobility and the impact of sociality on
can simply configure anything which is mandatory for the latter. The energy balance attained by these works
them and let the component decide the rest. While is slowed down by computation with imperfect infor-
the user parameters are updated, a constant feedback mation and is hampered by energy loss for the crowd.
mechanism with the server will also compute dynami- Currently, only coarse-grained social information is
cally the expected time of delivery for the service used. In this regard, the GreenCrowd method, for the
requested, according to the component parameters. first time, provides a component with fine-grained
For instance, if there are several tasks already social data that can produce outcomes which are
requested with a higher pay than the one offered by expected to be more accurate in this respect, borrow-
the requesting user, the component advises the user ing features from social psychology. In the physical
to either increase the payment amount, or to extend world, reciprocal interactions—which have been dem-
the deadline, in order to meet the service provider onstrated to be a powerful altruism-inspiring factor,14
expectations. It is also important to note that the can influence users to choose friends over strangers
component’s scope can be easily generalized to other when giving resources, especially if the cost of sharing
crowd-* systems. is expensive. Similar trends may be seen for social
reciprocity in cyberspace, where online communities days, assuming a real world deployment of Green-
share many structural traits with real-world face-to- Crowd. Specifically we consider users moving freely in
face networks.15 An increase in an online social user’s a campus, and which may access the GreenCrowd
reciprocity value has been shown empirically to boost platform when in range of other users for at least a
the reciprocity responses from her nearby online minimum time of contact. In our study, we conserva-
social graph.16 By utilizing the data accessible on tively set this minimum time to 15 min. In other words,
online social networks, those social features can users which are not in contact for at least 15 consecu-
assist us in adding a social component to the energy tive minutes of time are not willing to exchange
sharing process. energy, due to too much overhead in accessing the
platform, finding the other device and performing the
wireless charge.
ASSESSMENT OF GREENCROWD’S Figure 4 reports our findings related to the first
POTENTIAL research question. We analyzed for each user how
To assess the potential of GreenCrowd in a real many charging opportunities she may have in the 28
scenario, we leverage the dataset presented in days in which the data were recorded. We consider a
Sapiezynski et al.17 that reports human contacts minimum contact of at least 15 min with another
between hundreds of users. The dataset is built with device at different distances. For our study, we consid-
traces from more than 700 students on a 4 week ered three different RSSI levels for the Bluetooth,
period, and the data have been collected with smart- which are: 73 dBm, which corresponds to roughly 1.5
phones, which record their RSSI with respect to other m and reflect users already being close to each other,
nearby smartphones. It is then a well built dataset to 80 dBm, which corresponds to roughly 3.5 m and
explore the social interactions among users and the reflect users which may be in the same room, and
time in which they are in contact through a whole day, 89 dBm, which corresponds to roughly 10 m and
hence it poses the following key questions: reflect users which may be in adjacent rooms.
From Figure 4, we can see that a large portion of
› What is the percentage of users in a given area users have plenty of opportunities for recharging,
which can leverage the services offered by the even with devices which are already standing close to
GreenCrowd platform? them. We also note that typically any kind of device
› How much charge could they get depending on needs to be charged a maximum of 1-2 times per day,
their mobility patterns? hence a value of 28 means that the users has on the
› How many contacts take place between (online average a charging opportunity per day. Allowing to
social) friends? query farther users increase this value beyond several
opportunities per day, confirming the availability of
To answer these questions, we analyze the data users which may offer additional charging when
studied in Sapiezynski et al.17 which spans over 28 needed.
The second analysis we performed is pictured in
Figure 5, where it is possible to see the amount of
CONCLUSION
By leveraging social, reward, privacy, battery, and
decision-making functional components, the Green-
Crowd framework can be customized to suit various
use cases. This flexibility creates new opportunities
FIGURE 6. Number of contacts for each category of users
for reducing the dependence on nonrenewable
over time (the blue plot’s values equal to the sum of the rest
energy sources and encourage a shift toward a
of the plots’ values). more sustainable and equitable energy distribution
approaches. Moreover, our proposed framework has
the potential to reduce e-waste by extending the
charge users can get leveraging the GreenCrowd plat-
life of existing energy storage systems and extend-
form. To perform this analysis, we considered a con-
ing the lifetime of single-use batteries. This shift will
servative maximum rate of charge of 1 Ah, although
contribute to reducing the carbon footprint and
faster and more efficient wireless charging technolo-
enhancing environmental sustainability, eventually
gies do exist. Again, we test this for different proximity
promoting a more sustainable and circular approach
with other users, and the results confirm that a vast
to resource consumption. The increased engage-
majority of users can recharge a considerable amount
ment of the crowd has the potential to lead to a
of charge, with some of them which are also able to
more informed and involved community that is bet-
recharge up to 5Ah, which correspond to some of the
ter equipped to address the challenges of energy
smartphone with the largest batteries available at the
sustainability, which might in turn lead to more
time of writing.
effective data collection, analysis, and dissemina-
Next, we obtain some insight about the useful
tion, further supporting the sustainability of the
online social information with respect to the pattern
broader pervasive computing sector.
of the contacts of pairs of “online friends.” Given that
in this case we consider very limited Facebook status
data exchanges (which can be performed over long ACKNOWLEDGMENTS
distance in minimal time instead), we take into This work was supported in part by the European Union
account all the contacts of the users, regardless of under the Italian National Recovery and Resilience Plan
their RSSI levels or time of contact. The trend of the of NextGenerationEU, partnership on “Telecommunica-
increase of contacts over time is displayed visually in tions of the Future” (PE00000001—program “RESTART”).
Figure 6. The results demonstrate that, after taking Open Access funding provided by ‘Consiglio
into account the entire number of contacts, out of Nazionale delle Ricerche-CARI-CARE-ITALY’ within
2.418.901 contacts, a significant portion of 25.68% the CRUI CARE Agreement.
took place between online friends. This finding of a
significant amount of contacts that took place “online
friends” strengthens the motivation behind Green- REFERENCES
Crowd’s approach toward fusing the crowd charging 1. C. H. Liu, Z. Dai, Y. Zhao, J. Crowcroft, D. Wu, and K. K.
process with the abundant available online social Leung, “Distributed and energy-efficient mobile
information for better fine-tuning the energy sharing crowdsensing with charging stations by deep
functions. reinforcement learning,” IEEE Trans. Mobile Comput.,
As a final note, we also want to state that these vol. 20, no. 1, pp. 130–146, Jan. 2021.
results were obtained without considering any volun- 2. A. Dhungana, T. Arodz, and E. Bulut, “Exploiting peer-
tary movement from users. In other words, we have to-peer wireless energy sharing for mobile charging
derived this number considering the usual mobility of relief,” Ad Hoc Netw., vol. 91, 2019, Art. no. 101882.
3. E. Shamsa et al., “Ubar: User- and battery-aware resource 13. Q. Zhang, F. Li, and Y. Wang, “Mobile crowd wireless
management for smartphones,” ACM Trans. Embedded charging toward rechargeable sensors for Internet of
Comput. Syst., vol. 20, no. 3, pp. 1–25, Mar. 2021. Things,” IEEE Internet Things J., vol. 5, no. 6,
4. Z. Wang, J. Hu, J. Zhao, D. Yang, H. Chen, and Q. Wang, pp. 5337–5347, Dec. 2018.
“Pay on-demand: Dynamic incentive and task 14. O. Curry, S. G. B. Roberts, and R. I. M. Dunbar,
selection for location-dependent mobile crowdsensing “Altruism in social networks: Evidence for a ‘kinship
systems,” in Proc. IEEE 38th Int. Conf. Distrib. Comput. premium’,” Brit. J. Psychol., vol. 104, no. 2, pp. 283–295,
Syst., 2018, pp. 611–621. 2013.
5. J. Hu, K. Yang, K. Wang, and K. Zhang, “A Blockchain- 15. R. Dunbar, V. Arnaboldi, M. Conti, and A. Passarella, “The
based reward mechanism for mobile crowdsensing,” structure of online social networks mirrors those in the
IEEE Trans. Comput. Social Syst., vol. 7, no. 1, offline world,” Social Netw., vol. 43, pp. 39–47, 2015.
pp. 178–191, Feb. 2020. 16. J. Surma, “Social exchange in online social networks.
ndez-Orallo, C. Borrego, P. Manzoni, J. M.
6. E. Herna the reciprocity phenomenon on Facebook,” Comput.
Marquez-Barja, J. C. Cano, and C. T. Calafate, “Optimising Commun., vol. 73, pp. 342–346, 2016.
data diffusion while reducing local resources 17. P. Sapiezynski, A. Stopczynski, D. D. Lassen, and S.
consumption in opportunistic mobile crowdsensing,” Lehmann, “Interaction data from the Copenhagen
Pervasive Mobile Comput., vol. 67, 2020, Art. no. 101201. networks study,” Sci. Data, vol. 6, no. 1, p. 315, 2019.
7. J. Ni, K. Zhang, X. Lin, Q. Xia, and X. S. Shen, “Privacy- 18. T. Ojha, T. P. Raptis, M. Conti, and A. Passarella,
preserving mobile crowdsensing for located-based “Wireless crowd charging with battery aging
applications,” in Proc. IEEE Int. Conf. Commun., 2017, mitigation,” in Proc. IEEE Int. Conf. Smart Comput.,
pp. 1–6. 2022, pp. 142–149.
8. K. Yan, G. Luo, X. Zheng, L. Tian, and A. M. V. V. Sai, “A
comprehensive location-privacy-awareness task THEOFANIS P. RAPTIS is a senior researcher with the
selection mechanism in mobile crowd-sensing,” IEEE National Research Council of Italy, 56124, Pisa, Italy. His cur-
Access, vol. 7, pp. 77541–77554, 2019.
rent research interests include industrial networks, wirelessly
9. F. Montori and L. Bedogni, “A privacy preserving
powered networks, and IoT testbeds and platforms. Raptis
framework for rewarding users in opportunistic mobile
received his Ph.D. degree from the University of Patras,
crowdsensing,” in PerCom Workshops, 2020, pp. 1–6.
10. E. Bulut and A. Dhungana, “Social-aware energy Greece. He is the corresponding author of this article. Con-
balancing in mobile opportunistic networks,” in Proc. tact him at [email protected].
IEEE 16th Int. Conf. Distrib. Comput. Sensor Syst., 2020,
pp. 362–367. LUCA BEDOGNI is an associate professor with the University
11. E. Bulut, S. Hernandez, A. Dhungana, and B. K. of Modena and Reggio Emilia, 41125, Modena, Italy. His cur-
Szymanski, “Is crowdcharging possible?,” in Proc. rent research interests include the study of privacy aware
IEEE 27th Int. Conf. Comput. Commun. Netw., 2018,
context aware system, in the domains of IoT, crowdsensing
pp. 1–9.
and mobile applications. Bedogni received his Ph.D. degree
12. L.-R. Dung, C.-E. Chen, and H.-F. Yuan, “A robust,
from the University of Bologna, Italy. He is a member of IEEE
intelligent CC-CV fast charger for aging lithium
batteries,” in Proc. IEEE 25th Int. Symp. Ind. Electron., and ACM. Contact him at [email protected].
2016, pp. 268–273.
This article proposes a unified threat landscape for participatory crowd sensing
(P-CS) systems. Specifically, it focuses on attacks from organized malicious actors
that may use the knowledge of P-CS platform’s operations and exploit algorithmic
weaknesses in AI-based methods of event trust, user reputation, decision-making,
or recommendation models deployed to preserve information integrity in P-CS. We
emphasize on intent driven malicious behaviors by advanced adversaries and how
attacks are crafted to achieve those attack impacts. Three directions of the threat
model are introduced, such as attack goals, types, and strategies. We expand on
how various strategies are linked with different attack types and goals,
underscoring formal definition, their relevance, and impact on the P-CS platform.
W
ith the growing penetration of smart sensed quantity for situational event inference. Each
hand-held devices and smartphone participant’s report is compared with the output of
apps, various forms of crowd sensing truth discovery to assign and update users’ long-term
(CS) applications have emerged. In CS applications, reputation score.
human users are involved in providing reports or Participatory CS (P-CS): The P-CS subdomain, in
sensed data that improve civic well-being via perva- contrast, requires explicit human involvement, where
sive smart services. The goal of the CS application is some users (called “reporters”) manually contributes
to identify the correct event based on the reports/ observations in the form of reports, or any piece of
data and disburse incentives to those users helping information that is not an analog signal. In such sce-
in event identification. The incentive disbursement is nario, the approaches used in O-CS for finding truth-
critical in keeping the churn under control in such fulness of events or assigning user reputation do not
commercial applications. always apply. The “Participatory sensing” is analogous
to Social Media (where the users offer voluntary posts
on public groups and pages); hence many works use
TYPES OF CS PLATFORMS the broader term of social sensing. Nonetheless, the
The CS paradigm is classified into two subdomains— following differences exist with pure social media: 1) a
Opportunistic and Participatory—as described below. dedicated crowd reporting app (e.g., Google’s Waze
Opportunistic or Passive CS (O-CS): In O-CS, users App,a Yelp) is used instead of a social media app, and
agree to the usage of their personal devices as a sen- 2) one usually cannot share/forward other’s reports
sor. The O-CS app submits data automatically “with- but can only provide a feedback/reaction. Thus, les-
out” explicit human involvement. In this scenario, the sons learnt from P-CS vulnerabilities can partially help
report is an analog signal and thus similar to sensor systematize social sensing vulnerabilities as well.
networks. Therefore, many research works involving
O-CS setting borrow methods from statistics (e.g.,
INFORMATION INTEGRITY
maximum likelihood estimates) and statistical
CHALLENGES IN P-CS
machine learning (e.g., expectation maximization algo-
While the incentives attached to the contribution of
rithms) for computing truthful aggregate value of a
reports encourage participation, it also motivates
rogue reports from selfish users. Furthermore, orches-
1536-1268 ß 2023 IEEE trated false reports may cause incorrect events to be
Digital Object Identifier 10.1109/MPRV.2023.3296271
Date of publication 28 July 2023; date of current version 30
a
November 2023. [Online]. Available: www.waze.com
66 IEEE Pervasive Computing Published by the IEEE Computer Society October-December 2023
FEATURE ARTICLE
published in P-CS, thus having civilian and economic goal, the attacker may need one or more attack
impacts, which motivates organized malicious adver- types that depend on the stage of P-CS informa-
saries. Nonetheless, one critical challenge in CS appli- tion integrity being targeted. Furthermore,
cations is event trustworthiness or truthfulness. depending on the intended impact and the
Furthermore, determining which participants are hon- adversary’s level of prior knowledge, the attack
est or dishonest via a reputation scoring model is types can be launched using one or more attack
another typical challenge. In the literature, artificial strategies that belong to a certain attack type to
intelligence enabled computational trust and reputa- attain an attack goal.
tion models have been proposed to solve both chal- › To specify the attacker’s intent, we propose five
lenges. However, these models have weaknesses in possible attack goals: 1) induce false events, 2)
the design principles and P-CS operation design loop- suppress true events, 3) alter event types, 4) poi-
holes, which keep the door open for organized mali- son user reputation model, and 5) steal the event
cious intent to harm the P-CS platform’s integrity. publishing model.
› We propose three attack types: 1) sensory
WHY A FORMAL P-CS THREAT manipulation targeting weakness in the report-
LANDSCAPE? ing stage, 2) feedback weaponizing attack strat-
In the O-CS domain, the threat model is similar to egies targeting weakness in the rating feedback
those in cyber-physical systems and sensor net- stage, and 3) belief manipulation attacks target-
works, and does not require much leap of faith. ing weakness in the decision-making phase.
Hence, we do not discuss O-CS in this work. How- › For each attack type, we propose multiple attack
ever, our analysis of existing security literature in strategies, their relevance, and impact.
the P-CS domain revealed a lack of unified discus- › We highlight how our proposed strategies are
sion on strong and elaborate threat models specific linked to different goals and what types of attack
to P-CS. Those threats arise from the complex strategies require more research.
cyber-physical-human couplings, and design weak-
nesses in trust, reputation, and decision-making UNDERSTANDING THE
models in P-CS. Thus, an important motivation of INFORMATION INTEGRITY
this article is to consolidate various possible tar- PIPELINE IN P-CS
geted threats, specifically relevant to P-CS. We aim This section describes typical design stages of P-CS
to provide a guide for future designers wishing to platform and gives examples of CS Apps. such as
build secure and robust-by-design P-CS platforms. Waze and Yelp to illustrate how the design features
are seen in real-life apps. This will enable readers to
SCOPE OF THREAT MODEL relate to the threat landscape that has a more generic
There exists a lot of research in securing the P-CS treatment.
domain that deals with traditional well-known attacks
common to any networked system, such as Sybil Stages of P-CS Operation
attacks, privacy attacks, unauthorized access, etc. Our As explained below, a typical P-CS platform consists of
goal is to add and formalize a targeted threat model of three operational stages—Reporting or contributions,
information integrity specific to P-CS. Therefore, we do decision-making, and feedback monitoring.
not discuss commonly reported threats that do not Reporting: This stage involves voluntary contribu-
directly relate to algorithmic weaknesses of trust, repu- tions from the crowd indicating a particular event or
tation scoring, decision models, or procedural loopholes response to task.
in P-CS operations. Additionally, our threat model Decision-Making: The reports are collected by a P-
focuses on attacks that originate from organized mali- CS server and a recommendation is made based on
cious intent rather than individual selfish intent. an event decision-making model running on the P-CS
server that decides how to process various reports
ARTICLE CONTRIBUTIONS into a publishing a recommendation or event.
Our novel contributions are as follows: Feedback Monitoring: The published events can be
rated based on the perceived usefulness (e.g., yes/no)
› We propose a threat landscape spanning three by other users, called raters, with respect to that event.
main directions: 1) attack goals, 2) attack types, Real-Life Example of P-CS: Figure 1 illustrates an
and 3) attack strategies. To achieve an attack abstraction of a P-CS application for vehicular event
CS, such as Google Waze app, where the reporters malicious user can perform all three roles with respect
submit location tagged “reports” by clicking one out of to an event in P-CS.
the following events—road closure, jam, accident,
weather hazard, police presence, gas station pricing,
etc. The P-CS server decides whether and how long to Unified View of Information Integrity
publish this event on the Waze app; there is also an Pipeline
option for consumers to rate the perceived usefulness Regardless of the actual application, the architecture
of the events published. A similar abstraction exists in of assuring information integrity has the following
social sensing apps, such as Yelp,b where reports are overarching abstraction.
submitted in the form of a review on a business. Each Upon launching a new P-CS, the initial stages are
report is visible separately on the app that can be known as the cold start phase, where the user reputa-
rated by other users. For example, Yelp allows three tions are not known. Usually, in the cold start phase,
feedbacks to each post/comment while Waze allows the events are published by the decision-making
two feedbacks. The reports and feedbacks are com- model based on the contextual correlations among
bined to form an opinion on the business, and Yelp reports (e.g., event type, time, location, threshold num-
sorts them to recommend a business. ber of reports) in an area.12
In many practical systems as well as novel
Different User Roles research,2 the P-CS implements a mechanism known
The users in a CS paradigm can be classified into vari- as feedback monitoring that asks the crowd to rate or
ous roles, such as reporters, raters, and passive con- give a feedback on their perception of how truthful an
sumers. From the perspective of an event or entity event is. The data acquired as part of the feedback
which needs reporting, the users that contribute infor- monitoring are used to verify, in retrospect, the event’s
mation on that event are reporters with respect to truthfulness or trustworthiness. The event’s veracity is
that event (or entity). A subset of the remaining user indicative of the honesty levels (reputation) of ’those
base, known as raters, can give feedback on the use- users who submitted the reports corresponding to
fulness of the published event. The user base which this event. Intuitively, if the event truthfulness is high,
neither reports nor rates a given event is a passive the reputation of users reporting highly truthful event,
consumer with respect to that event. gets their reputation increased, and vice versa.
Across different events, however, a user of a P-CS Once a reliable user reputation base is established,
app can act as a reporter, rater, or passive consumer the P-CS enters the steady-state phase of operation.
based on their roles with respect to that event. It is In this phase, the decision-making model takes into
assumed that the system does not allow the same account three major factors: 1) prior reputation of
user to rate its own report. If the attacker recruits a users submitting a report; 2) contextual probability of
user or hacks apps to work for his attack goals, then a that event occurring, and 3) contextual correlations
and quantities deciding whether or not to publish an
event. For in-depth discussions on this unified view,
b
[Online]. Available: yelp.com refer to Restuccia et al.5 and Bhattacharjee et al.2
Note that in the steady-state phase, the P-CS still Categories of Attacker Intent
keeps the feedback/rating mechanism since new The following types of intent can undermine the infor-
users join and old users may become inactive. mation integrity of P-CS platforms.
the adversaries induce an “incorrect action” to exacer- adversary can now use its budget more efficiently, a
bate the consequences of the event that did happen. wider civilian and economic impact of previous attack
For example, let there be a congestion in a certain goals can be achieved.
part of the city, but the malicious reporters falsely To achieve the above objectives, different catego-
report low gas prices in the same area. By combining ries attack types can be developed as discussed next.
strategies, such as false category flagging and concur- The attack types depend on which stage of the P-CS
rent cross feedback strategies (discussed later), the operating cycle the attacker wants to realize its goal.
CS server can be triggered into having sufficient confi- Each attack type can have multiple attack strategies
dence on publishing the event of low gas prices. This classified under it. Note that the attack goals are com-
may cause many passive consumers of CS to reach plex and the attacker may require a combination of
this area of the city, worsening the civilian impact of strategies to achieve them, as illustrated in Figure 2.
the congestion already present.
knowledge the attacker has. The level of knowledge is: Grey Box Ghosting
1) complete (white box attack); 2) partial (gray box), Many research solutions use “context” similarity
and 3) no knowledge (black box). The strategies also among reports5,12 to compute the event trust or infer
depend on the attack goal, and there is a goal to strat- the correct event. Some methods known as “truth dis-
egy mapping as described in Figure 2. covery” incorporate correlation, maximum likelihood
estimate, and expectation maximization (first pro-
Sensory Manipulation posed by Wang et al.17) from the received reports to
These attacks exploit weaknesses in the event report- find the correct event. Regardless of the techniques,
ing phase of P-CS operations. The adversary compro- the common assumption is that the majority of the par-
mises (recruits) a set of malicious reporters who ticipant reporters are honest except some unreliable
submit fake reports strategically. We propose three participants with isolated selfish objectives; therefore,
targeted attack strategies to launch sensory manipu- this method works. While the above assumption may
lation: 1) Gray Box Probe, 2) Black Box Probe, and 3) sound reasonable in theory, a common practical fea-
False Category Flagging. ture in P-CS is that “the honest participants need not
report anything in the absence of an event.” Hence,
Feedback Weaponizing Manipulation high correlation and similarity among false reports
These attacks exploit weaknesses in the feedback from an adversary is implicitly guaranteed regardless
monitoring phase that collects evidence to quantify of the method used to compute such similarity or truth
the truthfulness of events contributed by the reporters. discovery, making this attack relevant.
Formally, these attacks involve submitting a dishonest The attacker submits a number of fake reports
feedback by a rating user recruited/compromised by when there is no event, and ensures that all fake
the adversary for different events. The feedback weap- events agree on the event type and in the same spatial
onizing includes specific attack strategies such as 1) or temporal context.
targeted ballot stuffing, 2) targeted bad mouthing, 3) Hence, methods based on correlation, similarity,
targeted obfuscation stuffing, 4) orchestrated sequen- truth discovery, and voting cannot prevent against
tial toggle feedback, and 5) concurrent cross feedback. such collusive sensory manipulation attacks in P-CS.
Such methods can only help find the correct event
Belief Manipulation type, if an event did occur. The above exploit is a grey
These attacks exploit algorithmic biases that originate box strategy since it requires some knowledge of the
from the use of “prior event likelihoods” and “prior user design philosophy that context similarity or correlation
reputation” that act as weights in most truth discovery in reports are used to quantify truthfulness of events.
and decision-making schemes, post the cold start
training phase. Formally, belief manipulation attack Black Box Probing
type involve strategies that exploit the dependence on The gray box ghosting is simple in itself, but has one
learnt beliefs and in turn utilize such beliefs to craft flaw in the sense that the adversary does not know
attacks that nudge P-CS into taking wrong decisions. how many fake reports are sufficient to actually trig-
Analogically, they are similar to evasion in machine ger a fake event to be published. The adversary needs
learning, where a sample input in the test phase is to steal the above information of the event publishing
incorrectly classified by a model. The belief manipula- model, to make its sensory manipulation attacks (like
tion includes specific attack strategies: 1) reputation gray box ghosting) very effective.
stuffing, 2) bias stuffing, and 3) exclusion stuffing. To achieve the above, the attacker launches a black
box probe strategy: During the reporting phase, the
adversary recruits (or deploys) a set of participant users
SPECIFIC ATTACK STRATEGIES IN and blends itself in the user population. This malicious
P-CS reporter base tries different candidate numbers of false
In this section, we put forward various possible attack reports and false event categories, and monitors which
strategies under each category of attack type and dis- combinations were successful in inducing a false event
cuss how they achieve various attack goals given the and which ones failed to induce a false event.
malicious intent. Note that, since P-CS is an open paradigm, an
event’s presence or absence on the mobile app are
Sensory Manipulation Strategies visible to all the users. The absence of the fake event
The false reports can be intelligently submitted in the on the app proves that the input attack combination
cold start phase by the following approaches: was invalid. Thus, the adversary can learn an input–
output relationship between the candidate attack The strategy works as follows: When or where there
inputs and the boundary between successful and is low participation in the rating process, the adversary
failed false events triggered in the app. This allows the focuses his budget in those contexts, to ensure a high
adversary to learn the lowest quantity of false reports proportion of fake positive ratings given to false events,
in order to induce a false event. By preventing local even with a seemingly low attack budget. Therefore, a
overprovisioning of its total attack budget, the adver- false event ends up with a high event truthfulness score
sary can improve its network wide spatial attack cov- and incorrectly appears to be true to the P-CS.
erage or save the remaining budget for other attack
types (e.g., feedback weaponizing.)
mouthing attacks, it suffers from the same vulnerabil- Concurrent Cross Feedback
ities as low participation in ratings and lack of incen- This attack is relevant only when each user report is
tives attached to the ratings. separately visible to the rater population (e.g., social
media plug-ins, Yelp, Yik Yak) and each report indi-
cates an event type. The goal is to allow the P-CS
server to make an error in judging the correct event
THUS, THE DECISION-MAKING type using the feedback apparatus.
MODULE BELIEVES IN THE EVENT. Using its recruited user base, the adversary concur-
rently give positive feedback to the reports with incor-
rect event category (from malicious reporters), and
The strategy works as follows: In contexts with low negative feedback to the reports with correct event
participation in the rating process, the adversary puts category (from honest reporters) for the same event.
its budget in those contexts, to ensure a high propor- The impact of strategy is that it enhances the
tion of fake negative ratingsto the true events, even chance of the CS server making an error in judging the
with a seemingly low attack budget. Therefore, true correct event type, inducing a misguided response.
events end up with a low event truthfulness score and
incorrectly appears as a false event to the P-CS. Belief Manipulation Strategies
Consequently, the P-CS platform withdraws these Three aspects are typically used to take decisions in a
published events, resulting in the suppression of true typical P-CS in the steady-state phase: 1) prior reputa-
events. Then, the user reputation system will penalize tion of the reporters,5 2) historical contextual likeli-
those honest users reporting this event (since truth- hood of the event,12 and 3) quantity of unique reports
fulness of events is key to improving reputations). indicating an event. Typically, a weighted approach is
After repeating this attack multiple times, the honest taken that is some variation of weighted reputation
participants end up with lower reputation scores hav- aggregation13 or decision tree formulation19 to decide
ing the following impacts: Honest participants with whether or not to publish and report in the steady-
low reputation will not get a high weight during the state phase. Below, we summarize the type of attacks
test phase decision-making and will also get progres- that are possible under this category.
sively lower or no incentives, thereby discouraging
them and new users to participate truthfully. Thus, a
P-CS will be left with a user base that consists of par- Reputation Stuffing
ticipants largely controlled by the adversary. Since decision trees or weighted reputation aggrega-
tion methods give higher importance to the more
Orchestrated Sequential Toggle Feedback reputed participants in the event publishing models, it
This kind of attack strategy is relevant if the adversary would make sense for an adversary to recruit/compro-
has a long-term objective of poisoning the user reputa- mise highly or most reputed users.
tion learning process. The attack happens as follows: The adversary recruits/compromises a fraction of
Orchestrated bad mouthing and ballot stuffing are highly reputed participants and asks them to report a
launched in alternating manner to different events fake event in the same context; thus, the decision-
over time. First, targeted bad mouthing will slowly dis- making module believes in the event.
courage the honest user base to refrain from participa- The event accuracy will drop while the event inac-
tion. Then, via targeted ballot stuffing, the user base curacy will rise with the increase of the fraction of
will be simultaneously replaced with compromised par- most highly reputed participants recruited for false
ticipants having artificially boosted reputations. reporting.
The impact will be a P-CS system with a seemingly
high trusted base controlled by a motivated adversary, Bias Stuffing
and faces little competition from honest users. This A high importance is given to the prior likelihood of
will destroy the credibility of the P-CS provider. The event (given a context) from the cold start phase, in
impact of a sequential toggle feedback attack is most event publishing models.
remarkably different compared to just ballot stuffing, The adversary exploits the bias toward high prior
bad mouthing, or obfuscation stuffing. It will create a likelihood of events in decision-making and decision
completely poisoned reputation base, where the mali- classification models used in the steady-state phase.
cious or dishonest users have higher reputation com- Basically, it spoofs a false event report from its recruited
pared to the honest users. malicious user base strategically in “contexts,” where
13. Y. Li et al., “Conflicts to harmony: A framework for SHAMEEK BHATTACHARJEE is an assistant professor with
resolving conflicts in heterogeneous data by truth Western Michigan University, Kalamazoo, MI, 49008, USA.
discovery,” IEEE Trans. Knowl. Data Eng., vol. 28, no. 8,
His research interests include theory of anomaly detection,
pp. 1986–1999, Aug. 2016.
artificial intelligence based security, and data science for
14. C. Miao, Q. Li, H. Xiao, W. Jiang, M. Huai, and L. Su,
cyber security. Bhattacharjee received his Ph.D. degree in
“Towards data poisoning attacks in crowd sensing
systems,” in Proc. 8th ACM Int. Symp. Mobile Ad Hoc computer engineering from the University of Central Flor-
Netw. Comput., 2018, pp. 111–120. ida. He is an IEEE and ACM professional member. He is the
15. 2014. Accessed: Jul. 2023. [Online]. Available: https:// corresponding author of this article. Contact him at
www.androidauthority.com/residents-put-fake- [email protected].
reports-waze-divert-traffic-574744/
16. D. Yue Zhang, J. Badilla, Y. Zhang, and D. Wang, SAJAL K. DAS is a curators’ distinguished professor of com-
“Towards reliable missing truth discovery in online
puter science and Daniel St. Clair Endowed Chair with the
social media sensing applications,” in Proc. IEEE/ACM
Int. Conf. Adv. Social Netw. Anal. Mining, 2018, Missouri University of Science and Technology, Rolla, MO,
pp. 143–150. 65409, USA. His research interests include cyber-physical
17. D. Wang, L. Kaplan, H. Le, and T. Abdelzaher, “On truth systems, cybersecurity, machine learning, pervasive and
discovery in social sensing: A maximum likelihood
mobile computing, wireless sensor networks, and mobile
estimation approach,” in Proc. IEEE/ACM Intl. Conf. Inf.
crowdsensing. Das received his Ph.D. degree in computer sci-
Process. Sensor Netw., 2012, pp. 233–244.
18. Y. Li et al., “A survey on truth discovery,” ACM SIGKDD ence from the University of Central Florida, Orlando, FL, USA.
Explorations Newslett., vol. 17, no. 2, pp. 1–16, 2016. He is a fellow of IEEE. Contact him at [email protected].
19. Y. Du, V. Issarny, and F. Sailhan, “User-centric context
inference for mobile crowdsensing,” in Proc. ACM IoT-
Des. Implementation, 2019, pp. 261–266.
Evolving Career
Opportunities
Explore new options—upload your resume today
3/26/2
0 10:23
AM
Intern
et of
Ethic Thing
s s
Mach
ine L
Quan ea rning
tu
Comp m
uting
JUNE
2020
JULY 20
ww w.c 20
ompu
ter.org
ww w.c
ompu
ter.org
ce7c1.ind
d 1
5/20/20
7:57 PM
6/24/20
1:20 PM
Secu
rity an
Priva d
cy High-P
Auto
matio
n Comp erforman
Block uting ce
chain Hard
Digit ware
al Affect
Transf iv
ormat Comp e
ion uting
Educa
tion
MAY 20
20
w w w.c
ompu
ter.org
ce5c1.
indd
1
S
ww w.c NOVE
ompu MBER
ter.org 2020
ww w.c
ompu
ter.org
4/22/2
0 5:3
3 PM
ce11c1.in
dd 1
7/22/20
3:51 PM
ce9c1.
indd
10/22/20 1
10:20
ComputingEdge
AM
Secu
rit
Priva y and
cy
Data
Intern
et
Artifi
cial
Intell
Cutting-edge OCTO
BER 20
Unique original Keeps you up to
articles from the content by date on what you
20
w w w.c
ompu
ce10c1
.indd
1
Society’s portfolio thought leaders, the technology
of 12 magazines. innovators, spectrum.
and experts.
9/23/2
0 12:48
PM