A Framework For Personal Data Protection in The IoT
A Framework For Personal Data Protection in The IoT
Abstract-loT personal devices are having an undoubtedly providing tracking parameters, thus making users
explosive growth as many different products become involved in competitions with other users. As reported
available on the market such as smart watches, contact by Yan et al [4], some of these platforms even adopt an
lenses, fitness bands, microchips under skin, among opt-out model for the social features - e.g., step counts
others. These devices and their related applications are shared by default unless the user switches it off. In
process the collected sensor data and use them to provide this respect, researchers show high risk of information
services to their users; in addition most of them require
leakage given the ease of inferring a user behavior such
data from other applications in order to enhance their
as a walking path from the pedometer outputs.
service. However, data sharing increases the risk for
In [5], the authors give insight into how activity
privacy protection since the aggregation of multiple data
trackers can be used in the field of education to monitor
may favor the prediction of unrevealed private
information. This paper presents a general framework for
students by parents and/or schooVcollege authority (e.g.,
managing the issue of privacy protection from unwanted
to deduce if the student is in the class or not). Moreover,
disclosure of personal data. The framework integrates two they show how other information, such as the
approaches in privacy protection: the use of personal data surrounding noise level and location, can be used in
managers to control and manage user data sharing and combination with the step count of the student to
the use of techniques for inference prevention. The produce further inferences.
contribution of the framework is to exploit the advantages In recent years, privacy concerns have given rise to
of the two mentioned lines of research in a user-centric new proposals for developing platforms that support
approach that empowers users with higher control of their users to control personal data storage and access
data and makes them aware about data-sharing decisions. permissions to third-party applications and devices [6, 7,
8]. These platforms typically require users to specify
Keywords- user data sharing; personal data what they want to hide from or share with a third-party
management; privacy; inference attacks application.
However, access control does not solve the problem
I. INTRODUCTION of possible inferences since they can be coming from the
The popularity of wearable devices, smart home third parties which are granted with authorized access by
appliances, fitness devices and health care monitoring the user. Several studies deal with the issue of personal
systems have increased the collection, exchange and data disclosure by using chains of reasoning that lead to
sharing of personal user data among applications. For discover protected data using only intentionally
disclosed information [9].
instance, monitoring the body movement is necessary
for a patient's rehabilitation. Friedman et al [1] In this paper, we present a general framework for
developed a device to monitor the wrist and hand personal data protection. Starting from the idea of
movements through capturing angular distance traveled developing an Adaptive Inference Discovery Service
by wrist and fmger joints which are useful for stroke (AID-S) [10], we defme a general framework that
rehabilitation. Body movement monitoring is also a key integrates personal data management functionalities with
factor for the applications heavily relying on human the inference discovery functionality and takes into
computer interactions such as gesture-based games. account the individuals' perception of privacy. In this
Physiological data shared for healthcare studies can paper, we provide the general architecture, the
also be used to infer addictions such as smoking or description of each building block, the interaction flow
drinking [2]. For instance, data from wrist-worn sensors and examples of integration with different personal data
have been used to classify smoking and eating gestures management platforms.
[3]. Moreover, many popular fitness-tracking The contribution of the framework is to exploit the
wristbands, such as Fitbit Flex and Jawbone UP, provide advantages of the two mentioned lines of research in a
services for sharing data on social systems besides user-centric approach that empowers users with higher
and personal data managed by PDM and AID-S do computes the risks associated to general time series data
not refer to a common vocabulary. using stochastic approaches.
• Finally, values in privacy preference settings have
C. Recommendation Strategies
to be converted to value formats and scales that fit
the inference risk computation algorithm. As described above, another main function of AID-S
Reasonably they are numeric values that will be is its capability to recommend optimal solutions to
used as privacy thresholds for the inference risk ensure the privacy of the user. In this framework we
computation. propose two main strategies:
I) Recommending the optimal privacy setting
B. Inference Risk Estimation An optimal privacy setting is defmed with respect to
The objective of this task is to compute the risk a the set of data required by the third party in the Policy
third party may infer private data given the available Statement. It is the set of personal data that represents
public data released to it. the optimal balance between minimizing the risk of
An ideal representation of inference probabilities inferring personal data and maximizing the number of
based on dependencies among user data, independently data to be shared with the third party.
of shared and non-shared data, is the Inference Matrix I Maximizing the number of shared data is aimed to
maximize the utility of the service provided by the third
party. Of course, recommending the optimal privacy
setting is possible for all the personal data in the subset
of lij. which is taken into account in a specific AID-S
implementation.
From the user's point of view, AID's can
where: recommend which personal data item should not to be
• Ii,j is the matrix containing all the probabilities shared since it heavily increases the inference risk of
of inferring an attribute; ai , given the another or a set of other personal data. Conversely, it can
dependencies of a set of other attributes Cj; also recommend which data can be shared since it is not
• ai E D (1 :::; i :::; IDI) (user Data Set) heavily correlated with any inference risk.
• Cj E P{D}(l :::; j :::; IP{D}I) (Power Set of D) Depending on the specific AID-S implementation,
• P (ai h) E [0,1] the recommendation could be provided directly to the
user, through the PDM's dialog manager, or, it could be
preceded by an attempt of automatic negotiation with the
The computed probabilities depend on the
third party, aimed to balances the privacy of the user
probabilistic model or learning algorithm that can be
data and the utility of the third party service.
deployed (e.g., RST, KNN, Bayes Filter, HMM, etc.). It
This recommendation strategy is efficient since data
should be noted that the method is open to any
processing techniques (e.g., aggregation, transformation
probabilistic model that can be found in the literature.
and obfuscation) will not be used: the whole process for
For example, the study in [12] uses RST, NaiVe Bayes
privacy protection is based on the third party's Policy
and KNN to study the inference in social networks (e.g.,
Statement and on the user's privacy preferences.
Facebook). In these terms, the matrix can be ideally used
to represent all the possible inferences of personal data For example, if the user does not want herihis
based on user data correlation. location to be inferred by the third party, the privacy
Based on this matrix, the inference risk for a given preference for this data item is set to Pmax. Suppose a
user can be computed by considering her/his specific third party (e.g., fitness tracker) asks for the
privacy preferences that are used as thresholds ti for ai. accelerometer data to PDM, AID-S is able to check all
The Inference Matrix is an abstract representation the correlations among data through Ii•j and concludes
but several algorithms are available to compute that the accelerometer data can be shared since the
inference measures for limited subsets of IiJ. probability of inferring the user location does not reach
A working example is studied in [4] which computes the privacy threshold for location (equal to Pmax in the
the probability of inferring the user's typical paths (e.g., example) for any combination with the accelerometer
going to coffee shop, grocery, outdoors, etc.) only by data.
exploiting the steps per minute computed from a fitness
loT pedometer. In that study it has been reported that as 2) Recommending data transformation
long as the threshold value, E, (denoting the Euclidian
The second strategy for recommendation concerns
distance between the steps-tracked sequence and the
computation on shared data. Data transformation (also
path query sequence) varies, the user path could be
known as data obfuscation or data perturbation) is a
inferred with at least approximately 50% of accuracy,
technique used to conceal private information in order to
thus, P(user behaviorlpedometer data) � 0.5. A
satisfy the user's privacy preferences. Figure 3 shows
related work on time series from Erdogdu et al [16]
this concept, further explained in [lO].
Third Party Pi
In erence
x �-------
health
insurance --�X
y
travel
,,
-�X
--------- z
User Data dn
Figure 3. The effect of the transformation T (e) which makes not feasible any more the inference of x, y, z .
The example in the figure illustrates the effect of To perform perturbation, they use a classical method
applying a transformation T on data item e (using of substituting the core with a more abstract or
transformation techniques sketched in Section 11). Given generalized annotation. This perturbation technique has
that ( a, b, e, d, e) are correlated with x and ( e, f, g) several levels of hierarchy and is called Generic
are correlated with y, by transforming e we obtain that Attribute Hierarchy (GAH). For example, instead of
two sets of correlations are broken, preventing the releasing the specific user attribute on the category
inference of x, y and z. "Favorite music", it can be perturbed as slow music.
For AID-S, the transformation has to balance two Different approaches can be found in the literature to
requirements: privacy of the user data and utility of the manage this task [11, 12].
third party service. This critical trade-off is the key for
VI. THE GENERAL WORKFLOW
both parties to conclude an agreement. In the example
above one perturbation is able to reduce the risk of This section thoroughly describes the interaction
inference of three data items. among the involving entities in this general framework.
This concept is well defmed in a study from Cai et al The general workflow is showed in Figure 2. It is
[12] where the authors propose a method (i.e., collective assumed that a standard way of communication exists
method) to protect the user from possible inferences of between the third parties and the PDMs in order for the
third parties in social networks. The algorithm works as interaction to be feasible.
follows: The framework identifies two general phases for
system operation namely, Initialization and Real-time
1) if PDAs UDAs 0,
n =
Monitoring. Initialization phase takes place when a new
2) then remove PDAs third party's request has been received. Subsequently,
3) else, remove PDAs - core; and Real-time monitoring takes place and tracks for dynamic
4) perturbing core
changes made by the entities.
A. Initialization
In their model, an attribute (or in our study, a user The initialization phase starts after the PDM receives
data item, ai) can be classified according to two criteria: a Policy Statement from the third party. As shown in the
PDA and UDA. The former means Privacy-Dependent, first block, PDM checks the personal data items that are
i.e., user data is set private or is estimated as private, the needed by the third party, as declared in the Policy
latter means Utility-Dependent, i.e., user data is Statement and sends a request of inference check to
requested by the third party to provide its service. The AID-S.
first step of the algorithm checks if there are no user data Since the inference risk computation is based on the
items that belong to both PDAs and UDAs. If none, then user privacy preferences, PDM sends AID-S the user
PDAs will be removed and only UDAs will be released profile managed by the User Privacy Setting in order for
to the third party. If there is, then this user data (termed AID-S to operate the next step. AID-S threshold setting,
as core) must be perturbed before being released to the as explained in Section V-A, uses a mathematical
third party.
computation to estimate privacy preferences and data can now be exchanged freely and this concludes the
thresholds given the user profiles provided by PDM. initialization workflow.
B. Rea/time Monitoring
[2] E. Ertin, N. Stohs, S. Kumar, A. Raij, M. al'Absi, S. Shah,. [9] C. Farkas and Stoica, A. G. , "Correlated Data Inference," Data
"AutoSense : Unobtrusively Wearable Sensor Suite for Inferring and Applications Security XVII. Springer US, 2004, pp. 119-
the Onset , Causality , and Consequences of Stress in the Field", 132.
In Proceedings 9th ACM Conference on Embedded Networked [I0] I. Torre, G. Adorni, F. Koceva, O. Sanchez, "Preventing
Sensor Systems, pp.274-287, 2011. Disclosure of Personal Data in loT Networks", In Proceedings
[3] A. Nahapetia, "Side-channel attacks on mobile and wearable of the 12th International Conference on Signal Image
systems", 13th IEEE Annual Consumer Communications and Technology & Internet Based Systems", Naples, Italy 28
Networking Conference, CCNC 2016, pp. 243-247, January 9 November - I December 2016.
2016. [II] S. H. Ahmadinejad, P. W. Fong, R. Safavi-Naini, "Privacy and
[4] T. Yan, Y. Lu, N. Zhang, "Privacy Disclosure from Wearable Utility of Inference Control Mechanisms for Social Computing
Devices", Proceedings of the 2015 Workshop on Privacy-Aware Applications", In Proceedings of the 11th ACM on Asia
Mobile Computing - PAMCO '15, pp.13-18, 2015. Conference on Computer and Communications Security, ACM,
[5] Z. Huo,M. Xiaofeng, Z. Rui, "Feel free to check-in: Privacy pp. 829-840, 2016.
alert against hidden location inference attacks in GeoSNs.", [I2] Z. Cai, Z. He, X. Guan, Y. Li, "Collective Data-Sanitization for
International Conference on Database Systems for Advanced Preventing Sensitive Information Inference Attacks in Social
Applications, Springer Berlin Heidelberg, 2013. Networks", IEEE Transactions on Dependable and Secure
[6] A. Chaudhr, J. Crowcroft, H. Howard, A. Madhavapeddy, R. Computing, 5971(c), 2016.
Mortier, H. Haddadi, D. McAuley, "Personal data: thinking [13] Y. Sun, L. Yin, L. Liu, S. Xin, "Toward inference attacks for k
inside the box", In Proceedings of The Fifth Decennial Aarhus anonymity", Personal and Ubiquitous Computing, 18(8), 1871-
Conference on Critical Alternatives, Aarhus University Press, 1880, 2014
pp. 29-32, 2015. [I4] S. Chakraborty, C. Shen, K. R. Raghavan, Y. Shoukry, M.
[7] G. Zyskind , O. Nathan, "Decentralizing privacy: Using Millar, M. Srivastava, "ipShield : A Framework For Enforcing
blockchain to protect personal data", In Security and Privacy Context-Aware Privacy", In 11th USENIX Symposium on
Workshops (SPW), IEEE, pp. 180-184, May 2015. Networked Systems Design and Implementation (NSDI 14), pp.
[8] M. Vescovi, C. Moiso, M. Pasolli, L. Cordin, F. Antonelli, 143-156. 2014.
"Building an eco-system of trusted services via user control and [IS] E. Szczekocka, J. Gromada, A. Filipowska, P. Jankowiak, P.
transparency on personal data", In IFIP International Conference Kaluzny, A. Brun, J.M. Portugal, 1. Staiano, "Managing
on Trust Management, Springer International Publishing, pp. Personal Information: A Telco Perspective".
240-250, May 2015. [I6] M. A. Erdogdu, N. Fawaz, "Privacy-utility trade-off under
continual observation" IEEE International Symposium on
Information Theory - Proceedings , pp.1801-1805. June 2015.