Crowd-Sourced Data Publishing
Crowd-Sourced Data Publishing
ISSN No:-2456-2165
Smita C Thomas3
Computer Science and Engineering
Mount Zion College of Engineering, Pathanamthittta
Abstract:- This paper outlines statistical publishing of representative for transmit the data to it using broadcast
real time data with strong protection. A person has mechanism. Along with ꞷ-event privacy and ɛ-differential
sensitive, non-sensitive data which needs protection privacy several other techniques are introduced level based,
when publishing it to public. For example health data, it multi-set generalization, K-anonymity methods for privacy
includes disease nature, health condition, medicine preservation.
prescription, patient name and details these are
sensitive data which may be leaked when publishing. In II. RELATED WORK
this, certain techniques are applied when transmitting
to public here introduces distributors and agents. Protection on statistical data publishing proposed
Propose a privacy preserving framework called privacy several techniques. Hong et al. [3] proposed K-anonymity
preserving distributed agent P2DA. Various techniques in publishing users logs. Winslett et.al. [1] realized one-
are included for prevent information loss they are level- time statistical data release for differentially private data
based, multi-set generalization, K-anonymity. publishing. Montjoye et al. [4], proposed differentially
private spatial decomposition techniques to partition the
Keywords:- Differential Privacy, Level-Based, Multi-Set space. Social network data publishing disclosure problem
Generalization, K-Anonymity, Data Publishing. address using randomized perturb method. Ganti et al. [5]
proposed privacy preserving time series data for statistics
I. INTRODUCTION publishing. Xiong et.al. [7], proposed a real time
monitoring of differential privacy using adaptive approach.
Explosive growth in volume and variety of data’s
generated due to the various applications. Various agencies Chen et al. [6] proposed a participant-density-aware
such as government and several organisation publish data privacy-preserving aggregate statistics scheme, making use
for research and data mining purposes. These data’s are of multi-pseudonym mechanism. X. Lu [2], proposed a
stored in table as rows and columns. Crowd sourced data spatial-temporal crowd-sourced data publishing in real time
from millions of users used for discover valuable with differential privacy.
information. However, data publishing is a risky factor that
linked back to the individuals. In digitalised world, privacy III. PROPOSED SYSTEM
of individuals is a challenging factor. To protect against
various privacy threats introduce several privacy The volcanic growth of data from various applications
preservation techniques. are collected by various data collectors. The finest example
is healthcare organization. The healthcare organization can
The existing privacy preservation of data publishing use statistical analysis to extract valuable information about
are ɛ-differential privacy and ꞷ- event privacy. ꞷ-event its patients for research purposes and doctor’s has able to
privacy, which protects event sequence in successive time retrieve patient’s medical records. Different organizations
stamps and ɛ-privacy provide one time statistical data done the research, and the control over the individual data
publishing. Traditional system follows trusted server for is hard to enforce. Also it will be used in a wrong way.
data collection and publishing. These trusted server maybe
hacked and becomes unsecure. Then the identity and The proposed system consist of two section: the agent
sensitive information be exposed. and the distributor. Here, the distributor registers the agent
and also determines priority of each agent during the time
During this paper, focus on crowd-sourced real time of registration. Each of the individual has its own priority
data publishing. A distributed agent based privacy that assigned at the time of registration and distributor
preservation for data publishing using untrusted server. In access request from registered agents. The agent is in
this multiple agents are introduced between the users and between user and the distributor. The agent who has direct
untrusted server. A users can randomly select one connection with the untrusted server and the requested
Anonymization techniques like generalisation, [1]. J. Xu, Z. Zhang, X. Xiao, Y. Yang, G. Yu, and M.
bucketization, and slicing. The first and foremost step is Winslett, “Differentially private histogram
attribute partitioning and next is removing of explicit publication,” The VLDB Journal, vol. 22, no. 6, pp.
identifiers. Generalisation which replaces quasi-identifiers 797– 822, 2013.
values with less specific values. However, k-anonymity [2]. Q. Wang, Y. Zhang, X. Lu, Z. Wang, Z. Qin, and K.
protects the identification disclosure and generalisation has Ren, “Rescuedp: Realtime spatio-temporal crowd-
considerable amount of information loss. A new level of sourced data publishing with differential privacy,” in
privacy was introduced for address homogeneity attack Proc. of IEEE INFOCOM, 2016, pp. 1–9.
called level based-diversity (l-diversity) there must be “l” [3]. Y. Hong, X. He, J. Vaidya, N. Adam, and V. Atluri,
well represented values for the sensitive attributes. Records “Effective anonymization of query logs,” in Proc. of
are allotted based on the count of sensitive attributes ACM CIKM. ACM, 2009, pp. 1465–1468.
happening and group the similar records and analyse it. [4]. G. Cormode, C. Procopiuc, D. Srivastava, E. Shen,
After check the diversity integrate the set of correlated and T. Yu, “Differentially private spatial
attributes. Bucketization doesn’t hinder level based decompositions,” in Proc. of IEEE ICDE. IEEE, 2012,
diversity and it is possible to gain information about pp. 20–31.
sensitive attribute as long as information about global [5]. R. K. Ganti, N. Pham, Y.E. Tsai, and T. F. Abdel
distribution. zaher, “Poolview: stream privacy for grassroots
participatory sensing,” in Proceedings of ACM
In slicing, to protect membership disclosure we use SenSys. ACM, 2008, pp. 281–294.
column generalization and extremely fit attributes are [6]. J.Chen, H.Ma, D.S.Wei, and D.Zhao, “Participant-
placed in the same column after attribute partitioning. Two density-aware privacypreserving aggregate statistics
type’s data structures are used for tuple partitioning for mobile crowd-sensing,” in Proc. of IEEE ICPADS.
Buckets queue IEEE, 2015, pp. 140– 147.
Set of sliced buckets [7]. L. Fan and L. Xiong, “An adaptive approach to real-
time aggregate monitoring with differential
At first, only one bucket in the queue which contains privacy,”IEEE Transactions on Knowledge and
tuples and the sliced buckets are empty. For each execution Data Engineering, vol. 26, no. 9, pp. 2094–2106,
of algorithm, buckets are removed from the queue and 2014.
splits into two buckets. The two buckets are placed at the
end of queue if sliced table satisfy l-diversity. Else it does
not split bucket and place in the sliced bucket. The sliced
table is calculated when queue empty.