0% found this document useful (0 votes)
31 views

Crowd-Sourced Data Publishing

This paper outlines statistical publishing of real time data with strong protection. A person has sensitive, non-sensitive data which needs protection when publishing it to public. For example health data, it includes disease nature, health condition, medicine prescription, patient name and details these are sensitive data which may be leaked when publishing. In this, certain techniques are applied when transmitting to public here introduces distributors and agents. Propose a privacy preserving
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Crowd-Sourced Data Publishing

This paper outlines statistical publishing of real time data with strong protection. A person has sensitive, non-sensitive data which needs protection when publishing it to public. For example health data, it includes disease nature, health condition, medicine prescription, patient name and details these are sensitive data which may be leaked when publishing. In this, certain techniques are applied when transmitting to public here introduces distributors and agents. Propose a privacy preserving
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Volume 5, Issue 3, March – 2020 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Crowd-Sourced Data Publishing


Using Untrusted Server
Laya Chacko1 Bibin Varghese2
Computer Science and Engineering Computer Science and Engineering
Mount Zion College of Engineering Pathanamthitta Mount Zion College of Engineering, Pathanamthitta

Smita C Thomas3
Computer Science and Engineering
Mount Zion College of Engineering, Pathanamthittta

Abstract:- This paper outlines statistical publishing of representative for transmit the data to it using broadcast
real time data with strong protection. A person has mechanism. Along with ꞷ-event privacy and ɛ-differential
sensitive, non-sensitive data which needs protection privacy several other techniques are introduced level based,
when publishing it to public. For example health data, it multi-set generalization, K-anonymity methods for privacy
includes disease nature, health condition, medicine preservation.
prescription, patient name and details these are
sensitive data which may be leaked when publishing. In II. RELATED WORK
this, certain techniques are applied when transmitting
to public here introduces distributors and agents. Protection on statistical data publishing proposed
Propose a privacy preserving framework called privacy several techniques. Hong et al. [3] proposed K-anonymity
preserving distributed agent P2DA. Various techniques in publishing users logs. Winslett et.al. [1] realized one-
are included for prevent information loss they are level- time statistical data release for differentially private data
based, multi-set generalization, K-anonymity. publishing. Montjoye et al. [4], proposed differentially
private spatial decomposition techniques to partition the
Keywords:- Differential Privacy, Level-Based, Multi-Set space. Social network data publishing disclosure problem
Generalization, K-Anonymity, Data Publishing. address using randomized perturb method. Ganti et al. [5]
proposed privacy preserving time series data for statistics
I. INTRODUCTION publishing. Xiong et.al. [7], proposed a real time
monitoring of differential privacy using adaptive approach.
Explosive growth in volume and variety of data’s
generated due to the various applications. Various agencies Chen et al. [6] proposed a participant-density-aware
such as government and several organisation publish data privacy-preserving aggregate statistics scheme, making use
for research and data mining purposes. These data’s are of multi-pseudonym mechanism. X. Lu [2], proposed a
stored in table as rows and columns. Crowd sourced data spatial-temporal crowd-sourced data publishing in real time
from millions of users used for discover valuable with differential privacy.
information. However, data publishing is a risky factor that
linked back to the individuals. In digitalised world, privacy III. PROPOSED SYSTEM
of individuals is a challenging factor. To protect against
various privacy threats introduce several privacy The volcanic growth of data from various applications
preservation techniques. are collected by various data collectors. The finest example
is healthcare organization. The healthcare organization can
The existing privacy preservation of data publishing use statistical analysis to extract valuable information about
are ɛ-differential privacy and ꞷ- event privacy. ꞷ-event its patients for research purposes and doctor’s has able to
privacy, which protects event sequence in successive time retrieve patient’s medical records. Different organizations
stamps and ɛ-privacy provide one time statistical data done the research, and the control over the individual data
publishing. Traditional system follows trusted server for is hard to enforce. Also it will be used in a wrong way.
data collection and publishing. These trusted server maybe
hacked and becomes unsecure. Then the identity and The proposed system consist of two section: the agent
sensitive information be exposed. and the distributor. Here, the distributor registers the agent
and also determines priority of each agent during the time
During this paper, focus on crowd-sourced real time of registration. Each of the individual has its own priority
data publishing. A distributed agent based privacy that assigned at the time of registration and distributor
preservation for data publishing using untrusted server. In access request from registered agents. The agent is in
this multiple agents are introduced between the users and between user and the distributor. The agent who has direct
untrusted server. A users can randomly select one connection with the untrusted server and the requested

IJISRT20MAR146 www.ijisrt.com 799


Volume 5, Issue 3, March – 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
database by the agent is specify. Next, tabulate the sensitive IV. CONCLUSION
and non-sensitive attributes. The related rule set is applied
based on the level assigned by the distributor to each agent. The proposed software can be placed in any firm and
The rule sets are depict on the basis of the degree of data can hold any databases. This proposed software focus on
hidden to various type of agent. Hence the transformed security and privacy preservation. Different privacy
dataset relay using routing mechanism and will reach to the preservation rulesets are apply based on priority level.
agent who requests the dataset. Compared to the existing system, introduce multiple agents
in between user and server. i.e, agent based privacy
Datas stored in the database table and the row framework for avoiding privacy leakage. The
corresponds to one individual. Attributes in this table implementation of level based privacy method, multiset
classified into: sensitive attributes, attributes easily identify method and K-anonymity reduces the information loss
individual and the values taken together identify an issue.
individual. Different agencies and organisation release data
for various data mining purpose. REFERENCES

Anonymization techniques like generalisation, [1]. J. Xu, Z. Zhang, X. Xiao, Y. Yang, G. Yu, and M.
bucketization, and slicing. The first and foremost step is Winslett, “Differentially private histogram
attribute partitioning and next is removing of explicit publication,” The VLDB Journal, vol. 22, no. 6, pp.
identifiers. Generalisation which replaces quasi-identifiers 797– 822, 2013.
values with less specific values. However, k-anonymity [2]. Q. Wang, Y. Zhang, X. Lu, Z. Wang, Z. Qin, and K.
protects the identification disclosure and generalisation has Ren, “Rescuedp: Realtime spatio-temporal crowd-
considerable amount of information loss. A new level of sourced data publishing with differential privacy,” in
privacy was introduced for address homogeneity attack Proc. of IEEE INFOCOM, 2016, pp. 1–9.
called level based-diversity (l-diversity) there must be “l” [3]. Y. Hong, X. He, J. Vaidya, N. Adam, and V. Atluri,
well represented values for the sensitive attributes. Records “Effective anonymization of query logs,” in Proc. of
are allotted based on the count of sensitive attributes ACM CIKM. ACM, 2009, pp. 1465–1468.
happening and group the similar records and analyse it. [4]. G. Cormode, C. Procopiuc, D. Srivastava, E. Shen,
After check the diversity integrate the set of correlated and T. Yu, “Differentially private spatial
attributes. Bucketization doesn’t hinder level based decompositions,” in Proc. of IEEE ICDE. IEEE, 2012,
diversity and it is possible to gain information about pp. 20–31.
sensitive attribute as long as information about global [5]. R. K. Ganti, N. Pham, Y.E. Tsai, and T. F. Abdel
distribution. zaher, “Poolview: stream privacy for grassroots
participatory sensing,” in Proceedings of ACM
In slicing, to protect membership disclosure we use SenSys. ACM, 2008, pp. 281–294.
column generalization and extremely fit attributes are [6]. J.Chen, H.Ma, D.S.Wei, and D.Zhao, “Participant-
placed in the same column after attribute partitioning. Two density-aware privacypreserving aggregate statistics
type’s data structures are used for tuple partitioning for mobile crowd-sensing,” in Proc. of IEEE ICPADS.
 Buckets queue IEEE, 2015, pp. 140– 147.
 Set of sliced buckets [7]. L. Fan and L. Xiong, “An adaptive approach to real-
time aggregate monitoring with differential
At first, only one bucket in the queue which contains privacy,”IEEE Transactions on Knowledge and
tuples and the sliced buckets are empty. For each execution Data Engineering, vol. 26, no. 9, pp. 2094–2106,
of algorithm, buckets are removed from the queue and 2014.
splits into two buckets. The two buckets are placed at the
end of queue if sliced table satisfy l-diversity. Else it does
not split bucket and place in the sliced bucket. The sliced
table is calculated when queue empty.

Slicing has the ability to handle high dimensional data


by splitting into columns and slicing reduces data
dimensionality. Each column viewed as sub-table with
lower dimensionality and these sub-tables are linked by the
buckets in slicing.

IJISRT20MAR146 www.ijisrt.com 800

You might also like