0% found this document useful (0 votes)
53 views4 pages

Protecting Privacy When Disclosing Information: K Anonymity and Its Enforcement Through Suppression

Abstract - Anonymization means to remove personal identifier or converted into non readable form by human to protect private or personal information. Data anonymization can be performed in different ways but in this paper k-anonymization approach is used. Suppose one person A having his own k-anonymous database and needs to determine whether database is still k-anonymous if tuple inserted by another person B. For some applications (for example, Student’s record), database needs to be confident.

Uploaded by

iir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views4 pages

Protecting Privacy When Disclosing Information: K Anonymity and Its Enforcement Through Suppression

Abstract - Anonymization means to remove personal identifier or converted into non readable form by human to protect private or personal information. Data anonymization can be performed in different ways but in this paper k-anonymization approach is used. Suppose one person A having his own k-anonymous database and needs to determine whether database is still k-anonymous if tuple inserted by another person B. For some applications (for example, Student’s record), database needs to be confident.

Uploaded by

iir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

Integrated Intelligent Research (IIR) International Journal of Computing Algorithm

Volume: 01 Issue: 01 June 2012 Page No.19-22


ISSN: 2278-2397

Protecting Privacy When Disclosing Information:


K Anonymity and its Enforcement through
Suppression
V.Khanaa,1 K.P.Thooyamani2
1
Dean, Centre for Information, Bharath University, Chennai
2
Vice Chancellor, Bharath University, Chennai
Email: [email protected], [email protected]

Abstract - Anonymization means to remove personal identifier conceptual, query restriction, data perturbation, and output
or converted into non readable form by human to protect perturbation. Criteria for evaluating the performance of the
private or personal information. Data anonymization can be various security-control methods are identified. A detailed
performed in different ways but in this paper k-anonymization comparative analysis of the most promising methods for
approach is used. Suppose one person A having his own k- protecting dynamic-online statistical databases is also
anonymous database and needs to determine whether database presented. To date no single security-control method prevents
is still k-anonymous if tuple inserted by another person B. For both exact and partial disclosures. There is big concern for
some applications (for example, Student’s record), database privacy. The problem of statistical disclosure control revealing
needs to be confidential, So access to the database is strictly accurate statistics about a population while preserving the
controlled. The confidentiality of the database managed by the privacy of individuals has a vulnerable history. Still, there is a
owner is violated once others have access to the contents of the difference between confidentiality and privacy- Confidentiality
database. Thus, Problem is to check whether the database refers to limiting information access and disclosure to
inserted with the tuple is still k-anonymous without letting the authorized users and preventing access by or disclosure to
owner A and others (B) to know the content of the tuple and unauthorized users. Privacy refers to limiting access to
database respectively. In this paper, we propose a protocol individuals' personal information [5].
solving this problem on suppression based k-anonymous and
confidential database. Question is Confidentiality is still required if data have been
anonymized-yes because anonymous data have business value
I. INTRODUCTION for the party owning database or unauthorized disclosure of
anonymous data may damage the party owning the data. There
For each application database is important or valuable thing, so have been lots of techniques developed to protect privacy, but
their security is important. There are different security control here we proposed k-anonymization [4]. K-Anonymity refers to
methods are identified and each method have different criteria. attributes are suppressed or generalized until each row is
For example, FERPA provides privacy protections for such identical with at least k-1 other rows. At this point the database
records when held by federally funded educational is said to be k-anonymous. K-Anonymity prevents definite
institutions[1]. FERPA defines an education record as those database linkages. The modification of the anonymous
records, files, documents, and other materials that contain database DB can be naively performed as follows: the party
information directly related to a student and are maintained by who is managing the database or the server simply checks
an educational agency or institution or by a person acting for whether the updated database DB is still anonymous. Under
such agency or institution. Students who are at least 18 years of this approach, the entire tuple t has to be revealed to the party
age, or attending postsecondary institutions or otherwise their managing the database server, thus violating the privacy of the
parents ,generally have a right to gain access to their education patient. Another possibility would be to make available the
records within 45 days of a written request, seek to amend any entire database to the patient so that the individual can verify if
information therein considered to be in error, control how the insertion of the data violates their own privacy. This
information in such records is disclosed to other institutions ,in approach however, requires making available the entire
general, such disclosures must be authorized by the student or database to the patient thus violating data confidentiality.
parent, with some exceptions and complain to the US There is a protocol solving this problem on suppression-based
Department of Education if these rights appear to have been k-anonymous and confidential databases.
violated. There is problem of providing security to statistical
databases against disclosure of confidential information. There The protocol depends on well-known cryptographic
is various security control method classified into four groups: assumptions. The huge numbers of databases recording a large
variety of information about individuals makes it possible to
19
Integrated Intelligent Research (IIR) International Journal of Computing Algorithm
Volume: 01 Issue: 01 June 2012 Page No.19-22
ISSN: 2278-2397

find information about specific individuals by simply assume that information of single student stored in a tuple and
correlating all the available databases. Confidentiality is database kept confidential at server. The users treated as
achieved by enforcing an access policy, or possibly by using database of educational record, only institution have right to
some cryptographic tools.
access to database. Since database is anonymous, the data
provider’s privacy is protected from users. Since database have
Privacy relates to data can be safely disclosed without leaking
privacy sensitive data, so main aim to protect privacy of
sensitive information regarding the legitimate owner. The
student’ data. This can be achieved by anonymization. If
confidentiality is still required once data have been
database is anonymous, it is not possible to catch student’s
anonymized, if the anonymous data have a business value for
identity for database. Suppose new student has to be entered,
the party owning them or the unauthorized disclosure of such
this means database has to be updated in order to insert a tuple.
anonymous data may damage the party owning the data or
The modification of anonymous database can be done as
other parties. So, problem is that can database owner assure
follows: the party who is managing the database checks
privacy of database without knowing data to be inserted? It is
whether the updated database is still anonymous after inserting
important to assure that database maintains privacy of
a tuple. Under this approach, the entire tuple t has to be
individual and also who maintain database. So, it needs to
revealed to the party managing the database server, thus
check that data entered in database do not violate privacy, and
violating the privacy of the student. Another possibility would
to perform such verification without seeing sensitive
be to make available the entire database to the student so that
information of individual.
the student can verify by himself/herself if the insertion of
his/her data violates his/her own privacy. To get solution of
II. KEY CHALLENGES
these problem, several problem needs to be addressed: The first
problem is: without revealing the contents of tuple t to be
There are some limitations of the protocol, if the database is
inserted and database DB, how to preserve data integrity by
not anonymous with respect to a tuple that has been inserted,
establishing the anonymity of DB U {t}.The second problem
the insertion cannot be performed. Therefore, one of the
is: once such anonymity is established, how to perform this
protocols is extremely inefficient. There are efficient protocols.
update?
The first research is based on algorithms for database
anonymization. The database is protected by data reduction,
data perturbation or generating synthetic data. However, the
main concept of k-anonymity to maintain confidentiality of
their contents. The problem is to protect privacy of data that
has been divided into two groups depending on whether data
are continuously released and anonymized or data released in
different fashion and anonymized. The second research is
related to Secure Multiparty Computation protocol which is
subfield of cryptography. The third research is related to the
private information retrieval, which can be seen as an
application of the secure multiparty computation techniques to
the area of data management. This allows a user to retrieve an
data (or tuple) from database without revealing tuple one is
retrieving. Here, main focus is to find efficient techniques to The third problem is: what can be done if database anonymity
express queries over a database without letting the database is not preserved? Finally, the forth problem is: what is the
know the actual queries [2]. Still, the problem of private initial content of the database, when no data about users has
updation of database has not been resolved because these been inserted yet? In this paper, we propose a protocol solving
techniques deal with only data retrieval. These approaches that first Problem, which is the central problem addressed by our
will not address the problem of k-anonymity since their goal is paper.
to encrypt the data hence external entities can obtain their data.
Thus, the main goal is to protect the confidentiality of the data III. DEVELOPMENT OF LINGUISTIC
from the external entities that manages the data. Even though, RESOURCE
the data are fully available to the clients that are not the case
under our approach. In data anonymization, Insertion cannot be The protocol relies on the fact that anonymity of database does
performed if database is not properly anonymized. The not affected, if inserting tuple is already in database. Then, the
problem is private updates to k-anonymous databases The problem of integrity while inserting tuple in database is
suppression based protocols deals with the problem of updating equivalent to privately checking of inserting tuple with tuple
the databases. Figure 1 shows Anonymous database system, we already in database. The protocol is aimed at suppression based
20
Integrated Intelligent Research (IIR) International Journal of Computing Algorithm
Volume: 01 Issue: 01 June 2012 Page No.19-22
ISSN: 2278-2397

anonymous database and it allows the owner of database to 19/4/82 male 02237
properly anonymize the tuple t, without gaining any useful TABLE 2 Suppressed Data with k=2
knowledge on its contents and without having to send to its
Birth date Sex Zip code
owner newly generated data. To achieve such goal, the parties
*/1/79 person 5****
secure their messages by encrypting them. To assure higher
*/1/79 person 5****
level of anonymity to the party inserting a tuple ,we require */*/8* male 022**
that the communication between the party and database occurs */*/8* male 022**
through anonymous connection, as provided by crowd
protocol[3]. Crowd protocol hides each user's communications
As shown in table 1 which contains original database (Table T)
by routing them randomly within a group of similar users. In
having three attributes Birth date, Sex, Zipcode. Table 2 shows
order to perform the privacy-preserving verification of the
a suppression based k-anonymization with k=2.As shown in
database anonymity upon the insertion, the parties use a
table k=2 means at least k(=2) tuples should be
commutative and holomorphic encryption scheme.
indistinguishable by masking values. Suppression based
attributes for every tuple of T is referred as anonymization
problem, and finding the anonymization that minimizes the
number of masked values.

A. Cryptography Primitive

The Diffie Hellmen key exchange algorithm allows two users


to establish shared secret key over insecure communication
without having any prior knowledge. Here, Diffie Hellmen is
used to agree on shared secret key to exchange data between
two parties. AES(Advanced Encryption Standard) algorithm is
Prototype Architecture
the advanced encryption standard form of algorithm which had
In the above figure, Data provider enters data is stored in been used as a symmetric form of encryption. There are two
crypto module which perform cryptography operation on all encryption schemes, commutative and product homomorphic E
tuples exchanged between user and Private updater, using to satisfy indistiguishability properly.. A commutative,
suppression based method. Loader module read anonymized product- homomorphic encryption scheme ensures that the
tuples from k-anonymous database. And checker module order in which encryptions are preformed is
checks whether the tuple from the user matches with the tuple irrelevant(commutativity) and it allows to consistently perform
in the database. If none of the tuple matches with the user arithmetic operations over encrypted data (homomorphic
tuple, then loader reads another tuple from k-anonymous property). Given a finite set K of keys and a finite domain D, a
database. The functionality provided by the Private Checker. commutative, product homomorphic encryption scheme E is a
Communication between user and database is carried out by polynominal time computable function E: K×D→D satisfying
anonymizer and that all the tuples are encrypted. the following properties:

IV. SUPRESSION BASED PROTOCOL 1. Commutativity:

The suppression based protocol relies on well-known In commutative, all key pairs 𝑘1,2∈𝑘 and value 𝑑∈𝐷, the
cryptographic techniques. We consider table T={t1,…t2} over following equality holds: 𝐸𝑘1 𝐸𝑘2 𝑑 =𝐸𝑘2(𝐸𝑘1(𝑑))
the attribute set A. Generally in suppression based method we
mask value of some special attributes with *, the value 2. Product-homomorphism:
deployed by the user for anonymization. So the main idea In product homomorphic every 𝑘∈𝑘and every value pairs
behind this protocol is: To form subset of indistinguishable 𝑑1,2∈𝐷 the following equality holds: 𝐸𝑘 𝑑1 ∙𝐸𝑘 𝑑2
tuples by masking the value of some well chosen attributes. =𝐸𝑘(𝑑1∙𝑑2)

TABLE 1 Original Dataset 3. Indistinguishability:

Birth date Sex Zip code It is infeasible to distinguish an encryption from a randomly
21/1/79 male 53715 chosen value in the same domain and having the same length.
10/1/79 female 55410 The advantages are high privacy of data even after updation,
21/2/83 male 02274

21
Integrated Intelligent Research (IIR) International Journal of Computing Algorithm
Volume: 01 Issue: 01 June 2012 Page No.19-22
ISSN: 2278-2397

and an approach that can be used is based on techniques for Computation Prevention for preserving privacy during Data Mining,
Vol. 3,No. 1,2009
user anonymous authentication and credential verification.
[7] C. Blake and C. Merz, ―UCI Repository of Machine Learning
B. Algorithm Databases, E. Bertino and R. Sandhu, ―Database Security— Concepts,
Approaches and Challenges, IEEE Trans. Dependable and Secure
Suppose Alice has control over database and Bob is data Computing, vol. 2, no. 1, pp. 2-19, Jan.-Mar. 2005.
[8] www.wikipedia.com/wikifiles/.
provider then protocol works as follows: In step 1, Alice sends
Bob encrypted version of tuple containing only non suppressed
attributes. At step 2, bob encrypts the information received
from Alice and sends it to her, along with encrypted version of
each value in his tuple. In final step, Alice examines if the
suppressed attributes of tuple is equal to the tuple sent by Bob.
If yes then insert tuple in database.

V. RELATED WORK

In this paper, we have proposed secure protocol for privately


checking whether a k-anonymous database retains anonymity
once a new tuple is being inserted to it. Since the proposed
protocol ensures the updated database remains k-anonymous.
Thus the database is updated properly using the proposed
protocol. The data provider’s privacy cannot be violated if user
update a table. If updating any record in database violate the
k- anonymity then such updating or insertion of record in table
is restricted. If insertion of record satisfies the k-anonymity
then such record is inserted in table and suppressed the
sensitive information attribute by * to maintain the k-
anonymity in database. Thus by making such k-anonymity in
table that makes unauthorized user to difficult to identify the
record. The important issues for future work are as follows:

 Improve the efficiency of protocols, by the number of


messages exchanged and sizes and algorithm used for
encryption and decryption.
The private update to database systems techniques supports
notions of anonymity different than k-anonymity.
 In the case of malicious parties by the introduction of an
untrusted third party, implementing a real-world
anonymous database system.

REFERENCE
[1] U.S. Department of Education. General Family Educational Rights and
privacy Act (FERPA).
[2] B.C.M. Fung ,K. Wang, A.W.C. Fu and J. Pei, Anonymity for
Continuous Data Publishing Proc. Extending database Technology
Conference (EDBT),2008
[3] M.K.Reiter, A. Rubin. Crowds: anonymity with Web transctions.ACM
Transactions on Information and System Security
(TISSEC),1(1),1998;66-92
[4] P. Samarati. Protecting respondent’s privacy in micro data release, IEEE
Transactions on Knowledge and Data Engineering vol. 13,no. 6,pp.1010-
1027,Nov/Des.2001
[5] University of Miami Leonard M.Miller School of Medicine, Information
Technology.
[6] Dr. Durgesh Kumar Mishra, Neha Koria, Nikhil Kapoor, Ravish
Bahety ,A Secure Multi-Party Computation Protocol for Malicious

22

You might also like