Privacy Preservation by Anonymization Method Accomplishing Concept of Hierarchical Clustering and DES: A Propose Study
Privacy Preservation by Anonymization Method Accomplishing Concept of Hierarchical Clustering and DES: A Propose Study
Abstract— Data mining has been substantially studied and In this paper we use anonymization technique along with
useful into numerous fields which include the Internet of Things hierarchical clustering and DES algorithm.
(IoT) and the business growth. However, data mining approaches
also take place serious challenges due to enlarged sensitive A. Anonymization-
information disclosure and the violation of privacy. Privacy-
Preserving Data Mining also called (PPDM), as an essential To guard identity of the individual when release of sensitive
branch of the data mining and an exciting topic in privacy information is done , holders of data often being encrypt or
preservation, has gain particular attention in current years. This eliminate explicit identifiers, like names and the unique
discussion describes the privacy concern that occurs due to data security no: . However, data which is unencrypted provides no
mining, particularly for the national security applications. We assurance for anonymity. To preserve privacy, model of k-
discuss privacy-preserving data mining by Anonymization anonymity has been proposed by the Sweeney [2] which
Method in which we use hierarchical clustering in order to divide achieve k anonymity by means of generalization and the
the given data and DES algorithm for encryption of data in order
suppression [2], K-anonymity, it is difficult for an imposter to
to prevent sensitive data from attacker.
decide the identity of the individuals in collection of data set
Keywords—Privacy Preservation, PPDM, Hierarchical containing personal information. Each let go data contain
Clustering, Anonymization, Data Encryption Standard. every combination of the values of the quasi identifiers and
which is matched indistinctly to at smallest amount of k-1
I. INTRODUCTION respondents. For ex, the age of person might be generalized to
a variety like youth, middle age and the adult with no
There been much interest in the recent on applying the specifying suitably, so as to decrease the threat of the
data mining for applications of counter terrorism . For ex, identification. [2] Suppression involves decrease the exactness
data mining used to detect patterns which are unusual , of the applications and it doesn’t liberate some information
terrorist activities and the fraudulent behavior. While all
.By using this method it reduces the risk of detecting exact
applications of data mining can give profit to the humans
information.
and save lives, there also negative side to this type of method
, it could be a danger to the individuals privacy . This is due to
data mining tools are present on the Web or, and even naive B. Hierarchical Clustering algorithm –
individuals can use these tools to mine information from
It builds a hierarchical breakdown of given set of the data
stored data in various databases and files, and consequently
violate the privacy of individuals. As we have stressed in objects. It can either be agglomerative or a divisive, based on
papers to take out efficient data mining and mine necessary how hierarchical breakdown can be formed. The
information for counter terrorism and national security, we agglomerative technique which is (bottom-up approach) start
gather all kinds of information about individuals [1]. with each one of the object forming a group which are
However, this information could be a threat to individuals’ separate. It successively merges the substance or the groups
privacy and civil liberties. Privacy is receiving more attention those are near to one another, awaiting all of groups are
partly as of counter-terrorism and the national security. At combined into the single or until a condition of termination get
present we have heard so much about national security in on holds. [3] The divisive technique (top-down approach)
media. This is mainly because people are now realizing that to starts with every part of the objects in the similar cluster. In
handle terrorism, the government may need to collect each of the consecutive iteration, a cluster is being come
information about individuals. This is causing major concern apart into smaller clusters, till finally every one object is in
with different civil liberties unions. The aim is to carry out particular cluster, or till condition of termination get on holds.
data mining and yet to the maintain privacy. This topic is Hierarchical methods undergo from the fact that just the once
known as privacy-preserving data mining . a step (merge or split) is being done, it not at all be undone.
955
International Conference on Current Trends in Computer, Electrical, Electronics and Communication (ICCTCEEC-2017)
956
International Conference on Current Trends in Computer, Electrical, Electronics and Communication (ICCTCEEC-2017)
FLOW DIAGRAM-
Start
Input dataset
Anonymization algorithm
Error rate
End in base 16.00 16.50 16.00 16.00 16.20
results
Error rate
in proposed 4.00 2.00 2.00 2.50 3.00
Procedure – results
Step1 - Consider the dataset for input.
Table II Describes the error rate between Base method results
Step2 - Apply anonymization technique to that particular
and Propose method results.
dataset.
Step3 - Hierarchical clustering technique is used to partition
the data sets into clusters.
Step4 – DES encryption technique is used to suppress the data
values.
Step5 – Final result obtained by union of lhs and rhs values
formed by anonymization technique.
TABLE I
Accuracy in Above graph represents the analysis of error rate in base and
84.00 83.50 84.00 84.00 83.00 propose method results and concluded that propose method is
base results
best for preserving privacy as the error rate in propose method
Accuracy in is less than the error rate in base method.
proposed 96.00 98.00 98.00 97.500 97.00
results
957
International Conference on Current Trends in Computer, Electrical, Electronics and Communication (ICCTCEEC-2017)
Conclusion References
The privacy renovation for data analysis is a challenging
[1] Bhavani Thuraisingham,” Privacy-Preserving Data Mining:
studies difficulty because of increasingly larger volumes of Developments and Directions”, IDEA GROUP PUBLISHING, Journal
data sets, thereby requiring in depth research. Each privacy of Database Management, 16(1), 75-87, Jan-March 2005 77.
preserving technique has its own importance. Data encryption [2] Pingshui WANG,” Survey on Privacy Preserving Data Mining”,
and anonymization are widely adopted ways to combat International Journal of Digital Content Technology and its
privacy breach. However, encryption is not suitable for data Applications, Volume 4, Number 9, December 2010.
that are processed and shared. Anonymizing huge data and [3] J. W. Han and M. Kamber, “Data Mining: Concepts and Techniques,” 2
dealing with anonymized data sets are nonetheless challenges nd Edition, China Machine Press, Beijing, 2006.
for classic anonymization processes. Privacy- preserving data [4] Kiran Israni, Shalu Chopra,” Survey on Anonymization Technique for
Privacy Preserving Data Mining (PPDM)”, International Journal of
mining is emerged for to 2 critical desires: data analysis with a Innovative Research in Computer and Communication Engineering, Vol.
purpose to deliver better services and making sure the privacy 4, Issue 11, November 2016, ISSN(Online): 2320-9801.
rights of the data owners. Substantial efforts have been [5] Asmaa H.Rashid and Prof.dr. Abd-Fatth Hegazy, “Protect Privacy of
accomplished to address these needs. Medical Informatics using K-Anonymization Model”, IEEE Explore.
[6] Yan Zhao, Ming Du, Jiajin Le, Yongcheng Luo, “A Survey on Privacy
The results of our proposed work shows that by doing Preserving Approaches in Data Publishing”, First International
hierarchical clustering and encrypting the data using DES Workshop on Database Technology and Applications, 2009.
method we can achieve more preservation of privacy. [7] Samarati P, “Protecting respondent’s privacy in Microdata release”,
IEEE Transactions on Knowledge and Data Engineering, 13:1010–1027.
[8] Sweeney L, “k-anonymity: A model for protecting Privacy”,
International Journal on Uncertainty, Fuzziness and Knowledge-based
Systems, 10(5):557–570.
[9] Tiancheng Li, Ninghui Li, “Towards Optimal k-anonymization”, Data &
Knowledge Engineering, 2008 Elsevier.
[10] Nissim Matatov, Lior Rokach, Oded Maimon, “Privacy-preserving data
mining: A feature set partitioning approach”, Information Sciences 180
(2010) 2696–2720.
[11] Benjamin C. M. Fung, Ke Wang, Lingyu Wang, Patrick C.K. Hung,
“Privacy-preserving data publishing for cluster analysis” , Data &
Knowledge Engineering 68 (2009) 552–575.
[12] Dan Zhu, Xiao-Bai Li, Shuning Wu, “Identity disclosure protection: A
data reconstruction approach for privacypreserving data mining”,
Decision Support Systems 48 (2009) 133–140.
958