Вы находитесь на странице: 1из 4

International Conference on Current Trends in Computer, Electrical, Electronics and Communication (ICCTCEEC-2017)

Privacy Preservation by Anonymization Method


Accomplishing Concept of Hierarchical Clustering
and DES: A Propose Study
Jeetendra Mittal Dr. Akash Saxena (Prof.)
Department of Computer Science Department of Computer Science
Compucom Institute of Technology & Management Compucom Institute of Technology & Management
Jaipur, India Jaipur, India
jeetendramittal1983@gmail.com akash27saxena@gmail.com

Abstract— Data mining has been substantially studied and In this paper we use anonymization technique along with
useful into numerous fields which include the Internet of Things hierarchical clustering and DES algorithm.
(IoT) and the business growth. However, data mining approaches
also take place serious challenges due to enlarged sensitive A. Anonymization-
information disclosure and the violation of privacy. Privacy-
Preserving Data Mining also called (PPDM), as an essential To guard identity of the individual when release of sensitive
branch of the data mining and an exciting topic in privacy information is done , holders of data often being encrypt or
preservation, has gain particular attention in current years. This eliminate explicit identifiers, like names and the unique
discussion describes the privacy concern that occurs due to data security no: . However, data which is unencrypted provides no
mining, particularly for the national security applications. We assurance for anonymity. To preserve privacy, model of k-
discuss privacy-preserving data mining by Anonymization anonymity has been proposed by the Sweeney [2] which
Method in which we use hierarchical clustering in order to divide achieve k anonymity by means of generalization and the
the given data and DES algorithm for encryption of data in order
suppression [2], K-anonymity, it is difficult for an imposter to
to prevent sensitive data from attacker.
decide the identity of the individuals in collection of data set
Keywords—Privacy Preservation, PPDM, Hierarchical containing personal information. Each let go data contain
Clustering, Anonymization, Data Encryption Standard. every combination of the values of the quasi identifiers and
which is matched indistinctly to at smallest amount of k-1
I. INTRODUCTION respondents. For ex, the age of person might be generalized to
a variety like youth, middle age and the adult with no
There been much interest in the recent on applying the specifying suitably, so as to decrease the threat of the
data mining for applications of counter terrorism . For ex, identification. [2] Suppression involves decrease the exactness
data mining used to detect patterns which are unusual , of the applications and it doesn’t liberate some information
terrorist activities and the fraudulent behavior. While all
.By using this method it reduces the risk of detecting exact
applications of data mining can give profit to the humans
information.
and save lives, there also negative side to this type of method
, it could be a danger to the individuals privacy . This is due to
data mining tools are present on the Web or, and even naive B. Hierarchical Clustering algorithm –
individuals can use these tools to mine information from
It builds a hierarchical breakdown of given set of the data
stored data in various databases and files, and consequently
violate the privacy of individuals. As we have stressed in objects. It can either be agglomerative or a divisive, based on
papers to take out efficient data mining and mine necessary how hierarchical breakdown can be formed. The
information for counter terrorism and national security, we agglomerative technique which is (bottom-up approach) start
gather all kinds of information about individuals [1]. with each one of the object forming a group which are
However, this information could be a threat to individuals’ separate. It successively merges the substance or the groups
privacy and civil liberties. Privacy is receiving more attention those are near to one another, awaiting all of groups are
partly as of counter-terrorism and the national security. At combined into the single or until a condition of termination get
present we have heard so much about national security in on holds. [3] The divisive technique (top-down approach)
media. This is mainly because people are now realizing that to starts with every part of the objects in the similar cluster. In
handle terrorism, the government may need to collect each of the consecutive iteration, a cluster is being come
information about individuals. This is causing major concern apart into smaller clusters, till finally every one object is in
with different civil liberties unions. The aim is to carry out particular cluster, or till condition of termination get on holds.
data mining and yet to the maintain privacy. This topic is Hierarchical methods undergo from the fact that just the once
known as privacy-preserving data mining . a step (merge or split) is being done, it not at all be undone.

978-1-5386-3243-7/17/$31.00 ©2017 IEEE.

955
International Conference on Current Trends in Computer, Electrical, Electronics and Communication (ICCTCEEC-2017)

C. DES(Data Encryption Standard)- their approaches produce k–anonymization with less


Data Encryption Standard also called as (DES) algorithm has generalization compared to previous approaches. They
been all the rage secret key encryption algorithm and it is conclude that a bottom-up approach for k – anonymization is
preferable for small number of quasiidentifying attributes.
taken in use in lots of commercial and the financial
applications. Though it was introduced in the year 1976, it has The k anonymity based method is illustrated in [10] is used
established resistant to all the forms of cryptanalysis. In to search for optimal feature set partitioning and [11] for
addition, DES is a block cipher algorithm which means that it cluster analysis. And [12] Proposes a data reconstruction
takes a fixed length of the message and encrypts it (encrypts approach to obtain k-anonymity safety in predictive data
the block), and returns a string in the same size. DES is the mining. In this method the probably identifying attributes are
first encryption algorithm recommended by NIST (National first mapped the usage of aggregation for numeric data and
Institute of Standards and Technology). swapping for nominal data. A genetic set of rules technique is
then implemented to the masked data to find a correct subset
of it.
II. LITERATURE REVIEW
There has been a surge in current studies activity inside the III. PROPOSED METHOD
place of PPDM and several papers had been published on The proposed algorithm is an attempt to present a new
diverse elements of PPDM. In this section, we discuss a approach for complex encrypting and decrypting data based
number of the preceding work closely associated with the on parallel programming in such a way that the new method
studies supplied on this paper and provide references to can make use of multiple- core processor to acquire higher
current manuscripts that cowl the state of the art within the
speed with better degree of protection.
area.
There are various PPDM techniques such as
anonymization, perturbation, randomization, condensation, ALGORITHM-
cryptography. In this paper we have reviewed anonymization
technique of PPDM such as k anonymity using generalization 1. Partition of given dataset by using hierarchical
and suppression, p sensitive k anonymity, (α, k) anonymity, l clustering.
diversity, m-invariance [4]. Given
Sometimes the data should be publically published in its A set X of objects{X1….. , Xn}
original form. Even though it is not encrypted and perturbed, A distance function dist(c1,c2)
some sort of precaution should be implemented before For i =1 to n
releasing the data in terms of anonymization. This is a kind of Ci ={Xi}
generalization of some attributes which protects against end for
identity disclosure. Anonymization can be obtained via C ={ c1….,c2}
methods inclusive of generalization, suppression, data l = n+1
removal, permutation, swapping and so on [5]., while C.size >1 do
Anonymization can be obtained through techniques inclusive
of generalization, suppression, data removal, permutation, -(Cmin1, Cmin2)= minimum dist(Ci,Cj) for all
swapping and so forth [5].
Ci,Cj in C
k-anonymity method is treated as the classical
- Remove Cmin1 and Cmin2 from C
anonymization method and most of the studies are based on k-
anonymity. The others are based on its improved methods like - Add {Cmin1,Cmin2} to C
l-diversity, t-closeness, km -anonymization, (α,k) anonymity,
p-sensitive k-anonymity, (k,e) anonymity, which are described - l = l +1
in [6]. end while
They provide a detailed survey of anonymization methods
and also point out pitfalls in k anonymity. Previous works by
Samarati and Sweeney [7,8] shows that the removal of the
personally identifying information from data is insufficient for
the data security, rather it is better to use k – anonymity
method for publishing data. The quasi–identifier (QI), which
is the combination of person specific identifiers are considered
here for the process of anonymization. One of the common
methods to achieve k –anonymity is to generalize identifiers
(for example date of birth can be generalized to month of
birth).
The [9] proposes a singular, extra flexible generalization
scheme. The experimental results of their study indicate that

956
International Conference on Current Trends in Computer, Electrical, Electronics and Communication (ICCTCEEC-2017)

FLOW DIAGRAM-

Start

Input dataset

Anonymization algorithm

Hierarchical clustering for partitioning


dataset
Above graph represents the analysis of accuracy in base and
propose method results and concluded that propose method is
best for preserving privacy.
DES encryption on data TABLE II

Final result No. of


100 200 300 400 500
records

Error rate
End in base 16.00 16.50 16.00 16.00 16.20
results
Error rate
in proposed 4.00 2.00 2.00 2.50 3.00
Procedure – results
Step1 - Consider the dataset for input.
Table II Describes the error rate between Base method results
Step2 - Apply anonymization technique to that particular
and Propose method results.
dataset.
Step3 - Hierarchical clustering technique is used to partition
the data sets into clusters.
Step4 – DES encryption technique is used to suppress the data
values.
Step5 – Final result obtained by union of lhs and rhs values
formed by anonymization technique.

IV. RESULT ANALYSIS

TABLE I

No. of records 100 200 300 400 500

Accuracy in Above graph represents the analysis of error rate in base and
84.00 83.50 84.00 84.00 83.00 propose method results and concluded that propose method is
base results
best for preserving privacy as the error rate in propose method
Accuracy in is less than the error rate in base method.
proposed 96.00 98.00 98.00 97.500 97.00
results

Table I Describes the accuracy between Base method results


and Propose method results.

957
International Conference on Current Trends in Computer, Electrical, Electronics and Communication (ICCTCEEC-2017)

Conclusion References
The privacy renovation for data analysis is a challenging
[1] Bhavani Thuraisingham,” Privacy-Preserving Data Mining:
studies difficulty because of increasingly larger volumes of Developments and Directions”, IDEA GROUP PUBLISHING, Journal
data sets, thereby requiring in depth research. Each privacy of Database Management, 16(1), 75-87, Jan-March 2005 77.
preserving technique has its own importance. Data encryption [2] Pingshui WANG,” Survey on Privacy Preserving Data Mining”,
and anonymization are widely adopted ways to combat International Journal of Digital Content Technology and its
privacy breach. However, encryption is not suitable for data Applications, Volume 4, Number 9, December 2010.
that are processed and shared. Anonymizing huge data and [3] J. W. Han and M. Kamber, “Data Mining: Concepts and Techniques,” 2
dealing with anonymized data sets are nonetheless challenges nd Edition, China Machine Press, Beijing, 2006.
for classic anonymization processes. Privacy- preserving data [4] Kiran Israni, Shalu Chopra,” Survey on Anonymization Technique for
Privacy Preserving Data Mining (PPDM)”, International Journal of
mining is emerged for to 2 critical desires: data analysis with a Innovative Research in Computer and Communication Engineering, Vol.
purpose to deliver better services and making sure the privacy 4, Issue 11, November 2016, ISSN(Online): 2320-9801.
rights of the data owners. Substantial efforts have been [5] Asmaa H.Rashid and Prof.dr. Abd-Fatth Hegazy, “Protect Privacy of
accomplished to address these needs. Medical Informatics using K-Anonymization Model”, IEEE Explore.
[6] Yan Zhao, Ming Du, Jiajin Le, Yongcheng Luo, “A Survey on Privacy
The results of our proposed work shows that by doing Preserving Approaches in Data Publishing”, First International
hierarchical clustering and encrypting the data using DES Workshop on Database Technology and Applications, 2009.
method we can achieve more preservation of privacy. [7] Samarati P, “Protecting respondent’s privacy in Microdata release”,
IEEE Transactions on Knowledge and Data Engineering, 13:1010–1027.
[8] Sweeney L, “k-anonymity: A model for protecting Privacy”,
International Journal on Uncertainty, Fuzziness and Knowledge-based
Systems, 10(5):557–570.
[9] Tiancheng Li, Ninghui Li, “Towards Optimal k-anonymization”, Data &
Knowledge Engineering, 2008 Elsevier.
[10] Nissim Matatov, Lior Rokach, Oded Maimon, “Privacy-preserving data
mining: A feature set partitioning approach”, Information Sciences 180
(2010) 2696–2720.
[11] Benjamin C. M. Fung, Ke Wang, Lingyu Wang, Patrick C.K. Hung,
“Privacy-preserving data publishing for cluster analysis” , Data &
Knowledge Engineering 68 (2009) 552–575.
[12] Dan Zhu, Xiao-Bai Li, Shuning Wu, “Identity disclosure protection: A
data reconstruction approach for privacypreserving data mining”,
Decision Support Systems 48 (2009) 133–140.

958

Вам также может понравиться