Вы находитесь на странице: 1из 3

Privacy Preserving Data Mining Using LBG And ELBG

PRIVACY PRESERVING DATA MINING USING LBG AND ELBG


1
D.ARUNA KUMARI, 2B. POOJA Y, 3A.SHALINI B, 4VINAY

1,3 Department of Electronics and Computer Engineering, Associate professors ,CSI Life Member K.L.University,
Vaddeswaram, Guntur
2,3,4 III/IV B.Tech ECM K.L.University, Vaddeswaram, Guntur

Abstract: Many privacy preserving data mining algorithms attempt to preserve what database owners consider as sensitive.
This Paper Provides one Technique for Privacy Preserving Data Mining as Data Mining predicts high sensitive information.
We will show reconstruction based technique for numerical data. Finally the performance of the proposed techniques is
evaluated using accuracy and distortion parameters.

Keywords: Privacy preserving, reconstruction, quantization.

I. INTRODUCTION Fourth dimension refers to Hiding, part of data or


result of data mining ,.. can be hided.
Huge volumes of detailed personal data are regularly
collected and analyzed by applications. Such Data Fifth dimension is the most important issue i.e,
include shopping habits, criminal records, medical providing privacy during data mining .This paper
history, credit records, among others. Analyzing such mainly concentrates on fifth dimension
data opens new threats to privacy .Privacy preserving
data mining (PPDM) is one of the important area of III. PROPOSED APPROACH
data mining that aims to provide security for secret
information from unsolicited or unsanctioned Vector quantization works based on rounding off.
disclosure. Data mining techniques analyzes and 1. Original Data
predicts useful information. The concept of privacy
preserving data mining is primarily concerned with 2. Constructing codebook
protecting secret data against unsolicited access. It is
3. Encoding original data with code book
important because now a days Treat to privacy is
becoming real since data mining techniques are able 4. Decoding
to predict high sensitive knowledge from huge
volumes of data [1]. Vector quantization (VQ) is generally used for data
compression. In previous days, the design
II. CLASSIFICATION OF PRIVACY methodology of a vector quantizer (VQ) is treated as
PRESERVING DATA MINING a big problem in terms of the need for multi-
dimensional integration. Linde, Buzo, and Gray
There are many approaches for privacy preserving (LBG) Introduced an algorithm for Vector
data mining. Privacy preserving data mining quantization design based on training sequence. A
techniques can be classified based on the following VQ that is designed based on this algorithm are
dimensions [11]. referred as LBG-VQ.

1. Data distribution
2. Data modification
3. Data mining algorithm
4. Data or rule hiding
5. Privacy preservation
First dimension refers to data distribution; data can be
distributed vertically or horizontally over the systems,
Second dimension refers to modifying the original
data to other form, so that we can prevent de-
identification of sensitive data,
There are several methods are there for data
modification like randomization, swapping, sampling,
anonymity, blocking,etc
Third dimension is Data mining algorithm, when
mining is performed on data we could be able to
preserve privacy of individuals. Figure 3.1 : Block diagram of Vector Quantizer

Proceedings of 6th IACEECE-2013, 29th September 2013, Chennai, India. ISBN: 978-93-82702-31-3

48
Privacy Preserving Data Mining Using LBG And ELBG

CodeBook Generation Using LBG


For the generation of the codebook some set of
training data (LSF) is needed which is obtained by
collecting a set of LSF vectors. There is a well-know
algorithm, namely LBG algorithm [Linde, Buzo and
Gray, 1980][12], for clustering a set of L training
vectors into a set of M codebook vectors. The
algorithm is formally implemented by the following
recursive procedure

1. Design a 1-vector codebook; this is the centroid of


the entire set of training vectors (hence, no iteration is
required here).

2. Double the size of the codebook by splitting each


current codebook according to the rule

3. Nearest-Neighbor Search: for each training vector,


find the codeword in the current codebook that is
closest (in terms of similarity measurement), and
assign that vector to the corresponding cell
(associated with the closest codeword).

4. Centroid Update: update the codeword in each cell


using the centroid of the training vectors assigned to
that cell.

5. Iteration 1: repeat steps 3 and 4 until the average


distance falls below a preset threshold

6. Iteration 2: repeat steps 2, 3 and 4 until a codebook


size of M is designed. Figure 3.11 : Representation of original data set and sanitized
Section 4 explains about basics of vector data set using LBG
quantization, here code book plays a very important
role and accuracy of data mining result depends on
the effective design of code book. Because
quantization performed with the help of indices of
codebook.

This Thesis uses LBG algorithm for designing of


codebook, once the code book is constructed, New
dataset will be formed by approximating each point
of data with nearest code vector in the code book.
So, new data set contains approximated data values,
not exact values like original data set

IV. EXPERIMENTAL RESULTS

Table1: Original Data

Proceedings of 6th IACEECE-2013, 29th September 2013, Chennai, India. ISBN: 978-93-82702-31-3

49
Privacy Preserving Data Mining Using LBG And ELBG

REFERENCES

1.Agrawal, R. & Srikant, R. (2000). Privacy Preserving Data


Mining. In Proc. of ACM SIGMOD Conference on
Management of Data (SIGMOD00), Dallas, TX.
2.Alexandre Evfimievski, Tyrone Grandison Privacy Preserving
Data Mining. IBM Almaden Research Center 650 Harry
Road, San Jose, California 95120, USA
3. Agarwal Charu C., Yu Philip S., Privacy Preserving Data
Mining: Models and Algorithms, New York, Springer, 2008.
4. Oliveira S.R.M, Zaiane Osmar R.,A Privacy-Preserving
Clustering Approach Toward Secure and Effective Data
Analysis for Business Collaboration, In Proceedings of the
International Workshop on Privacy and Security Aspects of
Data Mining in conjunction with ICDM 2004, Brighton, UK,
November 2004.
5.UCI Repository of machine learning databases, University of
California, Irvine.http://archive.ics.uci.edu/ml/
6. Wikipedia. Data mining.
http://en.wikipedia.org/wiki/Datamining
7. Binit kumar Sinha Privacy preserving clustering in data
mining.
8. C. W. Tsai, C. Y. Lee, M. C. Chiang, and C. S. Yang, A Fast VQ
Codebook Generation Algorithm via Pattern Reduction,
Pattern Recognition Letters, vol. 30, pp. 653{660, 2009}
9. K.Somasundaram, S.Vimala, A Novel Codebook Initialization
Figure 3.12 : Clustering on original data set and sanitized
Technique for Generalized Lloyd Algorithm using Cluster
dataset using LBG
Density, International Journal on Computer Science and
Engineering, Vol. 2, No. 5, pp. 1807-1809, 2010.
CONCLUSION AND FUTURE SCOPE 10.K.Somasundaram, S.Vimala, Codebook Generation for Vector
Quantization with Edge Features, CiiT International Journal
This Work gives a different approach of using vector of Digital Image Processing, Vol. 2, No.7, pp. 194-198, 2010.
11.Vassilios S. Verykios, Elisa Bertino, Igor Nai Fovino State-of-
quantization for privacy preserving data mining. This the-art in Privacy Preserving Data Mining in SIGMOD
work shows analytically and experimentally that Record, Vol. 33, No. 1, March 2004.
Privacy-Preserving data mining is to some extent 12. M.Madhavi Latha, M.Satya Sai Ram and P.Siddaiah . Multi
possible using vector quantization approach. Switched Split Vector Quantizer, International journal of
Computer ,Information and systems science and engineering ,
Performance is also evaluated by taking into account Vol 2 No:1
two important parameter: distortion and Fmeasure 13.M.Madhavi Latha, M.Satya Sai Ram and P.Siddaiah\Multi
(quality of data mining results). Switched Split Vector Quantization,Proceedings of World
Academy of Science , Engineering and Technology Volume
27 Feb 2008 ISSN 1307-6884.
As future work new and effective quantization 14.Agarwal Charu C., Yu Philip S., Privacy Preserving Data
method can be used rather than LBG approach that Mining: Models and Algorithms, New York, Springer, 2008.
we have used. K nearest neighbor approach is one of 15. Atallah, M., Elmagarmid, A., Ibrahim, M., Bertino, E.,
the approach which can give better result. Verykios, V.: Disclosure limitation of sensitive rules,
Workshop on Knowledge and Data Engineering Exchange,
1999

Proceedings of 6th IACEECE-2013, 29th September 2013, Chennai, India. ISBN: 978-93-82702-31-3

50

Вам также может понравиться