Вы находитесь на странице: 1из 5

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 4, APRIL 2011, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 12

An Effective Framework to Preserve the Data Privacy


By Innovative Rotation Technique
P.Kamakshi , Dr.A.Vinaya Babu

Abstract -- Due to rapid development in the hardware, software and networking technology, there has been a tremendous growth in the amount of
data collected, stored and shared between different organizations. The data is collected from heterogeneous sources like medical, financial, library,
telephone, and shopping records can be stored in central repository called data warehouse. The primary challenge is to how to utilize such data for
competitive business advantage. Data mining process analyzes such data from different perspectives and summarizes it into useful information that
can be used to increase revenue, reduce cost and recommend better resolution for the growth of an organization. Data mining tools finds
correlations or patterns among large relational databases and analyze the data from many different dimensions or angles. Data mining is seen as an
increasingly important tool by modern business to transform data into business intelligence giving an informational advantage in various domains
like marketing, weather forecasting, fraud detection, scientific research etc. A very significant feature to be considered during data mining process
is that the data collected from heterogeneous sources also consists of sensitive information. The extracted pattern obtained by data mining
operation may reveal the sensitive information. While data mining is a technology that has a large number of advantages, the main threat to be
addressed is privacy. The main anxiety of people is that their confidential information may be disclosed without their knowledge and will be
misused behind the scenes.. Hence data mining activities are forced to take actions to protect the privacy of the individuals. In this paper we
propose and architecture which utilizes the significant features of perturbation and rotation techniques. In this paper we analyzed the problem due
to perturbation technique and proposed a method to present better protection of sensitive information. 

Index Terms -- Data Mining, Data perturbation, Privacy, Rotation, Sensitive data. 


__ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _

1 INTRODUCTION PPDM is an emerging technique in data mining where privacy and


data mining can coexist. It gives the summarized results without any
Data mining operation are extensively used in various loss of privacy through data mining process. The main consideration
applications like business, academia, communication, in privacy preserving data mining is the sensitive nature of raw data.
bioinformatics and medicine. The data mining results not only The data miner, while mining for aggregate statistical information
gives the valuable information hidden in databases, but about the data, should not be able to access data in its original form
sometimes also reveals private information about individuals. with all the sensitive information. In this paper we proposed a
The data mining techniques are capable to derive highly framework which integrated the additive perturbation technique and
sensitive knowledge from uncategorized data which even the rotation technique .This novel architecture give better performance
database owners also don’t know. The collection of personal compared to the additive perturbation technique.
information during the original transaction between an
individual and organization can be used for other purposes. The 2 PREVIOUS WORK
threat to privacy becomes more harmful when the data owners
are not aware of the secondary uses of their data. The fear of 2.1 Privacy Preserving Data Mining Methods
individual is that they don’t have control over the sensitive
information which is stored and distributed on different digital
storage media. The difficult and complex situation is how to For the past few years, several approaches have been proposed
protect utilization of the valuable information obtained from in the context of privacy preserving data mining. These
secondary usage and also help the organizations to use the techniques can be classified based on the different protection
extracted knowledge to support in decision making and methods used, such as Data modification methods,
analysis purpose. Cryptographic methods. Fig-1 shows the classification of
Deleting the unique personal identifier like name or social different privacy preserving data mining methods.
security number from a dataset can offer sufficient privacy .but
removing the unique identifiable attributes containing personal Privacy Technique
information does not help always. The re-identification attacks
can relate different public data sets to re-identify the original
values. . The complexity of data mining process is that it Data Modification Cryptographic Query auditing
extracts or evaluates the individual data which is considered as
private by means of linking different attributes. Hence it is
difficult to acquire balance between the right to privacy and
knowledge discovery, which is critical factor in resolution Noise Addition Secured Multiparty Computation online
process in various activities. The real hurdle is not data mining,
but the technique data mining results are accomplished. Signals transform Horizontal Partition offline
___________________________________________________
Swapping Vertical Partition
P.Kamakshi ,Associate Professor ,Department of Information
Technology,Kakatiya Institute of Technology & science , Aggregation
Warangal,A.P.,India-506015
Suppression
Dr. A.Vinaya Babu, Professor, Department of Computer Science and
Engineering, Jawaharlal Nehru Technological University, Fig 1 Classification of privacy preserving methods
Hyderabad,A.P.,India.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 4, APRIL 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 13

Data modification techniques modify the data before releasing The rest of this paper is organized as follows: the next
it to the users. Data is modified in such a way that the privacy subsection provides a motivation for our work by presenting
is preserved in the released data set. the well-known randomized data perturbation technique. To
further enhance the quality of output result of perturbation
Cryptographic methods encrypt the data with encryption technique we used swapping technique. In this paper we
schemes while still allowing the data mining tasks. These proposed a framework which integrates both perturbation and
methods use certain set of protocols such as secured multiparty swapping techniques features.
computation (SMC).SMC techniques are not supposed to
disclose any new information other than the final result of the 2.2 Overview of additive perturbation Technique
computation to a participating party. SMC techniques are
applied to distributed data sets. Cryptographic methods bring in  Perturbation techniques preserve the privacy of individual data
the overhead of encryption decryption and are less efficient for by altering the original data with some known distribution of
larger data set and where data utility is of concern noise. Here the users are provided access only to the modified
values instead of original values. The larger the data set, the
Query auditing methods preserve privacy by modifying or less the difference between analyses performed on the original
restricting the results of a query.. In these methods too many and perturbed data sets. The main usage of perturbation
denials to a query leads to less utility of the data set. Lesser techniques comes where there is a need to provide the data to a
denial though increases the utility but sacrifices privacy. third party for data mining to retrieve related data sets and to
extract hidden patterns. In randomization approach the privacy
Noise addition methods add some random number (noise) to
of the data is obtained by perturbing it with randomization
numerical attributes. This random number is generally drawn
algorithms and submitting the randomized version, thus hiding
from a normal distribution with zero mean and small standard
the data and guaranteeing protection against the reconstruction
deviation. It is especially convenient for applications where the
of the data. In this scheme, a random number is added to the
data owners need to export/publish the privacy-sensitive data.
value of a sensitive attribute. For example, if X is the value of a
A data perturbation procedure can be simply described as
sensitive attribute than, Xi+r will appear in the database, where
follows. Before the data owner publishes the data, they
r is a random value drawn from some distribution. This method
randomly change the data in certain way to disguise the
is known as additive data perturbation. Most commonly used
sensitive information while preserving the particular data
distributions are the uniform distribution over an interval [-α,
property that is critical for building the data models..Noise
α] and Gaussian distribution with mean equal to zero and
addition to categorical values is not straightforward.
standard deviation σ. The algorithm is so chosen that aggregate
Data swapping interchanges attribute values among different properties of the data can be recovered with sufficient precision
records. Similar attribute values are interchanged with higher while individual entries are significantly distorted. The server
probability. All original values are kept within the data set and has a complete and precise database with information from its
only the positions are swapped. clients, and it has to make a version of this database public, for
others to work with. One important example is census data; the
Aggregation refers to grouping. Here in these methods few government of a country collects private data for research and
records are grouped and replaced by a group representative economic planning. However, it is assumed that private records
such as in case of income attribute, instead of individual of any given person should not be released nor be recoverable
income values they can be grouped into, high low and medium from what is released. In particular, a company should not be
income. Aggregation replaces k number of records of a data able to match up records in the publicly released database with
set by a representative record. The value of an attribute in such the corresponding records in the company’s own database of its
a representative record is generally derived by taking the customers. The method of randomization can be described as
average of all values, for the attribute, belonging to the records follows. Consider a set of data records denoted by X = {x1 . . .
that are replaced. Due to the replacement of k number of xN}. For record xi  X, we add a noise component which is
original records by a representative record aggregation results drawn from the probability distribution fY (y). These noise
in some information loss. The information loss can be components are drawn independently, and are denoted y1 . . .
minimized by clustering the original records into mutually yN. Thus, the new set of distorted records are denoted by
exclusive groups of k records prior to aggregation. x1+y1 . . . xN +yN. We denote this new set of records by z1 . .
Suppression refers to replacing an attribute value in one or . zN. In general, it is assumed that the variance of the added
more records by a missing value. In suppression technique noise is large enough, so that the original record values cannot
sensitive data values are deleted or suppressed prior to the be easily guessed from the distorted data. Thus, if X be the
release of a micro data. Suppression is used to protect an random variable denoting the data distribution for the original
individual privacy from intruders' attempts to accurately record, Y is the random variable describing the noise
predict a suppressed value. An intruder can take various distribution, and Z is the random variable denoting the final
approaches to predict a sensitive value. record, we have:

Signal Transform methods use Fourier Transformation and Z=X+Y


Wavelet transformation to modify the data. These methods are X=Z-Y
fast with improved time complexity than their predecessors.
Although some metric-based global perturbation can keep very By subtracting Y from the approximated distribution of Z, it is
good data mining utilities while preserving certain privacy, possible to approximate the original probability distribution X.
data statistics are not included in the consideration of these At the end of the process, we only have a distribution
techniques. It is necessary to keep statistic properties so that containing the behavior of X. One key advantage of the
one set of data can be used for statistics analysis in addition to randomization method is that it is relatively simple, and does
the data mining analysis. So another modification technique not require knowledge of the distribution of other records in the
wavelet-based data perturbation is used to maintain statistical data. Our experiment was performed on numerical database by
properties. applying Gaussian technique to all the attributes in a given
database. The same technique can be applied to selected
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 4, APRIL 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 14

attributes, which the database administrator considers as more privacy for multidimensional perturbation because it perturbs
sensitive. the fig.2 below shows a sample income table where multiple columns in one transformation.
the values of attribute income are shown in its original and
perturbed form. S.no Location Age profession income

1 Toronto 29 Student 20000


Income(per
S.NO Age Experience Income
turbed) 2 America 38 lawyer 30000
1 25 15 54 56 3 Iran 34 Professor 45000
2 30 14 33 31 4 America 43 doctor 50000
3 33 13 32 30

4 29 12 33 31
Fig. 4 A sample database before applying rotation operation
5 43 20 55 53

6 33 18 42 40 S. no Location Age profession income

7 44 14 34 36 1 Toronto 30 Student 22000

8 54 12 33 35 2 America 37 lawyer 33000

9 50 11 32 34 3 Iran 36 Professor 42000

10 51 10 30 33 4 America 45 doctor 48000

Fig. 5 Modified database after Rotation technique applied on


Fig. 2 The table above represents the sample database table of
the attribute age and income
an employee where the data value under attribute income is
perturbed
3 PROPOSED ARCHITECTURE
2.3 Data rotation technique
With the development of technology and networking
Maximum number of data mining applications collaborates organizations collect and store huge amount of data .Such huge
with large data sets that contain private information that must volume of data is considered as very important asset for an
be protected. The necessity to protect sensitive information has organization. Most of The companies use data mining tools to
led to the progress of many privacy-preserving data mining extract the unknown pattern from such data. The data owners
techniques. Many of these techniques use randomized data use such interesting patterns for analysis and decision making
distortion by adding noise to the sensitive data. However, non- process. In today’s competitive world the companies realized
careful noise addition may introduce misleading results. Hence that growth of the organization is not possible by only an
to protect statistical properties of the sensitive data and to meet individual but it is possible only by means of sharing
privacy requirements the data transformation technique called information and collaborating with other companies .But the
Rotation-Based Transformation is used . This method distorts companies restrict themselves because of the limitation on
only confidential numerical attributes and preserves the sharing of private data. The additive perturbation technique
statistical properties of the data. In rotation data transformation protects the privacy of the sensitive information by adding
method an angle theta is used to represent the noise. The small of noise to the original data and submitting the perturbed
transformation is applied to the selected confidential attribute data to the outside world for analysis purpose.
by rotation angle theta. The most difficult transformation is
rotation .The transformation is performed by observing the There are some challenges the additive perturbation technique
rotation of a point about the coordinate axes. The face in preserving the data privacy. One of the challenging
transformation metric shown in fig. below indicate the rotation issues to address is that under certain conditions it is relatively
of a point in a 2D discrete space by an angle theta. The rotation easy to violate the privacy protection offered by the random
angle theta is measured clockwise and this transformation perturbation based techniques. Randomization schemes might
affects the values of X and Y co-ordinates. In this example we not be secure as attackers may apply a random matrix-based
present a 2D geometry in 2D discrete space .Still, this spectral filtering technique to retrieve original data from the
technique is scalable for more number of dimension. perturbed data. Hence the main objective is to upgrade the
additive perturbation technique.
The proposed framework performs very efficiently in the
cos sin environment where there are number of parties want to share
their database for their mutual benefit and growth of
-sin cos companies, but companies restrict themselves from sharing
their private data due to various privacy policies .
Fig 3. Transformation matrix for Rotation
The proposed framework is divided into three components the
The set of operations Fi(OP) takes only the value rotate ,that client, data miner and data provider. At any time the role of
identifies a common rotation angle between the attributes Ki service data provider and customer can be interchanged. In this
and Kj .Unlike the perturbation technique discussed above , the novel hybrid approach which is amalgamation of the
rotation technique can be applied more than once to some perturbation approach and rotation technique. This architecture
confidential attributes. When a rotation transformation is follows four steps. In the first step the client submit the query
applied this affects the values of two coordinates. In a 2D to the data miner. The data miner interact with other parties
discrete space, the X and Y coordinates are affected. In a 3D having their own databases and are under the collaboration
discrete space or higher, two variables are affected and the with each other but don’t trust each other .
others remain without any alteration. One or more rotation
transformations guarantee that all the confidential attributes are
distorted in order to preserve privacy. This technique assures
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 4, APRIL 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 15
Client submits the 
query  Fig 8 Modified database after performing additive perturbation

On freight and Unit price attributes.


Data owner 

  Original  Ordered Custid Orderdate Freight Unit price


Data miner
  database  10248 Vinet 1996-07-04 11.06  17 
 
10249 TOMSP 1996-07-05 58.43  66.32 
Perturbed 
Rotation 
database  10250 Hanar 1996-07-06
operation   46  39 

10251 Victe 1996-07-07 52.51  66.32 

10252 suprd 1996-07-08 53.23  17 

10253 Hanar 1996-07-08 23.55  66.32 

 Results  10254 Chops 1996-07-09 112.27  14 

10255 Ricsu 1996-07-10 12.36  66.32 


Fig.6 Framework for preserving the privacy of sensitive data
by performing rotation transformation on perturbed data. 1996-07-11
10256 Welli 81.83  39 
In the second step data owner identifies the sensitive attributes 1996-07-12
10257 hilaa 175.32  66.32 
and perturb the data related to those attributes using additive
perturbation.. In the third step the perturb information is
received by the data miner, the data miner again select the
attributes which is considered as sensitive by data miner, the The table in fig. 8 indicated the sensitive attribute values after
rotation operation is performed on the data of selected adding small amount of noise to the values of sensitive
attributes .The result obtained in this manner has better privacy attributes i.e. freight charges and unit price. The table in Fig. 9
compared to the privacy which is obtained by perturbation represents the values of sensitive attributes after applying a
technique alone. rotation with an angle of 2 degrees.

Ordered Custid Orderdate Freight Unit price


4 EXPERIMENTAL RESULTS
10248 Vinet 1996-07-04 19.30302 
12.39657

Order_id Custid Orderdate Freight Unit price 10249 TOMSP 1996-07-05 64.91925 
56.78988
10248 Vinet 1996-07-04 12.06  17  10250 Hanar 1996-07-06 41.28963 
48.01015
10249 TOMSP 1996-07-05 57.43  65.32 
10251 Victe 1996-07-07 65.68665 
50.62129
10250 Hanar 1996-07-06 47  38 
10252 suprd 1996-07-08 19.30302 
1996-07-07 55.02925
10251 Victe 51.51  67.32 
10253 Hanar 1996-07-08
10252 suprd 1996-07-08 54.23  18  19.61949 65.7913

10253 Hanar 1996-07-08 22.55  65.32  10254 Chops 1996-07-09


113.0231 16.30484
10254 Chops 1996-07-09 113.27  15 
10255 Ricsu 1996-07-10
9.4981 64.91925
10255 Ricsu 1996-07-10 11.36  65.32 
10256 Welli 1996-07-11
10256 Welli 1996-07-11 82.83  40  82.21134 41.28963

10257 hilaa 1996-07-12 174.32  65.32  10257 hilaa 1996-07-12


175.1083 65.68665

Fig.7 A sample database of a shipping company The table Fig.9 The database table after executing rotation with an angle
above represents the sample database of a shipping company of 2 degrees on Freight and Unit price attribute.
.We have considered only few attributes out of 20 attributes.
The Freight charges and unit price attributes are considered as
sensitive attributes for computation purposes.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 4, APRIL 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 16

In this experiment setup we have considered freight and unit information from the perturbed data can be extracted by using
price as sensitive attributes. The results represented in the various techniques,  there is an increased need to discover and
Fig.10 and Fig.11 shows the original values, perturbed values distribute the databases, without compromising the privacy of
and the values rotated by 2 degrees for the Freight and the individual’s data. In this paper we proposed the architecture
unti_price attributes. which integrates the perturbation technique and Rotation
technique to protect the sensitive data. Privacy of sensitive
records is achieved as the original data is replaced by other
value in the results. Further the privacy is enhanced as data
rotation technique provides confidentiality protection by
modifying a fraction of the records in the database by applying
noise in terms of angle to the sensitive attributes.

6 REFERENCES

[1] Agrawal R.,Srikant R., ``Privacy Preserving Data Mining.,''


In the Proceedings of the ACM SIGMOD Conference.
2000.
[2] K.Muralidhar.,R.Sarathi, “A General additive data
perturbation method for data base security” journal of
Management Science. ,45(10):1399-1415,2002
[3] Agrawal D. Aggarwal C.C. “On the Design and
Quantification of Privacy Preserving Data mining
algorithms.” ACM PODS Conference, 2002
[4] Muralidhar K. and Sarathy R. “ Data Shuffling- a new
Fig 10 .The graph shows only the partial representation of masking approach for numerical data.” management
values out of 800 records .The values shows difference Science, forthcoming, 2006.
between original, after additive perturbation and after rotation [5] V.S. Iyengar.”Transforming data to satisfy privacy
constraints” In Proc. of SIGKDD’02, Edmonton,
technique for Freight attribute Alberta,Canada,2
[6] Lindell Y., Pinkas B.”Privacy preserving Data Mining“
CRYPTO 2000.
[7] Yu.H.,Vaidya J.,Jiang X.”Privacy preserving SVM
Classification on vertically partitioned data” PAKDD
conference, 2006.
[8] D. Agarwal and C.C.Aggarwal, “ On the design and
quantification of privacy preserving data mining
algorithms”, In Proceedings of the 20th Symposium on
Principles of Database
systems,Santabarbara,California,USA, May2001
[9] R.Agarwal and R.Srikant, “Privacy preserving data mining”,
In Procseedings of the 19th ACM SIGMOD conference on
Management of Data ,Dallas,Texas,USA, May2000
[10] J. Canny, “Collaborative filtering with privacy”. In IEEE
Symposium on security and privacy , pages 45-57 Oakland

Fig11. The above graph shows the values of initial ,after


additive perturbatin and after rotation with an angle of 2
degrees on unit_price attribute .

5 CONCLUSIONS
The accretion of massive data sets and the rapid development
of the Internet expanded opportunities for organizations to
collect, store and share the data and use it for analysis purpose.
The ever increasing ability to identify and collect large
amounts of data, analyzing the data using data mining process
and decision on the results gives prospective benefits to
organizations. The popular data mining tools are used to extract
novel feature from the data collected and can be used in
various domains by offering enormous opportunities for
statistical analysis, advancement and understanding of social
and health problems, and benefits to society. But the explosion
of digitized databases containing financial and health care
records having sensitive information leads fear of privacy of
personal data after the data mining results are revealed .Hence
the challenge is how to release the maximal amount of
information without the disclosure of individually identifiable
information. Privacy preserving data mining techniques
proposes a number of techniques to perform the data mining
tasks in a privacy-preserving way. As the confidential

Вам также может понравиться