Академический Документы
Профессиональный Документы
Культура Документы
On
Insider Collusion Attack on Privacy-Preserving Kernel-Based Data
Mining Systems
Bachelor of Engineering
in
Computer Science & Engineering
Submitted by
Sneha Shinde
Vaishali Singan
Shruti Ghotkar
Vaishali Jikar
Guided by
Prof. T. M. Belle
2016-17
Insider Collusion Attack on Privacy-Preserving Kernel-Based
Data Mining Systems
Bachelor of Engineering
In
Computer Science & Engineering
Submitted by
Sneha Shinde
Vaishali Singan
Shruti Ghotkar
Vaishali Jikar
Guided by
Prof. T. M. Belle
2016-17
I, hereby declare that the report titled Insider Collusion Attack on Privacy-Preserving Kernel-
Based Data Mining Systemssubmitted herein has been carried out by me in the Department of Computer
science & Engineering of Shri Shankarprasad Agnihotri College of Engineering, Wardha. The work is
original and has not been submitted earlier as a whole or in part for the award of any degree / diploma at this
or any other Institution / University.
I also hereby assign to Shri Shankarprasad Agnihotri College of Engineering, Wardha all rights under
copyright that may exist in and to the above work and any revised or expanded derivatives works based on the
work as mentioned. Other work copied from references, manuals etc. is disclaimed.
Sneha Shinde
Vaishali Singan
Shruti Ghotkar
Vaishali Jikar
Date :
ACKNOWLEDGEMENT
I have immense pleasure in expressing my gratitude towards my seminar guide Prof. T. M. Belle for
valuable guidance and inspiration. She continuously supervised my work with utmost care.
I would also like to thank Prof. Bawane sir, for always being supportive, encouraging and extremely
helpful.
I also expressed my sincere thanks to Prof. D. B. Dandekar, Head of Department CSE, SSPACE,
Wardha.
INDEX
Abstract
1. Introduction 1
3. Methodology 5
3.3 Algorithm
4. Tools/Platforms 8
5. Design/Implementation 9
6. Future Scope 10
7. Snapshots 11
8. References 18
Figure Index
In this paper, we consider a new insider threat for the privacy preserving work of distributed kernel-
based data mining (DKBDM), such as distributed support vector machine.Among severalknowndata
breaching problems, those associated with insider attacks have been rising significantly, making thisone of the
fastest growing types of security breaches. Once considered a negligible concern, insider attackshave risen to
be one of the top three central data violations. Insider-related research involving the distribution of kernel-
based data mining is limited, resulting in substantial vulnerabilities in designing protection against
collaborative organizations. Prior works often fall short by addressing a multifactorial model that is more
limited in scope and implementation than addressing insiders within an organization colluding with outsiders.
A faulty system allows collusion to go unnoticed when an insider shares data with an outsider, who can then
recover the original data from message transmissions (intermediary kernel values) among organizations. This
attack requires only accessibility to a few data entries within the organizations rather than requiring the
encrypted administrative privileges typically found in the distribution of data mining scenarios. To the best of
our knowledge, we are the _rst to explore this new insider threat in DKBDM. We also analytically
demonstrate the minimum amount of insider data necessary to launch the insider attack. Finally, we follow up
by introducing several proposed privacy-preserving schemes to counter the described attack. Data leakage
happens every day when confidential business information such as customer or patient data, company secrets,
budget information etc. are leaked out when these information are leaked out, then the companies are at
serious risk. Most probably data are being leaked from agents side. So,company have to very careful while
distributing such a data to an agents
Keywords : Data hiding, data leakage, fake object, insider attack, kernel, Privacy preserving data mining.
Chapter1
INTRODUCTION
Data-breaching problems related to insider attacks are one of the fastest growing attack types.
According to the 2015 Verizon Data Breach Investigations Report, attacks from insider misuse have
risen signicantly, from 8% in 2013 to 20.6% in 2015. This near-triple rate of increase is astonishing when
one considers that this rise has taken place over a span of only two years. As a result of this rapid increase,
insider attacks are now among the top three types of data breaches.
Insider attacks arise not from system security errors but from staff inside the companys enterprise
data security circles. solutions to protect its perimeter yet still nd it difcult to prevent an insider attack.
Many data mining applications store huge amounts of personal information; therefore, extensive research has
primarily focused on dealing with potential privacy breaches. One prime area of research in preserving
privacy is the Sup- port Vector Machine (SVM). SVM is a very popular data mining methodology used
mainly with the kernel trick to map data into a higher dimensional feature space as well as maintain archives
with better mining precision results.
With privacy protection in mind, Vaidya et al. provided astate-of-the-art privacy-preserving
distributed SVM schemeto securely merge kernels.Their proposal encoded andhide the kernel values in a
noisy mixture during transmissionsuch that the original data cannot be recovered even if thesedistributed
organizations colluded.To the best of our knowledge, no prior work has considereda robust pragmatic model
in which ``insiders within organizations''collude with outsiders. Such a pragmatic model considers the insider
as the key player in sharing data withan attacker, who can then recover the original data from theintermediary
kernel values of the SVM model. This attackis more realistic because the attacker needs only to obtaina few
data entries rather than the entire database from anorganization to successfully recover the rest of the private
data.
Fig. 1.1 The Attacking Scenario : Insider within the hospital helps the outsiders attackers (the SVM server)
to launch attack.
Chapter 2
Description : A novel method is proposed for preserving privacy in mining big databases. Goal is to
mining the data while preserving data privacy and confidentiality.
Implementation :
Description: The detection and conception of strategies to solve malicious insider threats.
Implementation :
Fig. 2.2 Opportunities for prevention, detection and response for an insider attack
2.3 Tittle : Data Security in Big Data
Author: Er. Jyoti Arora and Misha Gupta
With increase in the size of data every year by 40% globally data mining and its consequences both have
become important factors of research. The tech giants such as Google, Facebook etc. are looking for patterns
in data for enhancing the user experience. With data mining for user experience enhancement there are some
negative outcomes that happened with this. People have reported of leakage of their private information and
content. So this privacy control along with advancement in data mining models has become important factor
of research. In the study we did on the topic we decided to give our contribution to this research because this
factor is of very importance and touches million lives. Doing research on topic and studying various factors
we found that there is a requirement of Internet model which prevents the expose of user private data
and sensitive information on all types of user levels i.e. data provider, data collector, data miner and decision
maker. On the basis of this problem we encountered we decided following objectives. The data-mining model
designed will give extensions for prevention and expose to every user level. The extensions provided will be
on data such as:
Phone Numbers.
Personal Shopping details
Address
Family Details The above factors are the privacy fields we will design system, which will not allow access
to these sensitive information with restriction, enable feature for all four types of users. While users end is
providing personal information of content browsing we set field of prevention. This will make sure that
information respective to private fields will not shared anywhere except for the user and service provider. As
we know that privacy level varies from person to person and this should be implemented in research also so
we will make sure that in our model every user can set his privacy level. He can prevent his any information
to be shared or dragged, or he can allow his all information to be shared. This implementation is done on all
four users to provide extra robustness to the security. This Internet net model will be used on local university
connection and in friend zones to make survey on the success rate of this approach on the Internet. And with
help of live information we will perform improvements to our systems and set future scopes for the research.
IMPLEMENTATION:
In this step, the malicious outsider collects kernel values from the victim systemas many as possible.the
server is able to collect all the private kernel matrices directly following procedure 2 of the system. There
are many more ways to achieve the goal of collecting kernel values; we will not address them all in this paper.
Another example is illustrated by Navia-Vzquezs distributed semiparametric support vector machine
(DSSVM) [18]: in which kernels are used as basic units to reducing the amount of information required for
communication. In [18], every client needs to send the pairs{ Rm, rm} to the other clients for distributed
learning, where Rm and rm are two intermediate matrices consisting of kernel values to reduce the
communication load. Note that rm = KTm Wm ym, Km is the kernel matrix of client m, and Wm is a part of the
clients entire weighting matrix, whereas W, and ymare the client part of the whole y-value matrix. The two
parameters W and y are public to all clients. Thus, the client receiving { Rm, rm} could invert rm to obtain the
kernel matrix Km based on the two public parameters W and y. If one of the clients is an attacker, he can
obtain all the values of the kernel matrix of the previous client.
In which is very popular in complex problems efciently, the following principles are considered:In this step,
an insider colludes (shares his own data) with one of the outsiders, and then, the outsider searches for the
kernel values composed from the data of the insider and his collection of kernel values. The idea of the
outsiders search stems from the fact that kernel values are composed on the basis of the insiders data and the
other (non-insider) data, as shown in equation.
Principle 1: Because there is a symmetrical property in the kernel matrix, consider only vertical and
horizontal kernel lines.
Principle 2: Merge the kernel lines for the same axis of the index, because they all represent the same index.
Principle 3: remove the kernel lines representing the indices of the other insiders data.
The outsider learned which kernel values were composed from which insiders data in step 2. In this step, the
outsider recovers the remaining private data inside these kernels, which are composed of one insiders data
and one private data. For example, in Fig. 8 (b), all the elements Ki3, i6=3, which are K13, K23, and K43 on
the orange kernel line are composed of one insider data (insider-data A, whose index is 3) and one unknown
private data (the data whose index is i). The question is: How can we retrieve the unknown data from the
kernel value Ki3, i6=3? This is the main focus in attack step 3. We will continue to use the same example of
Vaidyas PPSVM system [4], [5] used above to help introduce our idea. Suppose that in steps 1 and 2, the
malicious outsider has successfully collected n insiders data, Sj, where j = 1 n, and has also collected the
kernel values, Kij, composed of the insiders data, Sj, and a non-insiders data, Di. Assume that in total there
are p non-insiders data values in Di, where i = 1 p. Sj and Di are all vectors with m elements, and S
j (k) is the k-th element of Sj, where k = 1 m. This is also true for Di (k). The goal of attack step 3 is to
deduce all the non-insiders data, D1 Dp. First, the outsider uses the following method to deduce one non-
insiders data value, Du, and then proceeds to deduce all the other unknown non-insiders data values, one by
one, in the same way.
Algorithm 1: Kernel-and-Insider-Data-Linking Attack
Require: m m kernel matrix KM, total m data records x1 xm, and total n insiders data s1
sn
1: for k = 1 . . . n do
2: {Compute K1 and K2, where K1 is the kernel value of (sk, sp,p6=k, 1 6p 6n), and K2 is the
kernel value of (sk, s q,q6=k||q6=p, 1 6q 6n)}
3: Let KC1 = [], KC2 = [], l1 = 0, l2 = 0, IndexCand= [], Index = []
4: for fori = 1 . . . m do //Search for values equal to K1 and K2 in KM
5: for j = 1 . . . m do
6: if KM(i, j) = K1 then
7: KC1(l1 ) = (i, j)
8: else if KM i, j) = K2 then
9: KC2(l2) = (i, j)
10: end if
11: end for
12: end for
13: for u = 1 . . . max(l1 ) do //Apply Principle 1 & 2 to
kernel lines
14: for v = 1 . . . max(l2) do
15: if KC1(u)[1] 6= KC1(v)[1] &KC1(u)[2] = KC1(v)[2] then
16: if no element of the array IndexCand(k) = KC1(u)[2] then
17: Insert the element KC1(u)[2] into the array IndexCand(k)
18: end if
19: end if
20: end for
21: end for
22: end for
23: for k = 1 . . . n do //Apply Principle 3 to kernel lines
24: if #element of IndexCand(k) = 1 then
25: Index(k) = theelementofIndexCand(k)
26: end if
27: end for
28: for k = 1 . . . n do
29: if #element of IndexCand(k) >1 then
30: Delete all elements of IndexCand(k) that has been assigned to the other Index
31: Index(k) = a randomly chosen element from the remaining elements of IndexCand(k)
32: end if
33:end for
Chapter 4
Tools/platform
In this project we run the languages html., which is run on the different platform but we run it on the
one platform which is we created thats the server but in future we may add programming language
such as php.
Chapter 7
Snapshots
7.1 Step 1 : Start up the visual studio software in a system.
[2] Chandni Bhatt et al, / (IJCSIT) International Journal of Computer Science and Information Technologies,
Vol. 5 (2) , 2014, 2556-2558 Data Leakage Detection Chandni Bhatt Prof.Richa SharmaP( GHRAET
India).
[3] P. S.Wang, S.-W.Chen, C.-H.Kuo, C.-M.Tu, and F. Lai, An intelligent dietary planning mobile system
with privacy-preserving mechanism, in Proc. IEEE Int. Conf. Consum. Electron. (ICCE), Jun. 2015, pp.
336337
[5] S. Hartley, Over 20 Million attempts to Hack into Health database. Auckland, New Zealand: The New
Zealand Herald,2014
[7]J.Vaidya,H.Yu,andx.Jiang,privacypreservingSVMclassificationknowl.inf.Syst.,vol.14,no.2,pp.161-
178,feb 2008
[11] M. Gnen and E. Alpayd_n, ``Multiple kernel learning algorithms,''J. Mach. Learn. Res., vol. 12, pp.
2211_2268, Jul. 2011
[13]K.-P.Lin and M.-S. Chen, Privacy-preserving outsourcing support vector machines with random
transformation,in Proc. ACM SIGKDD Int. Conf.Knowl. Discovery Data Mining, 2010, pp. 363372.
[14]S.R.M. Oliveira and O. R. Zaane, Privacy preserving clustering by data transformation, J. Inf. Data
Manage., vol. 1, no. 1, p. 37, 2010.