Вы находитесь на странице: 1из 7

Intrusion Detection System using PCA and Fuzzy

PCA Techniques
Amal Hadri Khalid Chougdali Rajae Touahni
LASTID Laboratory GREST Research Group, National LASTID Laboratory
Faculty of Science, Ibn tofail School of Applied Sciences (ENSA), Faculty of Science, Ibn tofail
University, Kenitra, Morocco. Kenitra, Morocco. University, Kenitra, Morocco.
amal.hadri2009@gmail.com chougdali@gmail.com rtouahni@hotmail.com

Abstract— One of the most commonly problem in the field of Our purpose is to make anomaly-based IDS more efficient,
network intrusion detection system is the tremendous by using some techniques to reduce the high dimensional
number of redundant and irrelevant information used to data obtained from network traffic, before applying any
build an intrusion detection system. In order to overcome this anomaly-based algorithms’.
problem, we have used and compared two dimensionality
reduction methods namely PCA and Fuzzy PCA which allows To address the problem of high dimensionality, a common
us to keeping just the most relevant information from the
approach is to identify the most relevant features
network traffic data. Then, we have applied K nearest
associated with all connection records without unduly
Neighbour algorithm in order to classify the test samples of
connections into a normal or attack category. The conducted
compromising the quality of the classification. The most
experiments were made by using KDDcup99 dataset. The commonly used approach which has proven to be efficient
results obtained reveal that Fuzzy PCA method outperforms in many application areas [3], [4][5][6] is Principal
PCA in detecting U2R and DoS (Denial of Service) attacks. Component Analysis (PCA) which allows us to define the
“eigenvectors” (or principal components PCs) of the
Keywords-- Dimension reduction; PCA; Fuzzy PCA; Network covariance matrix of the connection records distribution
Security; Intrusion Detection System (IDS)
[7]. These eigenvectors can be considered as a set of
I. INTRODUCTION features, which used to calculate the variation between all
connection records. Each connection can be defined by the
Nowadays, there are several existing mechanisms and eigenvectors corresponding to the largest eigenvalues, and
computer techniques to improve the robustness and which is the result of the most variance within the set of
security of computer network. Among these techniques, connection records.
we find the IDS (Intrusion Detection System) which can
Unfortunately, PCA as any other multivariate statistical
be used to detect abnormal or suspicious activities, as well
method are sensitive to outliers, missing data, and poor
as target’s attacks. In other words, these mechanisms can
linear correlation between variables, due to poorly
detect any attempt to violate the security policy. Generally,
distributed variables. As a result, data transformations
intrusion detection systems are classified as either misuse-
have a large impact upon PCA [8].
based (signature-based) or anomaly-based. Misuse-based
approach identifies the abnormal or suspicious behavior by To alleviate the drawbacks of PCA, one of the most
comparing it to known attacks signature. For this purpose, illuminating methods is FPCA [9], [10] (Fuzzy Principal
a dataset of attacks signature is required, this approach Component Analysis). The main goal of this technique is
provide a good detection for the well-known attack. to fuzzify the input data to reduce the influence of outliers
However, it can’t detect the new or unfamiliar intrusions. by using Fuzzy C-Means algorithm and then reformulate
On the other hand, the anomaly-based approach was PCA into FPCA.
introduced by Anderson [1] and Denning [2] attempt to
determine the “normal” model or behavior and generate This paper is organized as follows: the second section
an alarm if the variation between a given observation and describes concisely the two dimensionality reduction
the normal behavior surpasses a defined threshold. techniques PCA and FPCA. The third section is reserved
to present the approach of our system and we will present
the experimental methodology and discuss the results in

978-1-5090-6227-0/16/$31.00 ©2016 IEEE


the fourth section, and finally the fifth section summarizes This ratio defines the information rate kept from the whole
the main points of the paper. rough input data, by the corresponding d eigenvalues.

Step 6: Let t be a new sample column vector, the


II. PCA AND FUZZY PCA projection of t onto the new subspace spanned by these PCi
according to the following rules
This section will be dedicated to present theoretical
concepts of the two techniques: PCA and FPCA 𝑦𝑖 = 𝑈𝑖𝑇 𝑡 (6)

A. PCA :

Principal Component Analysis (PCA) [11]is a common B. FUZZY PCA


statistical technique for data analysis and preprocessing
that has been widely applied in many fields of research. The main idea behind this method is to fuzzify the input
PCA is designed to transform the data in a reduced form data to obtain the fuzzy membership for each data, and
and keep most of the original variance present in the initial reformulate PCA into FPCA [9] [13][14] .
data. In mathematical terms PCA is used to transform n
Suppose a data set of M connection vectors be v1, v2,
correlated variables into a d (d<<n) uncorrelated variables
v3,….,vM where each connection vector represented by N
called the principal components (PCs) [5][12].
features.
Let us consider a data set of M connection vectors be v1,
1) The first step is to fuzzify the data set and get
v2, v3,….,vM where each connection vector represented by N
memberships and centroids, by applying FCM
features. Following are the steps used to calculate the PCs:
algorithm to the input data set to obtain
Step 1: Calculate the average µ of this data set membership matrix R and centroid V.
2) Then we calculate the fuzzy covariance matrix:
𝑀
1 (1) 𝑀
𝜇= 𝑣𝑖 1 (7)
𝑀 𝐶𝑓𝑝𝑐𝑎 = 𝑉𝑖 𝑉𝑖𝑇
𝑖=1
𝑀
𝑖=1
Step 2: The deviation from the average is defined
as
(2)
3) Set the number of eigenvectors and calculate
𝜃𝑖 = 𝑣𝑖 − 𝜇
eigenvectors U and eigenvalues λ with Eq. (4).
Step 3: The sample covariance matrix of the data set is 4) Projecting the data onto the eigenvectors U with
defined as Eq. (6).

1 1
𝐶𝑛×𝑛 = 𝑀
𝑖=1 𝜃𝑖 𝜃𝑖𝑇 = 𝐴𝐴𝑇 (3) III. PROPOSED APPROACH OF OUR SYSTEM
𝑀 𝑀

Where 𝐴 = 𝜃1 , 𝜃2 , 𝜃3 , … , 𝜃𝑛 . The principal aim of this work is to build an efficient IDS,


and improving its performances. Following are the steps
Step 4: Let 𝑈𝑘 be the 𝑘 𝑡ℎ eigenvector of C, 𝜆𝑘 the used in our approach:
associated eigenvalue and let 𝑈𝑛 ×𝑑 = 𝑈1 𝑈2 … 𝑈𝑑 be the
matrix of these eigenvectors. Then we have Step 1: Dataset

𝐶𝑈𝑘 = 𝜆𝑘 𝑈𝑘 The Dataset used in our experiment is KDDcup99 [15]


(4)
that has been usually used in many different research
Step 5: Ordering the eigenvalues in decreasing order, and fields [16]and especially in the intrusion detection field to
select the eigenvectors (also called principal components give a predictive model capable to differentiate between
PCi) having the largest eigenvalues. The number of legitimate(normal) and illegitimate(called intrusion or
principal components depends on the inertia ratio attacks)connections in a computer network. The training
expressed by: dataset contained about 5,000,000 connection records, and
the 10% training dataset consisted of 494,021 records
𝑑
𝑖=1 𝜆𝑖 (5) which include 97,278 normal connections (i.e. 19.69%).
𝜏= 𝑛
𝑖=1 𝜆𝑖 Each connection record contains 41 different attributes
that present the different features of corresponding data) at the same time keeping the maximum variances
connection, and the value of the connection is labeled present in the original dataset.
either as an attack with one specific attack type, or as
normal. Step 4: Classification

Each attack type fall into four main categories: Finally, KNN [17] classifier is used for classification in
order to check whether these sample test network
- DOS: denial-of-service, e.g. syn flood; connections are normal or abnormal.
- R2L: unauthorized access from a remote machine,
e.g. guessing password; IV. EXPERIMENTAL METHODOLOGY AND
- U2R: unauthorized access to local superuser RESULTS
(root) privileges, e.g., various ``buffer overflow''
attacks; This section will be reserved to present the different
- Probing: surveillance and other probing, e.g., port experiments and results obtained when we implements the
scanning. dimensionality features reduction techniques PCA and
FPCA presented above.
The test dataset consists of 311,029 connections, and it
includes some particular attack types not existing in the We have used, as a training sample, 1900 normal
training dataset. The datasets contain a total of 24 training connections, 900 DOS, 900 Probing, 900 R2L, and 52
attack types, with an additional 14 types in the test data U2R randomly selected from the 10% training dataset
only. (KDDcup99). And for testing sample we have selected
randomly from the test dataset, 900 normal connections,
We work in this paper with the 10% dataset. 900 DOS, 900 Probing, 900 R2L, and 52 U2R.
Denote the intrusions successfully classified as TP (true
Step 2: Data preprocessing positives), the normal connections correctly predicted as
TN (true negatives), the normal connections wrongly
The main goal of this step is to have a standard format
classified as FP (false positives) and the intrusions
attributes before applying any dimensionality reduction,
wrongly classified FN (False negative).
for that we have converted the discrete attributes values of
In order to evaluate the performance of we use four
dataset into continuous values following the idea used in
measures: detection Rate (also called recall) DR, false
[3],suppose we have m possible values for a discrete
positive rate FPR, precision and F-measure. To getting
attribute i . For each discrete attribute we correspond m
more realistic results, we have calculated the average of
coordinates, and we can associate one coordinate for every
these measures using the 10-fold cross validation:
possible value of the attribute. Then, the coordinate
corresponding to the attribute value has a value of 1 and
the remaining coordinates has a value of 0. As an 𝑇𝑃
illustration if we consider the protocol type attribute which 𝐷𝑅 = × 100 (8)
𝑇𝑃 + 𝐹𝑁
can take one of the following discrete attributes tcp, udp or
icmp. Following the idea presented above, there will be 3
coordinates for this attribute. As a result, supposing that a 𝐹𝑃 (9)
𝐹𝑃𝑅 = × 100
connection record has a tcp (resp. udp or icmp) as a 𝐹𝑃 + 𝑇𝑁
protocol type ,then the corresponding coordinates will be
(1,0,0) (resp. (0,1,0) or (0,0,1)). With this conversion, each
𝑇𝑃
connection record in the datasets will be represented by 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = × 100 (10)
128 coordinates (3 different values for the protocol_type 𝑇𝑃 + 𝐹𝑃
attribute, 11 different values for the flag attribute, 70
possible values for the service attribute and 0 or 1 for the 2×𝑇𝑃
𝐹 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 = × 100 (11)
remaining 6 discrete attributes) in place of 41 attributes. 2×𝑇𝑃+𝐹𝑃+𝐹𝑁

Step 3: Dimensionality features reduction

In this stage we use the two dimensionality features A powerful IDS should have a high DR, Precision and F-
reduction techniques PCA and FPCA in order to reduce measure and a low FPR. In the beginning we have
the high dimensionality of data (for the training and testing conducted two experiments in order to determine the best
parameters which allow us the maximum detection rate
(also for the two measures: F-measure and Precision). In results with a maximum detection rate, Precision and F-
the first one we have fixed the number of principal measure and the minimum FPR. The main objective of this
component at two and we have diversified widely the experiment is to find the best number of principal
number of the nearest neighbors. As mentioned in the component (PCs) which can enhance considerably the
Fig.1 that k=2 nearest neighbors gives us the optimal detection rate (DR).

Fig.1:Precision/F-measure/DR/FPR vs. number of nearest neighbors (k)

In our second experiment we have fixed the number of In our third experiment, we have evaluated the efficiency
nearest neighbors at two and we have varied the number of of Fuzzy PCA in intrusion detection field, for that ,we
principal components in order to find the number of k have fixed the number of principal components and the
neighbors which give us the best detection rate. As shown number of nearest neighbors at two to seek the degree of
in the Fig.2, the first and second principal components membership M which gives us the best results. It’s clearly
give the optimal results. as illustrated in the Fig.3 that M= 9 gives us the best
results.
In accordance with the two experiences mentioned above
we have fixed the number of the nearest neighbors’ and the In the next experiments, we will compare the two methods
number of PC at their optimal values in order to compute PCA and Fuzzy PCA, as shown in Fig.4and Fig.5 FPCA
and get the detection rate for every type of attacks. overcomes PCA at the first and the second principal
components in detecting attacks .However, PCA gives a
few FPR rate than FPCA.

Fig.2:Precision/F-measure/DR/FPR vs. number of principal components (PC)


Fig.3:Precision/F-measure/DR/FPR vs. Degree M

To get more realistic results we have compared the with PCA, FPCA method has a worse false alarm rate even
detection rates of every type of attacks for PCA and FPCA if it has a better detection rate.
as demonstrated in the table II. It is shown that the
detection rates of FPCA for DOS and U2R attacks are
globally the best compared to those of PCA.

TABLE II. Attacks Detection Rate of PCA and FPCA

Method DOS U2R R2L Probing

DR (%) PCA 69,5556 7,6923 4,5556 93,1111

FPCA 73,1111 13,4615 4,1111 91,3333

Finally, we retain that FPCA on KDDCup99 dataset


outperforms PCA when we vary the number of principal
components from one to ten. Unfortunately, in comparison
Fig.4: Precision/F-measure/DR/FPR vs.PCs of PCA and FPCA methods

Fig.5: Precision/F-measure/DR/FPR vs. K nearest Neighbors of PCA and FPCA methods


V. CONCLUSION “Intrusion Detection Using Principal
Component Analysis,” in Engineering
The principal purpose of the approach presented in Systems Management and Its Applications
this paper is to reduce the huge number of the input (ICESMA), 2010, pp. 1 – 6.
data features of all connections records of intrusion
[7] I. T. Jolliffe, Principal Component
detection and keeping just the relevant and Analysis, Second Edition, 2nd ed. Springer-
meaningful information. By using the PCA and Verlag, 2002.
FPCA algorithms presented above .In order to
increasing the detection rate and minimizing the [8] M. Hubert, P. J. Rousseeuw, and S.
false alarms to build a robust IDS.The results Verboven, “A fast method for robust
principal components with applications to
obtained reveal that Fuzzy PCA outperforms PCA
chemometrics,” vol. 60, pp. 101–111, 2002.
in detecting U2R and DoS (Denial of Service)
attacks. In our future works we will implement new [9] W. Xiaohong and Z. Jianjiang, “II . Fuzzy
algorithms to enhance the performance of IDS Principal Component Analy-,” vol. 24, no.
using the latest dataset. 6, 2007.

VI. REFERENCES [10] H. F. P. Babe, “Principal Components


Analysis Based On A Fuzzy Sets
Approach,” no. May, 2016.

[11] M. Ringnér, “What is principal component


[1] J. P. A. Co, “Computer Security Threat
analysis ?,” vol. 26, no. 3, pp. 303–304,
Monitoring and Surveillance,” 1980.
2008.
[2] D. E. Denning, “Intrusion-Detection
[12] J. Shlens, M. View, and I. Introduction, “A
Model,” IEEE Trans. Softw. Eng., vol. SE-
Tutorial on Principal Component Analysis,”
13, no. 2, pp. 222–232, 1987.
2014.
[3] Y. Bouzida and N. Cuppens-boulahia,
[13] S. Z. XU, “Gait Recognition using Fuzzy
“Efficient Intrusion Detection Using
Principal Component Analysis,” in the 2ndin
Principal Component Analysis.” ,In 3ème
2nd International Conference on e-Business
Conference sur la Securite et Architectures
and Information System Security (EBISS),
Reseaux (SAR), La Londe,France, June,
2010, pp. 10–13.
2004.
[14] C. Sabru and H. F. Pop, “Principal
component analysis versus fuzzy principal
[4] S. Joseph, “Feature Reduction using component analysis A case study : the
Principal Component Analysis for Effective quality of danube water Atlanta ( 1985 –
Anomaly – Based Intrusion Detection on 1996 ),” vol. 65, pp. 1215–1220, 2005.
NSL-KDD,” vol. 2, no. 6, pp. 1790–1799,
[15] M. Tavallaee, E. Bagheri, W. Lu, and A. A.
2010.
Ghorbani, “A Detailed Analysis of the
[5] R. L. Kashyap, M. J. Paulik, N. Loh, A. KDD CUP 99 Data Set,” no. Cisda, pp. 1–
Automation, A. K. Jain, G. M. Jenkins, S. 6, 2009.
Francisco, and L. Sirovich, “Application of
[16] “KDDcup99.” [Online]. Available:
the Karhunen-Lokve Procedure for the
http://kdd.ics.uci.edu/databases/kddcup99/t
Characterization of Human Faces,” vol. 12,
ask.html.
no. 4, 1990.
[17] K. N. Stevens, T. M. Cover, and P. E. Hart,
[6] L. Mechtri, F. D. Tolba, and N. Ghoualmi,
“Nearest Neighbor,” vol. I, 1967.

Вам также может понравиться