Вы находитесь на странице: 1из 5

GRD Journals | Global Research and Development Journal for Engineering | International Conference on Innovations in Engineering and Technology

(ICIET) - 2016 | July 2016

e-ISSN: 2455-5703

An Efficient Extreme Learning Machine based


Intrusion Detection System
1W.

Sylvia Lilly Jebarani 2K. Janaki 3R. Anupriya


1
AP- Senior Grade 2,3UG Scholar
1,2,3
Department of Electronics and Communication Engineering
1,2,3
Mepco Schlenk Engineering college, Sivakasi, India
Abstract
This paper presents an intrusion detection technique based on online sequential extreme learning machine. For performance
evaluation, KDDCUP99 dataset is used. In this paper, we use three feature selection techniques filtered subset evaluation, CFS
subset evaluation and consistency subset evaluation to eliminate redundant features. Two network traffic profiling techniques are
used. Alpha profiling is done to reduce time complexity and beta profiling is used to remove redundant connection records and
hence reduce the size of dataset
Keyword- Network traffic profiling, OS-ELM
__________________________________________________________________________________________________

I. INTRODUCTION
In recent years of advanced technologies, networks are facing many threats. One among them is intrusion. It affects networks by
consuming more bandwidth and other resources. Thus the need of this hour is detecting intrusions. It can be done by analyzing
the network traffic dataset. But it is difficult to process large dataset. So network traffic behavior can be used for intrusion
detection.
The proposed technique considers issues like hugeness of dataset, low accuracy and time complexity. OS-ELM processes
network traffic dataset to detect intrusions. It is fast and accurate in classification.
The previous intrusion detection techniques use support vector machines for classification. It has the inability of
classifying new type of connection records for which it is not trained. In this proposed technique, we use extreme learning
machine for classification. It is trained by using training dataset and it learns itself and classifies new type of connection records.
The standard KDDCUP99 dataset is used for performance evaluation of this proposed technique. It has about 5 million
connection records.
Three feature selection techniques are used to remove redundant features which reduce the accuracy of the classifier.
The techniques are filtered subset evaluation, CFS subset evaluation and consistency subset evaluation. By selecting appropriate
features, the accuracy of classification is improved.
10 fold cross validation technique is used to divide the dataset into training and testing dataset. The dataset is divided into 10 sets
and 10 iterations are done. Every time, one set is used for testing and 9 for training. This is repeated 10 times.
Alpha profiling is a network traffic profiling technique which reduces the time complexity by grouping connection
records which have same protocol type and service into a single alpha feature. This reduces the time complexity of the classifier.
Some of the advantages of using alpha profiling are increased scalability, load balancing and handling unknown profiles.
Beta profiling is another network traffic profiling technique which reduces the size of dataset. The similar connections records
are grouped together by using a clustering algorithm namely DBSCAN and the centre of these clusters are combined and used as
dataset.
Online sequential extreme learning machine classifier is used for classification. It is fast and accurate compared to other
previously used classifiers. This classifier detects intrusion in the network.

II. DATASET DESCRIPTION


25,000 connection records were chosen from the KDDCup99 dataset. The dataset consists of 41 features and one class label. The
class label indicates whether the record is normal or anomalous. The features are as shown in Table 1.

All rights reserved by www.grdjournals.com

297

An Efficient Extreme Learning Machine based Intrusion Detection System


(GRDJE / CONFERENCE / ICIET - 2016 / 048)

Table 1: List of features in KDDCup99 Dataset

III. METHODOLOGY
Fig.1 shows methodology of the proposed system. The experiment is carried out using MatLab(version R2014a) and Weka data
mining tool.
The blocks involved are explained below:
A. Dataset Pre-processing
The KDDCup99 dataset contains both categorical and continuous features. Since classifier cannot compute categorical features,
the dataset must contain only continuous features.

All rights reserved by www.grdjournals.com

298

An Efficient Extreme Learning Machine based Intrusion Detection System


(GRDJE / CONFERENCE / ICIET - 2016 / 048)

Fig. 1: Proposed Intrusion Detection System

B. Feature Selection
Space and time complexity can be reduced by feature selection technique. It is carried out using Weka tool. After analysis with
different evaluation techniques, it is found that Filtered subset evaluation, consistency subset evaluation, CFS subset evaluation
are the three feature selection techniques that provides optimal subset of features. By reducing the number of features, accuracy
is increased.
C. Cross Validation
10-fold cross validation technique is used in this system. The whole dataset is divided randomly into 10 parts. Here, 9 parts are
used for training and the remaining one part is considered for testing. Maximum error estimation is done using 10-fold crossvalidation.
D. Alpha Profiling
Based on the protocol and service features of connection records, profiles are created. This process is called Alpha profiling.
Combination of a protocol and service is termed as an alpha feature. Connection records are separated based on each feature and
the groups are called Alpha profiles. The main advantages of this process are as below:
Increased scalability and load balancing.
Reduces number of comparisons
Efficient handling of unknown profiles
Reduces protocol service imbalances.

All rights reserved by www.grdjournals.com

299

An Efficient Extreme Learning Machine based Intrusion Detection System


(GRDJE / CONFERENCE / ICIET - 2016 / 048)

E. Beta Profiling
IDS use more detection time for processing the large dataset. In order to rectify this problem, beta profiling is introduced. This
process is also called Sample reduction process. This process also reduces memory requirement and time complexity. Density
based clustering of applications with noise (DBSCAN), a clustering algorithm is used for the implementation of this process.
This groups similar connections removing redundant records resulting in quality samples only.
F. OS-ELM:
In our proposed methodology, OS-ELM classifier is used for intrusion detection. It overcomes the slow learning limitation of
other classifiers. It is capable of solving several classification problems and process large dataset in a very less time.
G. Result Aggregation:
The results obtained from every process are analysed and compared with other existing classifier performances.

IV. RESULTS
Results of each experiment is presented in this section and analysed as given below:
A. Pre-processing
In this process, the nominal values were first converted into numerical values and then normalized. This was performed using
Matlab.
B. Feature Selection
The output of pre-processing is fed into Weka data mining tool for feature selection. Three techniques mentioned above were
used. The result of this process is as shown in Table 2.

Table 2: List of Selected features

C. Alpha Profiling
Alpha profiling is done using Matlab code. The output file consisted of 17 alpha profiles. The list of alpha profiles is as shown
below:

All rights reserved by www.grdjournals.com

300

An Efficient Extreme Learning Machine based Intrusion Detection System


(GRDJE / CONFERENCE / ICIET - 2016 / 048)

D. Beta Profiling
Using Matlab, the normal and anomalous connections records were separated. Then, based on DBSCAN algorithm, parameters
(distance threshold and minimum number of connections) are set and thus beta profiles are created. After beta profiling, the size
of the dataset got reduced by 10%.

V. PERFORMANCE ANALYSIS
The results obtained were fed into Support Vector Machine (SVM) and Sequential minimal optimization(SMO). The accuracy
values obtained are as shown in Table 3.

Table 3: Performance comparison using SVM and SMO classifiers

From the analysis of the tabulated results, the inferences are summarized as below:
1) After feature selection process, the dimensions have reduced by 52.38% of the original dataset. Thus, time consumption is
reduced.
2) From Table 1 it is found that accuracy is retained the same after alpha profiling but it accounts for reducing the number of
comparisons.
3) Accuracy is increased after beta profiling process. It reduces the size of the dataset thus reducing time and space complexity.

REFERENCES
[1] Adetunmbi A.Olusola., Adeola S.Oladele. and Daramola O.Abosede, Analysis of KDD 99 Intrusion Detection Dataset for
Selection of Relevance Features, World Congress on Engineering and Computer Science 2010, Vol I.
[2] Singh.R.,Kumar, H., &Singla. R. K(2014), TOPSIS based multi-criteria decision making of feature selection techniques for
network traffic dataset, International Journal of Engineering and Technology,5(6),4598-4604.
[3] Chia-Ming Wang, Yin-Fu Huang, Evolutionary-based feature selection approaches with new criteria for data mining: A
case study of credit approval data, April 2009.
[4] Ester,M., Kriegel, H.P., Sander,J.,&Xu,X.(1996),A Density-based algorithm for discovery clusters in large spatial
databases with noise, Second International Conference on Knowledge Discovery and data Mining(pp.226-231).
[5] S. Revathi., A. Malathi., A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for
Intrusion Detection, International Journal of Engineering & Technology, Vol. 2.
[6] KDDCup dataset (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
[7] Matlab Language of Technical Computing (2014). http://in.mathworks.com/products/matlab/.
[8] Weka 3.6.9: Data Mining Software, http://www.cs.waikato.ac.nz/ml/weka/.

All rights reserved by www.grdjournals.com

301