1 s2.0 S0140366411000600 Main

Computer Communications 34 (2011) 13281341
Contents lists available at ScienceDirect
Computer Communications
journal homepage: www.elsevier.com/locate/comcom
Distributed denial of service attack detection using an ensemble of neural classier

P. Arun Raj Kumar , S. Selvakumar
CDBR-SSE Project Laboratory, Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli 620015, Tamil Nadu, India
a r t i c l e
i n f o
a b s t r a c t
The vulnerabilities in the Communication (TCP/IP) protocol stack and the availability of more sophisticated attack tools breed in more and more network hackers to attack the network intentionally or unintentionally, leading to Distributed Denial of Service (DDoS) attack. The DDoS attacks could be detected using the existing machine learning techniques such as neural classiers. These classiers lack generalization capabilities which result in less performance leading to high false positives. This paper evaluates the performance of a comprehensive set of machine learning algorithms for selecting the base classier using the publicly available KDD Cup dataset. Based on the outcome of the experiments, Resilient Back Propagation (RBP) was chosen as base classier for our research. The improvement in performance of the RBP classier is the focus of this paper. Our proposed classication algorithm, RBPBoost, is achieved by combining ensemble of classier outputs and Neyman Pearson cost minimization strategy, for nal classication decision. Publicly available datasets such as KDD Cup, DARPA 1999, DARPA 2000, and CONFICKER were used for the simulation experiments. RBPBoost was trained and tested with DARPA, CONFICKER, and our own lab datasets. Detection accuracy and Cost per sample were the two metrics evaluated to analyze the performance of the RBPBoost classication algorithm. From the simulation results, it is evident that RBPBoost algorithm achieves high detection accuracy (99.4%) with fewer false alarms and outperforms the existing ensemble algorithms. RBPBoost algorithm outperforms the existing algorithms with maximum gain of 6.6% and minimum gain of 0.8%. 2011 Elsevier B.V. All rights reserved.
Article history: Received 2 March 2010 Received in revised form 7 December 2010 Accepted 24 January 2011 Available online 16 February 2011 Keywords: DDoS Collaborative environmet Ensemble of neural networks Machine learning
1. Introduction DDoS is a coordinated attack on the availability of services of a single or multiple victim systems through many compromised secondary victims. One such massive attack [20] happened at yahoo. com website on February 7, 2000. Similar attacks were reported in other commercial organizations [20] such as cnn.com, e-bay, Amazon, etc. Yet another attack, the slammer worm [46] its virulent propagation resulted in the shutdown of range of critical systems including a safety monitoring system in Ohio, thousands of automatic teller machines run by the Bank of America, and the Internet share trading in South Korea. Even after a decade, since its rst attack in 1998, Internet is still vulnerable to DDoS attack. A survey by Arbor Networks [9] on November 11, 2008, revealed that the scale of DDoS attacks has been growing gradually since 2001. In 2008, the largest recorded DDoS attacks against a single target reached 40 gigabits per second, as against 24 gigabits reported in the year 2007. Hence, DDoS attacks are the most significant security threat that ISPs face. The typical collaborative applications [14,49,62] in India, include the Space research, Military applications, Higher learning in Universities and Satellite campuses, State and Central government
Corresponding author. Tel.: +91 431 2503239; fax: +91 431 2500133.
E-mail addresses: park@nitt.edu (P.A. Raj Kumar), ssk@nitt.edu (S. Selvakumar). 0140-3664/$ - see front matter 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.comcom.2011.01.012
sponsored projects, e-governance, e-healthcare systems, etc. These applications/projects have inherently geographically distributed centers carrying out specic task/research, networked together to achieve the common goal. Any attack on data or the process in any one of the centers disrupts the objective. Hence, these critical infrastructure and services need protection. Revenue loss, Network performance degradation, and Service unavailability at critical time are some of the factors that motivated us to provide protection for these collaborative applications. DDoS attacks such as IRC ood, HTTP ood, SYN ood, UDP ood, and buffer overow have been posing a serious threat [20] to such resource centers. A defensive strategy that understands the semantics and ow of service messages is required for detecting attacks. When a service is ooded with a volume of messages that exceeds its processing capacity, the excess must be discarded. The packet based discard strategy which distinguishes legitimate messages (ash crowd) from ood trafc is used to circumvent the impact on legitimate service requesters. The patterns of network trafc do not have a regular structure, and hence the statistical pattern recognition approaches are needed. As the Internet is administered in a distributed environment, existing defense and response systems suffer cooperation between networks. Due to massive ooded trafc, available resources are not sufcient enough to mitigate an attack. On detection of an attack, a node exchanges its list of infected nodes with the other collaborative nodes for prevention of further
P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341
1329
attacks in other nodes as well. In this paper, a generic architecture for the automatic detection of DDoS attacks and response appliance that is aware of the semantics and ow of service messages is proposed. Our proposed system differs from [18,42] in alert generation strategy using ensemble of classiers, but similar in communicating the attack signatures by the peers in the collaborative environment. Our proposal is to make intelligent message discard decisions based on Neural Networks to result in fewer false alarms. The contributions of this paper include the following: Generic architecture of DDoS attack detection and response system for collaborative environment. Implementation of RBPBoost algorithm for the classication of network trafc. Classication error cost Minimization by Neyman Pearson Structural Risk Minimization. A classication accuracy of 97.2% when training and testing on the concker dataset. 98.5% when training and testing on the lab dataset. 99.4% when training and testing on the 1999 and 2000 DARPA Intrusion dataset. The rest of the paper is organized as follows: DDoS attack characteristics are given in Section 2.1. Existing feature extraction methods are given in Section 2.2. Machine learning methods for DDoS attack detection are given in Section 2.3. Soft Computing techniques for intrusion detection are given in Section 2.4. Existing traceback mechanisms are explained in Section 2.5. Existing ensemble of classier methods are given in Section 2.6. Proposed DDoS attack detection and response system are elucidated in Section 3. Simulation experiment details and the results are given in Section 4. Section 5 concludes the paper.
target victim within the specied time period leading to a lot of SYN errors. Further, many P2P le sharing applications use a single source port to connect to a lot of destination IP addresses for sharing les. This looks similar to a host scan activity. Flash crowd refers to more number of clients with less number of requests. But, DDoS attack is generated by less number of clients each generating a huge request trafc rate. Thus, the features such as number of connections from the same host to the specic destination within the specied time window and number of connections having SYN errors within the specied time window could be seen to differentiate the ooding attacks from abrupt changes of legitimate activity. Attacks such as quiet attack [6], low rate TCP Denial of Service (DoS) attack [4], and Reduction of Quality (RoQ) attacks [5,67] are short lived TCP ows. Only less than 2% of the internet trafc are short lived ows. One third of the ows of internet trafc are long lived ows. Shrew [6] and RoQ attacks send high rate UDP trafc periodically. As feature such as a number of UDP echo packets to a specied port is used in our paper, shrew and RoQ attacks can be detected. In quiet attack, because of aggregated attack trafc rate, link capacity is under attack. Our proposed work has not considered about the quiet attacks. But, quiet attack can be detected by monitoring the periodicity of burst rate in the ow. 2.2. Real time feature extraction Features are statistical characteristics derived from the collected dataset. Selection of real time feature set plays a vital role in online trafc classication. More number of features lead to better accuracy. But, computation of more number of features in real time causes more overhead and time consuming. 248 features are given and 1 feature is used to describe the class (normal or attack) in [47]. Computation of all the 248 features [13] took approximately two days on a dedicated System Area Network. Out of 248 features, some features such as maximum interpacket arrival time cannot be calculated until the entire ow is completed. Moreover, features based on Fast Fourier Transform values need better signal processing methods to reduce the computation time. So, less number of appropriate statistical features is to be selected for better pattern classication. Feature extraction [36] is classied into two stages: (i) Feature Construction (ii) Feature Selection. Constructing the features is either integrated into the modeling process or into the preprocessing stage which includes standardization, normalization, etc. Feature Selection is divided into Filter methods and wrapper methods [50]. In lter methods, selection is based on distance and information measures in the feature space. In wrapper methods, selection is based on classier accuracy. Three statistical features are used in [25]. Nine features are used in [35]. Flow based feature selection has been shown to block legitimate trafc in [35]. Flow based selection gives summary of metadata. By blocking the IP address and port, ow based selection does not permit the legitimate requests. Hence, instead of ow based solution, packet based solution has been used in this paper. Packet based solution minimizes the prevention of legal trafc as it blocks only the particular trafc based on the outcome of the analysis of the sequence number, window size, and packet length. Features are selected by classifying the IP ow into micro-ow and macro-ow [26]. Decision tree based Machine Learning (ML) algorithm combined with real time features has been proposed to be a good candidate for online trafc [69]. But, nding the smallest Decision Tree that is consistent with a set of training examples is NP-hard. 2.3. Machine learning methods Machine learning is mostly focused on nding relationships in data and analyzing the process for extracting such relations.
2. Related work 2.1. DDoS attack DDoS attack is broadly classied into bandwidth depletion and resource depletion attack [58]. In bandwidth depletion attack, attackers ood the victim with large trafc that prevents the legitimate trafc and amplify the attack by sending messages to broadcast IP address. In resource depletion attack, attackers attempt to tie up the critical resources (memory and processor) making the victim unable to process the service. A structural approach for DDoS attack classication is proposed in [27]. The detailed analysis on DDoS attacks and available attack tools [10] show that the DDoS attack has the following characteristics: Source and Destination IP address and port numbers of the packets are spoofed and randomly generated. Window size, sequence number, and packet length are xed during the attack. Flags in the TCP and UDP protocols are manipulated. Roundtrip time is measured from the server response. Routing table of a host or gateway is changed. DNS transaction IDs (reply packet) are ooded. HTTP requests are ooded through port 80. Real Challenge lies in distinguishing the ooding attacks from abrupt change in legitimate trafc. For example, Flashget [30] on a host (specic IP address) creates a large number of connections using multiple source ports to a single destination port of a server for downloading a le. This behavior is similar to that of SYN ood. However, the difference between SYN ood and Flashget is that the SYN ood will not complete the TCP 3-way handshakes with the
1330
Machine learning paradigms are classied as Supervised Learning (SL), Unsupervised Learning (UL), and Reinforcement Learning (RL). In SL, the algorithm attempts to learn some function with given input vector and actual output. In UL, the algorithm attempts to learn only with given input vector by identifying relationships among data. In RL, the algorithm learns with a single bit of information which indicates to the neuron whether the output is good or bad. Though many evolutionary algorithms exist, neural network algorithms provide a promising alternative in classifying the DDoS attack patterns based on statistical features [45]. Because of its generalization capability, neural networks are able to work with imprecise and incomplete data. Further, these machine learning techniques can also recognize the patterns not presented during a training phase. Several ML algorithms [25,35,54,59,63,64,66] have been proposed for DDoS attack detection. Most of the ML algorithms applied to DDoS attack detection have not considered minimizing the cost of the errors. These errors lead to more false alarms. The cost associated with false alarm is more expensive than misdetection [23]. Hence, the objective in this paper is to minimize these errors and improve the false alarm rate. In this paper, the existing machine learning algorithms, viz., RBP [45], Support Vector Machine [48], K-Nearest Neighbor [32], Decision Tree (C4.5) [45], and K-Means Clustering [28] have been simulated (Section 4) and the RBP algorithm is found to perform better. So, the RBP algorithm is chosen as Base Classier in this paper, for further improvement in performance. 2.4. Soft computing methods Recent network intrusion detection methods are based on soft computing techniques such as Neural Networks, Genetic algorithm, Fuzzy Logic, and hybrid approaches. Enhanced Swarm Intelligence Clustering (ESIC) method to choose the center of the radial basis function (RBF) in an RBF neural network (RBFNN) is proposed in [29]. In [41], a genetic clustering algorithm for intrusion detection is proposed and from the results, the optimal number of clusters and high performance rates are obtained. In [60], a method of incremental mining is proposed so that fuzzy association rules can be implemented in a real-time network IDS. Creation of fuzzy sets from the input network packet data, membership functions for fuzzy variables, and the application of genetic algorithm to identify the best rules are proposed in [24]. In [40], the Hidden Markov Model (HMM) is improved as fuzzy HMM (FHMM) where fuzzy similarity measures replaced probabilistic measures. The main drawback of soft computing methods is lack of interpretability [45]. Hybrid methods have been proposed to overcome the drawbacks of single methods. Of late, the research on intrusion detection is also towards using wavelets. Wavelet analysis is used for characterizing self-similar behavior, over a wide range of time scales. Wavelet neural network is based on the wavelet transform theory and the articial neural network. In [37], intrusions are detected using wavelet neural networks and their experimental results show that their proposed method is feasible and effective when tested with KDD Cup dataset. Selection of features from packet headers, comparing two rule sets, one mined online and the other mined from training data, rendering a decision every two seconds on large-scale DoS attack types are some of the tasks carried out in [37]. 2.5. Existing traceback mechanisms Once the target system is attacked, the detection mechanisms are necessary to detect an attack with less false positive rate and more accuracy rate. To foil the DDoS attacks, countermeasures such as detection mechanisms and traceback mechanisms can be deployed. Among these mechanisms, response system needs to
identify the location of the source in order to prevent further attacks. The security expert spends time in tracing the real source address and at times it demands more time as the perpetrator spoofs the source IP address. Traceback mechanisms [13,7, 8,15,21,22,34,51,65,68,70,72] have been proposed to trace the real source of the attackers to stop the attack at the point nearest to its source in order to reduce waste of network resources and to nd the identity of the attackers in order to take other legal actions against them. A detailed comparison of the existing traceback mechanisms with respect to their working principle, advantages, and drawbacks is given in [10]. 2.6. Ensemble of classiers motivation Single classier makes error on different training samples. So, by creating an ensemble of classiers and combining their outputs, the total error can be reduced and the detection accuracy can be increased. There are two main components in all ensemble systems [53], viz., a strategy to build an ensemble that is as diverse as possible and the combination of outputs of classier for the accurate classication decisions. Decision boundaries of each classier have to be uniquely different from others. To achieve this diversity, ensemble of classiers can be constructed by manipulating training data, feature sets, and injecting randomness. For the construction of the ensemble, the entire dataset is divided into subsets and each classier is trained with each subset. In order to construct the ensemble by manipulating input feature sets, it is divided into smaller feature subsets and each classier is trained with the same dataset. Another method to construct the ensemble is by randomly initializing the parameters such as weights, etc., and training with different parameter values at different times. As the number of features selected for training in this paper is less, the ensemble construction by feature set is not suitable [50]. The advantage of constructing an ensemble by manipulating training data is that the generated hypothesis performs fairly well even when there are only small changes in trafc data. So, ensemble construction by manipulating training data was chosen, as it would correctly detect the deviations, if any, however small it be. Classier combination is divided into two categories: Classier selection, where each classier is trained to become an expert in some local area of the total feature space. Classier fusion, where all classiers are trained over the same feature space. Classier outputs can be combined by methods such as Majority Voting [53], Weighted Majority Voting (WMV) [53], etc. In this paper, popular ensemble methods such as Bagging [16], Boosting [56], and AdaBoost [31] are compared with our proposal, RBPBoost. Our algorithm differs from existing algorithms in two ways, viz., achieving diversity of the classiers and combining the classier outputs through WMV and Weighted Product Rule (WPR) [53]. 3. Proposed system Eight Institutes/Universities (sites) situated in different geographical locations are working collaboratively on a Smart and Secure Environment (SSE) project as shown in Fig. 1. Our Institute (Site 4) is one among the sites connected through 2 Mbps Multi Protocol Label Switching (MPLS) Virtual Private Network (VPN) cloud. Each site maintains web, mail, DNS, and proxy servers. Unied Network Threat Management System (UNTMS) has been conceived by us, designed, and deployed in each site to monitor the real time trafc and to lter the malicious trafc. UNTMS
1331
Fig. 1. SSE environment.
consists of many modules such as Distributed Denial of Service (DDoS) attack detection module, Domain Name Server (DNS) Cache Poisoning detection module, Address Resolution Protocol (ARP) cache Poisoning detection module, Host Intrusion Detection System (HIDS) module, Network Intrusion Detection System (NIDS) module, and Anonymous Communication module, conceptualized by our subgroup of researchers. Our Distributed Denial of Service (DDoS) attack detection module analyzes the network trafc and classies whether it is normal or malicious. Malicious trafc is identied and attack signatures are generated. The attack signature patterns are communicated to the peers (UNTMS) in the collaborative environment to enable the lters in UNTMS to drop the packets from the suspect (s), leading to the protection of critical services. Further, the proposed system is scalable because for every inclusion of additional collaborating network, a deployment of UNTMS would sufce. Instead of building one centralized Intrusion Detection and Response System among the eight sites, co-operating autonomous agents can actively defend and maintain the integrity and trustworthiness of the system. The proposed architecture is shown in Fig. 2. The proposed system consists of the following four stages: Data Collection (A). Preprocessing (B). Classication (C). Response (D).
by a set of features. Each instance is expressed in vector space model. 3.2. Preprocessing Preprocessing refers to the process of extracting information about packet connections from data and construction of new statistical features. The preprocessing steps are explained as follows: Let x be the input vector of dimension n, such that x = [x1 , x2 , x3 , . . . xn ]. The variables xi of the input vector are the original features. Let t x be a vector of transformed features of dimension t n . The statistical real time features of the packet for extraction are shown in Table 1. These features are used to nd the statistical properties such as standard deviation and variance. These features quantify the behavioral characteristics of a connection in terms of number, type of various data items with respect to time. Hence, theses features are called as statistical real time features. Seven features are used as the gradients of the vector to classify the network pattern. Normalization is a process of ensuring that each attribute value in a database structure is suitable for further querying and free from certain undesirable characteristics. Hence, each variable is normalized in the range [1, 1] to eliminate the effect of scale difference. These values are used as inputs for machine learning algorithms. Our objective is to nd out the dissimilarities, if any, between the patterns. This dissimilarity could be detected by a distance measure known as Euclidean distance [57]. Euclidean distance is dened as the sum of squares between the two clusters added up over all the variables. The Euclidean distance is
3.1. Data collection A receiver process running in promiscuous mode captures all incoming packets and stores in data storage server. The data is stored as set of trafc ows, with each instance being described
1332
Incoming Network Traffic Drop Packet Filtering Module (FM)
3.3. Proposed classication algorithm (RBPBoost) Block schematic of the RBPBoost is shown in Fig. 3. Dataset of particular class is split into subsets. Each subset is trained with ensemble of classiers and results are combined by WMV [53]. T K is the total number of classiers chosen using cross-validation. Cross-validation is a popular method of manipulating training data to subdivide the training data into k disjoint subsets and to reconstruct training sets by leaving out some of the subsets. Results of each classication system are further combined by WPR [53]. The efciency of classication of the classier is signicant in the decision making process. Hence, it is measured by a parameter Q-statistic using (3). For effective decision, the Q-statistic should be zero. The training time depends on the number of times the classier needs training which in turn depends on the mean square error between iterations reaching global minimum. The training is speeded up by removing the overlapping data and retaining only the training samples adjacent to the decision boundary.
A
Feature Extraction
Online Database
Offline Database
Normalization
Training Classifier
Q i;j ad sc=ad sc
C
Attack
Testing
Drop Packet
Normal
Attack Signature
Update FM
MPLS VPN Cloud
D
Fig. 2. Architecture for DDoS attack detection and response system.
Q-statistic between two classiers (i and j) is calculated using (3). a is the number of samples that are correctly classied by i and j. s is the number of samples that are correctly classied by i and incorrectly classied by j. c is the number of samples that are incorrectly classied by i and correctly classied by j. d is the number of samples that are incorrectly classied by i and j. A value of Q = Zero or Positive is preferable, since it signies the best classication by both the classiers. A value of Q = Negative indicates the misclassication which in turn will distort the decision making process, and hence it is not preferable. 3.3.1. Training An ensemble of classiers is trained for each individual data subset and the results are combined. A new classier is added at each iteration. In our algorithm as given in Fig. 4, two classes (Normal and DDoS attack trafc) are considered. The inputs to the algorithm are as follows: Training data comprised of n instances with correct output labels. Resilient Back Propagation algorithm (RBP) as supervised base classier. Number of classiers (T K . From the experiments conducted, RBP emerged out as the best choice for deploying it as a learning algorithm for the base classier due to its higher detection accuracy. The samples from each class are split into Data subsets. Samples from data subset are taken randomly with replacement and given as an input to the base classier (RBP). The error et of classier is computed using (4). The error et of classier ht is weighted by the distribution, such that et is the sum of distribution weights of the instances misclassied by ht . If the obtained error is more than the false alarm threshold, then the generated hypothesis is dropped and starts from Step 2.a as shown in Fig. 4. Otherwise, the classier is added to the Ensemble. Each classier is assigned a weight as the logarithm of the reciprocal of its normalized error. Normalized error is calculated using (5). Classier with small error has higher voting weight. Results of classier are combined through WMV. The results of WMV are further combined through WPR. WPR has been analyzed theoretically in [39] and shown to be more effective in combining strong classiers with less sensitive to the compounding effects of condence errors upon combination, which justies the usage of WPR in our proposed RBPBoost.
Table 1 List of features. S. No. 1 2 3 4 5 6 7 Feature description Number of UDP echo packets to a specied port Number of connections to the same host during specied time window Number of ICMP echo reply packets from the same source Number of connections that have SYN errors using the same service during specied time window Number of connections having same window size, sequence number, and packet length Number of packets in the URG ag set in TCP header Type of service
calculated as follows: The attributes are scaled to the range [1, 1] using (1), where i (t) denotes the value of the feature, min (i) denotes the minimum value, and max (i) denotes the maximum value.
inorm t
2 it mini 1 maxi mini
The data thus available for the classier are real numbers between 1 and 1. Further, the dissimilarity measure (d (x, y)) between two vectors x and y is calculated using (2).
v u N uX xi y 2 i d~ ~ t x; y 2
i1
ri
where
ri is the standard deviation of the xi .
1333
Data Source (Class1)
Data Source (Class2)
Data Subset 1
RBP_T1
RBP_TK
Data Subset n
RBP_T1
Data Subset 1
RBP_TK RBP_T1
RBP_TK
Data Subset n
RBP_T1
WMV
WMV
WMV
WMV
RBP_TK
WPR
WPR
Neyman Pearson Approach (Cost Minimization) Classification Decision

Fig. 3. Block schematic diagram of RBPBoost.
where different types of error have different consequences. Such applications include fraud detection, spam ltering, machine monitoring, target recognition, and disease diagnosis [19]. In most applications, class probabilities differ for training and testing data. But NeP does not assume prior knowledge of class probabilities. It automatically balances model complexity and training error. It is used to increase the detection accuracy rate with known maximum false alarm threshold l where l is in the range [0, 1]. The objective of using Neyman Pearson approach in this paper is to reduce the classication error costs. The cost minimization in NeP has the following steps: Calculation of total number of samples belonging to each class. Calculation of misclassied samples belonging to each class. Calculation of the optimum threshold and display of nal classication decision (Attack/ No Attack). 3.3.3.1. Calculation oftotal number of samples for each class. The total number of samples from class j is calculated using (7).
nj
n X i1
IfY i jg
Fig. 4. RBPBoost classication algorithm.
3.3.2. Testing When a test instance is given as an input to RBPBoost algorithm, the votes for the instance corresponding to each k data subset for a particular class are computed using WMV scheme on all T k classiers. These are further combined through WPR to obtain an aggregate vote for this instance. T kc is the number of classiers corresponding to the kth data subset and cth class. As the number of classes is 2, c is equal to 2. The number of classiers through which test instance is run is given in (6). The output decision (attack or normal) of the classiers is not nal. The nal decision is made using Neyman Pearson approach as described in Section 3.3.3.
K L XX k1 c1
Let Z n ={X i , Y i } where i = 1, . . ., n be a collection of n independent and identically distributed samples of Z = (X, Y). A learning algorithm is a mapping function, hn : Z n > H (X, Y), where H (X, Y) is the set of all classiers. hn is a rule for selecting a classier based on training sample. I denotes the indicator function. nj is the total number of samples for class j. 3.3.3.2. Calculation of total number of misclassied samples for each class. To nd the efciency of classier, the total number of misclassied instances was calculated using (8).
Rj h 1=nj
X
iY i j
Ifhi jg
T kc
Rj (h) denotes the false positives corresponding to j = 0, j = 1. nj is the total number of samples for class j. I is the indicator function which outputs either 0 (if the sample is correctly classied as class j) or 1 (if the sample is not classied as class j). The number of misclassied samples are summed and divided by nj , resulting in false alarms. Given a class of classiers H, where H0 = {h 2 H: R0 (h) 6 l}. 3.3.3.3. Finding optimum threshold. A Receiver Operating Characteristic (ROC) curve is plotted against false positive (x-axis) and detection accuracy (y-axis). The line y = x shows random guessing (half
3.3.3. Neyman Pearson approach cost minimization Neyman Pearson (NeP) Structural Risk Minimization (SRM) [19] is an extension to NeP theory where prior knowledge of data distribution is not known. NeP hypothesis is useful in situations
1334
Table 2 Comparison of existing ensemble of classication algorithms with proposed algorithm. Algorithm/features Prior knowledge of data distribution Method used to combine classiers Bagging Not required Majority Voting Boosting Not required Three-way Majority Vote AdaBoost Required Weighted Majority Voting RBPBoost Not required Weighted Majority Voting and Weighted Product Rule. Number of classes Draws randomly some fraction from the Data Subset Yes (Neyman Pearson approach) Large amount of network trafc data is required No indication of overtting
Number of classiers Data subset for Training operation
Classication error (cost) minimization Drawbacks
Given as input Draws randomly some fraction from the Dataset No Works for small subset Simple to implement with good performance
Three Creates informative data subset No Limited to binary classication problems. Sensitive to noise and outliers Most informative dataset provided to each classier
Given as input Draws from the updated data distribution
No Prior knowledge of data distribution needed before generating hypothesis. Frequent retraining needed Capable of handling multiclass and regression problems
Advantages
of the samples are misclassied). If the misclassications are more than 50%, the performance of Intrusion Detection system (IDS) will be poor. So, the default false positive decision threshold is chosen as 0.5. From the list of classiers producing false positives, the classier with the fewer false positives are considered as optimum threshold. Optimum threshold is calculated using (9).
h arg minfRj h : h 2 H0 g
New instance classication decision is based on the new optimum threshold h . h is optimal with respect to classes of randomized tests/classiers. Thus, the classication accuracy is improved by combining the outputs of different classiers. 3.4. Response system Detection system deployed in each site maintains a hash table and updates IP address and port number (attack signature) of the suspicious blacklist nodes. When a site receives the attack signature, it checks if it exists in its hash table. If present, it means that the system is already alerted. If not, attack signature is added to the infected list. The updated attack signature is sent to all collaborating nodes, to prevent any damage that may be caused to the available services. 3.5. Comparison of RBPBoost with the existing algorithms Comparison of RBPBoost classication algorithm with existing ensemble algorithms is shown in Table 2. Prior knowledge of data distribution is required only for Adaboost algorithm. The hypotheses are generated by training a weak classier, using instances drawn from an iteratively updated distribution of the training data. The distribution update ensures that instances misclassied by the previous classier are included in the training data of the next classier. Majority Voting is used to combine the ensemble of classier outputs for Bagging, Adaboost, and RBPBoost. Boosting creates three weak classiers. If rst two classiers agree on the same class, that class is the nal classication decision. If these two classiers disagree, then class chosen by the third classier is the nal decision. The three classiers are combined through a three-way majority vote. Our RBPBoost network is trained using the data source of each class consisting of n number of data subsets. The trafc data from each subset are given as input to the neural classier. Also, the ensemble in our approach is constructed by the training data. During simulation experiments, it has been found that the detection accuracy was less as only 40% of the samples in the KDD cup
dataset were used for training. The detection accuracy was increased to 90% as 70% of the samples were used for training. For DARPA datasets, the detection accuracy was less when only less than 50% of the samples were used for training, and the detection accuracy increased to 90% when 85% of the samples were given as input for training. So, large amount of trafc dataset is needed for training and validation, in order to achieve high detection accuracy in our proposed algorithm as shown in Table 2. As large amount of training dataset is used, overtting is avoided which is an advantage of RBPBoost algorithm. In [71], proactive tests were conducted to identify and isolate the malicious trafc after successful TCP connection establishment. Majority Voting, Weighted Majority Voting, and Borda Count [53] are the three methods to combine the class labels of different classiers. As assigning weights to instances have been considered in our RBPBoost algorithm, weighted majority voting is suitable. The votes are added up across all classiers, and the class with the most votes is chosen as the ensemble decision. Different datasets lead to diverse individual neural network classiers committing uncorrelated errors in the ensemble. Though individual classiers may not take correct decision, voting is the solution to remove wrong decision made by individual classier and to achieve high accuracy. For example, in Fig. 3, each data subset of a particular class has k number of classiers. All k classiers are assigned weights during training. When a new instance is given as input, k classiers of data_subset1 outputs a class name whether it is an attack or normal. The weight corresponding to votes of attack class are summed and similarly the weight corresponding to the votes of normal class are summed. The class with more weights is chosen by WMV. Similarly, the classier outputs of different data subsets of each class are done. There is a possibility for classier weights for different classes to be the same in taking a decision during WMV. Hence, the WPR was used to combine the outputs of WMV of different data subsets of each class to increase the detection accuracy. 4. Simulation results Simulations and the analysis of experimental data were performed with the use of MATLAB Neural Network Toolbox. The existing supervised and unsupervised algorithms were considered. The supervised algorithm attempts to learn some function with given input vector and actual output. In Unsupervised Learning, the algorithm attempts to learn only with given input vector by identifying relationships among data. The unsupervised learning is distinguished from supervised learning, where the inputs to the unsupervised algorithm are unlabeled samples or instances.
1335
Though, it is intuitive that supervised algorithm would perform better than the unsupervised algorithms, we wanted to have a strong scientic basis for the selection of algorithm for base classier. Accordingly, simulation experiments were conducted on RBP, SVM, K-Nearest Neighbor, Decision Tree, and K-Means Clustering. The KDD cup (1999) dataset [38] and DARPA trafc data, though old, have been used to test our algorithm against the existing ones in the literature. Further, we have generated DDoS attack in our lab and tested our RBPBoost algorithm. 4.1. Experiment 1 The algorithm with more detection accuracy and less false alarms is to be chosen as learning algorithm for the classier. Supervised algorithms such as Multi Layer Perceptron-Resilient Back Propagation (MLP-RBP), Support Vector Machines (SVM), K-Nearest Neighbor, and Decision Trees were trained. KDD cup dataset used consisted of 4,898,430 connection records. Removing the duplicate records, the remaining trafc consisted of 812,813 normal connection records and 247,267 DoS connection records. Among these, 300,000 records comprising of both normal and DoS attack trafc were used for training the above listed algorithms. For the simulation experiments conducted, detection accuracy was calculated using (10).
Accuracy
TP TN TP FP TN FN
10
True Positive (TP) = Number of samples correctly predicted as attack class. False Positive (FP) = Number of samples incorrectly predicted as attack class. True Negative (TN) = Number of samples correctly predicted as normal class. False Negative (FN) = Number of samples incorrectly predicted as normal class. A three layer MLP neural network was simulated. The number of neurons in the input layer was 41 (features). Number of neurons in the output layer (sigmoid function) was 2. The optimal number of hidden neurons can be determined using the following rules of thumb in the literature: a. The number of hidden neurons should be two third of the size of the input layer plus the size of the output layer. b. The number of hidden neurons should be between the size of the input layer and the size of the output layer. c. The number of hidden neurons should be less than twice the size of the input layer. In our model, the size of the input layer is 7 and the size of the output layer is 2. According to Rule a, the number of hidden neurons is 7 (approx). According to Rule b, the number of hidden neurons is in between 2 and 7. According to Rule c, the number of hidden neurons is less than 14. From simulation experiments, we found that the number of hidden neurons was 20. So, it does not satisfy Rule a, b and c. Existing techniques such as Bayesian Ying Yang method [64], hybrid optimization algorithm [52], dynamic node creation algorithm [12], statistical procedure [55], etc., were used for determining the optimal number of hidden layer neurons. In Bayesian Ying Yang system, the optimal number of hidden neurons was determined using probabilistic approach with minimized generalization error. But, in our algorithm, Bayesian or probabilistic analysis is not used for classication. So, Ying Yang method is not considered. In hybrid optimization algorithm, the decreasing relationship between the sample approximation error and the number of hidden
units is proven mathematically. In dynamic node creation algorithm, a critical value is chosen arbitrarily and when the training error is less than the critical value, a new node is being created in the hidden layer. This is done iteratively. In our algorithm, there is no prior critical value selection. Also, the capability of a specic architecture is evaluated only after training. Hence, this method of Dynamic node creation algorithm is not chosen in our algorithm. In statistical procedure, the method is executed in two phases, top down phase and bottom up phase. In bottom up phase, the parameters of neural models such as learning rate, etc., are estimated when increasing the number of hidden neurons. In top down phase, a selection among neural model is performed using statistical Fischer test. Our technique is similar to bottom up phase of Statistical Procedure. The parameters of neural models are estimated. But, despite using Statistical Fischer test, the optimal number of hidden neurons is determined using Resilient Back Propagation technique with minimum generalization error. The number of hidden neurons which resulted in the minimum Generalization Error has been chosen as the optimal value in our algorithm. From the experimental studies, it was observed that setting more number of neurons in hidden layer resulted in Overtting and less number of neurons resulted in Undertting. The learning rate parameter (Lr determines how fast the system should adapt to new instances. From the experiments conducted and the results tabulated in [11], it was evident that lower Lr values produced less false positive rate and less detection accuracy. Higher Lr values produced more false positive rates and stable detection accuracy. It was observed that the model with 20 neurons in hidden layer with learning rate = 0.2 produces more detection accuracy with less false alarms. C4.5 nds the normalized information gain for each feature. Objective is to minimize the number of nodes that produces misclassications. From the simulation experiments, the detection accuracy achieved for the best Decision Tree classier was 95.3%. SVM performs a mapping from the input space to higher dimensional feature space through the use of a kernel function. SVM model was trained on the training dataset. In our experiments, the kernel function used for SVM was Radial Basis function [33] which is a feed forward neural network. The clusters used during each simulation were 8, 16, 32, and 64. From the simulation, it was observed from [11] that the model with 20 neurons in hidden layer, 16 clusters, and learning rate = 0.1 performs best in terms of detection accuracy. K-Nearest Neighbor classies a sample based on the majority vote of their nearest neighbors. The neighbors were identied with the representation of samples in multidimensional feature space. As it is binary classication problem, the K values vary in odd numbers ranging 11, 21, 31, and 41. Simulation experiments were conducted and from the results tabulated in [11], it is evident that more detection accuracy was obtained when k = 31. K-Means Clustering partitions N samples in k clusters with nearest mean. Simulations were carried out by varying the number of clusters ranging 2, 4, 8, 16, 32, and 128 and the results were tabulated in [11]. The simulated model that minimized the total squared error distance between each sample and center had 32 clusters. Fig. 5 (in color) shows the output of the simulation in terms of confusion matrices. From the Figure, it is evident that during the training phase, the detection accuracy was 96.7% with 3.3% false positives and during the validation phase, the detection accuracy was 96.2% with 3.8% false positives. Further, it could be seen that during the testing phase, the detection accuracy was 98.1% with 1.9% false positives. It is inferred that the trained network is able to generalize well and detect the new trafc patterns. In overall confusion matrix, the detection accuracy was 96.9% with 3.1% false positives. The outputs
1336
Fig. 7. Confusion matrix for the K-Nearest Neighbor. Fig. 5. Confusion matrix for the Multilayer Perceptron model.
Fig. 8. Confusion matrix for the Decision Tree.
Fig. 6. Confusion matrix for the Support Vector Machines.
of the network are almost perfect (100%) with the high number of correct responses in the green squares and the low numbers of incorrect responses in the red squares. The lower right blue squares illustrate the overall accuracy. Similarly, the output for SVM, K-Nearest Neighbor, Decision Trees, K-Means Clustering in terms of confusion matrices are shown in Figs. 69. Fig. 10 shows the colored lines in each axis representing the ROC curves. The ROC curve is a plot of the true positive rate (sensitivity) versus the false positive rate (1 specicity) as the
threshold is varied. A perfect test would show points in the upper-left corner, with 100% sensitivity and 100% specicity. From Fig. 10, it is observed that the network performs almost perfectly. From the simulation results shown in Table 3, it is evident that Multilayer Perceptron model outperforms the other algorithms with more detection accuracy. 4.2. Experiment 2 From Experiment 1, RBP was chosen as Base Classier for constructing an ensemble. Our proposed classication algorithm with cost minimization strategy is compared with existing ensemble methods [16,56,31] in terms of cost per sample on 1999 & 2000
P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341 Table 3 Simulation results of supervised and unsupervised algorithm. Classier Multi Layer Perceptron (MLP)-Resilient Back Propagation Decision Tree (C4.5) Support Vector Machine (SVM) K-Nearest Neighbor (K-NN) K-Means Clustering True Positive rate 96.9 95.8 89.7 93.5 93.0
1337
False Positive rate 3.1 4.2 10.3 6.5 7.0
Table 4 Simulation results of classication algorithms. For 1999 & 2000 DARPA Dataset. Algorithm Bagging Boosting AdaBoost RBPBoost True Positive rate 92.8 97.3 98.6 99.4 False Positive rate 3.6 3.9 4.0 3.7 Cost per sample 0.288 0.262 0.254 0.228
Fig. 9. Confusion matrix for the K-Means Clustering.
elds were extracted from the connection records using tcptrace [61] and given to the classier. The classier produces the output as either attack or normal. System was designed in such a way that it is capable of learning the normal behavior through number of iterations in order to detect the attack and survive during an attack. 2000 DARPA intrusion detection specic dataset [44] had been used in our simulation during the test phase. It included a DDoS attack ran by an attacker with the use of more sophisticated attack tools. It contained LLDOS1.0 and LLDOS2.0.2 datasets whose attack time was 6 s and 8 s, respectively. The background trafc was the same as was used in the 1999 datasets. It was observed during the process of testing that the training error was less than the testing error. Hence, the variance was at minimum. Further, it was evident that the bias was at minimum. Simulation results were tabulated in Table 4. The results of experiments show that our RBPBoost algorithm is efcient enough for accurate detection of DDoS attacks. To facilitate performance comparison among different ensemble methods, the cost function was used. Cost function is based on the number of samples that are misclassied. It was calculated using (11). k is the parameter for cost difference between false alarm and miss. The value of k was set as 6 for the experiments. If the cost is less, the performance of the detection system will be better.
Cost 1 True positive rate kFalse positive rate
11
Fig. 11 compares the results of our RBPBoost classication algorithm with the Bagging, Boosting, and AdaBoost ensemble methods. It can be seen in our RBPBoost approach, that the cost per
0.35
Fig. 10. ROC curve for Multilayer Perceptron model.
0.3 0.25
0.288 0.262 0.254 0.228
DARPA datasets, Concker dataset, and our own lab dataset. Three experiments were conducted with these three datasets and the results are explained in Sections 4.2.1, 4.2.2 and 4.2.3. 4.2.1. Experiment 2.1 The 1999 DARPA intrusion detection dataset [43] was used only for training the RBPBoost algorithm. Training data included list les that contained elds such as Start time, Duration, Protocol type, Source and Destination IP addresses, Source and Destination ports, and label to identify the class (attack or normal). These elds were used initially for training. Later during testing time, these
0.2 0.15 0.1 0.05 0 Bagging Boosting AdaBoost RBPBoost
Cost Per Sample
Ensemble Algorithms
Fig. 11. Cost per sample for existing ensemble classiers and RBPBoost.
1338
P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341 Table 5 Simulation results of classication algorithms for concker dataset. Algorithm Bagging Boosting AdaBoost RBPBoost True Positive rate 90.6 93.4 96.8 97.2 False Positive rate 4.0 3.2 3.8 3.6 Cost per sample 0.334 0.258 0.260 0.244
sample is less compared to existing ensemble methods. Further, it is evident from Fig. 11, that the cost is minimized. Neyman-Pearson (NeP) classication minimizes the miss rate while ensuring the false alarm rate to be less than a specied threshold. From Table 4, it is seen that the True Positive Rate (TPR) for RBPBoost is the highest with the low False Positive Rate (FPR). Thus from the experimental values, NeP is proven to be better suited for binary classication problems. The cost per instance achieved using our algorithm (0.228) is on par with the dLearning + dCMS [23] (0.224). The reason for the difference could be due to the selection of different subsets of the available data for training and testing and the cost minimization strategy. Moreover, in [23], intrusion detection algorithm is proposed for multiclass problems whose class probabilities are known in advance. But, our proposed intrusion detection algorithm is for binary classication problems where class probabilities are not known a priori. From Table 4, it is seen that the bagging algorithm has less false positive rate than RBPBoost. Increase in sensitivity results in little more false positives. The weights are not assigned to classiers in Bagging algorithm. Even though the performance of any classier is not good, the vote of that classier is considered as a count in majority voting. Hence, there is a possibility for more samples to be classied as normal trafc than attack trafc. But, in RBPBoost, the weights are initialized to the classier. Weightage is given to good performance classiers. Hence, the detection accuracy is more than Bagging algorithm. As the objective of our proposed algorithm is towards increasing the detection accuracy and reducing false alarms, the error difference is negligible. Also, the cost metric of our proposed RBPBoost algorithm is low compared to Bagging algorithm. 4.2.2. Experiment 2.2 The Concker [17] dataset contains data from the UCSD Network Telescope for three days between November 2008 and January 2009. The rst day (21 November 2008) covers the onset of the Concker A infection. On the second day, 21 December 2008, only Concker A was active, and during the third and nal day both Concker A and B were active. The dataset contains 68 compressed pcap les each containing one hour of traces. The total size of the dataset for all the three days is 69 GB. The pcap les only contain packet headers; payload has been removed. Out of the 68 compressed pcap les, 20 compressed pcap les (only attack trafc)
from three days and normal trafc from DARPA were used for training and testing. Fig. 12 (in color) shows the ROC Plot of RBPBoost algorithm in terms of FPR on the horizontal axis and the TPR on the vertical axis. In the ROC plot, each blue marker (bubble) represents one operating point. When moving over the plot, the gray cursor marker (light shaded bubble) follows the closest operating point. When error minimization is attempted on one class, the error on the other at some moment inevitably increases. Optimal solution is found without class overlap. Simulation results are tabulated in Table 5. Till date, no results were published with the concker dataset and hence unable to compare with our simulation result. From the results, it is evident that our proposed algorithm outperforms the existing algorithms using concker dataset, in terms of TPR. However, it is also evident from Table 5 that the FPR of Boosting is less. Boosting algorithm provides the most informative training dataset for each classier. The second classier is trained in such a way that 50% of the samples are correctly classied and 50% of the samples are misclassied. Misclassied instances by rst two classiers are given as input to the third classier. Though the actual output of the sample is an attack, the decision is made by the third classier if the rst two classiers disagree on their votes. So, the attack samples were misclassied and there is less chance of classifying the normal as an attack sample. Even though the performance of the boosting algorithm in terms of false positives is improved, the detection accuracy is less compared to RBPBoost approach. As our objective is towards increasing the detection accuracy and reducing false alarms, the cost metric based on the accuracy and false alarms is low compared to Boosting algorithm. 4.2.3. Experiment 2.3 The inspection of network trafc reveals condential and personal information, highly sensitive data, users network access patterns, etc. These condential and personal information were anonymized and the header contents were removed before making it publicly available. So, the exact real attack trafc data were not made publicly available. This enabled us to create our own datasets. So, normal and attack trafc data were created in our Smart and Secure Environment Project Laboratory. Attacks such as SYN ood, UDP ood (DNS Cache Poisoning), ICMP ood, and HTTP ood were generated in SSE Testbed. In order to evaluate the performance of RBPBoost algorithm, the SSE testbed shown in Fig. 1 was used. The environment that has been used for generating network trafc was similar to Planet Lab. SSE environment is described in Section 3. Target machine was in Site 1. Client machines in all sites were used to generate both normal and attack trafc. In a single client machine, many virtual clients were created with spoofed and random IP addresses to generate both attack and normal trafc. HTTP trafc was generated using HTTP TrafcGen, a HTTP load generation tool. All client machines in the Intranet are connected through a backbone bandwidth of 1 Gbps and to different sites through a 2 Mbps MPLS VPN cloud. As Target (victim) server was available in Site 1, traces were collected in Site 1 in tcpdump format. For example, TCP port (80) trafc was generated by the clients in all sites including target server site. The trafc was collected during
Fig. 12. ROC curve for RBPBoost algorithm (concker dataset).
P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341 Table 6 Simulation results of classication algorithms for SSE dataset. Algorithm Bagging Boosting AdaBoost RBPBoost True Positive rate 93.7 96.9 97.4 98.5 False Positive rate 3.9 3.7 3.6 2.9 Cost per sample 0.297 0.253 0.242 0.189
1339
working hours. There were two types of trafc generated in the network, viz., legitimate and attack trafc. DDoS attack trafc with varying packet lengths were created using tools such as Netwag and TFN2K program. DDoS attack was launched on the Web server present in Site 1. Packets were captured from the network using Wireshark tool, which is based on the Libpcap library and in capture mode, a lter was used to monitor trafc for WWW, TCP SYN, UDP, and IP. Tcpdump was used to extract the features from the captured packets of 1 s time window. Data obtained from feature extraction were used for training the RBPBoost algorithm. Simulation results were tabulated in Table 6. The training samples were input to the RBPBoost algorithm, a few at a time, to make it learn incrementally during training. Our trained model was able to detect the variation of attack trafc patterns during testing. So, the proposed framework is suitable for detection of new attacks. From the simulation results, it is evident that our proposed RBPBoost algorithm outperforms the existing algorithm in terms of TPR. Moreover, it is also evident from Table 6 that the FPR of RBPBoost is less. Fig. 13 depicts the cost per sample for existing Ensemble algorithms and our RBPBoost algorithm. From the gure, it is seen that the cost per sample of RBPBoost algorithm is less compared to other existing Ensemble algorithms. 4.3. Detection of new attacks In our simulation experiments, the dataset is split into training dataset and testing dataset. The testing dataset was served as completely unseen samples to the trained ensemble. Testing dataset contains some attack types which were not included in the training dataset and which poses a challenge to test the ability of RBPBoost algorithm in detecting the new attack types. The performance of the ensemble of neural classiers was evaluated based on the testing dataset. From the simulation experiments, it was evident that the trained ensembles were able to detect new attacks. Our RBPBoost algorithm was trained with DARPA 1999 dataset. DARPA 1999 dataset contains attacks such as back, land, mailmomb, Neptune, nmap, pod, smurf, icmp ood, syslog, and teardrop. From DARPA 1999 dataset, attacks such as smurf and icmp
Fig. 14. SELFGENS: SELF GENerating ENSemble.
0.35 0.297 0.3 0.25 0.2 0.15 0.1 0.05 0 Bagging Boosting AdaBoost RBPBoost 0.189 0.253 0.242
ood were not considered for training. DARPA 2000 dataset contains attacks such as Trojan DDoS worm which is not present in DARPA 1999 dataset and were not considered during training. Hence, these attack trafc (unseen samples) were given as input during testing in order to test the capability of our RBPBoost algorithm. The trained ensemble was tested with DARPA 2000, Concker dataset (unseen samples), and Lab dataset. The simulation results conrm that the detection rate was high and able to detect the new attacks. During learning, the instances that have not been properly learnt by the entire ensemble are assigned more weights. Hence, during next iteration, the instances that were assigned more weights or misclassied in the previous iteration are collected as a new dataset. A new instance of data is learnt incrementally to the trained ensemble as shown in Step 2 of RBPBoost Algorithm. The Learning Ensemble (LE) learns the new data without forgetting the previously acquired knowledge. So, a new instance of data is learnt without discarding the existing classier and retraining a new one. Our SELF GENerating ENSemble (SELFGENS) detection process for new and old attacks in real time is shown in Fig. 14 and the explanation for detection process is as follows: Ensemble of Classiers trained with n attacks is called Trained Ensemble (TE). For real time deployment, the TE model is replicated as Learning Ensemble (LE) and Stable Ensemble (SE). If trafc contains n trained attack as input, it is detected by TE and passed through SE with the output of all the attacks being detected. If the network trafc contained j new attacks other than the trained attacks and the normal trafc, then the trafc other than the trained ones are passed through LE while the trained ones take the path of TE-SE leading to the detection of trained attacks. LE learns using the new attack trafc and updates the SE which in turn updates TE, thus completing the learning of j new attacks. Hence, Ensemble once trained does not require retraining in real time, instead learning on its own every time using the trafc other than trained ones. It is evident from the experiments that the RBPBoost algorithm classies intrusions with more detection accuracy even when prior knowledge of data distribution is not known in advance and the number of instances in each class varies signicantly. Thus, the proposed framework is suitable for realtime scenario for detecting the new attacks. 5. Conclusion Critical services are often badly affected by DDoS attacks, in spite of the conventional deployment of network attack prevention mechanisms such as Firewall and Intrusion Detection Systems. Some intrusion detection systems detect only attacks with known signatures. Predicting the future attacks is impossible. Hence, the system must be trained and tested in such a way that it learns by observing the aberrant patterns associated with the network trafc and classify the incoming trafc as an attack or normal. The training time depends on the number of times the classier needs training which in turn depends on the mean square error
Cost Per Sample
Ensemble Algorithms
Fig. 13. Cost per sample for existing ensemble classiers and RBPBoost.
1340
P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341 [21] Dawn Song, Adrian Perrig, Advanced and authenticated marking scheme for IP traceback, IEEE INFOCOMM, Apr. 2001, pp. 878886. [22] D. Dean, M. Franklin, A. Stubbleeld, An algebraic approach to IP Traceback, Network and Distributed System Security Symposium, Feb. 2001, pp. 312. [23] Devi Parikh, Tsuhan Chen, Data fusion and cost minimization for intrusion detection, IEEE Transactions on Information Forensics and Security 3 (3) (2008) 381389. [24] Y. Dhanalakshmi, I. Ramesh Babu, Intrusion detection using data mining along fuzzy logic and genetic algorithms, International Journal of Computer Science Security 8 (2) (2008) 2732. [25] Dimitris Gavrilis, Evangelos Dermatas, Real time detection of distributed denial-of-service attacks using RBF networks and statistical features, Computer Networks 44 (5) (2005) 235245. [26] Dongqi Wang, Guiran chang, Xiaoshuo Feng, Rui Guo, Research on the detection of Distributed Denial of service attacks based on the characteristics of IP ow, NPC 2008, LNCS 5245, 2008, pp. 8693. [27] C. Douligeris, A. Mitrokotsa, DDoS attacks and defense mechanisms: classication and state-of-the-art, Computer Networks 4 (5) (2004) 643666. [28] R.O. Duda, P.E. Hart, Pattern Classication and Scene Analysis, Wiley, New York, 1973. [29] Y. Feng, Z.-f. Wu, J. Zhong, C.-x. Ye, K.-g. Wu, An enhanced swarm intelligence clustering-based rbf neural network detection classier. In: Fourth International Conference on Intelligent Computing, Springer, Shanghai, China, 2008, pp. 526533. [30] Flashget. Available from: <http://www.ashget.com>. [31] Y. Freund, R.E. Schapire, Decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences 55 (1) (1997) 119139. [32] G. Guo, H. Wang, D. Bell, Y. Bi, K. Greer, Using kNN model for automatic text categorization, Soft Computing A Fusion of Foundations, Methodologies and Applications 10 (5) (2006) 423430. [33] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, Upper Saddle River, NJ, 1994. [34] J. Henry, C. Lee, et al., ICMP Traceback with Cumulative Path, An Efcient Solution for IP Traceback, International Conference on Information and Communications Security, Springer Lecture Notes in Computer Science, Sept. 2003, pp. 124135. [35] Hoai-Vu Nguyen, Yongsun Choi, Proactive detection of DDoS attacks using kNN classier in an Anti-DDoS Framework, International Journal of Computer System Science and Engineering (2008) 247252. [36] Isabelle Guyon, Steve Gunn, Masoud Nikravesh, Lot A. Zadeh, Feature Extraction: Foundation and Applications, Physica-Verlag, Springer, 2006. ch.1. [37] Jianjing Sun1, Han Yang, Jingwen Tian, Fan Wu, Intrusion Detection Method Based on Wavelet Neural Network, IEEE Second International Workshop on Knowledge Discovery and Data Mining, 2009, pp. 851854. [38] KDD data set, 1999. Available from: <http://kdd.ics.uci.edu/databases/ kddcup99/kddcup99.html>. [39] J. Kittler, M. Hatef, R. Duin, J. Matas, On combining classiers, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (3) (1998) 226239. [40] Y. Li, Y. Ge, X. Jing, Z. Bo, A new intrusion detection method based on fuzzy hmm, in: Third IEEE Conference on Industrial Electronics and Applications, Singapore, 2008. [41] C.-C. Lin, M.-S. Wang, Genetic-clustering algorithm for intrusion detection system, International Journal of Information and Computer Security 2 (2) (2008) 218234. [42] Michael E. Locasto, Janak J. Parekh, Sal Stoo, Angelos D. Keromytis, Tal Malkin, Vishal Misra, Collaborative Distributed Intrusion Detection, Columbia University Computer Science Department Technical Report CUCS-012-04, March 2004. [43] MIT Lincoln Lab, 1999 DARPA Intrusion detection scenario specic dataset. Available from: <http://www.ll.mit.edu/mission/communications/ist/corpora/ ideval/data/1999data.html>. [44] MIT Lincoln Lab, 2000 DARPA Intrusion detection scenario specic dataset. Available from: <http://www.ll.mit.edu/mission/communications/ist/corpora/ ideval/data/2000data.html>. [45] T. Mitchell, Machine Learning, McGraw-Hill Education (ISE Editions), 1997. [46] D. Moore et al., Inside the slammer worm, IEEE Security & Privacy 1 (4) (2003) 3339. [47] A.W. Moore, D. Zeuv, Discriminators for use in ow-based classication, Intel Research Tech.Rep.2005. [48] S. Mukkamala, S. Lanaski, A. Sung, Intrusion detection using neural networks and support vector machines, in: Pmc. Ofthe Int Joint Conf on Neural Networks (IJCNN 2002). Honolulu. vol. 2. 2002, pp. 17021707. [49] R.C. Nickerson, A Taxonomy of Collaborative Applications, in Proceedings of the Association for Information Systems (AIS), Indianapolis, IN, 1997. [50] J.S. Park, K.M. Shazzad, D.S. Kim. Toward modeling lightweight intrusion detection through correlation-based hybrid feature selection. Information Security and Cryptology. First SKLOIS Conference, CISC 2005, Beijing, China, 2005, pp. 279289. [51] K. Park, H. Lee, On the Effectiveness of Probabilistic Packet Marking for IP Traceback under Denial of Service Attack, IEEE INFOCOMM, Apr. 2001, pp. 338347. [52] K. Peng, S.G. Shuzhi, C. Wen, An algorithm to determine neural network hidden layer size and weight coefcients, in: 15th IEEE International Symposium on Intelligent Control, Rio Patras, Greece, 2000, pp. 261266.
between iterations reaching global minimum. The training is speeded up by removing the overlapping data and retaining only training samples adjacent to the decision boundary. Also, as the number of input vector is less, the training time is less. Hence, it is evident that RBPBoost algorithm will be suitable for real time environment. In this paper, a generic architecture for automated DDoS attack detection and response system for collaborative environment using machine learning is proposed. The main objective in this paper is to minimize the cost of classication errors of the intrusion detection. RBPBoost algorithm proposed in this paper demonstrates the use of Neyman Pearson approach to minimize the cost of classication errors. From the simulation results, it is found that the RBPBoost classication algorithm with Neyman Pearson hypothesis results in high detection accuracy of 99.4%. The results obtained are on par with the best result reported so far using dLearning + dCMS. Further, it is observed from the results that RBPBoost algorithm outperforms the existing algorithms with maximum gain of 6.6% and minimum gain of 0.8%. Acknowledgements The authors are grateful for the sponsorship of this research work provided by the Government of India, New Delhi, under the Collaborative Directed Basic Research (CDBR) Project. References
[1] Abraham Yaar, Adrian Perrig, Dawn Song, FIT: Fast Internet Traceback, IEEE Infocomm, Mar. 2005. [2] Alex C. Snoeren et al., Hash-Based IP Traceback, ACM Sigcomm, Aug. 2001, pp. 314. [3] Alex C. Snoeren et al., Single-packet IP traceback, IEEE/ACM Transactions on Networking 10 (6) (2002) 721734. [4] Amey Shevtekar, Karunakar Anantharam, Nirwan Ansari, Low rate TCP denialof-service attack detection at edge routers, IEEE Communications Letters 9 (4) (2005) 363365. [5] Amey Shevtekar, Nirwan Ansari, A router-based technique to mitigate reduction of quality (RoQ) attacks, Computer Networks 52 (5) (2008) 957970. [6] Amey Shevtekar, Nirwan Ansari, Is it congestion or a DDoS attack? IEEE Communications Letters 15 (7) (2009) 546548. [7] Andrey Belenky, Nirwan Ansari, On IP traceback, IEEE Communications Magazine 41 (7) (2003) 142153. [8] Andrey Belenky, Nirwan Ansari, On deterministic packet marking, Computer Networks 51 (10) (2007) 26772700. [9] Arbor Networks, Worldwide Infrastructure Security Report, Volume IV, November 2008. [10] P. Arun Raj Kumar, S. Selvakumar, A DDoS threat in Collaborative environment A survey on DDoS attack tools and Traceback mechanisms, in: Proceedings of IEEE International Advance Computing Conference (IACC09), 2009, pp.1275 1280. [11] P. Arun Raj Kumar, S. Selvakumar, Simulation of Machine Leaning algorithms in MATLAB for the selection of number of neurons, number of clusters, learning rate for Resilient Back Propagation, Support Vector Machines, KMeans, and K-Nearest Neighbor, respectively, Technical Report, CDBR-SSE Project, Dept. of CSE, NIT, Trichy, No. Tech Report/CSE/CDBR-SSE/2009/01. [12] T. Ash, Dynamic node creation in backpropagation networks, Connection Science 1 (4) (1989) 365375. [13] T. Auld, A.W. Moore, S.F. Gull, Bayesian neural networks for Internet trafc classication, IEEE Transactions on Neural Networks 8 (1) (2007) 223239. [14] G. Bafoutsou, G. Mentzas, Review and functional classication of collaborative systems, International Journal of Information Management 22 (4) (2002) 281 305. [15] A. Belenky, N. Ansari, Tracing Multiple Attackers with Deterministic Packet Marking (DPM), IEEE Pacic Rim Conference on Communications, Computers and Signal Processing, 2003, pp.4952. [16] L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) 123140. [17] CAIDA UCSD Network Telescope Three Days Of Concker <dates used>,Paul Hick, Emile Aben, Dan Andersen and kcclaffy. Available from: <http:// www.caida.org/data/passive/telescope-3days-concker_dataset.xml>. [18] Y. Chen, K. Hwang, Collaborative change detection of DDoS attacks on community and ISP networks, in: IEEE International Symposium on Collaboration Technologies and Systems (CTS06), Las Vegas, NV, 1517 May, 2006. [19] Clayton Scott, Robert Nowak, A Neyman Pearson Approach to statistical learning, Technical Report TREE 0407. [20] CNN, DDoS attacks on Yahoo, Buy.com, eBay, Amazon, Datek, E Trade, CNN Headline News, 2000.
P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341 [53] R. Polikar, Ensemble based systems in decision making, IEEE Circuits and Systems 6 (2006) 2145. [54] Rasool Jalili, Fatema Imanimehr, Morteza Amini, Hamid Reza shahriari Detection of DDoS attacks using statistical Preprocessor and unsupervised Neural Networks, LNCS, 2005, pp.192203. [55] I. Rivals, L. Personnaz, 2000, A statistical procedure for determining the optimal number of hidden neurons of a neural model, in: Second International Symposium on Neural Computation (NC2000), Berlin, May 2326, 2000. [56] R.E. Schapire, The strength of weak learnability, Machine Learning 5 (2) (1990) 197227. [57] Staniford-Chen et al., GrIDSA graph-based intrusion detection system for large networks, in: The 19th National Information Systems Security Conference, 1998, pp. 361370. [58] Stephen M. Specht, Ruby B. Lee, Distributed Denial of Service: Taxonomies of Attacks, Tools, and Countermeasures, in: Proceedings of 17th International Conference on parallel and Distributed computing Systems, 2004 International Workshop on Security in Parallel and Distributed Systems, September 2004, pp.543550. [59] Stefan Seufert, Darragh O Brein, Machine Learning for Automatic Defense against Distributed Denial of Service Attacks, in: Proceedings of IEEE International Conference (ICC), 2007, pp.12171222. [60] M.-Y. Su, G.-J. Yu, C.-Y. Lin, A real-time network intrusion detection system for large-scale attacks based on an incremental mining approach, Computer Security 75 (2009) 301309. [61] TCPtrace software tool. Available from: <http://www.tcptrace.org>. [62] R. Tolksdorf et al., Working Group Report on Web Infrastructure for Collaborative Applications, in: Fifth International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, wetice, 1996, p. 341.
1341
[63] Xin Xu, Yongqiang Sun, Zunguo Huang, Defending DDoS attacks using Hidden Markov Models and Cooperative Reinforcement Learning, PAISI 2007, LNCS 4430, pp. 196207. [64] L. Xu, Bayesian Ying-Yang System and Theory as A Unied Statistical Learning Approach: (III) Models and Algorithms for Dependence Reduction, Data Dimension Reduction, ICA and Supervised Learning. Lecture Notes in Computer Science: Proc. of International Workshop on Theoretical Aspects of Neural Computation, May 2628, 1997, Hong Kong, Springer-Verlag, pp. 43 60. [65] Yang Xiang, Wanlei Zhou, A Defense System against DDoS Attacks by LargeScale IP Traceback, in: Proceedings of the Third International Conference on Information Technology and Applications (ICITA05), 2005. [66] Yang Xiang, Wanlei Zhou, Mark-Aided Distributed Filtering by using Neural Networks for DDoS defense, IEEE GLOBECOM 2005, pp. 17011705. [67] Yanxiang He, Qiang Cao, Yi Han, Libing Wu, Tao Liu (2009) Reduction of Quality (RoQ) attacks on structured peer-to-peer networks, in: 2009 IEEE International Symposium on Parallel&Distributed Processing. [68] Yuichi Uchiyama, Yuji Waizumi, Nei Kato, Yoshiaki Nemoto, Detecting and tracing DDoS attacks in the trafc analysis using auto regressive model, IEICE Transactions on Information and Systems E87-D (12) (2004) 26352643. [69] Zhao Jing-jing, Huang Xiao-hong, SUN Qiong, M.A. Yan, Real time feature selection in trafc classication, The Journal of China Universities of Posts and Telecommunications (2008) 247252. [70] Zhiqiang Gao, Nirwan Ansari, Tracing cyber attacks from the practical perspective, IEEE Communications Magazine 43 (5) (2005) 123131. [71] Zhiqiang Gao, Nirwan Ansari, Differentiating malicious DDoS attack trafc from normal TCP ows with proactive tests, IEEE Communications Letters 10 (11) (2006) 793795. [72] Zhiqiang Gao, Nirwan Ansari, A practical and robust inter-domain marking scheme for IP traceback, Computer Networks 51 (3) (2007) 732750.

1 s2.0 S0140366411000600 Main

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

1 s2.0 S0140366411000600 Main

Загружено:

Авторское право:

Доступные форматы

Computer Communications 34 (2011) 13281341

Contents lists available at ScienceDirect

Distributed denial of service attack detection using an ensemble of neural classier

P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341

P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341

P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341

Fig. 1. SSE environment.

P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341

Incoming Network Traffic Drop Packet Filtering Module (FM)

MPLS VPN Cloud

2 it mini 1 maxi mini

ri is the standard deviation of the xi .

P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341

Data Source (Class1)

Data Source (Class2)

Neyman Pearson Approach (Cost Minimization) Classification Decision

Fig. 4. RBPBoost classication algorithm.

P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341

Number of classiers Data subset for Training operation

Classication error (cost) minimization Drawbacks

Given as input Draws from the updated data distribution

P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341

P.A. Raj Kumar, S. Selvakumar / Computer Communications 34 (2011) 13281341

Fig. 8. Confusion matrix for the Decision Tree.

Fig. 6. Confusion matrix for the Support Vector Machines.

False Positive rate 3.1 4.2 10.3 6.5 7.0

Fig. 9. Confusion matrix for the K-Means Clustering.

Cost 1 True positive rate kFalse positive rate

0.288 0.262 0.254 0.228

0.2 0.15 0.1 0.05 0 Bagging Boosting AdaBoost RBPBoost

Cost Per Sample

Fig. 12. ROC curve for RBPBoost algorithm (concker dataset).

Fig. 14. SELFGENS: SELF GENerating ENSemble.

Cost Per Sample

Вам также может понравиться