Академический Документы
Профессиональный Документы
Культура Документы
Assistant Professor, Dept. of. ITSC, Addis Ababa Institute of Technology, AAU, Addis, Ethiopia.
Assistant Professor, Dept. of. ITSC, Addis Ababa Institute of Technology, AAU, Addis, Ethiopia.
3
HOD of ITSC, Addis Ababa Institute of Technology, AAU, Addis, Ethiopia.
Abstract: Enterprise network information system is not only the platform for information sharing and
information exchanging, but also the platform for enterprise production automation system and enterprise
management system working together. As a result, the security defense of enterprise network information
system does not only include information system network security and data security, but also include the
security of network business running on information system network, which is the confidentiality, integrity,
continuity and real-time of network business. Network security technology has become crucial in protecting
government and industry computing infrastructure. Modern intrusion detection applications face complex
requirements they need to be reliable, extensible, easy to manage, and have low maintenance cost. In recent
years, data mining-based intrusion detection systems (IDSs) have demonstrated high accuracy, good
generalization to novel types of intrusion, and robust behavior in a changing environment. Still, significant
challenges exist in the design and implementation of production quality IDSs. Incrementing components such as
data transformations, model deployment, and cooperative distributed detection remain a labor intensive and
complex engineering endeavor. This paper describes DAID, a database-centric architecture that leverages data
mining within the Relational RDBMS to address these challenges. DAID also offers numerous advantages in
terms of scheduling capabilities, alert infrastructure, data analysis tools, security, scalability, and reliability.
DAID is illustrated with an Intrusion Detection Center application prototype that leverages existing functionality
in Relational Database 10g. Intrusion detection system work at many levels in the network fabric and are taking
the concept of security to a whole new sphere by incorporating intelligence as a tool to protect networks against
un-authorized intrusions and newer forms of attack. We have described formal model for the construction of
network security situation measurement based on d-s evidence theory, frequent mode, and sequence model
extracted from the data on network security situation based on the knowledge found method and convert the
pattern on the related rules of the network security situation, and automatic generation of network security
situation.
Keywords: Intrusion Detection System (IDS), Data Mining, Network Security, DAID, Network Security
Situation (NSS), RDBMS.
I. Introduction
An intrusion detection system (IDS) is a device or software application that monitors network and/or
system activities for malicious activities or policy violations and produces reports to a management station. The
purpose of IDS is to detect and prevent electronic threat to computer systems. The extensive use of the
computers and availability of the Internet increase the impact of problem in size. In todays world everyone is
connected over networks and many services are provided over the internet. Intrusion detection is an area
growing in relevance as more and more sensitive data are stored and processed in networked systems. An
intrusion detection system (IDS) monitors networked devices and looks for anomalous or malicious behavior in
the patterns of activity in the audit stream. A comprehensive IDS requires a significant amount of human
expertise and time for development. Data mining-based IDSs require less expert knowledge yet provide good
performance. These systems are also capable of generalizing to new and unknown attacks. Data mining-based
intrusion detection systems can be classified according to their detection strategy. There are two main strategies
misuse detection and anomaly detection. Misuse detection attempts to match observed activity to known
intrusion patterns. This is typically a classification problem. Anomaly detection attempts to identify behavior
that does not conform to normal behavior. This approach has a better chance of detecting novel attacks. IDSs
can also be distinguished on the basis of the audit data source (e.g., network-based, host-based). Successful
detection of different types of attacks typically requires a variety of audit data sources.
www.ijres.org
20 | Page
The Practical Data Mining Model for Efficient IDS through Relational Databases
II. Basic approaches for intrusion detection system
Approaches for Intrusion detection systems can be broadly classified as: Signature based, Classification
based and Anomaly based.
2.1 Signature based (misuse detection) approach
Most of the commercial IDSs are misuse detection systems which are designed to detect only known
attacks. This approach uses a database of known attack signatures which is developed by experts and intrusion
analyst. The traffic over the network or sequence of processes within the computer is compared to the entries in
this database. If there is a match with database entries, the IDS system generates an alert message. Even though
such a system does not generate false positives alerts, these systems cannot identify new and novel attacks (Hu
Zhengbing et al., 2008; Ding et al., 2009).
There are two advantages of misuse detection approach:
It is very effective for detecting the attacks without generating an overwhelming number of false alarms and
it can quickly and reliably diagnose the use of a specific attack tool. On the other hand, the disadvantages of
misuse detection approach are: It can only detect those attacks that have been described in the database and the
database must be constantly updated with signatures of new attacks.
2.2 Classification-based intrusion detection approach
This approach uses normal and abnormal data sets of user behavior, and uses data mining techniques to
train the IDS system. This creates more accurate classification models for IDS as compared to signature-based
approaches and thus they are more powerful in detecting known attacks and their variants. Disadvantage of
classification-based intrusion detection approach: it is still not capable of detecting unknown attacks. This
approach uses normal and abnormal data sets of user behavior, and uses data mining techniques to train the IDS
system. This creates more accurate classification models for IDS as compared to signature-based approaches
and thus they are more powerful in detecting known attacks and their variants. Disadvantage of classificationbased intrusion detection approach: it is still not capable of detecting unknown attacks.
2.3 Anomaly intrusion detection approach
The basic assumption of anomaly detection approach is that attacks are different from normal activity and
thus they can be detected by IDS systems that identify these differences. Thus this approach begins with
definition of desired form or behavior of the system and then distinguishes between that desired behavior and
undesired or anomalous behavior. The main problem is, defining the boundary between acceptable and
anomalous behavior. So, the anomaly detector approach must be able to distinguish between the anomaly and
normal. There are 2 types of anomaly detectors: 1.Static anomaly detectors: It is based on the assumptions that
there is a portion of the system being monitored that should remain constant and 2. Dynamic anomaly detectors:
To characterize normal and acceptable behavior a base profile is created by a dynamic anomaly intrusion
system. Building the sufficiently accurate base profile is the main difficulty with the dynamic anomaly detection
system. The advantage of anomaly intrusion detection approach is: It is possible to detect unknown attacks. The
disadvantage of anomaly intrusion detection approach is: Produces a large number of false alarms due to the
unpredictable behaviors of users and networks. Therefore, large and accurate training data set is the major
requirement of anomaly detection approaches to define the normal behavior patterns. Building IDS is a complex
task of knowledge engineering that requires an elaborate infrastructure. An effective contemporary productionquality IDS need an array of diverse components and features, including:
Centralized view of the data
Data transformation capabilities
Analytic and data mining methods
Flexible detector deployment, including scheduling that enables periodic model creation and distribution
Real-time detection and alert infrastructure
Reporting capabilities
Distributed processing
High system availability
Scalability with system load
Recent proposals have highlighted the need for an architecture and framework specification for IDSs. While
these proposals provide a good foundation, they are somewhat general in nature and focus either on a
methodology specification that alleviates the knowledge engineering effort or on component interaction
specification (e.g., XML metadata). The details of the infrastructure required to support these feature-rich
www.ijres.org
21 | Page
The Practical Data Mining Model for Efficient IDS through Relational Databases
complex frameworks are not provided, instead such details are proposed to be handled by the system engineers
responsible for the implementation. This paper demonstrates that the Relational Database, with its capabilities
for supporting mission critical applications, distributed processing, and integration of analytics, can be an
appropriate platform for an IDS implementation. Given the data-centric nature of the intrusion detection
process, leveraging existing RDBMS infrastructure can be both efficient and effective. The current paper
presents DAID (Database-centric Architecture for Intrusion Detection) for the Relational Database. This
RDBMS-centric framework can be used to build, manage, deploy, score, and analyze data mining-based
intrusion detection models. The described approach is adopted in the Intrusion Detection Center (IDC) prototype
an application implemented using the capabilities of Relational Database. The paper is organized as follows.
Section 2 basic approaches to IDS. In section 3 describes proposed c architecture and the individual
components are described and illustrated with references to the IDC prototype and functionality available in the
Relational Database. Section 4 presents the conclusions and directions for future work.
www.ijres.org
22 | Page
The Practical Data Mining Model for Efficient IDS through Relational Databases
www.ijres.org
23 | Page
The Practical Data Mining Model for Efficient IDS through Relational Databases
1. Add pattern to possible attack signature
3. Stop
Known_attack_detection_algorithm:
Input: Network traffic feature, attack signature database
Output: Traffic classification (Norma/Attack)
Steps:
1. For each signature in known signature set
a. If(Traffic feature matches with signature)
i. Forward corresponding connection to intrusion prevention module
ii. Mark corresponding entry in feature data warehouse for attack
b. Else
i. Forward network traffic feature to possible attack signature detector
Possible_attack_detection_algorithm:
Input: Network traffic feature, possible attack signature database
Output: Traffic classification (Norma/Attack)
Steps:
1. For each signature in possible signature set
a. If(Traffic feature matches with signature)
i. Forward corresponding connection to honeypot module to detect intrusion
2. If (Result from honeypot is positive)
a. Remove corresponding signature entry from possible attack signature database
b. Add removed signature to known attack signature database
Else
c. Remove corresponding signature entry from possible attack signature database
3. Mark corresponding network traffic feature entry in feature data warehouse for attack.
Fig. 2. Model graph for SVMs
www.ijres.org
24 | Page
The Practical Data Mining Model for Efficient IDS through Relational Databases
A number of data mining techniques have been found useful in the context of misuse and anomaly
detection. Among the most popular techniques are association rules, clustering, support vector machines (SVM),
and decision trees. The Oracle 10g version of the Relational database offers robust and effective
implementations of data mining techniques that are fully integrated with core database functionality. The
incorporation of data mining eliminates the necessity of data export outside the database thus enhancing data
security. The model representation is native to the database and no special treatment is required to ensure
interoperability. In order to programmatically operationalize the model generation process, data mining
capabilities can be accessed via APIs (e.g., JDM standard Java API [9], PL/SQL data mining API ). Specialized
GUIs as entry points can be also easily developed, building upon the available API infrastructure (e.g.,
Relational Data Miner). Such GUI tools enable ad hoc data exploration and initial model investigation. The IDC
prototype leverages the dbms_data_mining PL/SQL package to instrument the model build functionality. We
train linear SVM anomaly and misuse detection models. SVM models have been shown to perform very well in
intrusion detection tasks. The intrusion detection dataset includes examples of normal behavior and four highlevel groups of attacks - probing, denial of service (dos), unauthorized access to local super user/root (u2r), and
unauthorized access from a remote machine (r2l). These four groups summarize 22 subclasses of attacks. The
test dataset includes 37 subclasses of attacks under the same four generic categories. We use the test data to
simulate the performance of the IDC prototype when operating in detection mode. For the misuse detection
problem, the SVM model was used to classify the network activity as normal or as belonging to one of the four
types of attack. The misuse classification results are summarized in Table 1. The overall accuracy of the system
was 92.1%. Applying the cost matrix from the KDD99 competition, the model had a misclassification cost of
0.248. These results are competitive with KDD99 published results . The poorer performance on the two rare
classes (u2r and especially r2l) is a common problem for this dataset and is attributed to differences in data
distribution between training and test data. For the anomaly detection problem, a one-class SVM model is used
to identify the network activity as normal or anomalous. Since anomaly detection does not rely on instances of
previous attacks, the one-class model was built on the subset of normal cases in the DARPA dataset. On the test
dataset, the model had excellent discrimination with an ROC area of 0.989. Sliding the probability decision
threshold allows trading-off the rate of true positives and false alarms for example, in this model a true
positive rate of 96% corresponded to a false alarm rate of 5%.
Table 2. Confusion Matrix on DARPA Intrusion Detection Dataset (KDD99 Competition)
actual\pred
Normal
Probe
Dos
U2r
R21
normal
59332
602
7393
178
14683
probe
1048
3251
88
1
41
dos
45
212
222288
8
7
u2r
57
62
75
33
31
r2l
111
39
9
8
1427
Relational implementation of SVM is highly scalable [16]. Figure 2 depicts the build scalability of a linear
SVM misuse model with increasing number of records. The datasets of smaller size represent random samples
of the original intrusion detection data. Tests were run on a machine with the following hardware and software
specifications: single 3GHz i86 processor, 2GB RAM memory, and Red Hat enterprise Linux OS 3.0.
www.ijres.org
25 | Page
The Practical Data Mining Model for Efficient IDS through Relational Databases
The fast training times allow for frequent model rebuilds. In the IDC prototype, we schedule periodic model
updates as new data is accumulated. A model rebuild is also triggered when the performance accuracy falls
below a predefined level.
3.5. Model distribution
In DAID, model distribution is greatly simplified since models are not only stored in the database but are
also executed in it as well. Models are periodically updated by scheduling automatic builds. The newly
generated models are then automatically deployed to multiple database instances. Noel et al. [19] point out a
recent trend towards distributed intrusion detection systems where a single IDS monitors a number of network
nodes and the monitored information is transferred to a centralized site. Examples of distributed architectures
can be found in [26, 24, 23, 3]. IDSs implemented using a framework based on the Relational RDBMS can
transparently leverage Relational grid computing infrastructure Real Application Clusters (RAC). Grids make
possible pooling of available servers, storage, and networks into a flexible on-demand computing resource
capable of achieving scalability and high availability [14, 4]. RAC allows a single Relational database to be
accessed by concurrent database instances running across a group of independent servers (nodes). The RAC
nodes share a single view of the distributed cache memory for the entire database. IDS built on top of RAC can
successfully leverage server load balancing (distribution of workload across nodes) and client load balancing
(distribution of new connections among nodes). Transparent application and connection failover mechanisms
are also available, thus ensuring uninterruptible system uptime. Figure 3 illustrates the architecture of an IDS
running on an Relational RAC system. Audit data are collected from Local Area Networks (LANs). The audit
data stream is monitored in one of the available database instances running on the Relational RAC nodes. In
addition to localized monitoring, the shared storage system allows pooling together of information from
different sources and enables detection of system-wide attack patterns. A grid-enabled RDBMS-based IDS
needs to be seamlessly integrated with a scheduling infrastructure. Such an infrastructure enables scheduling,
management, and monitoring of model build and deployment jobs. Although the IDC prototype has not yet been
deployed on a RAC system, we use Relational Scheduler a scheduling system that meets the above
requirements for management of the model build and deployment jobs.
3.6. Detection
Intrusion detection can be performed either real time or offline. An IDS typically handles large volumes of
streaming audit data. Real-time detection and alarm generation capabilities are critical for the instrumentation of
an effective system. ADAM [1] and MAIDS [2] are examples of data mining IDSs addressing issues in realtime detection. In the context of DAID, an effective real-time detection mechanism can be implemented by
leveraging the parallelism and scalability of the Relational database. This removes the need for a system
developer to design and implement such infrastructure. Figure 1 shows detection as a separate entity. However,
inside the database, detection can be tightly integrated, through SQL, within the ETL process itself (indicated by
the dashed box in Figure 1). The following example demonstrates how this integration is achieved in the IDC
prototype. We make use of Relational Oracle10g Release 2 PREDICTION SQL operator. The audit data are
classified as attack or not by the misuse detection SVM model. The model scoring is part of a database INSERT
statement:
INSERT INTO attack_predictions (id, prediction) VALUES(10001,
PREDICTION( misuse_model USING 'tcp' AS protocol_type,'ftp' AS service, ... 'SF' AS flag, 27 AS
duration));
In addition to real-time detection, it is useful to perform offline scoring of stored audit data. This provides
an assessment of model performance, characterizes the type and volume of malicious activity, and assists in the
discovery of unusual patterns. Having detection cast as an SQL operator allows powerful database features to be
leveraged. In our prototype, we create a functional index on the probability of a case being an attack. The
following SQL code snippet shows the index creation statement:
CREATE INDEX attack_prob_idx ON audit_data
(PREDICTION_PROBABILITY( anomaly_model, 0 USING *));
www.ijres.org
26 | Page
The Practical Data Mining Model for Efficient IDS through Relational Databases
www.ijres.org
27 | Page
The Practical Data Mining Model for Efficient IDS through Relational Databases
www.ijres.org
28 | Page
The Practical Data Mining Model for Efficient IDS through Relational Databases
www.ijres.org
29 | Page
The Practical Data Mining Model for Efficient IDS through Relational Databases
As illustrated above, Relational Database 10g Release 2 already incorporates these key functionality elements.
By leveraging the existing technology stack, a full-fledged IDS can be developed in a reasonably short timeframe and at low development cost. The similarities between real-time intrusion detection and real-time fraud
detection (e.g., credit-card fraud, e-commerce fraud) suggest that DAID could be also applicable to these types
of applications. This will be investigated in a future work. We also plan to extend the IDC prototype to take
advantage of RAC and sensor data preprocessing at the ETL stage.
Reference
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
Mukkamala, S., Janoski, G., and Sung, A., Intrusion Detection Using Neural Networks and Support Vector Machines, In Proc. of
IEEE International Joint Conference on Neural Networks, IEEE Press, 2002, pp. 1702-1707.
Hu, W., Liao, Y., and Vemuri, V. R., Robust Support Vector Machines for Anomaly Detection in Computer Security, In Proc.
International Conference on Machine Learning and Applications, 2003, pp. 168-174.
Barbar, D., Couto, J., Jajodia, S., Popyack, L., and Wu,N., ADAM: A Testbed for Exploring the Use of Data Mining in
Intrusion Detection, ACM SIGMOD Record, 30(4), 2001, pp. 15-24.
DataDirect
Technologies,
Using
Oracle
Real
Application
Clusters
(RAC),
http://www.datadirect.com/techzone/odbc/docs/odbc_oracle_rac.pdf, 2004.
DBMS_DATA_MINING PL/SQL package, Oracle10g PL/SQL packages and types reference, Ch. 23, Oracle Corporation, 2003.
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., and Stolfo, S. J., A Geometric Framework for Unsupervised Anomaly Detection:
Detecting Intrusions in Unlabeled Data, In D. Barbar and S. Jajodia (eds.), Applications of Data Mining in Computer Security,
Kluwer Academic Publishers, Boston, MA, 2002, pp. 78-99.
Zhang, Y., Lee, W., and Huang, Y.-A., Intrusion Detection Techniques for Mobile Wireless Networks, Wireless Networks, 9(5),
2003, pp. 545-556.
Youssif Al-Nashif, Aarthi Arun Kumar, Salim Hariri,Guangzhi Qu, Yi Luo and Ferenc Szidarovsky (2008) Multi-Level intrusion
detection system. IEEE, Intl. Conf. on Automonic Computing. pp:131-140.
Sangeetha S, Vaidehi V, Srinivasan N, Rajkumar KV, Pradeep S, Ragavan N, Sri Sai Lokesh C,Subadeepak I and Prashanth V
(2008) Implementation of application layer intrusion detection system using protocol analysis. IEEE-Intl Conf. on Signal
processing, Commun. & Networking .pp:279-284.
Ya-Li Ding, Lei Li and Hong-Qi Luo (2009) A novel signature searching for intrusion detection system using data mining. IEEE
8th Intl. Conf. on Machine Learning & Cybernetics. ISBN: 978-1-4244-3703-0.pp:122-126.
Lu Huijuan, Chen Jianguo and d Wei Wei (2008) Two stratum Bayesian network based anomaly detection model for intrusion
detection system. IEEE, Intl. Symp. on Electronic Commerce & Security.pp:482-487.
Su MY, Chang KC, Wei HF and Lin CY (2008) A real-time network intrusion detection system based on incremental mining
approach. IEEE.pp: 76- 81.
Joong-Hee Leet, Jong-Hyouk Leet, Seon-Gyoung Sohn, Jong-Ho Ryu, and Tai-Myoung Chungt (2008) Effective value of
decision tree with KDD 99 intrusion detection datasets for intrusion detection system.IEEE, ISBN: 978-89-5519-136-3.
G. Silo wash, D. Cappelli, A. Moore, R. Trzeciak, T. Shim all, and L. Flynn, Common sense guide to mitigating insider threats,
4th edition, Software Engineering Institute, Carnegie Mellon University, Tech.Rep. CMU/SEI-2012-TR-012, 2012.
M. Salem, S. Hers kop, and S. Stolfo, A survey of insider attack detection research, Insider Attack and Cyber Security, pp. 69
90, 2008.
http://www.academia.edu/4052668/Computer_and_Network_Security_Threats
http://www.clico.pl/services/Principles_Network_Security_Design.pdf
http://www.interhack.net/pubs/network-security.pdf
http://www.potaroo.net/t4/pdf/security.pdf
http://web.mit.edu/~bdaya/www/Network%20Security.pdf
http://web.mit.edu/~bdaya/www/Network%20Security.pdf
C. Thomas and N. Balakrishnan, Issues and Challenges in Intrusion Detection with Skewed Network Traffic, 2013.
M. O. Pervaiz, M. Cardei, and J. Wu, Routing security in ad hoc wireless networks, in Network Security, Springer, 2010, pp.
117142.
M. Hall, E. Frank, G. Holmes, B. Pfahringer,P. Reutemann, and I. H. Witten, The WEKA data mining software: an update,
ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 1018, 2009.
James Hoagland, The Teredo Protocol Tunneling Past Network Security and Other Security Implications - Google Search,
Symantec Advanced Threat Research, United States, 2006.
www.ijres.org
30 | Page