Вы находитесь на странице: 1из 80

INTEGRATING FUZZY LOGIC WITH

DATA MINING METHODS FOR


INTRUSION DETECTION
By
Jianxiong Luo
A Thesis
Submitted to the Faculty of
Mississippi State University
in Partial Fulfillment of the Requirements
for the Degree of Master of Science
in Computer Science
in the Department of Computer Science
Mississippi State, Mississippi
August 1999
INTEGRATING FUZZY LOGIC WITH
DATA MINING METHODS FOR
INTRUSION DETECTION
By
Jianxiong Luo
Approved:
______________________________ ______________________________
Susan M. Bridges Julia E. Hodges
Associate Professor of Professor of Computer Science
Computer Science (Advisor) and (Committee Member)
Graduate Coordinator of the
Department of Computer Science
______________________________ ______________________________
Rayford B. Vaughn, Jr. A. Wayne Bennett
Associate Professor of Dean of the College of Engineering
Computer Science
(Committee Member)
Name: Jianxiong Luo
Date of Degree: August 6, 1999
Institution: Mississippi State University
Major Field: Computer Science
Major Professor: Dr. Susan Bridges
Title of Study: INTEGRATING FUZZY LOGIC WITH DATA MINING METHODS
FOR INTRUSION DETECTION
Pages in Study: 69
Candidate for Degree of Master of Science
This report explores integrating fuzzy logic with two data mining methods
(association rules and frequency episodes) for intrusion detection. Data mining methods
are capable of extracting patterns automatically from a large amount of data. The
integration with fuzzy logic can produce more abstract and flexible patterns for intrusion
detection, since many quantitative features are involved in intrusion detection and
security itself is fuzzy. In this report, Chapter I introduces the concept of intrusion
detection and the practicality of applying fuzzy logic to intrusion detection. In Chapter II,
two types of intrusion detection systems, host-based systems and network-based systems,
are briefly reviewed. Some important artificial intelligence techniques that have been
applied to intrusion detection are also reviewed here, including data mining methods for
anomaly detection. Chapter III summarizes a set of desired characteristics for the
Intelligent Intrusion Detection Model (IIDM) being developed at Mississippi State
University. A preliminary architecture which we have developed for integrating machine
learning methods with other intrusion detection methods is also described. Chapter IV
discusses basic fuzzy logic theory, traditional algorithms for mining association rules,
and an original algorithm for mining frequency episodes. In Chapter V, the algorithms we
have extended for mining fuzzy association rules and fuzzy frequency episodes are
described. We add a normalization step to the procedure for mining fuzzy association
rules in order to prevent one data instance from contributing more than others. We also
modify the procedure for mining frequency episodes to learn fuzzy frequency episodes.
Chapter VI describes a set of experiments of applying fuzzy association rules and fuzzy
episode rules for off-line anomaly detection and real-time intrusion detection. We use
fuzzy association rules and fuzzy frequency episodes to extract patterns for temporal
statistical measurements at a higher level than the data level. We define a modified
similarity evaluation function which is continuous and monotonic for the application of
fuzzy association rules and fuzzy frequency episodes in anomaly detection. We also
present a new real-time intrusion detection method using fuzzy episode rules. The
experimental results show the utility of fuzzy association rules and fuzzy frequency
episodes in intrusion detection. The conclusions are included in Chapter VII.
ii
DEDICATION
I would like to dedicate this research to my family and my wife.
iii
ACKNOWLEDGMENTS
I am deeply grateful to Dr. Susan Bridges for expending much time to direct me
in this entire research project and directing my graduate study and research work during
last two years. In this research, she often gave me many distinctive insights and guided
me back to right path when I was astray. She always encourages me to learn new things
and think independently. I also thank her very much for her concerns when I was
hospitalized. I owe heartfelt thanks to Dr. Julia Hodges, who introduced me into the
DIAL research group and taught me much in my research and graduate study. Her classes
are always interesting and full of humors. I also thank her very much for her
encouragement when I faced difficulties, as well as her valuable trust. I am also very
indebted to Dr. Ray Vaughn for his direction in this project. He spent much time to guide
my project, introduced many suggestions, and provided me much useful information. I
also thank our system administrator, Mr. Gerhard Lehnerer. Without hi s time and effort,
my experiments would not have been conducted smoothly.
iv
TABLE OF CONTENTS
Page
DEDICATION ii
ACKNOWLEDGMENTS ... iii
LIST OF TABLES .. vi
LIST OF FIGURES . vii
CHAPTER
I. INTRODUCTION .... 1
II. LITERATURE REVIEW ON INTRUSION DETECTION 5
2.1 Intrusion Detection Systems ...... 5
2.1.1 Host-Based Intrusion Detection ... 6
2.1.2 Network-Based Intrusion Detection .... 7
2.2 Artificial Intelligence and Intrusion Detection Methods ... 9
2.2.1 Artificial Intelligence and Misuse Detection ... 9
2.2.1.1 Rule-Based Expert System . 9
2.2.1.2 State Transition Analysis 10
2.2.1.3 Genetic Algorithms ..... 10
2.2.2 Artificial Intelligence and Anomaly Detection .... 11
2.2.2.1 Inductive Sequential Patterns .. 11
2.2.2.2 Artificial Neural Networks .. 12
2.2.2.3 Data Mining Methods . 12
2.2.3 Summary of AI and Intrusion Detection .. 13
III. AN INTELLIGENT INTRUSION DETECTION MODEL . 14
3.1 Expected Characteristics of IIDM ...... 14
3.2 Preliminary Architecture .... 16
IV. REVIEW OF FUZZY LOGIC AND DATA MINING ... 20
4.1 Fuzzy Logic .... 20
v
CHAPTER Page
4.2 Data Mining Methods . 25
4.2.1 Association Rules ...... 25
4.2.2 Frequency Episodes .. 30
V. INTEGRATION OF FUZZY LOGIC WITH DATA MINING ..... 34
5.1 Fuzzy Association Rules ..... 34
5.1.1 Related Works ... 35
5.1.2 Fuzzy Association Rules ....... 36
5.2 Fuzzy Frequency Episodes ...... 39
VI. EXPERIMENTS AND RESULTS ........ 43
6.1 Anomaly Detection ..... 44
6.1.1 Experiment Set 1 ....... 44
6.1.2 Experiment Set 2 ... 53
6.2 Real-time Intrusion Detection . 55
6.2.1 Experiment 3 ..... 57
6.2.2 Experiment 4 ..... 60
6.2.3 Experiment 5 ..... 61
VII. CONCLUSION .. 64
REFERENCES .. 67
vi
LIST OF TABLES
TABLE Page
2.1 Summary of AI Techniques and Intrusion Detection .... 13
6.1 Specification of Training and Test Data Sets ..... 46
6.2 Effects of the minconfidence Threshold on the False Positive Error
Rate (FPER) and the False Negative Error Rate (FNER) .. 62
vii
LIST OF FIGURES
FIGURE Page
3.1 Architecture of an Intelligent Intrusion Detection Model .... 17
4.1 Singleton Representation of a Fuzzy Set ...... 22
4.2 Standard Function Representation of Fuzzy Sets . 23
4.3 Calculation Method of Fuzzy Set Values . 24
4.4 Agrawal and Srikant's Apriori Candidate Generation Algorithm (1994) . 27
4.5 Flow Chart Depiction of Agrawal and Srikants
Algorithm Apriori (1994) . 29
4.6 Candidate Generation Algorithm Based on Work of
Mannila and Toivonen (1996) .. 32
5.1 Example of Sharp Boundary Problem ...... 35
5.2 Candidate Generation Algorithm for Fuzzy Association Rules ... 38
6.1 Specification of Temporal Statistical Measurements
Used in the Experiments ....... 45
6.2 Comparison of Similarities Between Different Training and
Test Data Sets for Fuzzy Association Rules ... 51
6.3 Comparison of Similarities Between 3 Hour Training Data Set and
Test Data Sets for Fuzzy Episode Rules ...... 51
6.4 Comparison of Similarities Between Different Training and
Test Data Sets for Fuzzy Association Rules .... 52
6.5 Comparison of Similarities Between 3 Hour Training Data Set and
Test Data Sets for Fuzzy Episode Rules ... 52
viii
FIGURE Page
6.6 Comparison of Similarities Between Training Data Set and
Different Test Data Sets for Fuzzy Association Rules . 54
6.7 Comparison of Similarities Between Training Data Set and
Different Test Data Sets for Fuzzy Episode Rules ... 55
6.8 Anomaly Percentages of Different Test Data Sets in
Real-time Intrusion Detection ..... 58
6.9 Distribution of the Feature PN with Time from
Test Data Sets T1 (Representing Normal Behavior)
and T4 (Representing Simulated mscan Intrusions) .. 59
6.10 Comparison of False Positive Error Rates of
Fuzzy Episode Rules and Non-Fuzzy Episode Rules .... 61
1
CHAPTER I
INTRODUCTION
In recent years, computer security has become increasingly important and an
international priority. This is due to the wide use of computers, the emergence of
electronic commerce, and the rapid growth of computer networks. For example, a
Trojan horse in a computer host can perform illegal operations or even do some
damage because it masks itself as a valid program (Gasser 1988). In a computer network
running TCP/IP, IP spoofing will help intruders gain access to a remote host by guessing
its IP sequence numbers and then masking itself as a legal user. Since intrusions will take
advantage of vulnerabilities in computer systems, intrusion detection methods are usually
developed to enforce the security policy of computer hosts and computer networks.
In a modern computer system, intrusion detection has become an essential and
critical component. One of the reasons is that it is not technically feasible to build a
system without any vulnerabilities (Denning 1986). As a matter of fact, it is also very
difficult to test the security capabilities of a system since it is almost impossible to
incorporate all intrusion patterns. In addition, future attackers may use completely
unknown patterns which are unexpected and difficult to detect. On the other hand,
intrusions originating from authorized system users who choose to abuse their access
2
rights will not cause an alarm without the use of intrusion detection methods (Denning
1986).
There are two types of intrusion detection: misuse detection and anomaly
detection (Sundaram 1996). Misuse detection can be applied to the attacks that generally
follow some fixed patterns. For example, three consecutive login failures are likely to be
one of the important characteristics of password guessing. Misuse detection is usually
constructed to examine these intrusion patterns that have been recognized and reported by
experts. However, intruders do not always follow publicly known patterns to break into a
computer system. They will often try to mask their illegal behavior to deceive the
detection system. Anomaly detection methods are designed to counter this kind of
challenge. Unlike misuse detection that is based on attack patterns, anomaly detection
tries to find patterns of normal behavior, with the assumption that an intrusion will
usually include some deviation from this normal behavior. Observation of this deviation
will then result in an intrusion alarm.
Artificial intelligence (AI) techniques have played an important role in both
misuse detection and anomaly detection. AI techniques can be used for data reduction
and classification tasks (Frank 1994). For example, many intrusion detection systems
have been developed as rule-based expert systems. An example is SRIs Intrusion
Detection Expert System (IDES) (Lunt and Jagannathan 1988). The rules for detection
can be constructed based on the knowledge of system vulnerabilities or known attack
patterns. On the other hand, AI techniques also have the capability of learning inductive
rules. For example, sequential patterns can be learned by a system such as the Time-
3
based Inductive Machine (TIM) for intrusion detection (Teng, Chen, and Lu 1990).
Neural networks can be used to predict future intrusions after training (Debar, Becker,
and Siboni 1992). Data mining methods, such as association rules and frequency
episodes, have been also proposed to mine normal patterns from audit data (Lee, Stolfo,
and Mok 1998).
However, if a rule is directly dependent on audit data, there is very little
flexibility in this one-to-one (rule-to-audit record) representation (Ilgun and Kemmerer
1995). For example, an intrusion with a very small deviation from the patterns
represented in the rules may not be matched and recognized. To improve the flexibility of
an intrusion detection system, this thesis describes a method for integrating fuzzy logic
with data mining methods for intrusion detection.
There are two main reasons to introduce fuzzy logic for intrusion detection. First,
many quantitative features are involved in intrusion detection. SRIs Next-generation
Intrusion Detection Expert System (NIDES) categorizes security-related statistical
measurements into four types: ordinal, categorical, binary categorical, and linear
categorical (Lunt 1993). Ordinal measurements and linear categorical measurements are
quantitative features which can potentially be viewed as fuzzy variables. For instance, the
CPU usage time and the connection duration are two examples of ordinal measurements.
An example of a linear categorical measurement is the number of different TCP/UDP
services initiated by the same source host. The second reason to introduce fuzzy logic for
intrusion detection is that security itself includes fuzziness. Given a quantitative
measurement, a range value or an interval can be used to denote a normal value. Then,
4
any values falling outside the interval will be considered anomalies to the same degree
regardless of their different distances to the interval. The same applies to values inside
the interval, i.e., all will be viewed normal to the same degree. Unfortunately, this causes
an abrupt separation between normality and anomaly. For example, a value inside the
border is assumed normal while another value outside the border is assumed abnormal
even though there is only a very small difference between these two values. The
introduction of fuzziness to these quantitative features will help to smooth the abrupt
separation. The hypothesis of this research is that fuzzy logic is capable of producing
more general rules which will increase the flexibility of intrusion detection systems.
This thesis will investigate the integration of fuzzy logic with association rules
and frequency episodes with the purpose of improving the performance of an intrusion
detection system.
5
CHAPTER II
LITERATURE REVIEW ON INTRUSION DETECTION
Intrusions were first categorized by J. P. Anderson (Lunt 1993). They can be
largely classified into three types: external intrusions, internal intrusions, and misfeasors.
An external intrusion tries to break into a computer system without appropriate access
rights. An internal intrusion originates from a valid user inside a computer system. A
masquerader is an internal intruder who logs into the system by use of other users
accounts. A clandestine is also an internal intruder who deceives the system and performs
illegal operations. A misfeasor usually abuses his or her authority on the use of a
computer system.
Accordingly, intrusion detection can be defined as detecting outside intruders
who are using a computer system without authorization and inside intruders who have
legitimate access to the system but are abusing their privileges (Mukherjee, Heberlein,
and Levitt 1994). Intrusion detection systems are usually built to identify these
unauthorized behavior of outside or inside intruders and to enforce the security of
computer systems.
2.1 Intrusion Detection Systems
Generally speaking, there are two types of intrusion detection systems: host-based
intrusion detection systems and network-based intrusion detection systems.
6
2.1.1 Host-Based Intrusion Detection
A generic intrusion detection model proposed by Denning (1986) works as a rule-
based pattern matching system which includes the following six components:
1. Subjects: A subject is the initiator of an action being performed on the host,
e.g., a user or the host itself.
2. Objects: An object is the receptor of an action, e.g., a system file or a
system device.
3. Audit records: An audit record is used to represent an action initiated by the
subject and that occurred on the object. Some quantitative measurements on
the action are also included in the audit record, e.g., CPU usage time or I/O
activity.
4. Profiles: A profile is the signature or description of normal activity of a
subject or a group of subjects concerning an object or a group of objects, e.g.,
a profile on the CPU usage of a user session or a profile on the CPU usage of
a program. Several statistical models can also be included to calculate these
quantitative measurements in these profiles. Examples include the mean and
standard deviation model, Markov process model, and time serial model.
5. Anomaly records: An anomaly record is used to record an anomalous event
that has been detected.
6. Activity rules: An activity rule describes what action will be taken under some
conditions. For example, when a new audit record is created, the
corresponding profile will be updated automatically.
7
So, intrusion detection tasks can be conducted by checking the similarity between
the current audit record and the corresponding profiles. If the current audit record
deviates from the normal patterns enough, it will be considered an anomaly. This process
occurs in real time.
Dennings intrusion detection model is the basis of SRIs IDES (Lunt and
Jagannathan 1988). SRIs IDES has two components: the statistical anomaly detector and
the expert system (Mukherjee, Heberlein, and Levitt 1994). Based on Dennings model,
the first component is used to detect anomalies by applying statistical methods, i.e., the
normal patterns are constructed by use of statistical analysis and the anomaly intrusions
are detected by assuming that there will be always some differences between normal
patterns and intrusions. The expert system component of SRIs IDES is constructed as a
rule-based system and is used to detect the intrusions whose patterns are already known.
2.1.2 Network-Based Intrusion Detection
With the proliferation of computer networks, more and more individual hosts are
connected into LANs of small scale or WANs of large scale. However, the hosts, as well
as the networks, are exposed to intrusions due to the vulnerabilities of network devices
and network protocols. For example, a bastion host is a host which exposes itself in the
Internet since its address is publicly known (Chapman and Zwicky 1995). The TCP/IP
protocol can be also exploited by network intrusions such as IP spoofing, port scanning,
and so on. So, network-based intrusion detection has become increasingly important and
is designed to protect a computer network as well as all of its hosts. Packet filtering, for
example, can decide what kind data will be accepted or rejected for transfer through a
8
computer network based on routine information found in packet headers (Chapman and
Zwicky 1995). The installation of a network-based intrusion detection system can also
decrease the burden of the intrusion detection task on every individual host.
To detect network-based intrusions, a network security monitor (NSM) has been
proposed by Heberlein et al. (1990), which has a hierarchical architecture composed of
the following five layers (from lowest to highest):
1. Packet catcher: It will monitor network traffic, catch every packet, and send it
to the next layer.
2. Parser: It will analyze every incoming packet, summarize the security-related
information into a four dimensional vector of <source address, destination
address, service, connection ID>, and pass it to the next layer.
3. Matrix generator: A corresponding four-dimensional matrix is maintained.
Since the connection ID is unique, every connection will be represented by
one cell in the matrix. A cell usually stores two measurements: the number of
packets and the total data bytes transferred in one connection.
4. Matrix analyzer: Since the matrix actually represents the network traffic, the
matrix analyzer will compare it with the normal patterns by use of a
masking method. Anomaly intrusions will be detected because they will not
be masked by normal patterns.
5. Matrix archiver: It will store the matrix at intervals, e.g., every fifteen
minutes. These matrices can then be used to construct normal patterns of
network traffic.
9
NSM detects network anomalies by monitoring network traffic. For misuse
detection, LANLs (Los Alamos National Laboratory) NADIR (Network Anomaly
Detection and Intrusion Reporter) is built as a rule-base expert system through audit
analysis and consultation with security experts (Mukherjee, Heberlein, and Levitt 1994).
2.2 Artificial Intelligence and Intrusion Detection Methods
There are two types of intrusion detection methods: misuse detection and anomaly
detection. Misuse detection is based on the knowledge of system vulnerabilities and the
known attack patterns, while anomaly detection assumes that an intrusion will always
reflect some deviations from normal patterns. Many artificial intelligence techniques
have been applied to both misuse detection and anomaly detection.
2.2.1 Artificial Intelligence and Misuse Detection
Since misuse detection is used to identify the intrusions whose patterns are
known, pattern matching is a direct and efficient way to implement it.
2.2.1.1 Rule-Based Expert System
A known intrusion pattern can be easily represented by rules such as production
rules in the form of if-then-else. A rule-based expert system will also facilitate the
process of pattern matching. This is the reason many intrusion detection systems are
developed as rule-based expert systems or include rule-based inference components, such
as SRIs IDES, LANLs NADIR, and so on. The efficiency of pattern matching is one of
the most remarkable advantages for a rule-based expert system. When it is used in misuse
detection, activation of more rules raises the level of suspicion.
10
Ilgun and Kemmerer (1995), however, have pointed out that one obvious
disadvantage for a rule-based expert system is its direct dependency on audit data since
a very small difference from the intrusion scenario will have different audit data which
will blind the rule-based exert system from recognizing the intrusion.
2.2.1.2 State Transition Analysis
The State Transition Analysis Tool (STAT) proposed by Ilgun and Kemmerer
(1995) is another form of rule-based detection method. STAT first extracts a high-level
representation of the audit trail, which is called signature action, from raw audit data. An
intrusion pattern is represented by a sequence of state transitions from the initial state to
the final state. Each state represents the systems current situation, and the transition
between two states is activated by a signature action. A STAT rule has three parts: a
state description field, a signature action field, and a rule dependence field (Ilgun and
Kemmerer 1995).
The main advantage of STAT is that it can represent an intrusion pattern at a
higher level than the audit data level, as well as in a sequential way, i.e., as a series of
state transitions. However, the construction of a state transition diagram is not as direct as
a rule-based expert system.
2.2.1.3 Genetic Algorithms
Genetic Algorithm for Simplified Security Audit Trails Analysis (GASSATA)
proposed by Me (1998) introduces genetic algorithms, a sub-symbolic AI technique, for
misuse intrusion detection. GASSATA will construct a two-dimensional matrix. One axis
11
of the matrix specifies different attacks already known. The other axis represents
different kinds of events derived from audit trails, i.e., the features of these attacks. So,
this matrix actually represents the patterns of intrusions. A cell in the matrix, e.g., [Ei,
Aj], reflects the number of events Ei that will occur in an attack Aj. Given an audit record
being monitored which includes information about the number of occurrences of every
event, GASSATA will apply genetic algorithms to find the potential attacks appearing in
this audit record. Experiments on genetic algorithms have shown good results after
evolving only 10 epochs (Me 1998). However, the assumption that the attacks are
dependent only on events in this method will restrict its generality.
2.2.2 Artificial Intelligence and Anomaly Detection
Statistical analysis has been widely used in anomaly detection (Denning 1986;
Lunt and Jagannathan 1988). On the other hand, many AI techniques also can be applied
to anomaly detection.
2.2.2.1 Inductive Sequential Patterns
The Time-based Inductive Machine (TIM) has been proposed by Teng, Chen, and
Lu (1990) to learn sequential patterns automatically from audit data for real-time
anomaly detection. The format of the sequential rules inferred from audit trails by TIM
can be illustrated with the following example: A B (C=90%; D=10%). This rule is
interpreted to mean that if event A is directly followed by event B, then the next event
will be C or D with the probabilities of 90% and 10%, respectively. Then any event
sequence that does not match the normal sequential patterns inductively learned by TIM
12
will be marked as an anomaly. For example, given the normal pattern: A B C
(D=100%), the sequence of A B C E will be flagged as an anomaly because a
different event E (instead of D) has occurred while the conditions of the normal pattern
have been matched.
The main advantage of introducing an inductive learning mechanism to anomaly
detection is that sequential patterns can be learned automatically and updated adaptively
since new audit data can be used to train the system to find new normal patterns.
2.2.2.2 Artificial Neural Networks
Artificial neural networks have been suggested for use in conjunction with expert
systems to detect anomalies (Debar, Becker, and Siboni 1992). The backpropagation
algorithm is used to learn time series. For example, after appropriate training, the
backpropagation network will be able to predict the next command given a sequence of
user commands. Then if the command observed in the audit record is different from that
predicted by the neural network, a potential anomaly will be alarmed.
An outstanding advantage of artificial neural networks is that they are highly
tolerant of noisy data. Even an incomplete or inaccurate audit record will not prevent a
neural network from detecting intrusions (Debar, Becker, and Siboni 1992).
2.2.2.3 Data Mining Methods
Like TIM, data mining methods can be also used to extract normal patterns from
training data automatically and adaptively. Two data mining methods, association rules
and serial frequency episodes, have been proposed for audit data gathering, feature
13
selection, and off-line analysis for anomaly detection (Lee, Stolfo, Mok 1998). An
association rule specifies the correlation among different features. A serial frequency
episode represents a sequential pattern repeatedly occurring in the event sequence. An
advantage here is that both the patterns among different features and the patterns among
sequential events can be exploited.
2.2.3 Summary of AI and Intrusion Detection
Table 2.1 summarizes different AI techniques that have been used for intrusion
detection.
Table 2.1
Summary of AI Techniques and Intrusion Detection
Intrusion
Detection
Types
AI
Techniques
Pros Cons
Rule-Based
Expert
Systems
Efficiency of pattern matching;
Ease of construction
Dependency on audit
data
State
Transition
Analysis
General rules at higher level;
Sequential rules
Effort of construction
Misuse
Intrusion
Detection
Genetic
Algorithms
Efficiency of pattern matching Less general
Inductive
Sequential
Patterns
Automatic and adaptive learning A large amount of
training time
Artificial
Neural
Networks
Tolerance of noisy data A large amount of
training time
Anomaly
Intrusion
Detection
Data Mining
Methods
Automatic and adaptive learning;
More powerful rules
A large amount of
training time
14
CHAPTER III
AN INTELLIGENT INTRUSION DETECTION MODEL
A research group at Mississippi State University is investigating the development
of an intelligent intrusion detection model (IIDM) that applies artificial intelligence
techniques and data mining methods.
3.1 Expected Characteristics of IIDM
The expected characteristics of IIDM are specified as below.
(1) Efficient: One of the most important characteristics for an intrusion detection
system is its efficiency. An efficient intrusion detection system is able to correctly predict
an attack as well as correctly recognize a normal operation. Two quantitative
measurements are generally used to evaluate the efficiency of an intrusion detection
system: a false positive rate and a false negative rate (Crosbie and Spafford 1995). The
false positive rate is the error rate when an intrusion detection system wrongly predicts
normal behavior as an abnormal attack. Similarly, the false negative rate is the error rate
when an intrusion detection system marks an intrusion as a legal operation. A high false
positive rate will seriously affect the performance of the system being detected. A high
false negative rate will leave the system vulnerable to intrusions. So, both the false
positive rate and the false negative rate should be minimized in IIDM.
15
(2) Intelligent: An intrusion detection system should be sufficiently intelligent to
avoid being deceived by intrusions. For example, if an intrusion detection system is
designed to recognize three consecutive login failures as a potential attack, an intruder
can avoid this kind of routine check by always doing one or two consecutive login trials
instead of three. Another example is subversion (Crosbie and Spafford 1995). Intruders
may take some actions over a period of time. Each of these actions looks legal and safe if
taken separately, but the sequence of these actions will compose a malicious intrusion. It
is clear that an intelligent intrusion detection system should have enough flexibility to
generalize patterns, even over a period of time. The integration of fuzzy logic with data
mining methods is used to increase the flexibility of IIDM.
(3) Adaptive: IIDM will incorporate a machine learning component that works as
a background unit and learns normal patterns automatically from system audit data or
network traffic data. The learning algorithms integrate fuzzy logic with association rules
and serial frequency episodes and are implemented in a reusable way, i.e., they can be
used to mine normal patterns from different sets of training data. Furthermore, the
learning process is also an iterative and incremental procedure. New training data can be
used to mine new normal patterns and the old patterns can be updated by combining these
new patterns. This iterative and incremental learning process will make the intrusion
detection system more adaptive.
(4) Modular: Due to the complexity of intrusion detection, it is usually not
sufficient for an intrusion detection system to use only misuse detection methods or
anomaly detection methods. Accordingly, IIDM will include both of them in its core
16
component. In detail, the detection methods will be implemented as a set of intrusion
detection modules. An intrusion detection module may address only one or even a dozen
types of intrusions. Several intrusion detection modules may also cooperate to detect an
intrusion in a loosely coupled way since these detection modules are relatively
independent. Different modules may use different methods. For instance, one module can
be implemented as a rule-based expert system and another module can be constructed as
a neural network classifier. On the whole, this modular structure will ease future system
expansion and upgrades since a module can be easily added, modified, or removed.
(5) Distributed: With the rapid growth of computer networks and distributed
systems, network-based intrusion detection is necessary. Accordingly, IIDM will be
network-oriented. The intrusion detection sentries will collect and preprocess real-time
system audit data or network traffic data, as well as communicate with the
communication module in the core component. So, through the communication module,
the collected data can be passed to the decision-making module and intrusion detection
modules for further analysis, and the evaluation results can be fed back to the sentries.
(6) Real-time: In IIDM, the intrusion detection modules, the decision-making
module, the communication module, and the intrusion detection sentries will work
together to conduct real-time detection, while the machine learning component will work
off-line.
3.2 Preliminary Architecture
A preliminary architecture for IIDM is shown in Figure 3.1.
17
...
Core Component
Intrusion
Detection
Module n+1
...
Network Traffic
or Audit Data
(1)
Machine Learning Component
( by mining fuzzy association rules
and fuzzy frequency episodes )
Server
Clients
Background
Unit
Experts
Network Traffic
or Audit Data
(2)
Network Traffic
or Audit Data
(m)
Intrusion
Detection
Module n
Decision-Making Module
Intrusion
Detection
Module n
Communication Module
Intrusion
Detection
Module 1
.
.
.
Administrator
...
Anomaly Detection
Misuse
Detection
Intrusion
Detection
Sentry s
Host or
Network
Device
Intrusion
Detection
Sentry 1
Host or
Network
Device
Intrusion
Detection
Sentry 2
Host or
Network
Device
Figure 3.1: Architecture of an Intelligent Intrusion Detection Model
18
The functionality of each unit is briefly explained below:
(1) Machine Learning Component: With the purpose of learning rules that are
more abstract and less dependent directly on audit data, fuzzy logic will be
integrated with association rules and frequency episodes. The machine
learning component will automatically learn fuzzy association rules and fuzzy
frequency episodes from system audit data or network traffic data for anomaly
detection.
(2) Anomaly Intrusion Detection Module: This component will evaluate the
deviation from normal patterns for an observed audit trail.
(3) Misuse Intrusion Detection Module: Based on the knowledge of system
vulnerabilities and expert advice, a misuse intrusion detection module can be
built to detect known attacks.
(4) Decision-Making Module: It has two roles. Given an observed audit trail, it
will decide which intrusion detection modules (misuse or anomaly) will be
activated. On the other hand, it will also integrate the evaluation results from
different detection modules and generate an overall evaluation on the
suspiciousness of the observed audit trail.
(5) Communication Module: It is the bridge between the decision-making module
and the intrusion detection sentries. The observed audit trail that has been
preprocessed by detection sentries can be sent to the decision-making module
for intrusion evaluation; the feedback can be also returned to the detection
sentries.
19
(6) Intrusion Detection Sentry: This component will collect real-time system audit
data or network traffic data and do some preprocessing tasks. This is resident
at each host or at a host network interface component device.
20
CHAPTER IV
REVIEW OF FUZZY LOGIC AND DATA MINING METHODS
Based on fuzzy set theory, fuzzy logic provides a powerful way to categorize a
concept in an abstract way by introducing vagueness. On the other hand, data mining
methods are capable of extracting patterns automatically from a large amount of data.
The integration of fuzzy logic with data mining methods will help to create more abstract
patterns at a higher level than at the data level. Decreasing the dependency on data will
be helpful for patterns used in intrusion detection.
The literature review on fuzzy logic and two data mining methods, association
rules and frequency episodes, will be discussed.
4.1 Fuzzy Logic
Traditionally, a standard set like S = {a, b, c, d, e} represents the fact that every
member totally belongs to the set S. However, there are many concepts that have to be
expressed with some vagueness. For instance, tall is fuzzy in the statement of Johns
height is tall since there is no clear boundary between tall and not tall (Stefik 1995;
Hodges, Bridges, and Yie 1996).
Fuzzy set theory established by Lotfi Zadeh is the basis of fuzzy logic (Stefik
1995). A fuzzy set is a set to which its members belong with a degree between 0 to 1. For
example, S = {(a 0), (b 0.3), (c 1), (d 0.5), (e 0)} is a fuzzy set in which a, b, c, d, and e
21
have membership degrees in the set of S of 0, 0.3, 1, 0.5, and 0 respectively. So, it is
absolutely true that a and e do not belong to S and c does belong to S, but b and e are
only partial members in the fuzzy set S.
A fuzzy variable (also called a linguistic variable) can be used to represent these
concepts associated with some vagueness. A fuzzy variable will then take a fuzzy set as a
value, which is usually denoted by a fuzzy adjective. For example, height is a fuzzy
variable and tall is one of its fuzzy adjectives, which can be represented by a fuzzy set
(Stefik 1995; Hodges, Bridges, and Yie 1996).
A standard fuzzy logic system, FuzzyCLIPS provides several methods to
represent a fuzzy set. These include singleton representation, standard function
representation, and linguistic expression representation (Orchard 1995).
In the singleton representation, a fuzzy set consists of a sequence of points, each
of which is associated with a membership degree. Given a fuzzy set {(
1
x
1
), (
2
x
2
),
, (
n
x
n
)} where for all i, 1 i < n and
i
x
1 + i
x , the two consecutive points will be
linked by a straight line (Orchard 1995). Accordingly, the above example of the fuzzy set
S = {(a 0), (b 0.3), (c 1), (d 0.5), (e 0)} will look like Figure 4.1.
22
FuzzyCLIPS also provides three standard functions S, PI, and Z to represent fuzzy
sets. Their graphical shapes and formal definitions are shown in Figure 4.2 (Orchard
1995).
0
0.2
0.4
0.6
0.8
1
a b c d e
Figure 4.1 Singleton Representation of a Fuzzy Set
23
Moreover, FuzzyCLIPS provides many linguistic hedges to modify fuzzy
concepts. These include not, very, somewhat, more-or-less, extremely, and so on
(Orchard 1995). The following is an example from Orchard (1995) of how different
levels of temperature can be represented using fuzzy sets in FuzzyCLIPS.
( deftemplate temperature
0 100 C
( ( cold ( z 10 26 ) )
( hot ( s 37 60 ) )
( warm not [ hot or cold ] ) ) )
Figure 4.2 Standard Function Representation of Fuzzy Sets
0 a
2
2

,
_

a c
a
a <
2
c a +
2
2 1

,
_

a c
a
2
c a +
< c
1 c <
1 - S(, a, c)
S(, b-d, b) b
Z(, b, b+d) b <
S(, a, c) =
Z(, a, c) =
PI(, a, c) =
24
Here, temperature is a fuzzy variable and cold, hot, and warm are fuzzy
adjectives. The minimum value for temperature is 0 and the maximum value is 100.
C is the unit of these values.
Therefore, given a fuzzy variable and all its fuzzy set definitions, the values of the
different fuzzy sets at a specific point can be easily calculated. For example, suppose the
fuzzy variable weight has its fuzzy set definitions as follows:
( deftemplate weight
0 15 pounds
( ( light (0 1) (5 1) (6 0.5) (8 0) )
( average (5 0) (8 1) (11 0) )
( heavy (8 0) (10 0.5) (11 1) (15 1) ) ) )
Then, as shown in Figure 4.3, given a specific value v, e.g., v = 7 pounds, its
membership degrees in the fuzzy sets light, average, and heavy, will be 0.25, 0.667, and
0, respectively.
0
0.2
0.4
0.6
0.8
1
Figure 4.3 Calculation Method of Fuzzy Set Values
0 5 6 7 8 9 10 11 15
0.667
0.250
25
4.2 Data Mining Methods
Data mining methods have the ability to find new patterns from a large amount of
data automatically. Two data mining methods, association rules and frequency episodes,
have been proposed to mine audit data to find normal patterns for anomaly intrusion
detection (Lee, Stolfo, and Mok 1998).
4.2.1 Association Rules
Association rules originate from retail data analysis in business. A piece of sales
data, also called basket data, usually records information about a transaction, such as
transaction date and transaction items (Agrawal and Srikant 1994). Association rules can
be used to find the correlation among different items in a transaction. For example, when
a customer buys item A, item B will also be purchased by the customer with the
probability of 90%. So, item B is associated with item A.
Agrawral and Srikant (1994) have presented some fast algorithms to mine
association rules, including algorithm Apriori. In Agrawal and Srikants algorithm
Apriori (1994), suppose D = {
1
T ,
2
T , ,
n
T } is the transaction database with n
transactions in total and I = {
1
i ,
2
i , ,
m
i } is the set of all the items where each
j
i (1
j m) represents one kind of item. Then each transaction
l
T (1 l n) in D records the
items purchased, i.e.,
l
T I. Define an itemset as a non-empty subset of I. An
association rule will look like: X Y, c, s, where X I, Y I, and X Y = , i.e., X and
Y are disjoint itemsets. Here s represents the support of this association rule and c
represents the confidence of this association rule. Assume the number of transactions that
26
contains both the itemset X and the itemset Y is n , s = support(X Y) =
n
n
and c =
) (
) (
X support
Y X support
. Briefly speaking, support(X) can be viewed as the occurrence
frequency of the itemset X in the whole transaction database D, while c means that when
X is satisfied, there will be the certainty of c that Y is also true .
According to Agrawal and Srikant (1994), given two thresholds of minconfidence
(representing minimum confidence) and minsupport (representing minimum support), a
mining algorithm will find all such association rules as X Y, c, s where c
minconfidence and s minsupport. Define any itemset X as a large itemset if support(X)
minsupport. Usually, a mining algorithm involves two steps:
(1) Find all the large itemsets of different lengths.
(2) Construct association rules from every large itemset. This is actually a direct
mapping process. Given a large itemset L, for any non-empty subset of L, e.g.,
L , an association rule can be constructed as L (L L ), c , s , where
minsupport L support s ) ( and nce minconfide
L support
L support
c
) (
) (
.
Since the construction of association rules from large itemsets is a straightforward
process, algorithm Apriori focuses on how to find large itemsets.
We use } ,..., , {
2 1 k
k
item item item X I to represent an itemset of length k (1 k
m), i.e., an itemset that contains k items. Agrawal and Srikants algorithm Apriori (1994)
also defines
k
L as the set of all the k-length large itemsets and
k
C as a set of k-length
27
itemsets that are candidates for large itemsets. So,
k k
L C . Algorithm Apriori is based
on the following observation: any non-empty subset of a large itemset must be a large
itemset too. Suppose
k
X is a large itemset and
k l
X X (1 l < k). Since
minsupport X support X support
k l
) ( ) ( ,
l
X must be a large itemset, too. So,
k
C can
be directly constructed from
1 k
L (k 2) as shown in the following algorithm.
;
k
C
select } . , . ,..., . , . {
1
1
1
1
2
1
1
1


k
k
k
k k k
item Y item X item X item X into
k
C
from
1 k
L
where ) (
1
1

k
k
L X
) (
1
1

k
k
L Y
(j, 1 j k-2,
j
k
j
k
item Y item X . .
1 1
)
); . . (
1
1
1
1

<
k
k
k
k
item Y item X
forall itemsets
k
k
C Z do begin
if (there exists a sub-itemset
k k
Z Z
1
and
1
1

k
k
L Z )
then } {
k
k k
Z C C ;
return
k
C ;
In this algorithm, it is also clear that given an itemset of length k, e.g.,
} . ,..., . , . {
2 1 k
k k k k
item Z item Z item Z Z , there will be k sub-itemsets of length k-1 that
are } . {
l
k k
item Z Z , for all l, 1 l k.
Figure 4.4 Agrawal and Srikants Apriori Candidate Generation Algorithm (1994)
28
On the other hand, according to Agrawal and Srikant (1994), given
} ,..., , {
2 1
k
n
k k
k
X X X C ,
k
L can be constructed by simply scanning the transaction
database to calculate the support of every itemset in
k
C , since
)} ) ( ( ) ( ) 1 ( | { minsupport X support C X n j X L
k
j k
k
j
k
j k
. Moreover,
1
C can
be directly initialized as }} { },..., { }, {{
2 1 m
i i i . Details of algorithm Apriori are shown in
Figure 4.5.
Because one pass of the transaction database D will be able to construct
k
L from
k
C , if the maximum length of large itemsets is K (K < m), the cost of this algorithm will
be K passes of D, or (K*n).
29
Construct Lk:
forall transactions T D do begin
forall itemsets
k
X Ck do begin
if (
k
X T ) then
k
X .count++;
end
end
Lk = ;
forall itemsets
k
X Ck do begin
if (
k
X .count / n minsupport )
then Lk = Lk + {
k
X };
end
return Lk;
Lk =
Construct Association Rules
Succeed
k = k + 1;
Construct Ck from Lk-1
Yes
No
Figure 4.5: Flow Chart Depiction of Agrawal and Srikants Algorithm Apriori (1994)
k = 1;
Construct C1 = {{i1} {i2} {im}}
30
4.2.2 Frequency Episodes
Mannila and Toivonen (1996) have proposed an algorithm to discover simple
serial frequency episodes from event sequences based on minimal occurrences. In
Mannila and Toivonens method (1996), suppose S = {
1
E ,
2
E , ,
n
E } is an event
sequence with n events in total and A = {
1
a ,
2
a , ,
m
a } is the set of all the event
attributes. Each event } . ,..., . . {
2 , 1 m
a E a E a E E in S consists of m values for all the
event attributes. E is also associated with a timestamp denoted by E.T. Then a simple
serial episode ) ,..., , (
2 1 k
e e e P represents a sequential occurrence of k event variables
where each
i
e (1 i k) is an event variable and for all i and j (1 i < j k),
i
e .T <
j
e .T. Usually, k is much smaller than n, so 1 k << n. We use
q
e to represent an event
variable consisting of q event attributes, i.e., } ,..., , {
2 2 1 1 q q
q
v attr v attr v attr e
where } . ,..., . , . {
2 1 q
q q q
attr e attr e attr e A and 1 q m. In addition, each
i
v (1 i q) is
a value from the domain of attribute
i
attr . So,
q
e is said to have an occurrence in an
event E if for all i (1 i q),
i
q
i
q
v e attr e E . ) . .( .
According to Mannila and Toivonen (1996), given a time interval ] , [ t t , an
episode ) ,..., , (
2 1 k
e e e P is said to occur at interval ] , [ t t if t
1
e .T and t
k
e .T. Define
an occurrence of ) ,..., , (
2 1 k
e e e P at interval ] , [ t t as minimal if there does not exist
another of occurrence of ) ,..., , (
2 1 k
e e e P at the subinterval of ] , [ ] , [ t t u u . Given a
threshold of window (representing timestamp bounds), the frequency of ) ,..., , (
2 1 k
e e e P
31
is defined as frequency(P) = |{ ) ( | ] , [ window t t t t and the occurrence of P at interval
] , [ t t is minimal}| . Briefly speaking, the frequency of ) ,..., , (
2 1 k
e e e P in the event
sequence S is the total number of its minimal occurrence in any interval smaller than
window. So, given another threshold minfrequency (representing minimum frequency), an
episode ) ,..., , (
2 1 k
e e e P is called frequent if cy minfrequen
k n
P frequency

+ 1
) (
. Since in our
domain k << n (k is usually much smaller than n)
n
P frequency
k n
P frequency ) (
1
) (

+
will
hold. Therefore, in our implementation, an episode will be considered frequent, if
cy minfrequen
n
P frequency

) (
.
In Mannila and Toivonens algorithm (1996), a frequency episode is similar to a
large itemset of an association rule. If an episode ) ,..., , (
2 1 k
e e e P is frequent, any non-
empty sub-episode of P, e.g., P P , must be frequent, too, because frequency( P )
frequency(P) minfrequency. So, the frequency episodes of size k can be directly
constructed from the set of frequency episodes of size k-1. One major difference between
an itemset and a simple serial episode is that the events in an episode are ordered. For
example, episode } , {
2 1
E E is obviously different from episode } , {
1 2
E E . Define
k
L as
the set of all the k-size frequency episodes and
k
C is the set of all the k-size episodes that
are also candidates for frequency episodes. The following algorithm shows the
construction of
k
C from
1 k
L (k 2).
32
;
k
C
select } . , . ,..., . , . {
1 1 2 1 k k
e Q e P e P e P into
k
C
from
1 k
L
where ) ) ,..., , ( (
1 1 2 1

k k
L e e e P
) ) ,..., , ( (
1 1 2 1

k k
L e e e Q
(j, 2 j k-1,
1
. .

j j
e Q e P );
forall episodes
k k
C e e e R ) ,..., , (
2 1
do begin
if (there exists a sub-episode R e e e R
k

) ,..., , (
1 2 1
and
1

k
L R )
then } {R C C
k k
;
return
k
C ;
For example, suppose
2
L = {(
1
E ,
1
E ) (
1
E ,
2
E ) (
2
E ,
1
E )},
3
C will be {(
1
E ,
1
E ,
1
E ) (
1
E ,
1
E ,
2
E ) (
1
E ,
2
E ,
1
E ) (
2
E ,
1
E ,
1
E )}. The episode (
2
E ,
1
E ,
2
E ) is deleted
from
3
C because one of its sub-episodes (
2
E ,
2
E )
2
L .
According to Mannila and Toivonen (1996), the construction of
k
L for frequency
episode mining will be similar to algorithm Apriori except for the difference between
calculating episode frequencies and calculating itemset supports. Like association rules,
episode rules can be also directly established from frequency episodes. Given a frequency
episode ) ,..., , (
2 1 k
e e e P , there will be k1 non-empty ordered sub-episodes
P e e e P
i i
) ,..., , (
2 1
where 1 i k-1. Then given another threshold minconfidence
(representing minimum confidence), a simple serial episode rule can be constructed as
i
P

i
Q , c, s, w, where P e e e P
i i
) ,..., , (
2 1
, P P P e e e Q
i k i i i

+ +
) ,..., , (
2 1
,
Figure 4.6 Candidate Generation Algorithm Based on Work of Mannila and Toivonen
(1996)
33
s=frequency(P)minfrequency, nce minconfide
P frequency
P frequency
c
i

) (
) (
, and w = window.
Here according to (Lee, Stolfo, and Mok 1998), both
i
P and
i
Q are specified in the same
timestamp bound, i.e., the same threshold w.
The last episode rule
1 k
P
1 k
Q , c, s, w is of most interest since it can be used
to predict the
th
k event given the previous k-1 events. So, this kind of episode rule is
used in our experiments for intrusion detection.
34
CHAPTER V
INTEGRATION OF FUZZY LOGIC WITH DATA MINING
Although association rules and frequency episodes can be mined from audit data
for anomaly intrusion detection, the mined rules or episodes are at the data level. This
immediate dependency on data may limit the flexibility of intrusion detection. So, the
machine learning component in IIDM will be designed to extract more abstract patterns
at a higher level by integrating fuzzy logic with association rules and frequency episodes.
5.1 Fuzzy Association Rules
Many quantitative features are involved in intrusion detection. As explained in
Chapter I, SRIs NIDES classifies the statistical measures into four types: ordinal
measures, categorical measures, binary categorical measures, and linear categorical
measures (Lunt 1993). Both ordinal measures and linear categorical measures are
quantitative. SRIs Event Monitoring Enabling Response to Anomalous Live
Disturbances (EMERALD) also divides the network traffic statistical measures into four
classes: categorical measures, continuous measures, intensity measures, and event
distribution measures (Porras and Valdes 1998). Continuous measures, e.g., the
connection duration, and intensity measures, e.g., the number of packets during a unit of
time such as 60 seconds, are also quantitative. Our hypothesis is that it is possible to
35
make data mining methods more flexible in processing quantitative data for intrusion
detection by integrating fuzzy logic with both association rules and frequency episodes.
5.1.1 Related Work
Association rules were originally used to describe binary data, such as the
occurrence of an item in a transaction, instead of quantitative data, such as the number of
items in a transaction. Accordingly, association rules are divided into two types: boolean
association rules and quantitative association rules (Srikant and Agrawal 1996). To find
quantitative association rules, Srikant and Agrawal (1996) have proposed to partition
quantitative attributes into different intervals.
Unfortunately, a sharp boundary problem results from using interval partitions
(Kuok, Fu, and Wong 1998). For example, suppose [1, 5] and [6, 10] are two intervals
created on an quantitative attribute. If the minimum support threshold is set at 30%, the
interval [6, 10] will not gain enough support regardless of the large support near its left
boundary, as shown in Figure 5.1. That is to say, although the value 5 has a large support
and lies near the interval [6, 10], it will not make any contribution on counting the
support of [6, 10].
0%
5%
10%
15%
20%
1 2 3 4 5 6 7 8 9 10
quantitative attribute
s
u
p
p
o
r
t
Figure 5.1 Example of Sharp Boundary Problem
36
In intrusion detection, the sharp separation of intervals may raise additional
problems. For example, suppose the interval [1, 5] is mined as a normal pattern for the
quantitative attribute. The values 6 and 10 will both be considered abnormal regardless of
the difference in their deviations from the normal pattern. Likewise, a normal behavior
with a small variance may fall outside the interval representing a normal pattern and be
considered an anomaly. Similarly, an intrusion with a small variance may fall inside the
interval and be undetected.
To address the sharp boundary problem, Kuok, Fu, and Wong (1998) have
proposed to mine fuzzy association rules by using fuzzy sets to categorize a quantitative
attribute. In the above example, the two intervals will be replaced by two fuzzy sets,
suppose the value 5 has membership degree of 0.9 in the first set and 0.3 in the second
set. Then it will contribute 0.9 to the support of the first fuzzy set and 0.3 to the second
one. However, this means that the value 5 will be more important than other values since
the sum of its contributions to different fuzzy sets has become greater than 1. In our
method we address this shortcoming of Kuok, Fu, and Wongs approach by introducing
an additional normalization process.
5.1.2 Fuzzy Association Rules
According to Kuok, Fu, and Wong's method (1998), suppose we are given the
complete item set I = {
1
i ,
2
i , ,
m
i } where each
j
i (1 j m) denotes a categorical or
quantitative (fuzzy) attribute. We introduce ) (
j
i f to represent the maximum number of
categories (if
j
i is categorical) or the maximum number of fuzzy sets (if
j
i is fuzzy) and
37
) , ( v l m
j
i
to represent the membership degree of v on the
th
l category or fuzzy set of
j
i .
If
j
i is categorical, 0 ) , ( v l m
j
i
or 1 ) , ( v l m
j
i
. If
j
i is fuzzy, 1 ) , ( 0 v l m
j
i
. Srikant
and Agrawal (1996) introduce the idea of mapping the categories (or fuzzy sets) of an
attribute to a set of consecutive integers. Then an itemset
k
X (1 k m) can be
expressed as } ..., , {
2 2 1 1 k k
k
c item c item c item X where
} . ,..., . , . {
2 1 k
k k k
item X item X item X I and for all j (1 j k), ) ( 1
j j
i f c .
So, given a transaction } . ,..., . , . {
2 1 m
i T i T i T T ,
j
i T. (1 j m) represents a value
of the
th
j attribute and can be mapped to { )) . , ( , (
j i
i T l m l
j
| for all l, ) ( 1
j
i f l }.
However, when using Kuok, Fu, and Wongs algorithm, if
j
i is fuzzy, ) . , (
) (
1
j
i f
l
i
i T l m
j
j

does not always equal to 1. We have developed a normalization process as follows:

'

) . , (
) . , (
) . , (
) . , (
) (
1
j i
j
i f
l
i
j i
j i
i T l m
i T l m
i T l m
i T l m
j
j
j
j
j
Then, for an itemset } ,..., , {
2 2 1 1 k k
k
c item c item c item X where 1 k m,
its support contributed by T will be:
)) . .( , . (
1
.
j
k
j
k
k
j
item X
item X T c X m
j
k

.
if ij is fuzzy;
if ij is categorical.
38
Here we use the product to calculate an itemsets support because given a transaction
} . ,..., . , . {
2 1 m
i T i T i T T and any attribute set } ,..., , {
2 1 k
item item item (1 k m),
1 )) . , ( (
)] ( , 1 [ 1



j j
j
item f c
j j
k
j
item
item T c m will hold. That is to say, for any item or
any combination of items, the support from a transaction will be always 1.
Accordingly, the algorithm for constructing
k
C from
1 k
L (k 2) will look like:
;
k
C
select
} . . , . . ...,
,..., . . , . . {
1
1
1
1
1
1
1
1
2
1
2
1
1
1
1
1




k
k
k
k
k
k
k
k
k k k k
c Y item Y c X item X
c X item X c X item X
into
k
C
from
1 k
L
where ) (
1
1

k
k
L X
) (
1
1

k
k
L Y
(j, 1 j k-2, ) . . ( ) . . (
1 1 1 1
j
k
j
k
j
k
j
k
c Y c X item Y item X

)
); . . (
1
1
1
1

<
k
k
k
k
item Y item X
forall itemsets
k
k
C Z do begin
if (there exists a sub-itemset
k k
Z Z
1
and
1
1

k
k
L Z )
then } {
k
k k
Z C C ;
return
k
C ;
The rest of the algorithm for fuzzy association rules is similar to the algorithm Apriori for
Boolean association rules (Agrawal and Srikant 1994).
Figure 5.2 Candidate Generation Algorithm for Fuzzy Association Rules
39
In this algorithm, normalization is introduced to ensure that every transaction is
counted only one time for an item or any combination of items, either categorical or
fuzzy. For example, suppose I = {level, age} where level is a categorical attribute with
the domain of {freshman, sophomore, junior, senior, graduate} and age is a quantitative
attribute with three fuzzy sets {young, medium, old}. A transaction T = {graduate, 25}
will be mapped to {{(graduate, 1)}, {(young, 0.2), (medium, 0.9), (old, 0.1)}}. Without
normalization, it would increase the support of itemset {level = graduate, age = young}
by 0.2, the support of itemset {level = graduate, age = medium} by 0.9, and the support
of itemset {level = graduate, age = old} by 0.1. That is to say, this transaction will be
counted 0.2+0.9+0.1=1.2 times for the item age. However, it is unreasonable for one
transaction to contribute more than others. In contrast, the normalization process will
further transform the transaction T into {{(graduate, 1)}, {(young, 0.167), (medium,
0.75), (old, 0.083)}}, for a total contribution of 1.0 for the item age.
5.2 Fuzzy Frequency Episodes
In this section, we propose an idea of integrating fuzzy logic with frequency
episodes. The need to develop fuzzy frequency episodes comes from the involvement of
quantitative attributes in an event. That is to say, given the set of event attributes A =
{
1
a ,
2
a , ,
m
a }, each attribute
j
a (1 j m) may be categorical or quantitative
(fuzzy). Suppose ) (
j
a f represents the maximum number of categories (if
j
a is
categorical) or the maximum number of fuzzy sets (if
j
a is fuzzy), and ) , ( v l m
j
a
represents the membership degree of v in the
th
l category or fuzzy set of
j
a . If
j
a is
40
categorical, 0 ) , ( v l m
j
a
or 1 ) , ( v l m
j
a
. If
j
a is fuzzy, 1 ) , ( 0 v l m
j
a
. Similarly,
for an event attribute, its categories or fuzzy sets can be mapped to consecutive integers.
Then an event variable
k
e can be expressed as } ..., , {
2 2 1 1 k k
k
c attr c attr c attr e
where } . ,..., . , . {
2 1 k
k k k
attr e attr e attr e A and for all j (1 j k), ) ( 1
j j
a f c . We
define two event variables } ..., , {
2 2 1 1 p p
p
c attr c attr c attr e and
} ..., , {
2 2 1 1 q q
q
c attr c attr c attr e as homogeneous, if
} . ,..., . , . {
2 1 p
p p p
attr e attr e attr e = } . ,..., . , . {
2 1 q
q q q
attr e attr e attr e , which also indicates
that p = q. It is obvious that an event variable is homogeneous to itself.
So, given an event } . ,..., . , . {
2 1 m
a E a E a E E ,
j
a E. (1 j m) represents a value
of the
th
j attribute and can be mapped to { )) . , ( , (
j a
a E l m l
j
| for all l, ) ( 1
j
a f l }.
However, if
j
a is fuzzy, ) . , (
) (
1
j
a f
l
a
a E l m
j
j

does not always equal to 1. A normalization


process is used as follows:

'

) . , (
) . , (
) . , (
) . , (
) (
1
j a
j
a f
l
a
j a
j a
a E l m
a E l m
a E l m
a E l m
j
j
j
j
j
Then, for an event variable } ,..., , {
2 2 1 1 k k
k
c attr c attr c attr e where 1 k
m, its occurrence in E is no longer counted as either 0 or 1. Instead, it is defined as:
if aj is fuzzy;
if aj is categorical.
41
)) . .( , . ( ) , (
1
.
j
k
j
k
k
j
attr e
k
attr e E c e m E e occurrence
j
k

.
And the minimal occurrence of an episode is the product of the occurrences of its event
variables.
That is to say, an event E may support several event variable occurrences due to
the introduction of fuzzy sets. However, a side effect may arise. For example, consider
the event sequence {
1
E ,
2
E ,
3
E } within the window threshold. A, B, C, and D are event
variables in which A and B are homogeneous but A B, and C and D are homogeneous
but C D. Suppose 8 . 0 ) , (
1
E A occurrence , 2 . 0 ) , (
1
E B occurrence ,
1 . 0 ) , (
2
E A occurrence , 9 . 0 ) , (
2
E B occurrence , 9 . 0 ) , (
3
E C occurrence , and
1 . 0 ) , (
3
E D occurrence . Then the minimal occurrence of episode {A, C} will become
0.09 because } . , . {
3 2
C E A E is minimal by replacing } . , . {
3 1
C E A E which will contribute
0.72. So, a small occurrence of an event variable may change the minimal occurrence of
an episode in the event sequence.
To address this problem, we introduce another user-specified threshold
minoccurrence to represent the minimum occurrence required for an event variabl e. So,
given an event variable
k
e , if nce minoccurre E e occurrence
k
< ) , ( , it will be claimed not
to occur in E. In detail, the following normalization process will be further conducted:
) , ( E e occurrence
k

q
e
q
k
E e occurrence
E e occurrence
) , (
) , (
0 if ( nce minoccurre E e occurrence
k
< ) , ( );
if ( nce minoccurre E e occurrence
k
) , ( ).
42
Here every
q
e is homogeneous to
k
e and nce minoccurre E e occurrence
q
) , ( . For
instance, if minoccurrence = 0.2,
1
E will contribute 0.8 to A and 0.2 to B,
2
E will
contribute 1 to B, and
3
E will contribute 1 to C. As a matter of fact, if minoccurrence is
set above 0.5, for any event, only one event variable will be claimed to occur in it and its
occurrence will be normalized to 1. In this case, it will be the same as categorizing every
quantitative attribute by intervals.
Other than the difference in calculating the frequency (or minimal occurrence) of
an episode, the rest of this algorithm is similar to Mannila and Toivonens algorithm
(1996) for mining frequency episodes.
43
CHAPTER VI
EXPERIMENTS AND RESULTS
Association rules and frequency episodes have been proposed for feature
selection, as well as audit data gathering (Lee, Stolfo, and Mok 1998). On the other hand,
the association rules and episode rules mined from training data that represents normal
behavior can be also directly used for anomaly detection (Lee, Stolfo, and Mok 1998).
However, these rules are usually at the data level. For example, for the quantitative
feature of connection duration, a rule may contain such a component as connection
duration = 5 seconds or 5 seconds connection duration 10 seconds. With the
integration of fuzzy logic, more general rules can be produced at a higher and more
abstract level.
Another advantage resulting from the integration of fuzzy logic is that fuzzy
association rules and fuzzy frequency episodes can be applied to temporal statistical
measurements which are quantitative and security-related. Statistical analysis has been
widely used to construct normal patterns for anomaly detection in systems such as SRIs
IDES and NIDES. Some statistical measurements have been also proven to be able to
improve the accuracy of intrusion detection (Lee and Stolfo 1998). However, these
statistical features are usually incorporated as additional measurements manually. By
44
using fuzzy association rules and fuzzy frequency episodes, normal patterns for these
statistical features can be automatically created and used for anomaly detection.
6.1 Anomaly Detection
6.1.1 Experiment Set 1
The first set of experiments was designed to investigate the applicability of fuzzy
association rules and fuzzy frequency episodes for anomaly detection. According to (Lee,
Stolfo and Mok 1998), since a large amount of actual intrusion data is usually very hard
to collect, some normal data with different behavior than that used for training can be
treated as anomalous. One of the servers in the Department of Computer Science at
Mississippi State University has been monitored and its real-time network traffic data has
been collected by tcpdump. Data preprocessing is conducted by use of sanitize
(downloaded from http://ita.ee.lbl.gov/html/software.html on 1 March 1999). Porras and
Valdes (1998) and Lee and Stolfo (1998) suggest several quantitative features of network
traffic that they feel can be used for intrusion detection. Based on their suggestions, a
program has been written to extract the following four temporal statistical measurements
from the network traffic data:
SN the number of SYN flags appearing in TCP packet headers during last 2 seconds;
FN the number of FIN flags appearing in TCP packet headers during last 2 seconds;
RN the number of RST flags appearing in TCP packet headers during last 2 seconds;
PN the number of different destination ports during last 2 seconds.
Here statistical computation is done for overlapping 2 second time periods as shown in
Figure 6.1.
45
Each of the above four quantitative features is viewed as a fuzzy variable and is
divided into three fuzzy sets: LOW, MEDIUM, and HIGH. Membership function
definitions have been developed for fuzzy variables representing each of the features of
the network being monitored. The fuzzy association rule algorithm has been applied to
mine the correlation among the first three features, and the fuzzy frequency episode
algorithm has been applied to mine sequential patterns for the last feature.
The network traffic data was partitioned into different segments according to the
time slots when the sets were collected (i.e., morning, afternoon, evening, and night)
since different time slots likely exihibit different behavior. In this experiment, traffic data
in the afternoon was used as training data. Anomaly detection was then conducted on
traffic data from afternoon, evening, and night. A detailed specification of the training
and test data sets is given in Table 6.1.
0.0 1.0 2.0 3.0 4.0 5.0 6.0 Timestamp
first 2 seconds
second 2 seconds
third 2 seconds
...
Figure 6.1 Specification of Temporal Statistical Measurements Used in the Experiments
46
Table 6.1
Specification of Training and Test Data Sets
Data Sets Time Slots When Data Sets Are Collected
1 hour of training data 13:00 14:00, Tuesday, 23 March 1999
2 hours of training data 13:00 15:00, Tuesday, 23 March 1999
3 hours of training data 13:00 16:00, Tuesday, 23 March 1999 Training
6 hours of training data
13:00 16:00, Friday, 19 March 1999 &
13:00 16:00, Tuesday, 23 March 1999
T1 13:00 14:00, Wednesday, 24 March 1999
T2 14:00 15:00, Wednesday, 24 March 1999
T3 15:00 16:00, Wednesday, 24 March 1999
T4 18:00 19:00, Tuesday, 23 March 1999
T5 19:00 20:00, Tuesday, 23 March 1999
T6 20:00 21:00, Tuesday, 23 March 1999
T7 0:00 1:00, Wednesday, 24 March 1999
T8 1:00 2:00, Wednesday, 24 March 1999
Test
T9 2:00 3:00, Wednesday, 24 March 1999
Normal patterns (represented by fuzzy association rules and fuzzy episode rules)
are first established by mining the training data. An example of a fuzzy association rule
mined from the training data is: { SN = LOW, FN = LOW } { RN = LOW }, 0.924,
0.49. This means the pattern { SN = LOW, FN = LOW, RN = LOW } occurred in 49% of
the training cases. In addition, when { SN = LOW, FN = LOW } occurs, there will be
92.4% probability that { RN = LOW } will also occur. An example of a fuzzy episode
rule is: { PN = LOW, PN = MEDIUM } { PN = MEDIUM }, 0.854, 0.108, 10 seconds.
This means that with a window threshold of 10 seconds, the frequency of the serial
episode { PN = LOW, PN = MEDIUM, PN = MEDIUM } is 10.8% and when { PN =
LOW, PN = MEDIUM } occurs, { PN = MEDIUM } will follow with an 85.4%
probability.
47
Then for each test case, new patterns were mined using the same algorithms and
the same parameters. These new patterns were then compared to the normal patterns
created from the training data. If they are similar enough, no intrusion is detected;
otherwise, an anomaly will be alarmed.
The similarity function proposed in (Lee, Stolfo, and Mok 1998) and (Lee, Stolfo,
and Mok 1999) used a user-defined threshold, e.g., 5%. Given two rules with the same
LHS and RHS, if both their confidences and their supports are within 5% of each other,
these two rules are considered similar. This approach exhibits Kuok, Fu, and Wongs
(1998) sharp boundary problem. For example, given a rule R which represents a normal
pattern and two test rules R and R, if both R and R fall inside the threshold, there will
be no measurement of the difference between the similarity of R and R and the similarity
of R and R. Likewise, when both R and R fall outside the threshold, there is no
measure of their dissimilarities with R.
Instead, we introduce a new similarity evaluation function which is continuous
and monotonic. Given a normal association rule:
R1: X Y, c, s,
and a new association rule:
R2: X Y, c, s,
where X, Y, X, and Y are itemsets, define
) , (
2 1
R R similarity

,
_

,
_

s
s s
c
c c | |
,
| |
max 1 , 0 max
0
Given two rule sets S1 (of normal patterns) and S2 (of new patterns), define
if ( (XY) (XY) );
if ( (X=Y) (X=Y) ).
48
) , (
2 1
2 2
1 1
R R similarity s
S R
S R



.
Then, like the definition in (Lee, Stolfo, and Mok 1998), we define
| |
*
| |
) , (
2 1
2 1
S
s
S
s
S S similarity ,
where |S1| and |S2| are the total number of rules in S1 and S2, respectively. Here
| |
1
S
s
is
actually the percentage of normal patterns covered by the new patterns, and
| 2 | S
s
is the
percentage of new patterns covered by the normal patterns. The similarity evaluation for
fuzzy episode rules is almost the same as for fuzzy association rules, except that there is
one more parameter w (of window length) for an episode rule. It is required that the
window thresholds be identical when two episode rules are evaluated for their similarity.
The purpose of the first experiment in this set was to determine the amount of
training data (duration) needed to demonstrate differences in behavior for different time
periods. In this experiment, training sets of different duration (all from the same time
period, i.e., afternoon) were used to mine fuzzy association rules (see Table 6.1 for a
more detailed description of the data). The similarity of each set of rules derived from
training data of different duration was compared to test data for different time periods.
The results from this experiment are shown in Figure 6.2. These results show that the
fuzzy association rules derived from test data for the same time of the day as the training
data (afternoon) were very similar to the rules derived from the training data. Rules
derived from evening data were less similar and rules derived from late night data were
49
the least similar. This confirms the hypothesis that fuzzy association rules are able to
distinguish different behavior. This experiment also demonstrates that there is no
difference in the similarity measures when the duration of training data is increased from
3 hours to 6 hours.
The purpose of the second experiment in this set was to further demonstrate the
capability of fuzzy association rules for anomaly detection. In this experiment, 3 hours of
traffic data (afternoon) was selected as the training data based on results from the first
experiment. Nine test data sets from three different time periods, i.e., afternoon, evening,
and late night were used (see Table 6.1 for a more detailed description of the data). The
similarity of fuzzy association rules derived from training data was compared to each test
data set. The results from this experiment are shown in Figure 6.3. The results show that
the fuzzy association rules derived from the test data sets for the same time period as the
training data (afternoon) were more similar to the rules derived from training data than
any other test data set from different time periods, i.e., evening, and late night. Rules
derived from any evening test data set were more similar than rules derived from any late
night test data set. This further confirms the capability of fuzzy association rules in
distinguishing different behavior.
The next two experiments in this set were similar to the first two experiments,
except that fuzzy episode rules were mined for anomaly detection instead of fuzzy
association rules. The results are consistent with those from the first two experiments and
are shown in Figure 6.4 and Figure 6.5.
50
Some observations can be made from these experimental results: (1) in both the
fuzzy association rule training process and the fuzzy frequency episode training process,
there are no significant changes in similarity when using 3 hours of training data as
opposed to 6 hours of training data. Therefore, 3 hours of training data was used for the
remaining experiments; (2) similar results were obtained from both fuzzy association
rules and fuzzy frequency episodes. The test cases of T1, T2, and T3 are most similar to
normal patterns since all of them, as well as training data, are network traffic in the
afternoon. T7, T8, and T9 are most different from normal patterns; this is to be expected
since this data represents network traffic in the middle of the night when the usage of the
network is lightest.
51
Figure 6.2: Comparison of Similarities Between Different Training and Test Data Sets
for Fuzzy Association Rules (minconfidence=0.6; minsupport=0.1)
Training Data Sets: 1 hour of training data, 2 hours of training data, 3 hours of
training data, and 6 hours of training data (all from the afternoon)
Test Data Sets: T1 (afternoon), T4 (evening), and T7 (late night)
Figure 6.3: Comparison of Similarities Between 3 Hour Training Data Set and Different
Test Data Sets for Fuzzy Association Rules (minconfidence=0.6;
minsupport=0.1)
Training Data Set: 3 hours of training data (afternoon)
Test Data Sets: T1, T2, T3 (afternoon), T4, T5, T6 (evening), and
T7, T8, T9 (late night)
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6
Hours of Training Data
S
i
m
i
l
a
r
i
t
y
T1 T4 T7
0
0.2
0.4
0.6
0.8
1
S
i
m
i
l
a
r
i
t
y
Test Data Sets
Similarity 0.773 0.795 0.775 0.564 0.513 0.288 0.1 0.116 0.0804
T1 T2 T3 T4 T5 T6 T7 T8 T9
52
Figure 6.4: Comparison of Similarities Between Different Training and Test Data Sets
for Fuzzy Episode Rules (minconfidence=0.6; minsupport=0.1; minoccurrence=0.3; window=10s)
Training Data Sets: 1 hour training data, 2 hours of training data, 3 hours of
training data, and 6 hours of training data (all from the afternoon)
Test Data Sets: T1 (afternoon), T4 (evening), and T7 (late night)
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6
Hours of Training Data
S
i
m
i
l
a
r
i
t
y
T1 T4 T7
0
0.2
0.4
0.6
0.8
1
S
i
m
i
l
a
r
i
t
y
Test Data Sets
Similarity 0.681 0.883 0.89 0.207 0.171 0.0645 7.37E-05 8.91E-05 1.67E-05
T1 T2 T3 T4 T5 T6 T7 T8 T9
Figure 6.5: Comparison of Similarities Between 3 Hour Training Data Set and Different
Test Data Sets for Fuzzy Episode Rules (minconfidence=0.6;
minsupport=0.1; minoccurrence=0.3; window=10s)
Training Data Set: 3 hours of training data (afternoon)
Test Data Sets: T1, T2, T3 (afternoon), T4, T5, T6 (evening), and
T7, T8, T9 (late night)
53
The results have also shown that, given the same training data set and test data
set, their similarity as measured by mining fuzzy association rules is different from their
similarity as measured by mining fuzzy episode rules. This is not unexpected, since fuzzy
association rules and fuzzy episode rules use different features, which may have different
effects on anomaly detection. That is also one of the reasons why our Intelligent Intrusion
Detection Model (IIDM) incorporates different detection modules, each of which may
examine different aspects of the same data.
6.1.2 Experiment Set 2
The second set of experiments was designed to further test the capability of fuzzy
association rules and fuzzy frequency episodes for anomaly detection by using simulated
intrusion data. Three network traffic data sets in tcpdump format were downloaded from
http://iris.cs.uml.edu:8080 and used for the second set of experiments. These data sets
were collected by the Institute for Visualization and Perception Research at University of
Massachusetts Lowell with the purpose of providing an evaluation method for different
data mining techniques or some combinations of these techniques (The Institute for
Visualization and Perception Research 1998). Among these data sets, baseline represents
normal patterns, network1 includes simulated IP spoofing intrusions in which an intruder
tries to access a remote host by guessing its IP sequence numbers, and network3 includes
simulated port scanning intrusions in which an intruder attempts to collect information
about hosts or applications running on the network.
A program was first written to extract information about the same four temporal
statistical measurements used in the previous set of experiments directly from the raw
54
data. The data set baseline was segmented into two parts. The first part was used as
training data and the second part was used as test data. Network1 and network3 were used
as the other two test data sets.
The purpose of the first experiment in this set was to test the capability of fuzzy
association rules for distinguishing simulated intrusions from normal behavior. The
purpose of the second experiment in this set was to test the capability of fuzzy episode
rules for distinguishing simulated intrusions from normal behavior. The results shown in
Figures 6.6 and 6.7 provide additional evidence that anomalies can be detected by use of
fuzzy association rules and fuzzy episode rules.
Figure 6.6: Comparison of Similarities Between Training Data Set and Different
Test Data Sets for Fuzzy Association Rules (minconfidence=0.6;
minsupport=0.1)
Training Data Set: baseline (first half; representing normal behavior)
Test Data Sets: baseline (second half; representing normal behavior),
network1 (including simulated IP spoofing intrusions), and
network3 (including simulated port scanning intrusions)
0
0.2
0.4
0.6
0.8
1
S
i
m
i
l
a
r
i
t
y
Test Data Sets
Similarity 0.744 0.309 0.315
Baseline Network1 Network3
55
6.2 Real-time Intrusion Detection
Although (fuzzy) association rules and (fuzzy) frequency episodes can be directly
used for anomaly detection, it is generally believed that association rules and frequency
episodes cannot be used to detect anomalies at the record (e.g., a connection record or a
packet header record) level (Lee, Stolfo, and Mok 1998). That is to say, they cannot be
used for real-time detection directly. Instead, they are usually used to select features that
will be significant for real-time detection from a large amount of data.
In our experiments, we are investigating the possibility of applying fuzzy episode
rules for near real-time intrusion detection. In the Time-based Inductive Machine (TIM)
0
0.2
0.4
0.6
0.8
1
S
i
m
i
l
a
r
i
t
y
Test Data Sets
Similarity 0.885 0 0.000155
Baseline Network1 Network3
Figure 6.7: Comparison of Similarities Between Training Data Set and Different
Test Data Sets for Fuzzy Episode Rules (minconfidence=0.6;
minsupport=0.1; minoccurrence=0.3; window=10s)
Training Data Set: baseline (first half; representing normal behavior)
Test Data Sets: baseline (second half; representing normal behavior),
network1 (including simulated IP spoofing intrusions), and
network3 (including simulated port scanning intrusions)
56
proposed by Teng, Chen, and Lu (1990), a sequential pattern with 100% certainty can be
used to detect anomalies. For example, given a normal pattern like A -- B -- C
(D=100%), the sequence A -- B -- C -- E will be marked as an anomaly since it is
believed that A -- B -- C is always followed by D with no uncertainty. Similarly, we
introduce the idea of using fuzzy episode rules with high confidence (e.g., 0.8) for
anomaly detection.
Suppose we are given the event sequence } ,..., , {
1 2 1

n
E E E S , the current event
n
E following S, and a fuzzy episode rule R: w s c e e e
k k
, , , ,...
1 1

, where k 1 and
every ) 1 ( k i e
i
is an event variable. For the episode } ,... {
1 1 k
e e , if its minimum
occurrence in S is x (x > 0), it then can be predicted, with the confidence of c, that
} , ,... {
1 1 k k
e e e

will also have a minimal occurrence in } {
n
E S + with the constraint that
k
e occurs in the event
n
E . And suppose the minimal occurrence of the episode
} , ,... {
1 1 k k
e e e

in the sequence } {
n
E S + is y, c
x
y
should also hold. In this case, event
n
E is said to match the episode rule R. On the other hand, like TIM, if the episode
} ,... {
1 1 k
e e has no minimum occurrence in S, or if x = 0, the episode rule R is said to be
mismatched and other methods are needed to detect the normality of event
n
E . Our
experiments show that a large window threshold (e.g., 15 seconds w 30 seconds) will
decrease the probability of mismatches.
Therefore, given the set of episode rules which are mined from training data and
represent normal patterns, if the current event
n
E does not match any episode rule, it
57
will be marked as an anomaly with some degree of belief. Because the confidence of an
episode rule is usually less than 1, it is obvious that we cannot detect an anomaly with
100% confidence. So, this is an approximate detection. However, it can provide some
helpful indications when an anomaly occurs. And it can be also used in cooperation with
other detection methods, such as misuse detection methods.
6.2.1 Experiment 3
This experiment was designed to demonstrate the applicability of fuzzy episode
rules in real-time intrusion detection. In this experiment, intrusions of the probing type
were simulated by use of mscan in the same network at the Computer Science
Department of Mississippi State University. Mscan is a software tool which can be used
to scan multiple systems. Fuzzy episode rules are mined from 3 hours of training data
(used in Experiment 1) for the feature of PN (the number of different destination ports
during last 2 seconds). Since the simulated intrusions usually take 1 to 1.5 minutes, test
data sets are established by collecting network traffic data for 3 minutes, with the goal of
covering the entire duration of every simulated intrusion. Six test data sets were collected
during 13:00 -- 14:00, Tuesday, 30 March, 1999 and 13:00 -- 14:00, Wednesday, 31
March, 1999. Among the six test data sets, T1, T2, and T3 represent normal data sets,
and T4, T5, and T6 represent intrusion data sets. The anomaly percentage of every test
data set is calculated as follows. Suppose we are given a sequence of n events for testing.
An event will be marked as an anomaly if, in the set of episode rules representing normal
behavior, there is no episode rule it matches. If the total number of anomaly events is m,
58
anomaly percentage = %. 100 *
n
m
Figure 6.8 shows our experimental results.
The results reveal clear differences in anomaly percentage between normal data
and intrusion data. Since there is no simulated intrusion in T1, T2, and T3, the anomaly
percentages for this data actually represents false positive error rates. Further analysis of
the results from T4, T5, and T6 shows that all have false positive error rates below 10%,
i.e., 4.44%, 6.67%, and 8.89%. However, the false negative error rates are relatively high
(about 40%). There are several reasons. First, only one feature, i.e., PN, is taken into
Figure 6.8 Anomaly Percentages of Different Test Data Sets in Real-time Intrusion
Detection
Rule Set: fuzzy episode rules (minconfidence=0.8; minsupport=0.1; minoccurrence=0.3; window=15s)
Training Data Set: 3 hour training data (representing normal behavior)
Test Data Sets: T1, T2, T3 (representing normal behavior), and
T4, T5, T6 (including simulated mscan intrusions)
0%
10%
20%
30%
40%
A
n
o
m
a
l
y

P
e
r
c
e
n
t
a
g
e

(
%
)
Test Data Sets
Anomaly Percentage 8.99% 9.55% 7.30% 25.60% 33.71% 32.39%
T1 T2 T3 T4 T5 T6
59
account. Second, the simulated intrusions are not evenly distributed along the time
according to this feature.
0
20
40
60
80
100
120
140
49 59 69 79 89 99 109 119 129
Time (seconds)
P
N

(
n
u
m
b
e
r

o
f

d
i
s
t
i
n
c
t

d
e
s
t
i
n
a
t
i
o
n

p
o
r
t
s

d
u
r
i
n
g

l
a
s
t

2

s
e
c
o
n
d
s
)
Test Data Set of T1
Test Data Set of T4
Figure 6.9 shows the distribution of feature PN with time from the test data sets
T1 and T4. Here the time duration is 90 seconds. In test data set T4, the 49
th
is the first
second and the 138
th
second is the last second of the simulated intrusion. It is obvious that
the simulated intrusion in T4 has not shown much deviation from normal behavior in the
last 20 seconds, which may contribute much to the high false negative error rate.
However, it has been also found that in every intrusion test case (T4, T5, or T6), an
anomaly is alarmed in the first second the simulated intrusion occurs, although this is
only an approximate real-time detection.
Figure 6.9 Distribution of the Feature PN with Time from Test Data Sets T1
(Representing Normal Behavior) and T4 (Representing Simulated mscan
Intrusions)
60
6.2.2 Experiment 4
This experiment was conducted in order to compare the intrusion detection
performance, especially the false positive error rate, between fuzzy episode rules and
non-fuzzy episode rules (by use of intervals). The same training data set and six test data
sets as in Experiment 3 were used. Both fuzzy episode rules and non-fuzzy episode rules
were mined from training data for the feature of PN (the number of different destination
ports during last 2 seconds), which was divided into three fuzzy sets (for fuzzy episode
rules) or three intervals (for non-fuzzy episode rules): LOW, MEDIUM, and HIGH.
Figure 6.10 shows a comparison of the false positive error rates on test data sets between
fuzzy episode rules and non-fuzzy episode rules.
61
The experimental results demonstrate that the false positive error rates from fuzzy
episode rules are less than for non-fuzzy episode rules. That is to say, the error rate of
predicting a normal behavior as an intrusion is much lower for fuzzy episode rules than
non-fuzzy episode rules.
6.2.3 Experiment 5
The goal of this experiment was to determine the effect of the minconfidence
threshold on the false positive error rate and the false negative error rate.
Figure 6.10 Comparison of False Positive Error Rates of Fuzzy Episode Rules and
Non-Fuzzy Episode Rules
Rule Sets: fuzzy episode rules (minconfidence=0.8; minsupport=0.1; minoccurrence=0.3; window=15s)
non-fuzzy episode rules (minconfidence=0.8; minsupport=0.1; window=15s)
Training Data Set: 3 hour training data (representing normal behavior)
Test Data Set: T1, T2, T3 (representing normal behavior), and
T4, T5, T6 (including simulated mscan intrusions)
0%
5%
10%
15%
20%
F
a
l
s
e

P
o
s
i
t
i
v
e

E
r
r
o
r

R
a
t
e

(
%
)
Test Data Sets
Fuzzy 8.99% 9.55% 7.30% 4.44% 6.67% 8.89%
Non-Fuzzy 17.98% 12.92% 15.25% 11.11% 17.78% 11.11%
T1 T2 T3 T4 T5 T6
62
Table 6.2
Effects of the minconfidence Threshold on the False Positive Error Rate (FPER)
and the False Negative Error Rate (FNER)
Minconfidence 0.80 0.85 0.90
Number of Episode Rules Learned from 3
Hour Training Data Set (minsupport=0.1;
minoccurrence=0.3; window=15s)
19 11 5
T1 FPER 8.99% 30.34% 45.51%
T2 FPER 9.55% 33.15% 50.56%
T3 FPER 7.30% 39.16% 57.23%
FPER 4.44% 21.59% 36.36%
T4
FNER 45.56% 13.33% 7.78%
FPER 6.67% 56.82% 70.45%
T5
FNER 40.00% 0% 0%
FPER 8.89% 26.14% 36.36%
Test
Data
Sets
T6
FNER 45.56% 18.89% 12.22%
From Table 6.1, it can be seen that a higher minconfidence value will result in
higher false positive error rates and lower false negative error rates. Our experiments
have also shown that a higher minconfidence value will cause many more mismatches.
The main reason here is that a higher minconfidence value will reduce the number of
episode rules learned from training data, as shown in Table 6.1. If the rule number is too
low, these rules will not be able to cover patterns representing all normal behavior. This
will cause the false positive error rate to increase dramatically.
Our strategy here is to minimize the false positive error rate first. The false
negative error rate is expected to be minimized by introducing more features and/or by
63
using this method in conjunction with other intrusion detection methods. Using this
strategy, the minconfidence threshold suggested by our experiments is 0.8.
64
CHAPTER VII
CONCLUSION
Intrusion detection is an important but complex task for a computer system. Many
AI techniques have been widely used in intrusion detection systems. A research group at
Mississippi State University is investigating the development of an intelligent intrusion
detection system. This thesis has explored the practicality of integrating fuzzy logic with
data mining methods for intrusion detection.
Data mining methods are capable of extracting patterns automatically and
adaptively from a large amount of data. Association rules and frequency episodes have
been used to mine training data to established normal patterns for anomaly detection.
However, these patterns are usually at the data level, with the result that normal behavior
with a small variance may not match a pattern and will be considered anomalous. In
addition, an actual intrusion with a small deviation may match the normal patterns and
thus not be detected. We have demonstrated that the integration of fuzzy logic with
association rules and frequency episodes generates more abstract and flexible patterns for
anomaly detection.
There are two main reasons for introducing fuzzy logic for intrusion detection.
First, many quantitative features are involved in intrusion detection. Fuzzy set theory
provides a reasonable and efficient way to categorize these quantitative features
65
in order to establish high-level patterns. Second, security itself is fuzzy. For quantitative
features, there is no straight separation between normal operations and anomalies. So,
fuzzy association rules can be mined to find the abstract correlation among different
security-related features, both categorical and quantitative. Similarly, fuzzy episode rules
can be also mined to create the high-level sequential patterns representing normal
behavior.
We have extended previous work (Lee, Stolfo, and Mok 1998) in the areas of
fuzzy association rules and fuzzy frequency episodes. We add a normalization step to the
procedure for mining fuzzy association rules by Kuok, Fu, and Wong (1998) in order to
prevent one data instance from contributing more than others. We modify the procedure
of Mannila and Toivonen (1996) for mining frequency episodes to learn fuzzy frequency
episodes. We use fuzzy association rules and fuzzy frequency episodes to extract patterns
for temporal statistical measurements at a higher level than the data level. We have
developed a similarity evaluation function which is continuous and monotonic for the
application of fuzzy association rules and fuzzy frequency episodes in anomaly detection.
We also present a real-time intrusion detection method by using fuzzy episode rules. In
addition, our experimental results have shown the utility of fuzzy association rules and
fuzzy episode rules in intrusion detection.
We have also developed an architecture for integrating machine learning methods
with other intrusion detection methods. By using data mining algorithms to mine fuzzy
association rules and fuzzy frequency episodes, a machine learning component is
implemented and incorporated in our intelligent intrusion detection system. This
66
component will work as a background unit and learn normal patterns for anomaly
detection. This learning process is both automatic and incremental. This means that new
patterns can be learned from new training data and used to update old patterns adaptively.
67
REFERENCES
Agrawal, R., and R. Srikant. 1994. Fast algorithms for mining association rules. In
Proceedings of the 20
th
international conference on very large databases held in
Santiago, Chile, September 12-15, 1994, 487-99. San Francisco, CA: Morgan
Kaufmann. (Downloaded from http://www.almaden.ibm.com/cs/people/ragrawal/
papers/vldb94_rj.ps on 19 February 1999.)
Chapman, D., and E. Zwicky. 1995. Building internet firewall. Sebastopol, CA: OReilly
& Associates, Inc.
Crosbie, M., and G. Spafford. 1995. Active defense of a computer system using
autonomous agents. Purdue University. Department of Computer Science.
Technical Report 95-008.
Debar, H., M. Becker, and D. Siboni. 1992. A neural network component for an intrusion
detection system. In Proceedings of 1992 IEEE computer society symposium on
research in security and privacy held in Oakland, California, May 4-6, 1992, by
IEEE Computer Society, 240-50. Los Alamitos, CA: IEEE Computer Society Press.
Denning, D. 1986. An intrusion-detection model. In Proceedings of 1986 IEEE computer
society symposium on research in security and privacy held in Oakland, California,
April 7-9, 1986, by IEEE Computer Society, 118-31. Los Alamitos, CA: IEEE
Computer Society Press.
Frank, J. 1994. Artificial intelligence and intrusion detection: Current and future
directions. In Proceedings of the 17
th
national computer security conference held in
October, 1994. (Downloaded from http://seclab.cs.ucdavis.edu/papers.html on 2
February 1998.)
Gasser, M. 1988. Building a secure computer system. New York, NY: Van Nostrand
Reinhold Company Inc.
Heberlein, L., G. Dias, K. Levitt, B. Mukherjee, J. Wood, and D. Wolber. 1990. A
network security monitor. In Proceedings of 1990 IEEE computer society
symposium on research in security and privacy held in Oakland, California, May 7-
9, 1990, by IEEE Computer Society, 296-304. Los Alamitos, CA: IEEE Computer
Society Press.
68
Hodges, J., S. Bridges, and S. Yie. 1996. Preliminary results in the use of fuzzy logic for
a radiological waste characterization expert system. Mississippi State University.
Department of Computer Science. Technical Report 960626.
Ilgun, K., and A. Kemmerer.1995. State transition analysis: A rule-based intrusion
detection approach. IEEE Transaction on Software Engineering 21(3): 181-99.
Kuok, C., A. Fu, and M. Wong. 1998. Mining fuzzy association rules in databases.
SIGMOD Record 27(1): 41-6. (Downloaded from http://www.acm.org/sigs/sigmod/
record/issues/9803 on 1 March 1999.)
Lee, W., S. Stolfo, and K. Mok. 1998. Mining audit data to build intrusion detection
models. In Proceedings of the fourth international conference on knowledge
discovery and data mining held in New York, New York, August 27-31, 1998, edited
by Rakesh Agrawal, and Paul Stolorz, 66-72. New York, NY: AAAI Press.
Lee, W., and S. Stolfo. 1998. Data mining approaches for intrusion detection. In
Proceedings of the 7
th
USENIX security symposium, 1998. (Downloaded from
http://www.cs.columbia.edu/~sal/recent-papers.html on 10 March 1999.)
Lee, W., S. Stolfo, and K. Mok. 1999. A data mining framework for building intrusion
detection models. (Downloaded from http://www.cs.columbia.edu/~sal/
recent-paper.html on 10 March 1999.)
Lunt, T. 1993. Detecting intruders in computer systems. In Proceedings of 1993
conference on auditing and computer technology. (Downloaded from http://www2.
csl.sri.com/nides/index5.html on 3 February 1999.)
Lunt, T., and R. Jagannathan. 1988. A prototype real-time intrusion-detection expert
system. In Proceedings of 1988 IEEE computer society symposium on research in
security and privacy held in Oakland, California, April 18-21, 1988, by IEEE
Computer Society, 59-66. Los Alamitos, CA: IEEE Computer Society Press.
Mannila, H., and H. Toivonen. 1996. Discovering generalized episodes using minimal
occurrences. In Proceedings of the second international conference on knowledge
discovery and data mining held in Portland, Oregon, August, 1996, by AAAI Press,
146-51. (Downloaded from http://www.cs.Helsinki.FI/research/fdk/
datamining/pubs on 19 February 1999.)
Me, L. 1998. GASSATA, a genetic algorithm as an alternative tool for security audit trail
analysis. In Proceedings of the first international workshop on the recent advances
in intrusion detection held in Louvain-la-Neuve, Belgium, September 14-16, 1998.
(Downloaded from http://www.zurich-ibm.com/~dac/Prog_RAID98/
Table_of_content.html on 2 February 1999.)
69
Mukherjee, B., L. Heberlein, and K. Levitt. 1994. Network intrusion detection. IEEE
Network, May/June, 26-41.
Orchard, R. 1995. FuzzyCLIPS version 6.04 users guide. Knowledge System
Laboratory, National Research Council Canada.
Porras, P., and A. Valdes. 1998. Live traffic analysis of TCP/IP gateways. In Proceedings
of the 1998 ISOC symposium on network and distributed systems security held in
March, 1998. (Downloaded from http://www2.csl.sri.com/emerald/downloads.html
on 1 March 1999.)
Srikant, R., and R. Agrawal. 1996. Mining quantitative association rules in large
relational tables. In Proceedings of ACM SIGMOD international conference on
management of data held in June 4-6, 1996, by ACM Press, 1-12. (Downloaded
from http://www.almaden.ibm.com/cs/people/ragrawal/papers/sigmod96.ps on 19
February 1999.)
Stefik, M. 1995. Introduction to knowledge systems. San Francisco, CA: Morgan
Kaufmann Publishers, Inc.
Sundaram, A. 1996. An introduction to intrusion detection. (Downloaded from http://
www.cs.purdue.edu/coast/archive/data/categ24.html on 10 March 1999.)
Teng, H., K. Chen, and S. Lu. 1990. Adaptive real-time anomaly detection using
inductively generated sequential patterns. In Proceedings of 1990 IEEE computer
society symposium on research in security and privacy held in Oakland, California,
May 7-9, 1990, by IEEE Computer Society, 278-84. Los Alamitos, CA: IEEE
Computer Society Press.
The Institute for Visualization and Perception Research, University of Massachusetts
Lowell. 1998. Information Exploration Shootout. http://iris.cs.uml.edu:8080
(Accessed 1 March 1999).

Вам также может понравиться