Академический Документы
Профессиональный Документы
Культура Документы
a c
a
a <
2
c a +
2
2 1
,
_
a c
a
2
c a +
< c
1 c <
1 - S(, a, c)
S(, b-d, b) b
Z(, b, b+d) b <
S(, a, c) =
Z(, a, c) =
PI(, a, c) =
24
Here, temperature is a fuzzy variable and cold, hot, and warm are fuzzy
adjectives. The minimum value for temperature is 0 and the maximum value is 100.
C is the unit of these values.
Therefore, given a fuzzy variable and all its fuzzy set definitions, the values of the
different fuzzy sets at a specific point can be easily calculated. For example, suppose the
fuzzy variable weight has its fuzzy set definitions as follows:
( deftemplate weight
0 15 pounds
( ( light (0 1) (5 1) (6 0.5) (8 0) )
( average (5 0) (8 1) (11 0) )
( heavy (8 0) (10 0.5) (11 1) (15 1) ) ) )
Then, as shown in Figure 4.3, given a specific value v, e.g., v = 7 pounds, its
membership degrees in the fuzzy sets light, average, and heavy, will be 0.25, 0.667, and
0, respectively.
0
0.2
0.4
0.6
0.8
1
Figure 4.3 Calculation Method of Fuzzy Set Values
0 5 6 7 8 9 10 11 15
0.667
0.250
25
4.2 Data Mining Methods
Data mining methods have the ability to find new patterns from a large amount of
data automatically. Two data mining methods, association rules and frequency episodes,
have been proposed to mine audit data to find normal patterns for anomaly intrusion
detection (Lee, Stolfo, and Mok 1998).
4.2.1 Association Rules
Association rules originate from retail data analysis in business. A piece of sales
data, also called basket data, usually records information about a transaction, such as
transaction date and transaction items (Agrawal and Srikant 1994). Association rules can
be used to find the correlation among different items in a transaction. For example, when
a customer buys item A, item B will also be purchased by the customer with the
probability of 90%. So, item B is associated with item A.
Agrawral and Srikant (1994) have presented some fast algorithms to mine
association rules, including algorithm Apriori. In Agrawal and Srikants algorithm
Apriori (1994), suppose D = {
1
T ,
2
T , ,
n
T } is the transaction database with n
transactions in total and I = {
1
i ,
2
i , ,
m
i } is the set of all the items where each
j
i (1
j m) represents one kind of item. Then each transaction
l
T (1 l n) in D records the
items purchased, i.e.,
l
T I. Define an itemset as a non-empty subset of I. An
association rule will look like: X Y, c, s, where X I, Y I, and X Y = , i.e., X and
Y are disjoint itemsets. Here s represents the support of this association rule and c
represents the confidence of this association rule. Assume the number of transactions that
26
contains both the itemset X and the itemset Y is n , s = support(X Y) =
n
n
and c =
) (
) (
X support
Y X support
. Briefly speaking, support(X) can be viewed as the occurrence
frequency of the itemset X in the whole transaction database D, while c means that when
X is satisfied, there will be the certainty of c that Y is also true .
According to Agrawal and Srikant (1994), given two thresholds of minconfidence
(representing minimum confidence) and minsupport (representing minimum support), a
mining algorithm will find all such association rules as X Y, c, s where c
minconfidence and s minsupport. Define any itemset X as a large itemset if support(X)
minsupport. Usually, a mining algorithm involves two steps:
(1) Find all the large itemsets of different lengths.
(2) Construct association rules from every large itemset. This is actually a direct
mapping process. Given a large itemset L, for any non-empty subset of L, e.g.,
L , an association rule can be constructed as L (L L ), c , s , where
minsupport L support s ) ( and nce minconfide
L support
L support
c
) (
) (
.
Since the construction of association rules from large itemsets is a straightforward
process, algorithm Apriori focuses on how to find large itemsets.
We use } ,..., , {
2 1 k
k
item item item X I to represent an itemset of length k (1 k
m), i.e., an itemset that contains k items. Agrawal and Srikants algorithm Apriori (1994)
also defines
k
L as the set of all the k-length large itemsets and
k
C as a set of k-length
27
itemsets that are candidates for large itemsets. So,
k k
L C . Algorithm Apriori is based
on the following observation: any non-empty subset of a large itemset must be a large
itemset too. Suppose
k
X is a large itemset and
k l
X X (1 l < k). Since
minsupport X support X support
k l
) ( ) ( ,
l
X must be a large itemset, too. So,
k
C can
be directly constructed from
1 k
L (k 2) as shown in the following algorithm.
;
k
C
select } . , . ,..., . , . {
1
1
1
1
2
1
1
1
k
k
k
k k k
item Y item X item X item X into
k
C
from
1 k
L
where ) (
1
1
k
k
L X
) (
1
1
k
k
L Y
(j, 1 j k-2,
j
k
j
k
item Y item X . .
1 1
)
); . . (
1
1
1
1
<
k
k
k
k
item Y item X
forall itemsets
k
k
C Z do begin
if (there exists a sub-itemset
k k
Z Z
1
and
1
1
k
k
L Z )
then } {
k
k k
Z C C ;
return
k
C ;
In this algorithm, it is also clear that given an itemset of length k, e.g.,
} . ,..., . , . {
2 1 k
k k k k
item Z item Z item Z Z , there will be k sub-itemsets of length k-1 that
are } . {
l
k k
item Z Z , for all l, 1 l k.
Figure 4.4 Agrawal and Srikants Apriori Candidate Generation Algorithm (1994)
28
On the other hand, according to Agrawal and Srikant (1994), given
} ,..., , {
2 1
k
n
k k
k
X X X C ,
k
L can be constructed by simply scanning the transaction
database to calculate the support of every itemset in
k
C , since
)} ) ( ( ) ( ) 1 ( | { minsupport X support C X n j X L
k
j k
k
j
k
j k
. Moreover,
1
C can
be directly initialized as }} { },..., { }, {{
2 1 m
i i i . Details of algorithm Apriori are shown in
Figure 4.5.
Because one pass of the transaction database D will be able to construct
k
L from
k
C , if the maximum length of large itemsets is K (K < m), the cost of this algorithm will
be K passes of D, or (K*n).
29
Construct Lk:
forall transactions T D do begin
forall itemsets
k
X Ck do begin
if (
k
X T ) then
k
X .count++;
end
end
Lk = ;
forall itemsets
k
X Ck do begin
if (
k
X .count / n minsupport )
then Lk = Lk + {
k
X };
end
return Lk;
Lk =
Construct Association Rules
Succeed
k = k + 1;
Construct Ck from Lk-1
Yes
No
Figure 4.5: Flow Chart Depiction of Agrawal and Srikants Algorithm Apriori (1994)
k = 1;
Construct C1 = {{i1} {i2} {im}}
30
4.2.2 Frequency Episodes
Mannila and Toivonen (1996) have proposed an algorithm to discover simple
serial frequency episodes from event sequences based on minimal occurrences. In
Mannila and Toivonens method (1996), suppose S = {
1
E ,
2
E , ,
n
E } is an event
sequence with n events in total and A = {
1
a ,
2
a , ,
m
a } is the set of all the event
attributes. Each event } . ,..., . . {
2 , 1 m
a E a E a E E in S consists of m values for all the
event attributes. E is also associated with a timestamp denoted by E.T. Then a simple
serial episode ) ,..., , (
2 1 k
e e e P represents a sequential occurrence of k event variables
where each
i
e (1 i k) is an event variable and for all i and j (1 i < j k),
i
e .T <
j
e .T. Usually, k is much smaller than n, so 1 k << n. We use
q
e to represent an event
variable consisting of q event attributes, i.e., } ,..., , {
2 2 1 1 q q
q
v attr v attr v attr e
where } . ,..., . , . {
2 1 q
q q q
attr e attr e attr e A and 1 q m. In addition, each
i
v (1 i q) is
a value from the domain of attribute
i
attr . So,
q
e is said to have an occurrence in an
event E if for all i (1 i q),
i
q
i
q
v e attr e E . ) . .( .
According to Mannila and Toivonen (1996), given a time interval ] , [ t t , an
episode ) ,..., , (
2 1 k
e e e P is said to occur at interval ] , [ t t if t
1
e .T and t
k
e .T. Define
an occurrence of ) ,..., , (
2 1 k
e e e P at interval ] , [ t t as minimal if there does not exist
another of occurrence of ) ,..., , (
2 1 k
e e e P at the subinterval of ] , [ ] , [ t t u u . Given a
threshold of window (representing timestamp bounds), the frequency of ) ,..., , (
2 1 k
e e e P
31
is defined as frequency(P) = |{ ) ( | ] , [ window t t t t and the occurrence of P at interval
] , [ t t is minimal}| . Briefly speaking, the frequency of ) ,..., , (
2 1 k
e e e P in the event
sequence S is the total number of its minimal occurrence in any interval smaller than
window. So, given another threshold minfrequency (representing minimum frequency), an
episode ) ,..., , (
2 1 k
e e e P is called frequent if cy minfrequen
k n
P frequency
+ 1
) (
. Since in our
domain k << n (k is usually much smaller than n)
n
P frequency
k n
P frequency ) (
1
) (
+
will
hold. Therefore, in our implementation, an episode will be considered frequent, if
cy minfrequen
n
P frequency
) (
.
In Mannila and Toivonens algorithm (1996), a frequency episode is similar to a
large itemset of an association rule. If an episode ) ,..., , (
2 1 k
e e e P is frequent, any non-
empty sub-episode of P, e.g., P P , must be frequent, too, because frequency( P )
frequency(P) minfrequency. So, the frequency episodes of size k can be directly
constructed from the set of frequency episodes of size k-1. One major difference between
an itemset and a simple serial episode is that the events in an episode are ordered. For
example, episode } , {
2 1
E E is obviously different from episode } , {
1 2
E E . Define
k
L as
the set of all the k-size frequency episodes and
k
C is the set of all the k-size episodes that
are also candidates for frequency episodes. The following algorithm shows the
construction of
k
C from
1 k
L (k 2).
32
;
k
C
select } . , . ,..., . , . {
1 1 2 1 k k
e Q e P e P e P into
k
C
from
1 k
L
where ) ) ,..., , ( (
1 1 2 1
k k
L e e e P
) ) ,..., , ( (
1 1 2 1
k k
L e e e Q
(j, 2 j k-1,
1
. .
j j
e Q e P );
forall episodes
k k
C e e e R ) ,..., , (
2 1
do begin
if (there exists a sub-episode R e e e R
k
) ,..., , (
1 2 1
and
1
k
L R )
then } {R C C
k k
;
return
k
C ;
For example, suppose
2
L = {(
1
E ,
1
E ) (
1
E ,
2
E ) (
2
E ,
1
E )},
3
C will be {(
1
E ,
1
E ,
1
E ) (
1
E ,
1
E ,
2
E ) (
1
E ,
2
E ,
1
E ) (
2
E ,
1
E ,
1
E )}. The episode (
2
E ,
1
E ,
2
E ) is deleted
from
3
C because one of its sub-episodes (
2
E ,
2
E )
2
L .
According to Mannila and Toivonen (1996), the construction of
k
L for frequency
episode mining will be similar to algorithm Apriori except for the difference between
calculating episode frequencies and calculating itemset supports. Like association rules,
episode rules can be also directly established from frequency episodes. Given a frequency
episode ) ,..., , (
2 1 k
e e e P , there will be k1 non-empty ordered sub-episodes
P e e e P
i i
) ,..., , (
2 1
where 1 i k-1. Then given another threshold minconfidence
(representing minimum confidence), a simple serial episode rule can be constructed as
i
P
i
Q , c, s, w, where P e e e P
i i
) ,..., , (
2 1
, P P P e e e Q
i k i i i
+ +
) ,..., , (
2 1
,
Figure 4.6 Candidate Generation Algorithm Based on Work of Mannila and Toivonen
(1996)
33
s=frequency(P)minfrequency, nce minconfide
P frequency
P frequency
c
i
) (
) (
, and w = window.
Here according to (Lee, Stolfo, and Mok 1998), both
i
P and
i
Q are specified in the same
timestamp bound, i.e., the same threshold w.
The last episode rule
1 k
P
1 k
Q , c, s, w is of most interest since it can be used
to predict the
th
k event given the previous k-1 events. So, this kind of episode rule is
used in our experiments for intrusion detection.
34
CHAPTER V
INTEGRATION OF FUZZY LOGIC WITH DATA MINING
Although association rules and frequency episodes can be mined from audit data
for anomaly intrusion detection, the mined rules or episodes are at the data level. This
immediate dependency on data may limit the flexibility of intrusion detection. So, the
machine learning component in IIDM will be designed to extract more abstract patterns
at a higher level by integrating fuzzy logic with association rules and frequency episodes.
5.1 Fuzzy Association Rules
Many quantitative features are involved in intrusion detection. As explained in
Chapter I, SRIs NIDES classifies the statistical measures into four types: ordinal
measures, categorical measures, binary categorical measures, and linear categorical
measures (Lunt 1993). Both ordinal measures and linear categorical measures are
quantitative. SRIs Event Monitoring Enabling Response to Anomalous Live
Disturbances (EMERALD) also divides the network traffic statistical measures into four
classes: categorical measures, continuous measures, intensity measures, and event
distribution measures (Porras and Valdes 1998). Continuous measures, e.g., the
connection duration, and intensity measures, e.g., the number of packets during a unit of
time such as 60 seconds, are also quantitative. Our hypothesis is that it is possible to
35
make data mining methods more flexible in processing quantitative data for intrusion
detection by integrating fuzzy logic with both association rules and frequency episodes.
5.1.1 Related Work
Association rules were originally used to describe binary data, such as the
occurrence of an item in a transaction, instead of quantitative data, such as the number of
items in a transaction. Accordingly, association rules are divided into two types: boolean
association rules and quantitative association rules (Srikant and Agrawal 1996). To find
quantitative association rules, Srikant and Agrawal (1996) have proposed to partition
quantitative attributes into different intervals.
Unfortunately, a sharp boundary problem results from using interval partitions
(Kuok, Fu, and Wong 1998). For example, suppose [1, 5] and [6, 10] are two intervals
created on an quantitative attribute. If the minimum support threshold is set at 30%, the
interval [6, 10] will not gain enough support regardless of the large support near its left
boundary, as shown in Figure 5.1. That is to say, although the value 5 has a large support
and lies near the interval [6, 10], it will not make any contribution on counting the
support of [6, 10].
0%
5%
10%
15%
20%
1 2 3 4 5 6 7 8 9 10
quantitative attribute
s
u
p
p
o
r
t
Figure 5.1 Example of Sharp Boundary Problem
36
In intrusion detection, the sharp separation of intervals may raise additional
problems. For example, suppose the interval [1, 5] is mined as a normal pattern for the
quantitative attribute. The values 6 and 10 will both be considered abnormal regardless of
the difference in their deviations from the normal pattern. Likewise, a normal behavior
with a small variance may fall outside the interval representing a normal pattern and be
considered an anomaly. Similarly, an intrusion with a small variance may fall inside the
interval and be undetected.
To address the sharp boundary problem, Kuok, Fu, and Wong (1998) have
proposed to mine fuzzy association rules by using fuzzy sets to categorize a quantitative
attribute. In the above example, the two intervals will be replaced by two fuzzy sets,
suppose the value 5 has membership degree of 0.9 in the first set and 0.3 in the second
set. Then it will contribute 0.9 to the support of the first fuzzy set and 0.3 to the second
one. However, this means that the value 5 will be more important than other values since
the sum of its contributions to different fuzzy sets has become greater than 1. In our
method we address this shortcoming of Kuok, Fu, and Wongs approach by introducing
an additional normalization process.
5.1.2 Fuzzy Association Rules
According to Kuok, Fu, and Wong's method (1998), suppose we are given the
complete item set I = {
1
i ,
2
i , ,
m
i } where each
j
i (1 j m) denotes a categorical or
quantitative (fuzzy) attribute. We introduce ) (
j
i f to represent the maximum number of
categories (if
j
i is categorical) or the maximum number of fuzzy sets (if
j
i is fuzzy) and
37
) , ( v l m
j
i
to represent the membership degree of v on the
th
l category or fuzzy set of
j
i .
If
j
i is categorical, 0 ) , ( v l m
j
i
or 1 ) , ( v l m
j
i
. If
j
i is fuzzy, 1 ) , ( 0 v l m
j
i
. Srikant
and Agrawal (1996) introduce the idea of mapping the categories (or fuzzy sets) of an
attribute to a set of consecutive integers. Then an itemset
k
X (1 k m) can be
expressed as } ..., , {
2 2 1 1 k k
k
c item c item c item X where
} . ,..., . , . {
2 1 k
k k k
item X item X item X I and for all j (1 j k), ) ( 1
j j
i f c .
So, given a transaction } . ,..., . , . {
2 1 m
i T i T i T T ,
j
i T. (1 j m) represents a value
of the
th
j attribute and can be mapped to { )) . , ( , (
j i
i T l m l
j
| for all l, ) ( 1
j
i f l }.
However, when using Kuok, Fu, and Wongs algorithm, if
j
i is fuzzy, ) . , (
) (
1
j
i f
l
i
i T l m
j
j
'
) . , (
) . , (
) . , (
) . , (
) (
1
j i
j
i f
l
i
j i
j i
i T l m
i T l m
i T l m
i T l m
j
j
j
j
j
Then, for an itemset } ,..., , {
2 2 1 1 k k
k
c item c item c item X where 1 k m,
its support contributed by T will be:
)) . .( , . (
1
.
j
k
j
k
k
j
item X
item X T c X m
j
k
.
if ij is fuzzy;
if ij is categorical.
38
Here we use the product to calculate an itemsets support because given a transaction
} . ,..., . , . {
2 1 m
i T i T i T T and any attribute set } ,..., , {
2 1 k
item item item (1 k m),
1 )) . , ( (
)] ( , 1 [ 1
j j
j
item f c
j j
k
j
item
item T c m will hold. That is to say, for any item or
any combination of items, the support from a transaction will be always 1.
Accordingly, the algorithm for constructing
k
C from
1 k
L (k 2) will look like:
;
k
C
select
} . . , . . ...,
,..., . . , . . {
1
1
1
1
1
1
1
1
2
1
2
1
1
1
1
1
k
k
k
k
k
k
k
k
k k k k
c Y item Y c X item X
c X item X c X item X
into
k
C
from
1 k
L
where ) (
1
1
k
k
L X
) (
1
1
k
k
L Y
(j, 1 j k-2, ) . . ( ) . . (
1 1 1 1
j
k
j
k
j
k
j
k
c Y c X item Y item X
)
); . . (
1
1
1
1
<
k
k
k
k
item Y item X
forall itemsets
k
k
C Z do begin
if (there exists a sub-itemset
k k
Z Z
1
and
1
1
k
k
L Z )
then } {
k
k k
Z C C ;
return
k
C ;
The rest of the algorithm for fuzzy association rules is similar to the algorithm Apriori for
Boolean association rules (Agrawal and Srikant 1994).
Figure 5.2 Candidate Generation Algorithm for Fuzzy Association Rules
39
In this algorithm, normalization is introduced to ensure that every transaction is
counted only one time for an item or any combination of items, either categorical or
fuzzy. For example, suppose I = {level, age} where level is a categorical attribute with
the domain of {freshman, sophomore, junior, senior, graduate} and age is a quantitative
attribute with three fuzzy sets {young, medium, old}. A transaction T = {graduate, 25}
will be mapped to {{(graduate, 1)}, {(young, 0.2), (medium, 0.9), (old, 0.1)}}. Without
normalization, it would increase the support of itemset {level = graduate, age = young}
by 0.2, the support of itemset {level = graduate, age = medium} by 0.9, and the support
of itemset {level = graduate, age = old} by 0.1. That is to say, this transaction will be
counted 0.2+0.9+0.1=1.2 times for the item age. However, it is unreasonable for one
transaction to contribute more than others. In contrast, the normalization process will
further transform the transaction T into {{(graduate, 1)}, {(young, 0.167), (medium,
0.75), (old, 0.083)}}, for a total contribution of 1.0 for the item age.
5.2 Fuzzy Frequency Episodes
In this section, we propose an idea of integrating fuzzy logic with frequency
episodes. The need to develop fuzzy frequency episodes comes from the involvement of
quantitative attributes in an event. That is to say, given the set of event attributes A =
{
1
a ,
2
a , ,
m
a }, each attribute
j
a (1 j m) may be categorical or quantitative
(fuzzy). Suppose ) (
j
a f represents the maximum number of categories (if
j
a is
categorical) or the maximum number of fuzzy sets (if
j
a is fuzzy), and ) , ( v l m
j
a
represents the membership degree of v in the
th
l category or fuzzy set of
j
a . If
j
a is
40
categorical, 0 ) , ( v l m
j
a
or 1 ) , ( v l m
j
a
. If
j
a is fuzzy, 1 ) , ( 0 v l m
j
a
. Similarly,
for an event attribute, its categories or fuzzy sets can be mapped to consecutive integers.
Then an event variable
k
e can be expressed as } ..., , {
2 2 1 1 k k
k
c attr c attr c attr e
where } . ,..., . , . {
2 1 k
k k k
attr e attr e attr e A and for all j (1 j k), ) ( 1
j j
a f c . We
define two event variables } ..., , {
2 2 1 1 p p
p
c attr c attr c attr e and
} ..., , {
2 2 1 1 q q
q
c attr c attr c attr e as homogeneous, if
} . ,..., . , . {
2 1 p
p p p
attr e attr e attr e = } . ,..., . , . {
2 1 q
q q q
attr e attr e attr e , which also indicates
that p = q. It is obvious that an event variable is homogeneous to itself.
So, given an event } . ,..., . , . {
2 1 m
a E a E a E E ,
j
a E. (1 j m) represents a value
of the
th
j attribute and can be mapped to { )) . , ( , (
j a
a E l m l
j
| for all l, ) ( 1
j
a f l }.
However, if
j
a is fuzzy, ) . , (
) (
1
j
a f
l
a
a E l m
j
j
'
) . , (
) . , (
) . , (
) . , (
) (
1
j a
j
a f
l
a
j a
j a
a E l m
a E l m
a E l m
a E l m
j
j
j
j
j
Then, for an event variable } ,..., , {
2 2 1 1 k k
k
c attr c attr c attr e where 1 k
m, its occurrence in E is no longer counted as either 0 or 1. Instead, it is defined as:
if aj is fuzzy;
if aj is categorical.
41
)) . .( , . ( ) , (
1
.
j
k
j
k
k
j
attr e
k
attr e E c e m E e occurrence
j
k
.
And the minimal occurrence of an episode is the product of the occurrences of its event
variables.
That is to say, an event E may support several event variable occurrences due to
the introduction of fuzzy sets. However, a side effect may arise. For example, consider
the event sequence {
1
E ,
2
E ,
3
E } within the window threshold. A, B, C, and D are event
variables in which A and B are homogeneous but A B, and C and D are homogeneous
but C D. Suppose 8 . 0 ) , (
1
E A occurrence , 2 . 0 ) , (
1
E B occurrence ,
1 . 0 ) , (
2
E A occurrence , 9 . 0 ) , (
2
E B occurrence , 9 . 0 ) , (
3
E C occurrence , and
1 . 0 ) , (
3
E D occurrence . Then the minimal occurrence of episode {A, C} will become
0.09 because } . , . {
3 2
C E A E is minimal by replacing } . , . {
3 1
C E A E which will contribute
0.72. So, a small occurrence of an event variable may change the minimal occurrence of
an episode in the event sequence.
To address this problem, we introduce another user-specified threshold
minoccurrence to represent the minimum occurrence required for an event variabl e. So,
given an event variable
k
e , if nce minoccurre E e occurrence
k
< ) , ( , it will be claimed not
to occur in E. In detail, the following normalization process will be further conducted:
) , ( E e occurrence
k
q
e
q
k
E e occurrence
E e occurrence
) , (
) , (
0 if ( nce minoccurre E e occurrence
k
< ) , ( );
if ( nce minoccurre E e occurrence
k
) , ( ).
42
Here every
q
e is homogeneous to
k
e and nce minoccurre E e occurrence
q
) , ( . For
instance, if minoccurrence = 0.2,
1
E will contribute 0.8 to A and 0.2 to B,
2
E will
contribute 1 to B, and
3
E will contribute 1 to C. As a matter of fact, if minoccurrence is
set above 0.5, for any event, only one event variable will be claimed to occur in it and its
occurrence will be normalized to 1. In this case, it will be the same as categorizing every
quantitative attribute by intervals.
Other than the difference in calculating the frequency (or minimal occurrence) of
an episode, the rest of this algorithm is similar to Mannila and Toivonens algorithm
(1996) for mining frequency episodes.
43
CHAPTER VI
EXPERIMENTS AND RESULTS
Association rules and frequency episodes have been proposed for feature
selection, as well as audit data gathering (Lee, Stolfo, and Mok 1998). On the other hand,
the association rules and episode rules mined from training data that represents normal
behavior can be also directly used for anomaly detection (Lee, Stolfo, and Mok 1998).
However, these rules are usually at the data level. For example, for the quantitative
feature of connection duration, a rule may contain such a component as connection
duration = 5 seconds or 5 seconds connection duration 10 seconds. With the
integration of fuzzy logic, more general rules can be produced at a higher and more
abstract level.
Another advantage resulting from the integration of fuzzy logic is that fuzzy
association rules and fuzzy frequency episodes can be applied to temporal statistical
measurements which are quantitative and security-related. Statistical analysis has been
widely used to construct normal patterns for anomaly detection in systems such as SRIs
IDES and NIDES. Some statistical measurements have been also proven to be able to
improve the accuracy of intrusion detection (Lee and Stolfo 1998). However, these
statistical features are usually incorporated as additional measurements manually. By
44
using fuzzy association rules and fuzzy frequency episodes, normal patterns for these
statistical features can be automatically created and used for anomaly detection.
6.1 Anomaly Detection
6.1.1 Experiment Set 1
The first set of experiments was designed to investigate the applicability of fuzzy
association rules and fuzzy frequency episodes for anomaly detection. According to (Lee,
Stolfo and Mok 1998), since a large amount of actual intrusion data is usually very hard
to collect, some normal data with different behavior than that used for training can be
treated as anomalous. One of the servers in the Department of Computer Science at
Mississippi State University has been monitored and its real-time network traffic data has
been collected by tcpdump. Data preprocessing is conducted by use of sanitize
(downloaded from http://ita.ee.lbl.gov/html/software.html on 1 March 1999). Porras and
Valdes (1998) and Lee and Stolfo (1998) suggest several quantitative features of network
traffic that they feel can be used for intrusion detection. Based on their suggestions, a
program has been written to extract the following four temporal statistical measurements
from the network traffic data:
SN the number of SYN flags appearing in TCP packet headers during last 2 seconds;
FN the number of FIN flags appearing in TCP packet headers during last 2 seconds;
RN the number of RST flags appearing in TCP packet headers during last 2 seconds;
PN the number of different destination ports during last 2 seconds.
Here statistical computation is done for overlapping 2 second time periods as shown in
Figure 6.1.
45
Each of the above four quantitative features is viewed as a fuzzy variable and is
divided into three fuzzy sets: LOW, MEDIUM, and HIGH. Membership function
definitions have been developed for fuzzy variables representing each of the features of
the network being monitored. The fuzzy association rule algorithm has been applied to
mine the correlation among the first three features, and the fuzzy frequency episode
algorithm has been applied to mine sequential patterns for the last feature.
The network traffic data was partitioned into different segments according to the
time slots when the sets were collected (i.e., morning, afternoon, evening, and night)
since different time slots likely exihibit different behavior. In this experiment, traffic data
in the afternoon was used as training data. Anomaly detection was then conducted on
traffic data from afternoon, evening, and night. A detailed specification of the training
and test data sets is given in Table 6.1.
0.0 1.0 2.0 3.0 4.0 5.0 6.0 Timestamp
first 2 seconds
second 2 seconds
third 2 seconds
...
Figure 6.1 Specification of Temporal Statistical Measurements Used in the Experiments
46
Table 6.1
Specification of Training and Test Data Sets
Data Sets Time Slots When Data Sets Are Collected
1 hour of training data 13:00 14:00, Tuesday, 23 March 1999
2 hours of training data 13:00 15:00, Tuesday, 23 March 1999
3 hours of training data 13:00 16:00, Tuesday, 23 March 1999 Training
6 hours of training data
13:00 16:00, Friday, 19 March 1999 &
13:00 16:00, Tuesday, 23 March 1999
T1 13:00 14:00, Wednesday, 24 March 1999
T2 14:00 15:00, Wednesday, 24 March 1999
T3 15:00 16:00, Wednesday, 24 March 1999
T4 18:00 19:00, Tuesday, 23 March 1999
T5 19:00 20:00, Tuesday, 23 March 1999
T6 20:00 21:00, Tuesday, 23 March 1999
T7 0:00 1:00, Wednesday, 24 March 1999
T8 1:00 2:00, Wednesday, 24 March 1999
Test
T9 2:00 3:00, Wednesday, 24 March 1999
Normal patterns (represented by fuzzy association rules and fuzzy episode rules)
are first established by mining the training data. An example of a fuzzy association rule
mined from the training data is: { SN = LOW, FN = LOW } { RN = LOW }, 0.924,
0.49. This means the pattern { SN = LOW, FN = LOW, RN = LOW } occurred in 49% of
the training cases. In addition, when { SN = LOW, FN = LOW } occurs, there will be
92.4% probability that { RN = LOW } will also occur. An example of a fuzzy episode
rule is: { PN = LOW, PN = MEDIUM } { PN = MEDIUM }, 0.854, 0.108, 10 seconds.
This means that with a window threshold of 10 seconds, the frequency of the serial
episode { PN = LOW, PN = MEDIUM, PN = MEDIUM } is 10.8% and when { PN =
LOW, PN = MEDIUM } occurs, { PN = MEDIUM } will follow with an 85.4%
probability.
47
Then for each test case, new patterns were mined using the same algorithms and
the same parameters. These new patterns were then compared to the normal patterns
created from the training data. If they are similar enough, no intrusion is detected;
otherwise, an anomaly will be alarmed.
The similarity function proposed in (Lee, Stolfo, and Mok 1998) and (Lee, Stolfo,
and Mok 1999) used a user-defined threshold, e.g., 5%. Given two rules with the same
LHS and RHS, if both their confidences and their supports are within 5% of each other,
these two rules are considered similar. This approach exhibits Kuok, Fu, and Wongs
(1998) sharp boundary problem. For example, given a rule R which represents a normal
pattern and two test rules R and R, if both R and R fall inside the threshold, there will
be no measurement of the difference between the similarity of R and R and the similarity
of R and R. Likewise, when both R and R fall outside the threshold, there is no
measure of their dissimilarities with R.
Instead, we introduce a new similarity evaluation function which is continuous
and monotonic. Given a normal association rule:
R1: X Y, c, s,
and a new association rule:
R2: X Y, c, s,
where X, Y, X, and Y are itemsets, define
) , (
2 1
R R similarity
,
_
,
_
s
s s
c
c c | |
,
| |
max 1 , 0 max
0
Given two rule sets S1 (of normal patterns) and S2 (of new patterns), define
if ( (XY) (XY) );
if ( (X=Y) (X=Y) ).
48
) , (
2 1
2 2
1 1
R R similarity s
S R
S R
.
Then, like the definition in (Lee, Stolfo, and Mok 1998), we define
| |
*
| |
) , (
2 1
2 1
S
s
S
s
S S similarity ,
where |S1| and |S2| are the total number of rules in S1 and S2, respectively. Here
| |
1
S
s
is
actually the percentage of normal patterns covered by the new patterns, and
| 2 | S
s
is the
percentage of new patterns covered by the normal patterns. The similarity evaluation for
fuzzy episode rules is almost the same as for fuzzy association rules, except that there is
one more parameter w (of window length) for an episode rule. It is required that the
window thresholds be identical when two episode rules are evaluated for their similarity.
The purpose of the first experiment in this set was to determine the amount of
training data (duration) needed to demonstrate differences in behavior for different time
periods. In this experiment, training sets of different duration (all from the same time
period, i.e., afternoon) were used to mine fuzzy association rules (see Table 6.1 for a
more detailed description of the data). The similarity of each set of rules derived from
training data of different duration was compared to test data for different time periods.
The results from this experiment are shown in Figure 6.2. These results show that the
fuzzy association rules derived from test data for the same time of the day as the training
data (afternoon) were very similar to the rules derived from the training data. Rules
derived from evening data were less similar and rules derived from late night data were
49
the least similar. This confirms the hypothesis that fuzzy association rules are able to
distinguish different behavior. This experiment also demonstrates that there is no
difference in the similarity measures when the duration of training data is increased from
3 hours to 6 hours.
The purpose of the second experiment in this set was to further demonstrate the
capability of fuzzy association rules for anomaly detection. In this experiment, 3 hours of
traffic data (afternoon) was selected as the training data based on results from the first
experiment. Nine test data sets from three different time periods, i.e., afternoon, evening,
and late night were used (see Table 6.1 for a more detailed description of the data). The
similarity of fuzzy association rules derived from training data was compared to each test
data set. The results from this experiment are shown in Figure 6.3. The results show that
the fuzzy association rules derived from the test data sets for the same time period as the
training data (afternoon) were more similar to the rules derived from training data than
any other test data set from different time periods, i.e., evening, and late night. Rules
derived from any evening test data set were more similar than rules derived from any late
night test data set. This further confirms the capability of fuzzy association rules in
distinguishing different behavior.
The next two experiments in this set were similar to the first two experiments,
except that fuzzy episode rules were mined for anomaly detection instead of fuzzy
association rules. The results are consistent with those from the first two experiments and
are shown in Figure 6.4 and Figure 6.5.
50
Some observations can be made from these experimental results: (1) in both the
fuzzy association rule training process and the fuzzy frequency episode training process,
there are no significant changes in similarity when using 3 hours of training data as
opposed to 6 hours of training data. Therefore, 3 hours of training data was used for the
remaining experiments; (2) similar results were obtained from both fuzzy association
rules and fuzzy frequency episodes. The test cases of T1, T2, and T3 are most similar to
normal patterns since all of them, as well as training data, are network traffic in the
afternoon. T7, T8, and T9 are most different from normal patterns; this is to be expected
since this data represents network traffic in the middle of the night when the usage of the
network is lightest.
51
Figure 6.2: Comparison of Similarities Between Different Training and Test Data Sets
for Fuzzy Association Rules (minconfidence=0.6; minsupport=0.1)
Training Data Sets: 1 hour of training data, 2 hours of training data, 3 hours of
training data, and 6 hours of training data (all from the afternoon)
Test Data Sets: T1 (afternoon), T4 (evening), and T7 (late night)
Figure 6.3: Comparison of Similarities Between 3 Hour Training Data Set and Different
Test Data Sets for Fuzzy Association Rules (minconfidence=0.6;
minsupport=0.1)
Training Data Set: 3 hours of training data (afternoon)
Test Data Sets: T1, T2, T3 (afternoon), T4, T5, T6 (evening), and
T7, T8, T9 (late night)
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6
Hours of Training Data
S
i
m
i
l
a
r
i
t
y
T1 T4 T7
0
0.2
0.4
0.6
0.8
1
S
i
m
i
l
a
r
i
t
y
Test Data Sets
Similarity 0.773 0.795 0.775 0.564 0.513 0.288 0.1 0.116 0.0804
T1 T2 T3 T4 T5 T6 T7 T8 T9
52
Figure 6.4: Comparison of Similarities Between Different Training and Test Data Sets
for Fuzzy Episode Rules (minconfidence=0.6; minsupport=0.1; minoccurrence=0.3; window=10s)
Training Data Sets: 1 hour training data, 2 hours of training data, 3 hours of
training data, and 6 hours of training data (all from the afternoon)
Test Data Sets: T1 (afternoon), T4 (evening), and T7 (late night)
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6
Hours of Training Data
S
i
m
i
l
a
r
i
t
y
T1 T4 T7
0
0.2
0.4
0.6
0.8
1
S
i
m
i
l
a
r
i
t
y
Test Data Sets
Similarity 0.681 0.883 0.89 0.207 0.171 0.0645 7.37E-05 8.91E-05 1.67E-05
T1 T2 T3 T4 T5 T6 T7 T8 T9
Figure 6.5: Comparison of Similarities Between 3 Hour Training Data Set and Different
Test Data Sets for Fuzzy Episode Rules (minconfidence=0.6;
minsupport=0.1; minoccurrence=0.3; window=10s)
Training Data Set: 3 hours of training data (afternoon)
Test Data Sets: T1, T2, T3 (afternoon), T4, T5, T6 (evening), and
T7, T8, T9 (late night)
53
The results have also shown that, given the same training data set and test data
set, their similarity as measured by mining fuzzy association rules is different from their
similarity as measured by mining fuzzy episode rules. This is not unexpected, since fuzzy
association rules and fuzzy episode rules use different features, which may have different
effects on anomaly detection. That is also one of the reasons why our Intelligent Intrusion
Detection Model (IIDM) incorporates different detection modules, each of which may
examine different aspects of the same data.
6.1.2 Experiment Set 2
The second set of experiments was designed to further test the capability of fuzzy
association rules and fuzzy frequency episodes for anomaly detection by using simulated
intrusion data. Three network traffic data sets in tcpdump format were downloaded from
http://iris.cs.uml.edu:8080 and used for the second set of experiments. These data sets
were collected by the Institute for Visualization and Perception Research at University of
Massachusetts Lowell with the purpose of providing an evaluation method for different
data mining techniques or some combinations of these techniques (The Institute for
Visualization and Perception Research 1998). Among these data sets, baseline represents
normal patterns, network1 includes simulated IP spoofing intrusions in which an intruder
tries to access a remote host by guessing its IP sequence numbers, and network3 includes
simulated port scanning intrusions in which an intruder attempts to collect information
about hosts or applications running on the network.
A program was first written to extract information about the same four temporal
statistical measurements used in the previous set of experiments directly from the raw
54
data. The data set baseline was segmented into two parts. The first part was used as
training data and the second part was used as test data. Network1 and network3 were used
as the other two test data sets.
The purpose of the first experiment in this set was to test the capability of fuzzy
association rules for distinguishing simulated intrusions from normal behavior. The
purpose of the second experiment in this set was to test the capability of fuzzy episode
rules for distinguishing simulated intrusions from normal behavior. The results shown in
Figures 6.6 and 6.7 provide additional evidence that anomalies can be detected by use of
fuzzy association rules and fuzzy episode rules.
Figure 6.6: Comparison of Similarities Between Training Data Set and Different
Test Data Sets for Fuzzy Association Rules (minconfidence=0.6;
minsupport=0.1)
Training Data Set: baseline (first half; representing normal behavior)
Test Data Sets: baseline (second half; representing normal behavior),
network1 (including simulated IP spoofing intrusions), and
network3 (including simulated port scanning intrusions)
0
0.2
0.4
0.6
0.8
1
S
i
m
i
l
a
r
i
t
y
Test Data Sets
Similarity 0.744 0.309 0.315
Baseline Network1 Network3
55
6.2 Real-time Intrusion Detection
Although (fuzzy) association rules and (fuzzy) frequency episodes can be directly
used for anomaly detection, it is generally believed that association rules and frequency
episodes cannot be used to detect anomalies at the record (e.g., a connection record or a
packet header record) level (Lee, Stolfo, and Mok 1998). That is to say, they cannot be
used for real-time detection directly. Instead, they are usually used to select features that
will be significant for real-time detection from a large amount of data.
In our experiments, we are investigating the possibility of applying fuzzy episode
rules for near real-time intrusion detection. In the Time-based Inductive Machine (TIM)
0
0.2
0.4
0.6
0.8
1
S
i
m
i
l
a
r
i
t
y
Test Data Sets
Similarity 0.885 0 0.000155
Baseline Network1 Network3
Figure 6.7: Comparison of Similarities Between Training Data Set and Different
Test Data Sets for Fuzzy Episode Rules (minconfidence=0.6;
minsupport=0.1; minoccurrence=0.3; window=10s)
Training Data Set: baseline (first half; representing normal behavior)
Test Data Sets: baseline (second half; representing normal behavior),
network1 (including simulated IP spoofing intrusions), and
network3 (including simulated port scanning intrusions)
56
proposed by Teng, Chen, and Lu (1990), a sequential pattern with 100% certainty can be
used to detect anomalies. For example, given a normal pattern like A -- B -- C
(D=100%), the sequence A -- B -- C -- E will be marked as an anomaly since it is
believed that A -- B -- C is always followed by D with no uncertainty. Similarly, we
introduce the idea of using fuzzy episode rules with high confidence (e.g., 0.8) for
anomaly detection.
Suppose we are given the event sequence } ,..., , {
1 2 1
n
E E E S , the current event
n
E following S, and a fuzzy episode rule R: w s c e e e
k k
, , , ,...
1 1
, where k 1 and
every ) 1 ( k i e
i
is an event variable. For the episode } ,... {
1 1 k
e e , if its minimum
occurrence in S is x (x > 0), it then can be predicted, with the confidence of c, that
} , ,... {
1 1 k k
e e e
will also have a minimal occurrence in } {
n
E S + with the constraint that
k
e occurs in the event
n
E . And suppose the minimal occurrence of the episode
} , ,... {
1 1 k k
e e e
in the sequence } {
n
E S + is y, c
x
y
should also hold. In this case, event
n
E is said to match the episode rule R. On the other hand, like TIM, if the episode
} ,... {
1 1 k
e e has no minimum occurrence in S, or if x = 0, the episode rule R is said to be
mismatched and other methods are needed to detect the normality of event
n
E . Our
experiments show that a large window threshold (e.g., 15 seconds w 30 seconds) will
decrease the probability of mismatches.
Therefore, given the set of episode rules which are mined from training data and
represent normal patterns, if the current event
n
E does not match any episode rule, it
57
will be marked as an anomaly with some degree of belief. Because the confidence of an
episode rule is usually less than 1, it is obvious that we cannot detect an anomaly with
100% confidence. So, this is an approximate detection. However, it can provide some
helpful indications when an anomaly occurs. And it can be also used in cooperation with
other detection methods, such as misuse detection methods.
6.2.1 Experiment 3
This experiment was designed to demonstrate the applicability of fuzzy episode
rules in real-time intrusion detection. In this experiment, intrusions of the probing type
were simulated by use of mscan in the same network at the Computer Science
Department of Mississippi State University. Mscan is a software tool which can be used
to scan multiple systems. Fuzzy episode rules are mined from 3 hours of training data
(used in Experiment 1) for the feature of PN (the number of different destination ports
during last 2 seconds). Since the simulated intrusions usually take 1 to 1.5 minutes, test
data sets are established by collecting network traffic data for 3 minutes, with the goal of
covering the entire duration of every simulated intrusion. Six test data sets were collected
during 13:00 -- 14:00, Tuesday, 30 March, 1999 and 13:00 -- 14:00, Wednesday, 31
March, 1999. Among the six test data sets, T1, T2, and T3 represent normal data sets,
and T4, T5, and T6 represent intrusion data sets. The anomaly percentage of every test
data set is calculated as follows. Suppose we are given a sequence of n events for testing.
An event will be marked as an anomaly if, in the set of episode rules representing normal
behavior, there is no episode rule it matches. If the total number of anomaly events is m,
58
anomaly percentage = %. 100 *
n
m
Figure 6.8 shows our experimental results.
The results reveal clear differences in anomaly percentage between normal data
and intrusion data. Since there is no simulated intrusion in T1, T2, and T3, the anomaly
percentages for this data actually represents false positive error rates. Further analysis of
the results from T4, T5, and T6 shows that all have false positive error rates below 10%,
i.e., 4.44%, 6.67%, and 8.89%. However, the false negative error rates are relatively high
(about 40%). There are several reasons. First, only one feature, i.e., PN, is taken into
Figure 6.8 Anomaly Percentages of Different Test Data Sets in Real-time Intrusion
Detection
Rule Set: fuzzy episode rules (minconfidence=0.8; minsupport=0.1; minoccurrence=0.3; window=15s)
Training Data Set: 3 hour training data (representing normal behavior)
Test Data Sets: T1, T2, T3 (representing normal behavior), and
T4, T5, T6 (including simulated mscan intrusions)
0%
10%
20%
30%
40%
A
n
o
m
a
l
y
P
e
r
c
e
n
t
a
g
e
(
%
)
Test Data Sets
Anomaly Percentage 8.99% 9.55% 7.30% 25.60% 33.71% 32.39%
T1 T2 T3 T4 T5 T6
59
account. Second, the simulated intrusions are not evenly distributed along the time
according to this feature.
0
20
40
60
80
100
120
140
49 59 69 79 89 99 109 119 129
Time (seconds)
P
N
(
n
u
m
b
e
r
o
f
d
i
s
t
i
n
c
t
d
e
s
t
i
n
a
t
i
o
n
p
o
r
t
s
d
u
r
i
n
g
l
a
s
t
2
s
e
c
o
n
d
s
)
Test Data Set of T1
Test Data Set of T4
Figure 6.9 shows the distribution of feature PN with time from the test data sets
T1 and T4. Here the time duration is 90 seconds. In test data set T4, the 49
th
is the first
second and the 138
th
second is the last second of the simulated intrusion. It is obvious that
the simulated intrusion in T4 has not shown much deviation from normal behavior in the
last 20 seconds, which may contribute much to the high false negative error rate.
However, it has been also found that in every intrusion test case (T4, T5, or T6), an
anomaly is alarmed in the first second the simulated intrusion occurs, although this is
only an approximate real-time detection.
Figure 6.9 Distribution of the Feature PN with Time from Test Data Sets T1
(Representing Normal Behavior) and T4 (Representing Simulated mscan
Intrusions)
60
6.2.2 Experiment 4
This experiment was conducted in order to compare the intrusion detection
performance, especially the false positive error rate, between fuzzy episode rules and
non-fuzzy episode rules (by use of intervals). The same training data set and six test data
sets as in Experiment 3 were used. Both fuzzy episode rules and non-fuzzy episode rules
were mined from training data for the feature of PN (the number of different destination
ports during last 2 seconds), which was divided into three fuzzy sets (for fuzzy episode
rules) or three intervals (for non-fuzzy episode rules): LOW, MEDIUM, and HIGH.
Figure 6.10 shows a comparison of the false positive error rates on test data sets between
fuzzy episode rules and non-fuzzy episode rules.
61
The experimental results demonstrate that the false positive error rates from fuzzy
episode rules are less than for non-fuzzy episode rules. That is to say, the error rate of
predicting a normal behavior as an intrusion is much lower for fuzzy episode rules than
non-fuzzy episode rules.
6.2.3 Experiment 5
The goal of this experiment was to determine the effect of the minconfidence
threshold on the false positive error rate and the false negative error rate.
Figure 6.10 Comparison of False Positive Error Rates of Fuzzy Episode Rules and
Non-Fuzzy Episode Rules
Rule Sets: fuzzy episode rules (minconfidence=0.8; minsupport=0.1; minoccurrence=0.3; window=15s)
non-fuzzy episode rules (minconfidence=0.8; minsupport=0.1; window=15s)
Training Data Set: 3 hour training data (representing normal behavior)
Test Data Set: T1, T2, T3 (representing normal behavior), and
T4, T5, T6 (including simulated mscan intrusions)
0%
5%
10%
15%
20%
F
a
l
s
e
P
o
s
i
t
i
v
e
E
r
r
o
r
R
a
t
e
(
%
)
Test Data Sets
Fuzzy 8.99% 9.55% 7.30% 4.44% 6.67% 8.89%
Non-Fuzzy 17.98% 12.92% 15.25% 11.11% 17.78% 11.11%
T1 T2 T3 T4 T5 T6
62
Table 6.2
Effects of the minconfidence Threshold on the False Positive Error Rate (FPER)
and the False Negative Error Rate (FNER)
Minconfidence 0.80 0.85 0.90
Number of Episode Rules Learned from 3
Hour Training Data Set (minsupport=0.1;
minoccurrence=0.3; window=15s)
19 11 5
T1 FPER 8.99% 30.34% 45.51%
T2 FPER 9.55% 33.15% 50.56%
T3 FPER 7.30% 39.16% 57.23%
FPER 4.44% 21.59% 36.36%
T4
FNER 45.56% 13.33% 7.78%
FPER 6.67% 56.82% 70.45%
T5
FNER 40.00% 0% 0%
FPER 8.89% 26.14% 36.36%
Test
Data
Sets
T6
FNER 45.56% 18.89% 12.22%
From Table 6.1, it can be seen that a higher minconfidence value will result in
higher false positive error rates and lower false negative error rates. Our experiments
have also shown that a higher minconfidence value will cause many more mismatches.
The main reason here is that a higher minconfidence value will reduce the number of
episode rules learned from training data, as shown in Table 6.1. If the rule number is too
low, these rules will not be able to cover patterns representing all normal behavior. This
will cause the false positive error rate to increase dramatically.
Our strategy here is to minimize the false positive error rate first. The false
negative error rate is expected to be minimized by introducing more features and/or by
63
using this method in conjunction with other intrusion detection methods. Using this
strategy, the minconfidence threshold suggested by our experiments is 0.8.
64
CHAPTER VII
CONCLUSION
Intrusion detection is an important but complex task for a computer system. Many
AI techniques have been widely used in intrusion detection systems. A research group at
Mississippi State University is investigating the development of an intelligent intrusion
detection system. This thesis has explored the practicality of integrating fuzzy logic with
data mining methods for intrusion detection.
Data mining methods are capable of extracting patterns automatically and
adaptively from a large amount of data. Association rules and frequency episodes have
been used to mine training data to established normal patterns for anomaly detection.
However, these patterns are usually at the data level, with the result that normal behavior
with a small variance may not match a pattern and will be considered anomalous. In
addition, an actual intrusion with a small deviation may match the normal patterns and
thus not be detected. We have demonstrated that the integration of fuzzy logic with
association rules and frequency episodes generates more abstract and flexible patterns for
anomaly detection.
There are two main reasons for introducing fuzzy logic for intrusion detection.
First, many quantitative features are involved in intrusion detection. Fuzzy set theory
provides a reasonable and efficient way to categorize these quantitative features
65
in order to establish high-level patterns. Second, security itself is fuzzy. For quantitative
features, there is no straight separation between normal operations and anomalies. So,
fuzzy association rules can be mined to find the abstract correlation among different
security-related features, both categorical and quantitative. Similarly, fuzzy episode rules
can be also mined to create the high-level sequential patterns representing normal
behavior.
We have extended previous work (Lee, Stolfo, and Mok 1998) in the areas of
fuzzy association rules and fuzzy frequency episodes. We add a normalization step to the
procedure for mining fuzzy association rules by Kuok, Fu, and Wong (1998) in order to
prevent one data instance from contributing more than others. We modify the procedure
of Mannila and Toivonen (1996) for mining frequency episodes to learn fuzzy frequency
episodes. We use fuzzy association rules and fuzzy frequency episodes to extract patterns
for temporal statistical measurements at a higher level than the data level. We have
developed a similarity evaluation function which is continuous and monotonic for the
application of fuzzy association rules and fuzzy frequency episodes in anomaly detection.
We also present a real-time intrusion detection method by using fuzzy episode rules. In
addition, our experimental results have shown the utility of fuzzy association rules and
fuzzy episode rules in intrusion detection.
We have also developed an architecture for integrating machine learning methods
with other intrusion detection methods. By using data mining algorithms to mine fuzzy
association rules and fuzzy frequency episodes, a machine learning component is
implemented and incorporated in our intelligent intrusion detection system. This
66
component will work as a background unit and learn normal patterns for anomaly
detection. This learning process is both automatic and incremental. This means that new
patterns can be learned from new training data and used to update old patterns adaptively.
67
REFERENCES
Agrawal, R., and R. Srikant. 1994. Fast algorithms for mining association rules. In
Proceedings of the 20
th
international conference on very large databases held in
Santiago, Chile, September 12-15, 1994, 487-99. San Francisco, CA: Morgan
Kaufmann. (Downloaded from http://www.almaden.ibm.com/cs/people/ragrawal/
papers/vldb94_rj.ps on 19 February 1999.)
Chapman, D., and E. Zwicky. 1995. Building internet firewall. Sebastopol, CA: OReilly
& Associates, Inc.
Crosbie, M., and G. Spafford. 1995. Active defense of a computer system using
autonomous agents. Purdue University. Department of Computer Science.
Technical Report 95-008.
Debar, H., M. Becker, and D. Siboni. 1992. A neural network component for an intrusion
detection system. In Proceedings of 1992 IEEE computer society symposium on
research in security and privacy held in Oakland, California, May 4-6, 1992, by
IEEE Computer Society, 240-50. Los Alamitos, CA: IEEE Computer Society Press.
Denning, D. 1986. An intrusion-detection model. In Proceedings of 1986 IEEE computer
society symposium on research in security and privacy held in Oakland, California,
April 7-9, 1986, by IEEE Computer Society, 118-31. Los Alamitos, CA: IEEE
Computer Society Press.
Frank, J. 1994. Artificial intelligence and intrusion detection: Current and future
directions. In Proceedings of the 17
th
national computer security conference held in
October, 1994. (Downloaded from http://seclab.cs.ucdavis.edu/papers.html on 2
February 1998.)
Gasser, M. 1988. Building a secure computer system. New York, NY: Van Nostrand
Reinhold Company Inc.
Heberlein, L., G. Dias, K. Levitt, B. Mukherjee, J. Wood, and D. Wolber. 1990. A
network security monitor. In Proceedings of 1990 IEEE computer society
symposium on research in security and privacy held in Oakland, California, May 7-
9, 1990, by IEEE Computer Society, 296-304. Los Alamitos, CA: IEEE Computer
Society Press.
68
Hodges, J., S. Bridges, and S. Yie. 1996. Preliminary results in the use of fuzzy logic for
a radiological waste characterization expert system. Mississippi State University.
Department of Computer Science. Technical Report 960626.
Ilgun, K., and A. Kemmerer.1995. State transition analysis: A rule-based intrusion
detection approach. IEEE Transaction on Software Engineering 21(3): 181-99.
Kuok, C., A. Fu, and M. Wong. 1998. Mining fuzzy association rules in databases.
SIGMOD Record 27(1): 41-6. (Downloaded from http://www.acm.org/sigs/sigmod/
record/issues/9803 on 1 March 1999.)
Lee, W., S. Stolfo, and K. Mok. 1998. Mining audit data to build intrusion detection
models. In Proceedings of the fourth international conference on knowledge
discovery and data mining held in New York, New York, August 27-31, 1998, edited
by Rakesh Agrawal, and Paul Stolorz, 66-72. New York, NY: AAAI Press.
Lee, W., and S. Stolfo. 1998. Data mining approaches for intrusion detection. In
Proceedings of the 7
th
USENIX security symposium, 1998. (Downloaded from
http://www.cs.columbia.edu/~sal/recent-papers.html on 10 March 1999.)
Lee, W., S. Stolfo, and K. Mok. 1999. A data mining framework for building intrusion
detection models. (Downloaded from http://www.cs.columbia.edu/~sal/
recent-paper.html on 10 March 1999.)
Lunt, T. 1993. Detecting intruders in computer systems. In Proceedings of 1993
conference on auditing and computer technology. (Downloaded from http://www2.
csl.sri.com/nides/index5.html on 3 February 1999.)
Lunt, T., and R. Jagannathan. 1988. A prototype real-time intrusion-detection expert
system. In Proceedings of 1988 IEEE computer society symposium on research in
security and privacy held in Oakland, California, April 18-21, 1988, by IEEE
Computer Society, 59-66. Los Alamitos, CA: IEEE Computer Society Press.
Mannila, H., and H. Toivonen. 1996. Discovering generalized episodes using minimal
occurrences. In Proceedings of the second international conference on knowledge
discovery and data mining held in Portland, Oregon, August, 1996, by AAAI Press,
146-51. (Downloaded from http://www.cs.Helsinki.FI/research/fdk/
datamining/pubs on 19 February 1999.)
Me, L. 1998. GASSATA, a genetic algorithm as an alternative tool for security audit trail
analysis. In Proceedings of the first international workshop on the recent advances
in intrusion detection held in Louvain-la-Neuve, Belgium, September 14-16, 1998.
(Downloaded from http://www.zurich-ibm.com/~dac/Prog_RAID98/
Table_of_content.html on 2 February 1999.)
69
Mukherjee, B., L. Heberlein, and K. Levitt. 1994. Network intrusion detection. IEEE
Network, May/June, 26-41.
Orchard, R. 1995. FuzzyCLIPS version 6.04 users guide. Knowledge System
Laboratory, National Research Council Canada.
Porras, P., and A. Valdes. 1998. Live traffic analysis of TCP/IP gateways. In Proceedings
of the 1998 ISOC symposium on network and distributed systems security held in
March, 1998. (Downloaded from http://www2.csl.sri.com/emerald/downloads.html
on 1 March 1999.)
Srikant, R., and R. Agrawal. 1996. Mining quantitative association rules in large
relational tables. In Proceedings of ACM SIGMOD international conference on
management of data held in June 4-6, 1996, by ACM Press, 1-12. (Downloaded
from http://www.almaden.ibm.com/cs/people/ragrawal/papers/sigmod96.ps on 19
February 1999.)
Stefik, M. 1995. Introduction to knowledge systems. San Francisco, CA: Morgan
Kaufmann Publishers, Inc.
Sundaram, A. 1996. An introduction to intrusion detection. (Downloaded from http://
www.cs.purdue.edu/coast/archive/data/categ24.html on 10 March 1999.)
Teng, H., K. Chen, and S. Lu. 1990. Adaptive real-time anomaly detection using
inductively generated sequential patterns. In Proceedings of 1990 IEEE computer
society symposium on research in security and privacy held in Oakland, California,
May 7-9, 1990, by IEEE Computer Society, 278-84. Los Alamitos, CA: IEEE
Computer Society Press.
The Institute for Visualization and Perception Research, University of Massachusetts
Lowell. 1998. Information Exploration Shootout. http://iris.cs.uml.edu:8080
(Accessed 1 March 1999).