Вы находитесь на странице: 1из 5

16 (IJCNS) International Journal of Computer and Network Security,

Vol. 2, No. 8, August 2010

An Adoptive Algorithm for Mining Time-Interval


Sequential Patterns
Hao-En Chueh1 and Yo-Hsien Lin2
1
Department of Information Management, Yuanpei University,
No.306, Yuanpei Street, Hsinchu 30015, Taiwan, R.O.C.
hechueh@mail.ypu.edu.tw
2
Department of Information Management, Yuanpei University,
No.306, Yuanpei Street, Hsinchu 30015, Taiwan, R.O.C.
yohsien@mail.ypu.edu.tw
customers at the right time. Therefore, recently, some
Abstract: An adoptive algorithm for mining time-interval
sequential patterns is presented in this paper. A time-interval researches start to propose algorithms for discovering the
sequential pattern is a sequential pattern with time-intervals sequential patterns with time-intervals between successive
between successive itemsets. Most proposed algorithms use some itemsets, this kind of pattern is called time-interval
predefined non-overlap time partitions to find the time-intervals sequential pattern [2].
between successive itemsets, but a predefined set of non-overlap
time partitions cannot be suitable for every pair of successive To discover the time-interval sequential patterns, many
itemsets. Therefore, in this paper, the clustering analysis is used researches adopt some predefine non-overlap time
first to generate the suitable time-intervals for frequent partitions, and assume that the time-intervals between
occurring pairs of successive itemsets. Next, the generated time- successive itemsets of the frequently sequential patterns can
intervals are used to extend the typical sequential patterns
fit into one of the predefined time partitions.
mining algorithms to discover the time-interval sequential
patterns. Finally, an operator to obtain the time-interval of a However, a predefined set of non-overlap time partitions
subsequence of a time-interval sequential pattern is also cannot be suitable for every pair of successive itemsets.
presented.
Therefore, generating the suitable time partitions for every
Keywords: Adoptive algorithm, Time-Interval, Sequential pair of successive itemsets directly from the real sequence
Pattern, Clustering Analysis datasets is more reasonable. Accordingly, in this paper, we
present an adoptive algorithm to discover the time-interval
1. Introduction sequential patterns without using predefined time partitions.
Data mining is usually defined as the procedure of This algorithm uses clustering analysis to automatically
discovering hidden, useful, previously unknown information obtain suitable time-intervals between frequent occurring
from large databases. The common data mining techniques pairs of successive itemsets, and then uses these time-
include classification, clustering analysis, association rules intervals to extend typical sequential patterns mining
mining, sequential patterns mining and so on. Sequential algorithms to discover the time-interval sequential patterns.
patterns mining introduced by Agrawal and Srikant (1995) The rest of this paper is organized as follows. Some
is the task of finding frequently occurring patterns related to researches related to time-interval sequential patterns are
time or other sequences from a given sequence database [1]. reviewed in section 2. The proposed time-interval sequential
It is widely used in the field of retail business to assist in patterns mining algorithm is presented in section 3. A
making various marketing decisions [3, 5, 7]. An example example is displayed in section 4. The conclusion is given in
of a sequential pattern is “A customer who bought a digital section 5.
camera will buy a battery and a memory card later”.
Up to now, many algorithms have been proposed [1, 4, 6,
9] for mining sequential patterns, however, most of these 2. Time-Interval Sequential Patterns
algorithms only focus on the order of the itemsets, but Sequential patterns mining is defined as the task of
ignore the time-intervals between itemsets. In business field, discovering frequently occurring ordered patterns from the
actually, a sequential pattern which includes the time- given sequence database. A sequence is an ordered list of
intervals between successive itemsets is more valuable than itemsets. Let I = { i1 , i2 ,......,im } be a set of items,
a sequential pattern without any time information. An
S =< s1 , s 2 , ......, s k > is a sequence, where s i ⊆ I is called an
example of a sequential pattern with time intervals between
successive itemsets is “A customer who bought a digital itemset. Length of a sequence means the number of itemsets
camera will return to buy a battery and a memory card in the sequence, and a sequence contains k itemsets is called
within one week”. Clearly, the time-intervals between a k-sequence. The support of a sequence S is denoted by
itemsets can offer the retail business more useful supp (S ) and means the percentage of total number of
information to sell the appropriate products to their records containing sequence S in the sequence database. If
(IJCNS) International Journal of Computer and Network Security, 17
Vol. 2, No. 8, August 2010

supp (S ) is greater than or equal to a predefined threshold, that A, B and C happen in this order, and the time-interval
called minimal support, than sequence S is regarded as a between A and B is within 1 day , and the time-interval
frequent sequence and called a sequential pattern. between B and C lies between 3 days and 7 days.
Many algorithms have been proposed to discover These proposed researches can discover the sequential
sequential pattern [1, 4, 6, 9], and most algorithms only patterns with the time-intervals between successive itemsets
focus on the frequently occurring order of the itemsets, but by using a or some predefined time partitions, but the
ignore the time-intervals between itemsets. The time- sequential patterns with time-intervals between successive
intervals between successive itemsets, in fact, can offer itemsets lie outside these used time ranges cannot be found
useful information for business to sell the appropriate yet. To solve this problem, therefore, an adoptive algorithm
products to their customers at the right time. Due to the for mining time-interval sequential patterns without using
value of the time-intervals between successive itemsets, any predefined time partitions is presented in this work. The
many algorithms for mining various sequential patterns with main concept of this proposed algorithm is to generate the
time-intervals between successive itemsets have been suitable time-intervals directly from the real sequence
proposed [2, 8, 10, 11]. dataset. The algorithm first adopts clustering analysis to
Srikant et al. [10] utilize three predefined restrictions, the automatically generate the suitable time-intervals for
maximum interval (max − interval), the minimum interval frequent occurring pairs of successive itemsets, and then
(min − interval), and the time window size ( window − size) uses these time-intervals to extend typical algorithms to
discover sequential patterns with time-ntervals between
to find sequential patterns related to time-intervals. The
successive itemsets. Details of the proposed algorithm are
discovered sequential pattern is like (( A, B), (C , D)), where introduced in the next section.
( A, B) and (C , D) are two subsequences of (( A, B), (C , D)).
The max − interval and min − interval are respectively
3. Adoptive Time-Interval Sequential Patterns
used to indicate the maximal time-interval and the minimal
Mining Algorithm
time- interval within subsequence. The window − size is
used to indicate the time-interval among subsequences. The proposed algorithm for mining time-interval sequential
Assume that the max − interval is set to 10 hours, the patterns is introduced as follows. First, some notations are
min − interval is set to 3 hours, and the window − size is defined in advance.
set to 24 hours, then the time-interval between A and B I = { i1 , i 2 ,......,i m } : The set of items.
lies in [ 2, 10] , the time- interval between C and D also
Si =< s1 , s 2 ,......, s n > : A sequence, where each s k ⊆ I .
lies in [ 2, 10] , and the time- interval
between ( A, B) and (C , D) lies in [ 0, 24] . D = { S1 , S2 ,......, Sk } : The sequences dataset.

Mannila et al. [8] use a predefined window width (win) supp ( Si ) : The support of the sequence Si .
to find frequent episodes in sequences of events, and the min − supp : The minimal support threshold.
discovered episode is like ( A, B, C ). Assume that the win is
set to 3 days, then the episode ( A, B, C ) means that, in 3 CSk : The candidate set of frequent k-sequences.
days, A occurs first, B follows, and C happens finally. FSk : The set of frequent k-sequences.
Wu et al. [11] also utilize a window (d ) to find the CTISk : The candidate set of frequent time-interval k-
sequential pattern likes ( A, B, C ), such that, in the sequential sequences.
pattern ( A, B, C ), the time-interval between adjacent events
FTIS k : The set of frequent time-interval k-sequences.
is within d . Assume d is set to 5 hours, then the
discovered pattern ( A, B, C ) means that A occurs first, B 3.1 The proposed algorithm
follows, and C happens finally; the time-interval between Step 1: Produce FS1 , the set of frequent 1-sequences.
A and B , and the time-interval between B and C are
Each items s i ∈ I is as a candidate frequent 1-sequence. A
both within 5 hours.
candidate frequent 1-sequence whose support is greater than
Chen et al. [2] use a predefined set of non-overlap time or equal to min − supp is a frequent 1- sequence, and FS1
partitions to discover potential time-interval sequential
denotes the set of all frequent 1-sequences.
patterns, and the discovered pattern is like ( A, I 0 , B, I 2 , C ),
where I 0 , I 2 belong to the non-overlap set of time partitions. Step 2: Produce CS2 , the candidate set of frequent 2-
Assume that, I 0 denotes the time-interval t satisfying sequences. From any two frequent 1- sequences of FS1 , say
0 ≤ t ≤ 1 day; I 2 denotes the time interval t satisfying s1 and s2 , where s1 , s2 ∈ FS1 and s1 ≠ s 2 , generate 2
3 < t ≤ 7 days, and then the pattern ( A, I 0 , B, I 2 , C ) means candidate frequent 2-sequences belong to CS2 ,
say < s1 , s 2 > and < s2 , s1 > .
18 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 8, August 2010

Step 3: Produce FS2 , the set of frequent 2-sequences. A time-interval (k-1)-sequences S1 and S2 ,
candidate frequent 2-sequence whose support is greater than where S1 =< s1,1 , T1,1 , s1, 2 ,L, s1,k − 2 ,
or equal to min − supp is a frequent 2-sequence, and the set T1,k − 2 , s1, k −1 >, S2 =< s 2 ,1 , T2,1 , s 2 , 2 ,L, s 2 ,k − 2 , T2 ,k − 2 , s 2 ,k −1 >∈ FTISk −1 ;
of all frequent 2-sequences is FS2 .
s1, 2 = s 2 ,1 , s1, 3 = s 2 , 2 ,L, s1,k −1 = s 2, k − 2 ; T1, 2 = T2 ,1 , T1, 3 = T2 , 2 ,L, T1,k − 2
Step 4: Find the frequent time-intervals for each 2- = T2 ,k −3 , then we can generate a candidate time-interval k-
sequence of FS2 . For any frequent 2-sequence of FS2 , sequence S12 =< s1,1 , T1,1 , s1, 2 ,L, s1,k − 2 , T1, k − 2 , s1,k −1 , T2 ,k − 2 , s 2, k −1 > .
say < s p , s q >, all the time-intervals between s p and s q
Step 7: Produce FTISk , k ≥ 3 , the set of frequent time-
appear in D and are listed in increasing order, then the
following clustering analysis based steps, Step 4(a), Step interval k-sequences. A candidate time-interval k-sequence
4(b) and Step 4(c) are used to obtain the frequent time- whose support is greater than or equal to min − supp is a
intervals. frequent time-interval k-sequence, and the set of frequent
time- interval k-sequences is FTISk .
Step 4(a): Let T (1, z) = [t1 , t 2 ,L, t z ] is the increasingly
Step 8: Repeat Step 6 and Step 7, until no next CTIS k can
ordered list of the time-intervals of
< s p , sq > . Let T < s p , s q > = {T (1, z) } be the set of time- be generated.

intervals of < s p , sq > . The first step is to find the maximal 3.2 Time-intervals of a subsequence
difference between two adjacent time-intervals of In this subsection, an operator to obtain the time-interval of
T < s p , s q >, and then divide T < s p , sq > into 2 subsets a subsequence of a frequent time-interval sequential pattern
according to the maximal difference. Assume that the is introduced. Let S =< s1 , T1, 2 , s 2 ,L, sk −1 , Tk −1, k , s k > is a
difference between ti and t i+1 is maximal, then T (1, z) is frequent time-interval k-sequence, and
divided into T (1, i ) and T (i + 1, z), where T (1, i ) = [t1 ,L, t i ], S ' =< si , Ti ,i +1 , s i=1 ,L, si + j −1 , Ti + j −1,i+ j , s i+ j > is a subsequence
T (i + 1, z) = [t i +1 ,L, t z ]. of S. Assume that Ti ,i+1 =
[a i , bi ], Ti +1,i + 2 = [a i+1 , bi +1 ],L, Ti+ j −1,i+ j = [a i+ j −1 , bi + j −1 ], and then
Step 4(b): Calculate the support of < s p , sq > that
the time-interval between si and s i + j is equal
respectively includes each time- interval set. If the support
to [a i + a i+1 + L + a i + j −1 , bi + bi +1 + Lbi+ j −1 ].
of < s p , sq > that includes time-intervals T (1, i ) is greater
By using the above steps, a simple example is displayed
than or equal to min − supp, then T (1, i ) is a frequent time-
in the next section.
interval of < s p , sq > , and then T (1, i ) is reserved, otherwise
T (1, i ) is deleted. Similarly, if the support of < s p , sq > that
4. Example
includes time-intervals T (i + 1, z) is greater than or equal to In this section, we use the example sequence database
shown as in Table 1 to discover the time-interval sequential
min − supp, then T (i + 1, z) is also a frequent time-interval,
patterns. In Table 1, Id denotes the record number of a
and T (i + 1, z) is reserved, otherwise T (i + 1, z) is deleted. sequence, and each sequence is represented
The reserved subsets of time-intervals next replaces the as < (s1 , t1 ), (s 2 , t 2 ),L, ( s n , t n ) >, where s i denotes an itemset,
original set of time-intervals T < s p , sq > . If no subset is and t i denotes the time stamp that s i occurs; here, the
reserved, then the original set of time-intervals is called min − supp is set to 0.3.
non- dividable. If all differences between two adjacent time-
intervals in the original set of time-intervals are equal, then Table 1: A sample sequence database
the original set of time-intervals is called as non-dividable Id Sequence
as well. 01 ( s 5 ,8), ( s 4 ,15), ( s 6 ,20)

Step 4(c): Repeat Step 4(a) and Step 4(b), until all subsets 02 ( s1 ,2), ( s 3 ,7), ( s 2 ,11), ( s 6 ,18)
of time-intervals in T < s p , sq > are non-dividable. 03 ( s 2 ,3), ( s1 ,4), ( s 3 ,7), ( s 6 ,16), ( s 7 ,19)
04 ( s1 ,2), ( s 2 ,8), ( s 6 ,10), ( s 7 ,15)
Step 5: Produce FTIS 2 , the set of frequent time-interval
05 ( s 5 ,4), ( s 6 ,16), ( s1 ,20), ( s 3 ,24)
2- sequences. Each 2-sequence of FS2 is extended by all its
06 ( s 7 ,7), ( s1 ,13), ( s 5 ,18), ( s 2 ,25), (s6,28)
frequent time-intervals to generate FTIS 2 . If T < s p , s q >=
07 ( s 5 ,4), ( s1 ,8), ( s 3 ,12), ( s 6 ,16), ( s 7 ,20)
{T 1 , T 2 ,L, T R } is the set of frequent time-intervals of ( s1 ,3), ( s 5 ,6), ( s 2 ,9), ( s 4 ,18), ( s 6 ,21)
08
< s p , s q >, then T < s p , T i , s q >, i = 1L R, is a frequent time- 09 ( s 2 ,5), ( s1 ,10), ( s 3 ,15), ( s 6 ,20), ( s 7 ,25)
interval 2- sequence. 10 ( s 6 ,2), ( s 7 ,8), ( s 5 ,12), ( s 2 ,17)
Step 6: Produce CTISk , k ≥ 3 , the candidate set of
First, we need to calculate the supports of all itemsets to
frequent time-interval k-sequences. For any two frequent produce FS1 . Supports of all itemsets are shown in Table 2.
(IJCNS) International Journal of Computer and Network Security, 19
Vol. 2, No. 8, August 2010

Here, we can obtain FS1 = {s1 , s 2 , s 3 , s 5 , s6 , s7 }. < s3 , s7 > T < s 3 , s 7 >={8, 10, 12}
< s5 , s2 > T < s 5 , s 2 >={3, 5, 7}
Table 2: Supports of itemsets < s5 , s6 > T < s 5 , s 6 >={10, 12, 15}
itemsets support
< s6 , s7 > T < s 6 , s 7 >={3, 4, 5, 6 }
s1 0.8
s2 0.7 According to the step 4 described in the section 3, the set
s3 of all suitable time-intervals for each sequences of FS2 are
0.5
s4 obtained as in Table 5.
0.2
s5 0.6 Next, each 2-sequence of FS2 is extended by all its
s6 1 suitable time-intervals to form FTIS 2 .
s7 0.6 1 1 1
FTIS 2 ={< s1 , T , s 2 >, 1, 2 < s1 , T , s 3 >,
1, 3 < s1 , T , s 6 >,
1, 6

Next, CS2 is generated by jointing FS1 × FS1 ; Supports of < s1 , T1,26 , s 6 >, < s1 , T11, 7 , s 7 >, < s 2 , T21, 6 , s 6 >, < s 2 , T22, 6 , s 6 >,
the sequences in CS2 are calculated and shown in Table 3. < s 2 , T21, 7 , s 7 >, < s 3 , T31, 6 , s 6 >, < s 3 , T31, 7 , s 7 >, < s 5 , T51, 2 , s 2 >,
Therefore, we obtain FS3 = {< s1 , s 2 >, < s1 , s3 >, < s1 , s 6 >,
< s 5 , T51, 6 , s 6 >, < s 6 , T61, 7 , s 7 >}.
< s1 , s 7 >, < s 2 , s6 >, < s 2 , s 7 >, < s3 , s 6 >, < s3 , s7 >, < s5 , s 2 >,
< s5 , s 6 >, < s 6 , s 7 >}.
Table 5: Suitable time-intervals of the sequences in FS2
For each frequent 2-sequence of FS2 , all its time FS2 time-intervals
intervals are recorded and listed in increasing order (Table < s1 , s 2 > T < s1 , s 2 >={ T11, 2 = [6, 12] }
4).
< s1 , s 3 > T < s1 , s 3 >={ T11, 3 = [3, 5] }
< s1 , s 6 > T < s1 , s 6 >={ T11, 6 = [8,12], T1,26 = [16,18] }
Table 3: Supports of sequences in CS2
< s1 , s 7 > T < s1 , s 7 >={ T11,7 = [12,15] }
CS2 support CS2 support
< s2 , s6 > T < s 2 , s 6 >={ T21, 6 = [ 2,7], T22, 6 = [12,15] }
< s1 , s 2 > 0.4 < s 5 , s1 > 0.2
< s2 , s7 > T < s 2 , s 7 >={ T21,7 = [16,20] }
< s1 , s 3 > 0.5 < s5 , s2 > 0.3
< s3 , s6 > T < s 3 , s 6 >={ T31,6 = [4,11] }
< s1 , s 5 > 0.2 < s5 , s3 > 0.2
< s3 , s7 > T < s 3 , s 7 >={ T31,7 = [8,12] }
< s1 , s 6 > 0.7 < s5 , s6 > 0.5
< s5 , s2 > T < s 5 , s 2 >={ T51,2 = [3,7] }
< s1 , s 7 > 0.4 < s5 , s7 > 0.1
< s5 , s6 > T < s 5 , s 6 >={ T51,6 = [10,12] }
< s 2 , s1 > 0.2 < s 6 , s1 > 0.1
< s6 , s7 > T < s 6 , s 7 >={ T61,7 = [3,6] }
< s2 , s3 > 0.2 < s6 , s2 > 0.1
< s2 , s5 > 0.0 < s6 , s3 > 0.1 CTIS3 , the candidate set of frequent time-interval 3-
< s2 , s6 > 0.6 < s6 , s5 > 0.1 sequences is generated by jointing FTIS2 × FTIS 2 . Supports
< s2 , s7 > 0.3 < s6 , s7 > 0.5 of the sequences of CTIS3 are calculated and shown in Table
< s 3 , s1 > 0.0 < s 7 , s1 > 0.1 6. A candidate frequent time-interval 3-sequence whose
support is greater than or equal to min − supp is called as a
< s3 , s2 > 0.1 < s7 , s2 > 0.2
frequent time-interval 3-sequence. Therefore, we can obtain
< s3 , s5 > 0.0 < s7 , s3 > 0.0
the set of all the frequent time-interval 3-sequences,
< s3 , s6 > 0.4 < s7 , s5 > 0.2 FTIS3 = {< s1 ,
< s3 , s7 > 0.3 < s7 , s6 > 0.1
T11, 2 , s 2 , T21, 6 , s 6 >, < s1 , T11, 3 , s3 , T31, 6 , s6 >, < s1 , T11,3 , s3 , T31, 7 , s 7 >,
< s1 , T11,6 , s6 , T61,7 , s 7 >, < s3 , T31, 6 , s 6 , T61, 7 , s7 >}.
Table 4: Time-intervals of the sequences in FS2
FS2 time-intervals
Table 6: Supports of sequences in CTIS3
< s1 , s 2 > T < s1 , s 2 >={6, 9, 12}
CTIS3 support
< s1 , s 3 > T < s1 , s 3 >={3, 4, 5}
< s1 , T , s 2 , T , s 6 >
1
1, 2
1
2,6
0.3
< s1 , s 6 > T < s1 , s 6 >={8, 10, 12, 16, 18}
< s1 , T11,2 , s 2 , T22,6 , s 6 > 0.1
< s1 , s 7 > T < s1 , s 7 >={12, 13, 15}
< s1 , T11,2 , s 2 , T21,7 , s 7 > 0.1
< s2 , s6 > T < s 2 , s 6 >={2, 3, 7, 12, 13, 15}
< s1 , T11,3 , s 3 , T31,6 , s 6 > 0.4
< s2 , s7 > T < s 2 , s 7 >={7, 16, 20}
< s1 , T , s 3 , T , s 7 >
1
1, 3
1
3,7
0.3
< s3 , s6 > T < s 3 , s 6 >={4, 5, 9, 11}
< s1 , T , s 6 , T , s 7 >
1
1, 6
1
6,7
0.4
20 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 8, August 2010

< s 2 , T21,6 , s 6 , T61,7 , s 7 > 0.1 [2] Y. L. Chen, M. C. Chiang, M. T. Ko, “Discovering
< s2 , T , s6 , T , s7 >
2 1 time-interval sequential patterns in sequence
2,6 6,7
0.2
databases,” Expert Systems with Applications, 25(3),
< s3 , T , s6 , T , s7 >
1
3, 6
1
6,7
0.3 pp. 343-354, 2003.
< s5 , T , s2 , T , s7 >
1
5,2
1
2,7
0 [3] M. S. Chen, J. Han, P. S. Yu, “Data mining: An
< s 5 , T51,6 , s 6 , T61,7 , s 7 > 0.1 overview from a database perspective,” IEEE
Transactions on Knowledge and Data Engineering,
8(6), pp. 866-883, 1996.
The candidate set of frequent time-interval 4-sequences,
[4] M. S. Chen, J. S. Park, P. S. Yu, “Efficient data
CTIS4 , is generated by jointing FTIS3 ×FTIS3 . Here, only mining for path traversal patterns,” IEEE Transactions
one sequence, < s1 , T11,3 , s 3 , T31,6 , s 6 , T61, 7 , s 7 >, is generated. on Knowledge and Data Engineering, 10(2), pp. 209-
221, 1998.
The support of the sequence < s1 , T11,3 , s 3 , T31,6 , s 6 , T61, 7 , s 7 > is
[5] M. H. Dunham, Data mining, Introductory and
0.3, thus < s1 , T11,3 , s 3 , T31,6 , s 6 , T61, 7 , s 7 > is also a frequent Advanced Topics, Pearson Education Inc., 2003.
time- interval 4-sequences, and we obtain [6] J. Han, G. Dong, Y. Yin, “Efficient mining of partial
periodic patterns in time series database,” In
FTIS4 = < s1 , T11,3 , s 3 , T31,6 , s 6 , T61, 7 , s 7 >. Because no
Proceedings of the 1999 International Conference on
next CTIS5 can be generated, the algorithm stops here. In Data Engineering, pp. 106-115, 1999.
addition, the time-interval of any subsequence of [7] J. Han, M. Kamber, Data mining: Concepts and
< s1 , T11,3 , s 3 , T31,6 , s 6 , T61, 7 , s 7 > can be obtained by using the Techniques, Academic Press, 2001.
[8] H. Mannila, H. Toivonen, A. Inkeri Verkamo,
operator introduced in subsection 3.2. “Discovery of frequent episodes in event sequences,”
From the above example, we can clearly see that the Data Mining and Knowledge Discovery, 1(3), pp. 259-
suitable time-intervals for every pair of successive itemsets 289, 1997.
[9] J. Pei, J. Han, H. Pinto, Q, Chen, U. Dayal, M.-C. Hsu,
are different and overlap, therefore, it is more reasonable to
“PrefixSpan: Mining sequential patterns efficiently by
generate the suitable time-intervals directly from the real
prefix-projected pattern growth,” In Proceedings of
sequence data for every pair of successive itemsets when 2001 International Conference on Data Engineering,
mining time-interval sequential patterns. pp. 215-224, 2001.
[10] R. Srikant, R. Agrawal, “Mining sequential patterns:
5. Conclusion Generalizations and performance improvements,” In
Proceedings of the 5th International Conference on
In this paper, we present an adoptive algorithm for mining Extending Database Technology, pp. 3-17, 1996.
time-interval sequential patterns. A sequential pattern with [11] P. H. Wu, W. C. Peng, M. S. Chen, “Mining sequential
the time-intervals between successive itemsets is more alarm patterns in a telecommunication database,” In
valuable than a traditional sequential pattern without any Proceedings of Workshop on Databases in
time information. Most proposed algorithms reveal the time- Telecommunications (VLDB 2001), pp. 37-51, 2001.
intervals between itemsets by using some predefined non-
overlap time partitions, but this way, in fact, may not be Authors Profile
suitable for every pair of successive itemsets. To solve this
problem, the proposed algorithm uses clustering analysis to Hao-En Chueh received the Ph.D. in
automatically generate the suitable time-intervals between Computer Science and Information
frequent occurring pairs of successive itemsets, and then Engineering from Tamkang University,
uses these generated time-intervals to extend typical Taiwan, in 2007. He is an Assistant
algorithms to discover the time-interval sequential patterns Professor of Information Management at
without pre- defining any time partitions. In addition, a Yuanpei University, Hsinchu, Taiwan. His
research interests include data dining,
useful operator for computing the time-interval of a
fuzzy set theory, probability theory,
subsequence of a frequent time-interval sequential pattern is statistics, database system and its
also introduced in this paper. From the result of the applications.
example, we can conclude that because the time-intervals
between successive itemsets are quite different and overlap,
it is more reasonable to generate the suitable time-intervals Yo-Hsien Lin received the Ph.D. in
directly from the real sequence data when mining time- information management from the
interval sequential patterns. National YunLin University of Science
and Technology, Taiwan, in 2008. He is
an Assistant Professor of Information
References Management at the Yuanpei University,
Hsinchu, Taiwan. His research interests
[1] R. Agrawal, R. Srikant, “Mining sequential patterns,” include bio-inspired systems, neural
In Proceedings of the International Conference on Data networks, evolutionary computation,
Engineering, pp. 3-14, 1995. evolvable hardware, intelligence system
chip, biocomputing, pattern recognition,
and medical information management.

Вам также может понравиться