Вы находитесь на странице: 1из 8

Learning DFA: Evolution versus Evidence Driven State Merging

Simon M. Lucas and T. Jeff Reynolds


Department of Computer Science
University of Essex
Colchester, Essex CO4 3SQ
{sml,reynt}@essex.ac.uk
Abstract
Learning Deterministic Finite Automata (DFA) is a hard
task that has been much studied within machine learn-
ing and evolutionary computation research. This paper
presents a new method for evolving DFAs, where only the
transition matrix is evolved, and the state labels are cho-
sen to optimize the t between nal states and training set
labels. This new procedure reduces the size and in partic-
ular, the complexity, of the search space. We present re-
sults on the Tomita languages, and also on a set of random
DFA induction problems of varying target size and training
set density. The Tomita set results show that we can learn
the languages with far fewer tness evaluations than pre-
vious evolutionary methods. On the random DFA task we
compare our method with the Evidence Driven State Merg-
ing (EDSM) algorithm, which is one of the most powerful
known DFA learning algorithms. We show that our method
outperforms EDSM when the target DFA is small (less than
32 states) and the training set is sparse.
1 Introduction
Learning deterministic nite automata (DFA) from sam-
ples of labelled data is an interesting problem that has been
extensively studied. It has been shown to be a hard task by
a number of criteria [14, 8], and is a good benchmark for
evaluating machine learning algorithms. More discussion
can be found in [5, 13]. An attractive property for compar-
ing learning algorithms is that tasks can readily be gener-
ated which vary in the complexity of the target that has to
be learnt as well as the amount of data which is supplied to
the learning process. This paper compares an evolutionary
method with the Evidence Driven State Merging (EDSM)
algorithm due to Price and appearing in Lang, Pearlmutter
and Price [9]. This was shown to be a very successful algo-
rithm for learning DFA in the Abbadingo One competition
[9], and has since been rened by Cicchello and Kremer [3].
There have been many attempts to learn DFA(or their as-
sociated regular languages) using evolutionary approaches
[4, 11], by training recurrent neural networks using appro-
priate variations of the back-propagation training algorithm
[6, 19], and also by evolving recurrent neural networks [1].
Note that to learn some DFA that is consistent with the
training data is a trivial problem - one could simply con-
struct the prex tree acceptor. Hence it is usual to add a
further constraint, that the DFA should generalise to unseen
test data - or that the challenge is to nd the smallest DFA
that is consistent with the training set; these two goals are
generally mutually compatible.
2 Deterministic Finite Automata
A Deterministic Finite Automaton (DFA) is convention-
ally dened as a 5-tuple (Q, , , q
0
, F), where Q is the set
of states, is the set of input symbols, : Q Q
is the state transition function, q
0
is the start state and F
(where F Q) is the set of accepting states. This form of
DFA is also known as a complete DFA since there is a tran-
sition from every state to some other state for every input.
For convenience of representation, and also to generalize
the problem to multi-class problems, we adopt a modied
representation described next.
2.1 Representation and Search Space
We enumerate the states Q as the set of integers from
0 . . . (n 1), where there are n states, and we always use
state zero as the start state. Similarly, for the set of inputs
we use the set of integers in the range 0 . . . ( 1). The
transition function is implemented as a matrix of size (n
) indexed on current state and current input, where each
element of the matrix is in the range 0 . . . (n 1).
We label each state with its output class where each la-
bel is in the range 0 . . . (c 1), where c is the number of
string classes. The vector o represents these labels, where
1
o
i
is the output label for state i. For conventional DFA us-
age of accepting or rejecting strings the number of possible
outputs is two, with some arbitrary convention chosen such
as zero for an accepting state and one for a rejecting state.
This denition, however, can be used for c-way classica-
tion problems, and is not restricted to merely accepting or
rejecting strings.
Using this representation, the only possible aspects of
the system to evolve are the number of states, the state la-
bel vector, and the state transition matrix. For each run of
the evolutionary algorithm (EA, described in the next Sec-
tion) we can either x the number of states by using prior
knowledge of the target DFA size, or else we can re-run the
EA with an increasing number of states until optimal per-
formance is achieved on the training set, or we exceed a
time-bound. In all the experiments reported below we xed
the number of states using prior knowledge of the problem.
When we run the EA with the number of states xed at
n, we can partition the search space into two parts: the tran-
sition matrix and the output vector. The size of the search
space, S, is given in Equation 1 below.
S = n
(n)
c
n
(1)
Note that the true search space is smaller than this at least
by a factor of (n1)! due to the existence of isomorphisms.
For all the experiments reported in this paper we deal
with binary input strings ( = 2) and a two class problem,
where each string is accepted or rejected (c = 2).
S = n
2n
2
n
(2)
The 2
n
term comes from the number of ways of labelling
the states. We eliminate this from the search using our smart
labelling scheme, described next.
2.2 Smart Tuning the Output Labels
If we evolve both the transition matrix and the output
vector, there will certainly exist some epistasis or depen-
dency between these two parts. This means that to improve
a particular solution, the EA may need to make adjustments
both to the transition matrix and to the set of output labels.
To overcome this, we devised an algorithm for optimally
selecting the output labels given the transition matrix and
the training set - we refer to this as Smart Tuning the Output
Labels.
The procedure is simple. Let h[i][c] be an array denoting
the number of times the DFA nished in state i for pattern
class c. Given a set of training strings T, and having initial-
ized the elements of h to zero:
for each string t in T increment h[f(t)][c], where f(t) is
the nal state reached on completion of reading string
t, and c is the class of t.
For each state we then choose the output label:
o
i
= argmax
c
h[i][c] (3)
This is a very efcient procedure which can be directly
incorporated into the tness function, since the number of
correctly classied strings is simply the sum over all states
of the argmax terms from Equation 3 above.
The search space is now:
S = n
2n
(4)
Hence, we reduce the search space by a factor of 2
n
. We
found that this simple trick allowed the EA to nd the op-
timal solution in signicantly fewer evaluations than when
we evolved both the transition matrix and the output vector
(i.e. the state labels). This is to be expected, since for each
transition matrix under consideration we directly determine
the optimal output vector.
3 Evolutionary Algorithm (EA)
We used a multi-start random hill-climber. This simpli-
es the design of the EA, as it obviates the need to experi-
ment with population size, selection methods, and more sig-
nicantly, the need to dene a meaningful crossover opera-
tor that avoids the problem of competing conventions [16].
Random hill-climbers, also known as a (1, 1) Evolutionary
Strategy [2], are the simplest form of evolutionary algo-
rithm, but often perform competitively with more complex
EAs [12].
The random hill climber takes a random solution, then
each time around the loop produces a mutated version
which it evaluates. If the mutated version has better than
or equal tness to the current solution, it is accepted, oth-
erwise it is rejected. Using our DFA representation no mu-
tated copies are made. Instead, the current solution is mod-
ied in-place, and the change is reverted if it causes a de-
crease in performance. The algorithm also notes whether
any improvement took place, and keeps a count of the num-
ber of steps taken without an improvement in tness. If the
number of evaluations since restarting exceeds a parame-
ter we call noImprovementLimit, then the current solution is
recorded and the hill-climber is restarted. This procedure is
run until a perfect score on the training set is achieved, or
until the number of allowed tness evaluations maxEvals is
reached. The best solution from all restarts is then returned
as the nal result.
3.1 Fitness Function
We use a simple measure of tness: the proportion of
strings in the training set that are classied correctly. Hence,
a DFA that classies every string incorrectly will have a
2
tness of zero, and one that classies every string correctly
will have a tness of one. On balanced datasets (where the
number of strings in each class is approximately equal) we
expect randomly constructed DFAs to have a mean tness
of 0.5.
3.2 Fitness Evaluation Efciency
Note that one of the benets of evolving DFAs directly in
this way, as compared with evolving a neural network to act
as a DFA, is that the time taken for each tness evaluation
is much reduced. Signicantly, the cost of tness evalua-
tion is independent of the number of states in the DFA, and
depends only linearly on the sum of the lengths of each se-
quence in the training set.
4 Evidence Driven State Merging Algorithm
The Evidence Driven State Merging (EDSM) algorithm
emerged from the Abbadingo One competition [9], held in
1997. The intention of this competition was to stimulate
both theoreticians and experimentalists to try out their ideas
on signicant DFA learning problems. Among its strengths
were that a number of large unseen problems were made
available for solution and the possibility of test-set tuning
was avoided by only providing 1 bit of information to an
entrant algorithm i.e. success or failure (at 99% classica-
tion accuracy).
A DFA can be simply inferred by constructing a prex
tree equivalent to the training data and progressively merg-
ing states [18]. For the Abbadingo One competition the
nominal size of each target DFA is known. If the initial
prex tree is larger than the target size then the operation of
merging states is presumed to lead towards the target DFA.
In general many valid merges will be possible at any step. A
valid merge preserves the consistency of the DFA with the
training data and normally produces a new DFA which is
a generalization of the previous one. The new DFA is said
to be correct if it is consistent with the target machine.
However, without exhaustive training data, merges amount
to guesses about the behaviour of the target machine on un-
seen data. These guesses can turn out to be wrong. In par-
ticular it is important to guess correctly in the early stages
or recovery is impossible.
Aplausible strategy is to search through the space of pos-
sible merge choices. This would nd all DFAs accessible
from the initial prex tree. In practice the space is too large
for a complete search, though a beam search can be success-
ful [7] in some cases. A beam search is still expensive, how-
ever, and failed to solve the more difcult problems in the
Abbadingo competition. A more successful approach was
to focus on minimizing the risk of making wrong guesses by
weighing heuristic evidence and choosing the merges which
are most likely to lead to the target machine. The EDSM al-
gorithm scores a possible merge by counting the number of
tests that the merge passes while being validated. If the tests
are assumed independent and have an equal probability of
success then the merge which passes the most tests has the
highest probability of being correct.
A further improvement is obtained by considering all
possible pairs of states as candidates for merging. In gen-
eral this requires the recursive consideration of the children
of the candidate pair. Each potential merge generates a par-
tition of the DFA states into equivalence classes and a to-
tal score can be computed for the multiple merge implied.
When the best score is found all states can be merged in
one step. In practice EDSM can be speeded up by eliminat-
ing states from consideration for merging if they are more
than a certain depth from the initial state [9]. This does not
detract signicantly from its performance.
Potential merges may tie for rst place. In our
implementation we search through the possible merges
(p,q) in upper triangular matrix raster scan order
[(0, 1), (0, 2), . . . , (1, 2), (1, 3), . . .] and will select the rst
of any tying merges that we come across. We tested our im-
plementation of EDSM by verifying that it obtained similar
results to those reported in [9].
Note that the EDSM algorithm has been further rened
by Cicchello and Kremer. They analyzed the EDSM heuris-
tic and found it to be very good in the later stages of conver-
gence to the correct target DFA but less good at the initial
choices of merge. In fact they found that a 27% improve-
ment could be made over EDSM by searching through the
choices made in the rst 5 merges. We would expect to im-
prove the EDSM results reported below by adding search to
the algorithm, but at the expense of speed.
5 Experimental Setup
Each method under test (EDSM and EA) is seen as a
black box that takes as input a set of labelled strings, and
produces as output a DFA its best guess at the underlying
DFA.
This is the true interface for EDSM, but for the EA there
is another input: the maximum allowed size of DFA. In this
sense, the algorithms are not truly comparable, since the
EA requires more guidance than EDSM. This can be seen
as both an advantage and a disadvantage. In cases where we
know the size of the target, being able to inform the learner
of this gives it an advantage. In general though, this is usu-
ally more of a hinderance, and the learner that gures out
the size of the target DFA for itself would normally be pre-
ferred. Note that all evolutionary methods suffer from this
drawback, and either choose a xed size that is expected to
be sufcient, or in the case of Genetic Programming meth-
ods, usually adopt some method of controlling the size dis-
3
tribution of the population see [15] for recent ideas on
this.
We also compared our system with the Genetic Pro-
gramming method of Luke et al [11]. In this approach each
DFA within a population is represented by a genome which
contains an unbounded number of genes. Each gene rep-
resents a state of the DFA. Further attributes of each gene,
called a chemical template, are used to represent gene reg-
ulation which controls state transitions when the DFA is
exercised.
It should be noted that we are evolving a xed-size struc-
ture (though we could easily modify this by allowing inser-
tion and deletion of states), so the comparison with vari-
able size methods such as [11] could be considered un-
fair. More work would be needed to properly analyze this,
though Luke et al do not give the size distribution of their
initial population.
To enable direct comparison of results with [11] we used
the average of the per-class accuracies as the measure of
test-set accuracy. This measure is preferred when the lan-
guages are unbalanced e.g. when there are more strings in
the language than not in the language - which is the case for
the Tomita languages.
6 Results
6.1 Tomita Languages
A common benchmark for learning DFAs is the Tomita
suite of 7 target DFAs rst dened in [17]. We used the
training set specied in [11]. We note that the description of
Tomita-3 and the regular expression given in [11] disagree;
we chose to use the description and re-classied two of the
training strings as a consequence. The results reported in
that paper are among the best known for an evolution-based
system on the Tomita languages, so we have included them
for comparison with out system. We rst studied the num-
ber of function evaluations needed to nd a DFA consis-
tent with the training set, comparing a standard random hill
climbing approach (plain) with our optimal state labelling
method (smart). For the standard random hill climber, we
xed the number of states to be 10. We report results from
two versions of our Smart method. Smart shows results
where we also xed the maximum number of states to be
10, whereas for nSmart we set the number of states to be
exactly the number of states in the minimal DFA consis-
tent with the training set. Table 1 summarizes these results,
showing the average number of tness evaluations needed
by each system, together with the Genetic Programming
(GP) system of Luke et al [11]. Note that in all cases the
simple random hill-climber requires far fewer tness evalu-
ation than the GP method, and that in all cases apart from
language 1, the Smart version requires far fewer than the
plain one.
For the these experiments we set maxEvals to 100,000
and noImprovementLimit to 2n
2
, the latter based on the size
of the neighbourhood.
Interestingly, when we x the number of states for each
problem (nSmart) the number of average evaluations may
signicantly increase, as observed on problems 3,4,5 and
7. This may seem counter-intuitive, since on the face of
it weve reduced the size of the search space. However,
searching for the minimal DFA is a harder problem than
searching for some larger consistent DFA that may have
some slack in it.
Tomita No. n Plain Smart nSmart GP
1 2 107 25 15 30
2 3 186 37 40 1010
3 5 1809 237 833 12450
4 4 1453 177 654 7870
5 4 1059 195 734 13670
6 3 734 93 82 2580
7 5 1243 188 1377 11320
Table 1. Number of states in minimal DFA for
each Tomita language, and average number
of tness evaluations required by each sys-
tem to learn the training set.
Table 2 summarizes the generalization accuracy of our
EAs, EDSM and GP on this learning task. We ran the
EAs with 20 different random seeds for each target. We ran
EDSM only once for each language because our EDSM is
deterministic for a given training set.
The performance over the 7 Tomita targets indicates that
the EA has the best performance when we tell it the num-
ber of target states (nSmart), but if we omit this information
and just choose some arbitrary small value for n, then the
test set accuracy is much poorer. However we really need
a bigger test so our next section will deal with larger, ran-
domly generated DFAs.
In every single run nSmart found a minimal DFA con-
sistent with the training data. Therefore, cases where the
average test set accuracy of nSmart is less than 100% indi-
cate that the problem is under-specied, and that there ex-
ists many distinct minimal DFA that are consistent with the
training set. In these cases getting the correct DFA is down
to chance.
To demonstrate this point with a specic example con-
sider two examples of three-state DFAs learned by the EA
for language 2 (Figures 1 and 2 start state outlined in
bold). The DFA in Figure 1 scores 100% accuracy on the
test set, while the one shown in Figure 2 scores only 83%.
4
Tomita No. Smart nSmart EDSM GP
1 81.8 100 52.4 88.4
2 88.8 95.5 91.8 84.00
3 71.8 90.8 86.1 66.3
4 61.1 100 100 65.3
5 65.9 100 100 68.7
6 61.9 100 100 95.9
7 62.6 82.9 71.9 67.7
Table 2. Average test set accuracy on Tomita
languages for various methods.
1
0
1
0
0
1
0
0
1
Figure 1. An evolved minimal DFAwith perfect
generalization for language 2.
This latter DFA is more frequently produced by the EA.
This implies that the search landscape induced by our oper-
ators and the training set just happens to make the EA more
likely to nd the incorrect DFA. From a general DFA learn-
ing perspective, however, both are minimal consistent DFAs
and should really be considered as equally good solutions,
given the training set.
6.2 Run-times
Table 3 shows the average elapsed time to learn the
Tomita training sets. All timings are based on Java imple-
mentations running on a 2.4GHz Pentium. Note that the
smart hill climber signicantly outperforms the plain hill-
climber. This is partially explained by the fewer number of
tness evaluations required (see Table 1), and partially by
the reduced book-keeping, since in the smart hill-climber
we replace the copy/mutate operation with an in-place mu-
tation.
1
0
1
0
0
1
0
0
1
Figure 2. An evolved minimal DFA with 83%
test set score for language 2.
Algorithm t (ms)
EDSM 37
Plain 33
Smart 1.6
Table 3. Average elapsed time in milliseconds
to learn the Tomita languages.
6.3 Random Target DFAs
We followed the Abbadingo One [9] style of DFA and
dataset generation. To generate a random DFA of nominal
size n, we do the following:
1. Generate a random degree-2 digraph with 5*n/4 nodes
(=states).
2. Choose an initial state randomly.
3. Find all states reachable from the initial state.
4. Label all states with a toss of a fair coin.
DFAs generated this way have a size centred near n and
a depth centred near 2 log
2
n 2. We follow Abbadingo
in generating DFAs until we nd one that has depth of ex-
actly 2 log
2
n 2. This does not really suit our EA, which
would rather have a target that has an exactly known number
of states, however we retain comparability. For each target
DFAwe generate a training set by drawing, without replace-
ment, from the set of 16n
2
1 possible input strings from
length 0 . . . 2log
2
n + 3 inclusive. The number of training
5
examples is varied from a density of 0.01 to 0.20. Density
is dened as the proportion of the total number of possi-
ble input strings. The test set is the set of remaining input
strings. EDSM is run once where each merge is the best
indicated by the heuristic, i.e. no search is performed.
For the EA we set maxEvals to 1,000,000 and noIm-
provementLimit to 10,000. The size of DFA that each EA
is set to generate is 5*n/4 where n is the nominal size. This
number is a compromise - it means that the EA is set to run
with more than enough states needed to solve the problem -
though of course it may choose not to use some of the states.
For each nominal size n rising from 4 to 16 in powers
of 2 and each density we generated 100 different random
DFA targets along with 100 different random training sets.
We drop the repetitions to 10 for nominal size 32, simply
because of the long time each experiment takes. Each of
the gures below show graphs of the average performance
of both EDSM and the EA as they vary against training set
density. Error bars show the standard errors calculated from
the data taken. Several points emerge from the results:
Both algorithms perform very poorly on extremely
sparse data, i.e. their performance on unseen test data
is close to the random expectation of 0.5. A possible
exception to this is that the EA is better on extremely
sparse data for the smallest target DFAs (nominal size
4 states).
For targets in the nominal range (n = 4 . . . 16), and
in a middle range of sparsity the EA performs signif-
icantly better. This better performance is lost as we
scale to larger targets (n = 32), and EDSM is better.
As more training data is made available both algo-
rithms approach 100% performance on the test set, i.e.
the target DFA is successfully learnt.
Averages and error bars do not tell the whole story. It
is also of interest how often both algorithms can be said to
actually nd the target successfully. The Abbadingo One
competition chose to use a performance of better than 0.99
on an unseen test set as an indication that the target has
been found. Table 4 shows how often the two algorithms
succeed by this success measure. We pick on a density of
0.08 because it represents a point of contrast between the
two algorithms. Note that in the case of n = 8 the EA is
twice as likely to be successful than EDSM. This superiority
diminishes for n = 16, and is reversed for n = 32.
The algorithms fail for opposite reasons. When the EA
fails, it is because it has failed to learn the training data. On
the other hand, EDSM always returns a DFA that is con-
sistent with the training data, but often fails to generalize.
When it does fail, the DFA it constructs is usually signi-
cantly larger than the target DFA.
0. 0.04 0.08 0.12 0.16 0.2
0.5
0.6
0.7
0.8
0.9
1.0
EA
EDSM
density
fitness
Figure 3. EDSM versus smart hill climber (tar-
get n=4).
0. 0.04 0.08 0.12 0.16 0.2
0.5
0.6
0.7
0.8
0.9
1.0
EA
EDSM
density
fitness
Figure 4. EDSM versus smart hill climber (tar-
get n=8)
0. 0.04 0.08 0.12 0.16 0.2
0.5
0.6
0.7
0.8
0.9
1.0
EA
EDSM
density
fitness
Figure 5. EDSM versus smart hill climber (tar-
get n=16).
6
Target Size Smart EDSM nRuns
4 27 24 100
8 35 18 100
16 41 34 100
32 3 7 10
Table 4. Number of successful runs on sparse
data with respect to nominal target size.
0. 0.04 0.08 0.12 0.16 0.2
0.5
0.6
0.7
0.8
0.9
1.0
EA
EDSM
density
fitness
Figure 6. EDSMversus smart hill climber. (tar-
get n=32)
7 Discussion
We have taken a very simple approach to evolving DFAs.
The method not only appears to outperform other evo-
lutionary approaches (e.g. [11, 1]), but also outperforms the
powerful heuristic method EDSM on a certain class of prob-
lem. It would be interesting to compare with Sage [7], but
weve not yet had chance to do this.
7.1 Can we be smarter?
So far we have observed that applying a smart state la-
belling scheme makes the job of evolving a DFA much sim-
pler than attempting to simultaneously evolve both the tran-
sition matrix and the state labels. It is therefore natural
to look for other ways in which we may improve the per-
formance of the random hill-climber. Some investigation
shows that there is indeed such a method, but we have not
yet implemented or evaluated it. The method is based on
the observation that we can assign credit to the transitions
of the DFA based on the input strings they are involved in
processing. Suppose we keep a count for each transition of
the number of correctly recognized strings each transition
is involved in, and also a count for the number of incor-
rectly recognized strings it is involved in. Since we know
the total number of strings in the training set, this allows us
to calculate a measure of tness for each transition as the
proportion of total strings that the transition is involved in
labelling correctly.
Modifying a transition can only affect the score for the
strings that it is involved in classifying. If these are all in-
correctly classied strings, then the modication will either
maintain or increase the overall score. In particular, some
transitions are unreachable, and are not involved with clas-
sifying any strings - it would seem futile to spend time mod-
ifying these. Note, however, that this is a measure that must
be recalculated for each tness evaluation, since a previ-
ously unused transition can become highly used as a result
of modifying a different transition in the matrix.
Hence, a modied sampling procedure should improve
the performance of the hill climber by avoiding the waste of
time involved in making and evaluating futile modications.
7.2 Trick or General Principle?
The results demonstrate a signicant improvement using
the smart labelling method compared to evolving the entire
DFA. A question that naturally arises is whether we should
see this as a neat trick that improves performance on the
problem of DFA induction, or is it a more general principle
that can be applied elsewhere?
One immediate possibility would be to apply the same
principle to evolving Finite State Transducers (FSTs). Lu-
cas [10] obtained FSTs by evolving both the state transition
matrix and the output matrix. It would be interesting to in-
vestigate evolving only the transition matrix for the FST,
leaving the entries of the output matrix to be free variables
whose values are chosen to optimize the transduction score
on the training set. While this is a little more complex than
the procedure for optimally assigning state labels for the
DFA, and depends on the tness function used, it should
still be possible to formulate an efcient method for doing
this.
8 Conclusions
In this paper we presented a new scheme for evolving
DFA. The method is simple: use a multi-start random hill
climber to optimize the transition matrix of the DFA, and
use a smart state labelling scheme to optimally choose the
state labels, given the transition matrix and the training set.
We evaluated the scheme on two types of data: the
Tomita languages, and randomly constructed target DFAs
with randomly constructed training samples of varying den-
sity. On the Tomita languages we nd that our system learns
a small DFA consistent with the training set typically in
many fewer tness evaluations than previous evolutionary
methods. We show that whether or not the generalization
is better than other methods is a dubious question to ask.
7
Faced with a number of distinct minimal DFA that are con-
sistent with the training set, picking one with perfect gener-
alization is more of a lottery than a scientic process.
The average time taken to learn a Tomita language with
our method is 1.6ms, which compares very favourably with
other methods. On the (Abbadingo style) random DFA,
we nd that our evolutionary method outperforms the well-
known heuristic method EDSM when the target DFA are
small, and the training sample is sparse. For larger ma-
chines with 32 states, our evolutionary method fails and
EDSM then clearly outperforms it. We are currently inves-
tigating ways of making our evolutionary approach perform
better on these larger problems.
Acknowledgements
The authors would like to thank the members of the Nat-
ural and Evolutionary Computation group at the University
of Essex, UK, for helpful comments and discussion.
References
[1] P. J. Angeline, G. M. Saunders, and J. P. Pollack. An evolu-
tionary algorithm that constructs recurrent neural networks.
IEEE Transactions on Neural Networks, 5(1):5465, Jan-
uary 1994.
[2] H.-G. Beyer. Toward a theory of evolution strategies:
The (mue, lambda)-theory. Evolutionary Computation,
2(4):381407, 1994.
[3] O. Cicchello and S. C. Kremer. Beyond EDSM. Lecture
Notes in Computer Science, 2484:3748, 2002.
[4] P. Dupont. Regular grammatical inference from positive and
negative samples by genetic search: The GIG method. In
R. C. Carrasco and J. Oncina, editors, Grammatical Infer-
ence and Applications (ICGI-94), pages 236245. Springer,
Berlin, Heidelberg, 1994.
[5] P. Dupont, L. Miclet, and E. Vidal. What is the search space
of the regular inference ? In R. C. Carrasco and J. Oncina,
editors, Grammatical Inference and Applications (ICGI-94),
pages 2537. Springer, Berlin, Heidelberg, 1994.
[6] C. Giles, G. Sun, H. Chen, Y. Lee, and D. Chen. Higher
order recurrent neural networks and grammatical inference.
In D. Touretzky, editor, Advances in Neural Information
Processing Systems 2. Morgan Kaufman, San Mateo, CA,
(1990).
[7] H. Juille and J. B. Pollack. A sampling-based heuristic for
tree search applied to grammar induction. In Proceedings of
the Fifteenth National Conference on Articial Intelligence
(AAAI-98), Madison, Wisconsin, USA, 26-30 1998. AAAI
Press Books.
[8] M. Kearns and L. G. Valiant. Cryptographic limitations on
learning Boolean formulae and nite automata. In Proceed-
ings of ACM Symposium on Theory of Computation (STOC-
89), pages 433444, 1989.
[9] K. J. Lang, B. A. Pearlmutter, and R. A. Price. Results of
the Abbadingo One DFA learning competition and a new
evidence-driven state merging algorithm. Lecture Notes in
Computer Science, 1433:112, 1998.
[10] S. M. Lucas. Evolving nite state transducers: Some initial
explorations. In Proceedings of 6th European Conference
on Genetic Programming, pages 130 141, 2003.
[11] S. Luke, S. Hamahashi, and H. Kitano. Genetic Program-
ming. In W. Banzhaf et al , editor, GECCO-99: Proceedings
of the Genetic and Evolutionary Computation Conference.
Morgan Kaufmann, 1999.
[12] M. Mitchell, J. Holland, and S. Forrest. When will a ge-
netic algorithm outperform hill climbing? In J. Cowan,
G. Tesauro, and J. Alspector, editors, Advances in Neural
Information Processing Systems 6, pages 51 58. Morgan
Kaufman, San Mateo, CA, 1994.
[13] A. L. Oliveira and J. P. M. Silva. Efcient search techniques
for the inference of minimum size nite automata. In String
Processing and Information Retrieval, pages 8189, 1998.
[14] L. Pitt and M. Warmuth. The minimum consistent DFA
problem cannot be approximated within any polynomial.
Journal of the ACM, 40(1), 1993.
[15] R. Poli. A simple but theoretically-motivated method to con-
trol bloat in genetic programming. In Proceedings of 6th
European Conference on Genetic Programming, pages 204
217, 2003.
[16] J. D. Schaffer, D. Whitley, and L. J. Eshelman. Combina-
tions of genetic algorithms and neural networks: a survey of
the state of the art. In D. Whitley and J. D. Schaffer, edi-
tors, COGANN-92, Int. Workshop on Combinations of Ge-
netic Algorithms and Neural Networks, pages 137. IEEE
Computer Society, 1992.
[17] M. Tomita. Dynamic construction of nite automata from
examples using hill climbing. In Proc. of the 4th Annual
Cognitive Science Conference, USA, pages 105108. 1982.
[18] B. A. Trakhtenbrot and Y. M. Barzdin. Finite Automata.
North-Holland, Amsterdam, 1973.
[19] R. Watrous and G. Kuhn. Induction of nite-state automata
using second-order recurrent networks. In J. Moody, S. Han-
son, and R. Lippmann, editors, Advances in Neural Informa-
tion Processing Systems 4, pages 309 316. Morgan Kauf-
man, San Mateo, CA, 1992.
8

Вам также может понравиться