A regular expression can also be represented graphically. The regular
expression that recognizes integer and real constants (without sign) looks like this: ((di (di)*) | (di (di)* . di (di)* (E di di | ))) The edge label di for digit represents the set 0 to 9. So each edge labelled di replaces 10 edges labelled 0 to 9 with the same input and output nodes. In the accompanying graph, the so-called transition diagram , the edge labels indicate what is to be read:
As an example, we take the real constant 123.45E12
In doing so, we read the input character by character from left to right. From the node 0, upon reading the digit "1", one can reach the node 1 as well as the node 2. On subsequently reading the digit "2", there are two possible transitions: from the node 1 again to the node 1 and analogously from the node 2 again to the node 2. The same transitions take place on reading the character "3".
Now the character "." is read, and in the transition diagram, there exists no transition from node 1 under this character. So there is no path over the input into a final state via the node 1, and therefore this path is rejected. There is, however, a transition from node 2 to node 3 under the character ".". This path is continued in an analogous way.
Eventually, there exists a path from the start node 0 to an end node, here the node 7, for which the sequence of edge labels corresponds to the input word. The language accepted by the transition diagram is the set of words for which there exists such a path from the start node to an end node. In our example, a real constant was recognized correctly. This path was marked in the drawing above.
Nondeterministic Finite Automata (NFA) A mechanism equivalent to transition diagrams are nondeterministic finite automata (short: NFA). NFA consist of: a variable that can only take on a finite number of different states; a read head with which the input tape will be read from left to right; a transition relation that forms the control of the automaton; an initial state and one or several final states.
An example: NFA that recognizes integer and real constants Alphabet = { 0, 1, ..., 9, ., E} Set of States Q = { z0, z1, ..., z7} Initial State q0 = z0 Final State F = {z1, z7} Transition Relation Q x ( { }) x Q :
Input 123.45E12
The NFA starts in the state z0 and reads the first character "1" from the input tape. Thus it either enters the state z1 or the state z2. In the state z1, it can only read the subsequent digits; for the point ".", there is no transition in the table. In the state z2, it can also read the subsequent digits, and for the character ".", there exists a transition to the state z3. So under the character ".", it enters the state z3 and so on. In the end, it enters the state z7, that is a final state, and accepts the input as a real constant. We note that the NFA and the transition diagram have the same behaviour. Let us now look more carefully at the correspondence between NFA and transition diagram :
From now on we will mostly use transition diagram instead of NFA to simplify matters. An important relation of finite automata and regular expressions is given in the following theorem: For every regular expression r, there is a nondeterministic finite automaton that accepts the regular set described by r. The proof of this theorem is constructive and is described by the algorithm RE (regular expressions) to NFA:
Start with
Decompose the regular expression on the edges and refine the transition diagram until all edges are labelled by characters from or .
Algorithm: RE -> NFA Input: Regular expression r over Output: Transition diagram of an NFA Method: Start:
Apply the following rules to the current transition diagram until all the edges are labelled by characters from or . The nodes on the left sides of the rules are identified with nodes in the current transition diagram. All newly occurring nodes on the right side of a rule correspond to newly created nodes and thus to new states.
Animation of the Algorithm RE
NFA
NFA without -transitions
DFA
DFA minimal
Deterministic Finite Automaton (DFA) In contrast to the NFA, the deterministic finite automaton (DFA) has no -transitions for every (q,a) with q Q and a at most one successor state. Thus, for every word w * , there is at most one w path in the transition diagram for the DFA M. If w is in the language of M, then this path leads from the initial state to a final state without M having to employ the 'guesswork' of a nondeterministic automaton. Therefore, DFA are preferred for practical use. Theorem: If a language L is accepted by an NFA, then there is a DFA that accepts L. The proof is constructive and uses the so-called subset construction. If there are two w paths, the corresponding states form a common state in the DFA.
This construction is described in the algorithm NFA -> DFA. It uses the two terms DFA associated with an NFA and successor states .
lgorithm: NFA -> DFA Input: NFA M = ( , Q, , q o , F) Output: DFA M'' = ( , Q'', , q o '' , F'') Method: Step 1: Removal of the edges labelled . Apply the following rules on the current transition diagram until all edges labelled are removed from the diagram. The edges on the left sides of the rules are identified with edges in the current transition diagram. All newly occurring edges on the right side of a rule correspond to newly created edges and thus to new transitions. After all rules that are applicable on an edge labelled have been applied, this edge is removed.
Let the NFA without edges labelled formed by these rules be: M' = ( , Q', , q o ' , F').
Animation of the Algorithm RE
NFA
NFA without -transitions
DFA
DFA minimal
Step 2: NFA M' without -transitions DFA M''. States in M'' are sets of states of M'. The starting state of M'' is - SS(q o '). The additional states generated for Q'' are marked, as soon as their successor states or transitions under all symbols of have been generated. The marking of generated states is determined in the partial function marked:P(Q)->bool. q o '' := -FZ (q o '); Q'' := {q o ''}; marked(q o '') := false; := ; while (there exists S Q''and marked(S) = false) do marked(S) := true; foreach a do T:= {p Q' | (q, a, p) } and q S}; if T Q'' then Q'':=Q'' {T}; marked(T) := false fi; := {(S, a) T}; od; od;
Animation of the Algorithm RE
NFA
NFA without -transitions
DFA
DFA minimal
Minimization of DFA The deterministic finite automata generated from regular expressions in the two steps RE->NFA and NFA->DFA are in general not the smallest possible accepting the source language. There may be states with the same "acceptance behaviour". This applies to states p and q, if for all input words, the automaton always or never moves to a final state from p and q.
In the example
state {1,2,3} and state {1,3,4} have the same acceptance behaviour: from both states, the final state {1,3,4} is reached for every input.
Now we will present a procedure that constructs for a given DFA the DFA for the same language with a minimal number of states.
Algorithm: MinDFA Input: DFA M = ( , Q, , q o ,F)
Output: DFA M' = ( , Q, , q o ,F) M' is minimal
Method: Step 1: Remove unreachable states. All states that are unreachable from the initial state via any set of transitions of the DFA M are removed.
Step 2: Set up a marking table. 1. Set up a table of all pairs of states {q, q'} with q q' of M. 2. Mark all pairs {q, q'} with q F and q' F (or vice versa). 3. For each unmarked pair {q, q'} and each a , test, if { (q,a), (q',a)} is already marked. If so: mark {q, q'} as well. 4. Repeat the last step until it produces no changes in the table any more.
Step 3: Unite states. Unite all unmarked pairs of states {q, q'} after the following rules: (1) if (p, a, q') with p Q, a then: = (p, a, q) (2) if (q', a, p) with p Q, a then: = (q, a, p) (3) remove q' from Q (4) remove (p, a, q') and (q', a, p) from for all p Q, a