Академический Документы
Профессиональный Документы
Культура Документы
Used in software for verifying all kinds of systems with a nite number of states, such as communication protocols
Automata
Used in software for scanning text, to nd certain patterns Used in Lexical analyzers of compilers (to turn program text into tokens, e.g. identiers, keywords, brackets, punctuation) Part of Turing machines and abacus machines
. p.26/39 . p.30/39
tra
sh
ip
ns fer
re de
pa
Customer
em
Bank
cancel
Communication protocol
Start pay ship c a b
redeem
ship
transfer ship
Store
redeem
transfer
transfer
Customer
Bank
. p.32/39
Finite automata
We shall study nite automata rst, because they can be seen as a rst step towards Turing machines and abacus machines.
1
. p.29/39 . p.33/39
Product automaton
Start
a
p c
b
p s c p p s r
c
p
d
p s
e
p
f
p
s
g
p
c s p r p
c
s
p,c s r r
p,c p,c
p,c
p,c
p,c
s
r r s p,c p,c p,c p,c t t
s
4
c
p p,c
p,c
. p.37/39
b
p s
c
p
redeem
ship
transfer ship
Store
redeem
transfer
pay, cancel 2
p,c
p,c
3
cancel Start Start 1 redeem 3
s
t t
s
transfer
Customer
Bank
. p.35/39
4
p,c
p,c
. p.38/39
redeem
Store
redeem
pay, cancel
transfer
Yes! If Customer has indicated to pay, but sent a cancellation message to the Bank, we are in state (b,2). If Store ships then, we make a transition into (c,2), and the Store will never receive a money transfer! So store should never ship before redeeming.
. p.35/39 . p.39/39
Customer
Bank
- (StoreState , BankState )
whenever Store has a transition action - StoreState and Bank has a StoreState transition BankState action BankState .
. p.36/39
2
. p.8/37
Start 0
q0 0 q2
1 1 0 1 1
q1 0 q3
. p.10/37
. p.14/37
The transition graph we used before is an informal presentation of the transition function if (q, a) = q . Deterministic means that for every state q and input symbol a, there is a unique (i.e. exactly one) following state, (q, a). Later, we shall also see non-deterministic nite automata (NFAs), where (q, a) can have any number of following states. FAs are also called nite state machines.
. p.11/37
This is the transition table for the transition graph given above.
. p.15/37
3
. p.12/37
. p.21/37
q0
q1
q2
This DFA accepts 1010, but not 1110 It accepts those strings that have an even number of 0s and an even number of 1s. Therefore, we call this DFA parity checker.
. p.18/37
. p.22/37
We dene the extended transition function . It takes a state q and an input string w to the resulting state. The denition proceeds by induction over the length of w. Induction basis (w has length 0): in this case, w is the empty string, i.e. the string of length 0, for which we write . We dene (q, ) = q.
This is the transition table for the same automaton. Let us call this machine A. What language does this automaton accept?
. p.19/37
. p.23/37
q0
q1
q2
This DFA accepts 0011, but not 1110 for instance. Try some other strings.
4
. p.20/37
To nd out what language L(A) is accepted by A you need to work out what strings A will accept. Try and describe L(A) in English.
. p.24/37
Exercises
Give DFAs accepting the following languages over the alphabet {0, 1}. (Note that you can choose between giving a transition table, a transition graph, or a formal presentation of Q, , q0 , , and F .) 1. The set of all strings ending in 00. 2. The set of all strings with two consecutive 0s (not necessarily at the end). 3. The set of strings with 011 as a substring.
. p.25/37
Non-deterministic FA (NFA)
An NFA is like a DFA, except that it can be in several states at once. This can be seen as the ability to guess something about the input. Useful for searching texts.
. p.29/37
Exercise
For the alphabet {a, b, c}, give a DFA accepting all strings that have abc as a substring.
NFA: example
An NFA accepting all strings that end in 01:
0,1 Start q0 0 q1 1 q2
. p.26/37
. p.30/37
Exercises
(More advanced; do not worry if you need tutors help to solve this.) Give DFAs accepting the following languages over the alphabet {0, 1}. 1. The set of all strings such that each block of ve consecutive symbols contains at least two 0s. 2. The set of all strings whose tenth symbol from the right is a 1. 3. The set of strings such that the number of 0 is divisible by ve, and the number of 1s is divisible by three.
. p.27/37
Suppose the input string is 100101. The NFA starts in state q0 , as indicated by the token.
. p.31/37
Exercise
Consider the DFA with the following transition table: 0 1 A A B B B A (1) Informally describe the language accepted by this DFA; (2) prove by induction on the length of an input string that your description is correct. (Dont worry if you need tutors help for (2).)
. p.28/37
The remaining input string is 100101. The NFA reads the rst symbol, 1. It remains in state q0 .
5
. p.32/37
The remaining input string is 00101. The NFA reads the next symbol, 0. The resulting possible states are q0 or q1 .
The remaining input string is 1. The NFA reads the next symbol, 1. The possible states are q0 and q2 . Because q2 is nal, the NFA accepts the word, 100101.
. p.37/37
. p.33/37
The remaining input string is 0101. The NFA reads the next symbol, 0. The resulting possible states are still q0 or q1 .
. p.34/37
. p.1/39
The remaining input string is 101. The NFA reads the next symbol, 1. The resulting possible states are q0 and q2 . (Because q2 is a nal states, this means that the word so far, 1001, would be accepted.)
. p.35/37
q2
The remaining input string is 01. The NFA reads the next symbol, 0. There is no transition for 0 from q2 , so the token on q2 dies. The resulting possible states are q0 or q1 .
. p.36/37
01 is accepted. Strings of length 3 that are accepted: 101, 011, 010, 001. Strings not accepted: 111, 110, 100, 000 Clearly 01 is going to be a substring of any string accepted. Is anything else required?
. p.3/39
Non-deterministic FA (NFA)
An NFA is like a DFA, except that it can be in several states at once. This can be seen as the ability to guess something about the input. Useful for searching texts.
. p.4/39
. p.8/39
NFA: example
An NFA accepting all strings that end in 01:
0,1 Start q0 0 q1 1 q2
. p.5/39
. p.9/39
Last time we saw how to use the NFA. We saw that when in state q0 given input 0 the resulting states are q0 or q1 :
0,1 Start q0 0 q1 1 q2
It accepts the language of all strings of 0 and 1 that contain the substring 01. To prove that this is the language accepted by the machine we would need to make an inductive argument based on the length of the string w.
. p.6/39
We also saw that when in state q2 any input with cause the token to die and that the same occurs when we have input 0 to state q1 .
. p.10/39
a nite set of input symbols, a transition function : Q P (Q), a start state q0 Q, and a set F Q of nal or accepting states.
. p.11/39
. p.12/39
. p.16/39
. p.13/39
. p.17/39
Before reading any symbols, the set of possible states is (q0 , ) = {q0 }.
. p.14/39
. p.18/39
8
. p.15/39 . p.19/39
We have (q0 , 100101) = {q0 , q2 }. Because {q0 , q2 } F = {q0 , q2 } {q2 } = {q2 } = , the NFA accepts.
. p.20/39
. p.24/39
Formal denition of
Denition. The extended transition function : Q P (Q) of an NFA is dened inductively as follows: Induction basis (length 0): (q, ) = {q} Induction step (from length l to length l + 1): (q, va) =
q (q,v)
. p.21/39
where. . .
(q , a).
. p.25/39
. p.22/39
. p.26/39
Exercise
Give NFA to accept the following languages. 1. The set of strings over an alphabet {0, 1, . . . , 9} such that the nal digit has appeared before. 2. The set of strings over an alphabet {0, 1, . . . , 9} such that the nal digit has not appeared before. 3. The set of strings of 0s and 1s such that there are two 0s separated by a number of positions that is a multiple of 4.
. p.23/39
N (q , a)
That is, D (S, a) is the set of all states of N that are reachable from some state q via a.
. p.27/39
DFA 0 1 {q0 } {q0 , q1 } {q0 } {q1 } {q2 } {q2 } {q0 , q1 } {q0 , q1 } {q0 , q2 } {q0 , q2 } {q0 , q1 } {q0 } {q1 , q2 } {q2 } {q0 , q1 , q2 } {q0 , q1 } {q0 , q2 }
(1)
. p.28/39
{q 2}
= N (q0 , va)
. p.29/39
Finally, we use Equation (1), which we just proved, to prove that the languages of D and N are equal: w L(D) D ({q0 }, w) FD N (q0 , w) FD w L(N ) (by defn. of L(D)) (by Equation (1)) (by defn. of L(N )).
. p.30/39
. p.34/39
10
. p.31/39
. p.35/39
Warning
Let N be an NFA, and let D be the DFA that arises from the powerset construction. As we have seen, we have QD = P (QN ). So, if QN has size k, then the size of QD is 2k . This exponential growth of the number of states makes the powerset construction unusable in practice. It can be shown that removing unreachable states does not prevent this exponential growth.
. p.36/39 . p.1/29
Exercise
Convert the following NFA to a DFA: 0 1 p {p, q} {p} q {r} {r} r {s} {} s {s} {s}.
. p.37/39
. p.2/29
Exercise
Convert the following NFA to a DFA: 0 1 p {q, s} {q} q {r} {q, r} r {s} {p} s {} {p}.
Formal denition of
Denition. The extended transition function : Q P (Q) of an NFA is dened inductively as follows: Induction basis (length 0): (q, ) = {q} Induction step (from length l to length l + 1): (q, va) =
q (q,v)
. p.38/39 . p.3/29
(q , a).
Exercise
Convert the following NFA to a DFA: 0 1 p {p, q} {p} q {r, s} {t} r {p, r} {t} s {} {} t {} {} Describe informally the language accepted by this NFA accept? (Dont worry if you need tutors help for this.)
11
. p.4/29
. p.39/39
Resulting DFA:
0 1 {q }
0
0 1
{q ,q }
0 1
{} 0 1
{q }
1
where. . .
We cannot reach the states and {q1 } from the start state so we can remove that part of the machine. You should be able to see that this resulting machine accepts the same language and how it is forced to work in a slightly more subtle way.
. p.5/29 . p.9/29
Summary of proof
Firstly we assume that for every string w we have D ({q0 }, w) = N (q0 , w). We use this to show that the languages accepted by N and D are the same that is, L(N ) = L(D). We then need to show that (1) is true and we do this by induction on the length of the input string.
. p.7/29 . p.11/29
(1)
N (q , a)
That is, D (S, a) is the set of all states of N that are reachable from some state q via a.
12
. p.12/29
Concatenation
The concatenation L L (or just LL ) of languages L and L is dened to be the set of strings ww where w L and w L .
Regular expressions
For example, if L = {001, 10, 111} and L = {, 001}, then LL = {001, 10, 111, 001001, 10001, 111001}.
. p.13/29
. p.17/29
Self-concatenation
For a language L, we write Ln for L L L n times That is, Ln is the language that consists of strings w1 w2 . . . wn , where each wi is in L. For example, if L = {, 001}, then L3 = {, 001, 001001, 001001001}. Note that L1 = L. The language L0 is dened to be {}.
. p.18/29
. p.14/29
First example
The regular expression 01 + 10 denotes the language consisting of all strings that are either a single 0 followed by any number of 1s, or a single 1 followed by any number of 0s.
Closure (Kleene-star)
The closure (or star or Kleene closure) L of a language L is dened to be L =
n0
Ln
That is, L is the language that consists of strings w1 w2 . . . wk , where k is any non-negative integer and each wi is in L. E.g. if L = {0, 11}, then L consists of all strings such that the 1s come in pairs, e.g. 011, 11110, and , but not 01011 or 101.
. p.15/29
. p.19/29
Operations on languages
Before describing the regular-expression notation, we need to dene the operations on languages that the operators of regular expressions represent.
13
. p.16/29
Exercise
For any alphabet , which are the subsets S of such that the set S is nite?
. p.25/29
Example
Suppose you want to search some messy text le for the street parts of addresses, e.g. Milsom Street or Wells Road. Let [A Z] stand for A + B + + Z. Let [a z] stand for a + b + + z. You may want to use a regular expression like [AZ][az] (Street+St.+Road+Rd.+Lane) Expressions like this are accepted e.g. by the UNIX command grep, the EMACS text editor, and various other tools.
. p.22/29
Exercises
Write regular expressions for the following languages: 1. The set of strings over alphabet {a, b, c} with at least one a and at least one b. 2. The set of strings of 0s and 1s whose tenth symbol from the right end is 1. 3. The set of strings of 0s and 1s with at most on pair of consecutive 1s.
. p.23/29
. p.27/29
Exercises
Write regular expressions for the following languages: 1. The set of all strings of 0s and 1s such that every pair of adjacent 0s appears before any pair of adjacent 1s. 2. The set of strings of 0s and 1s whose number of 0s is divisible by ve.
14
. p.24/29 . p.28/29
. p.29/29
15
. p.4/30
1. The set of strings over alphabet {a, b, c} with at least one a and at least one b. 2. The set of strings of 0s and 1s whose tenth symbol from the right end is 1. 3. The set of strings of 0s and 1s with at most on pair of consecutive 1s.
. p.1/30
. p.5/30
Operations on languages
For languages L and L The Concatenation L L (or just LL ) is dened to be the set of strings ww where w L and w L . We write Ln for L L L (n times). Ln is the language that consists of strings w1 w2 . . . wn , where each wi is in L. The closure (or star or Kleene closure) L is dened to be L = n0 Ln . L is the language that consists of strings w1 w2 . . . wk , where k Z, k 0 and each wi is in L.
16
. p.6/30
. p.2/30
. p.7/30
. p.11/30
start
NE
accept
. p.12/30
NE for E = a
start a accept
. p.9/30
. p.13/30
NE for E =
start accept
17
. p.10/30 . p.14/30
NE+E
start E start E+E' accept E' NE accept E
accept E+E'
startE'
N E'
. p.15/30
. p.19/30
NEE
start E =start EE' NE accept E
Exercises 6.1
Convert each of the following regular expressions to an -NFA: 1. 01 2. (0 + 1)01 3. 00(0 + 1)
startE'
N E'
. p.16/30
NE
. p.20/30
start E*
start E
NE
accept E
accept E*
To show this, it sufces to show that for every -NFA there is a DFA that accepts the same language. To that end, we shall use a modied powerset construction.
. p.17/30
. p.21/30
NE for E =
start accept
There is no way to get from the start state to the accepting state.
18
. p.18/30 . p.22/30
Exercise 6.2
Consider the following -NFA. a b c p {p} {q} {r} q {p} {q} {r} r {q} {r} {p} 1. Compute the -closure of each state. 2. Give all strings of length three or less accepted by this automaton. 3. Convert the automaton to a DFA.
. p.23/30 . p.27/30
Exercise 6.3
Repeat the previous exercise for the following -NFA. a b c p {q, r} {q} {r} q {p} {r} {p, q} r
The nal states of D are those sets of the form cl(S) that contain a nal state of N : FD = {cl(S) | S FN = }
. p.24/30 . p.28/30
Exercise 6.4
In an earlier exercise, we converted the regular expressions 01 , (0 + 1)01, and 00(0 + 1) into NFAs. Convert each of those -NFAs into a DFA.
That is, D (S, a) is the set of all states of N that are reachable from some state q S via a, followed by any number of -transitions.
. p.25/30
. p.29/30
Formal languages
Next time we will start on formal languages. A formal language (or simpy language) is a set L of strings over some nite alphabet . That is, a subset L . Finite automata and regular expressions describe certain formal languages but many important languages are not regular e.g. programming languages.
19
. p.26/30
To describe formal languages we will use formal grammars which we will introduce next lecture.
. p.30/30
Formal languages
. p.2/30
. p.4/30
20
. p.5/30
Example
The grammar G with non-terminal symbols N = {S, B}, terminal symbols = {a, b, c}, and productions S abc S aSBc cB Bc bB bb Following a common practice, we use capital letters for non-terminal symbols and small letters for terminal symbols.
Motivation
It turns out that many important languages, e.g. programming languages, cannot be described by nite automata/regular expressions. To describe such languages, we shall introduce context-free grammars (CFGs).
. p.7/30
. p.11/30
Example
The grammar G with non-terminal symbols N = {S}, terminal symbols = {a, b}, and productions S aSb S ab. Following a common practice, we use capital letters for non-terminal symbols and small letters for terminal symbols.
. p.12/30
Denition. The language of a formal grammar G = (N, , P, S), denoted as L(G), is dened as all those strings over that can be generated by starting with the start symbol S and then applying the production rules in P until no more nonterminal symbols are present.
. p.8/30
21
Parse trees
Every string w in a context-free language has a parse tree. The root of a parse tree is the start symbol S. The leaves of a parse tree are the terminal symbols that make up the string w. The branches of the parse tree describe how the productions are applied.
Exercise
Consider the grammar S A1B A 0A | B 0B | 1B | . Give parse trees for the strings 1. 00101 2. 1001 3. 00011
. p.19/30
C0 C1 . . . C9
C C0 C C1 . . . C C9
Compact notation
More compact notation for productions: E I | C | E + E | E E | (E) C 0 | C0 | . . . | 9 | C9 I x|y
Exercise
For the CFG G dened by the productions S aS | Sb | a | b, prove by induction on the size of the parse tree that the no string in language L(G) has ba as a substring.
. p.16/30
. p.20/30
Exercises
Give a context-free grammar for the language of all palindromes over the alphabet {a, b, c}. (A palindrome is a word that is equal to the reversed version of itself, e.g. abba, bab.)
22
. p.17/30 . p.21/30
Parsing
Finding a parse tree for a string is called parsing. Software tools that do this are called parsers. A compiler must parse every program before producing the executable code. There are tools called parser generators that turn CFGs into parsers. One of them is the famous YACC (Yet Another Compiler Compiler). Parsing is a science unto itself, and described in detail in courses about compilers.
. p.22/30 . p.26/30
Regular languages
Ambiguity
A context-free grammar with more than one parse tree for some expression is called ambiguous. Ambiguity is dangerous, because it can affect the meaning of expressions; e.g. in the expression b a + b, it matters whether has precedence over + or vice versa. To address this problem, parser generators (like YACC) allow the language designer to specify the operator precedence.
. p.23/30
Regular grammars
Next, we shall see certain CFGs called regular grammars and show that they dene the same languages as nite automata and regular expressions. These are the regular languages and are type 3 on the Chomsky hierarchy. They form a subset of the context free languages, which are type 2, dened by the context free grammars we have been talking about.
Exercise
Consider the grammar S aS | aSbS | . Show that this grammar is ambiguous.
. p.27/30
. p.24/30
. p.28/30
Exercise
Consider the grammar S A1B A 0A | B 0B | 1B | . Show that this grammar is unambiguous.
Example
The grammar G with N = {A, B, C, D}, = {0, 1}, S = A, and the productions below is regular. A 0C | 1B | B 0D | 1A C 0A | 1D D 0B | 1C
23
. p.25/30 . p.29/30
Non-example
The grammar below is not regular because of the b on the right of S. S aSb S .
Regular grammars
Next, we shall see certain CFGs called regular grammars and show that they dene the same languages as nite automata and regular expressions.
These are the regular languages and are type 3 on the Chomsky hierarchy. They form a subset of the context free languages, which are type 2, dened by the context free grammars we have been talking about.
. p.1/27
. p.5/27
a nite set of terminal symbols not in N ; a nite set P of production rules of the form uv where u and v are strings in ( N ) and u contains at least one non-terminal symbol; a start symbol S in N .
. p.2/27
. p.6/27
Example
The grammar G with N = {S, A}, = {0, 1}, S the start state and the productions below is regular. S 0S S 1B B
24
What language does this grammar describe? Can you nd a DFA and a regular expression that dene the same language?
. p.7/27
Non-example
The grammar below is not regular because of the b on the right of S. S aSb S . This is a context free grammar that describes the language {an bn |n 1}. You cannot write a regular grammar that describes this language. Why not?
8.1 Exercise
Turn some of the NFAs in this lecture course into regular grammars.
. p.8/27
. p.12/27
Parity Checker
1
. p.13/27
A
1 0 0 1 0
Next, we replace by = and | by +, to make the grammar look like an equation system: X = aY + bZ +
0
Y = cX + dZ Z = eZ + f Y
C
1
The trick is now to get a solution for the start symbol X that does not rely on Y and Z.
. p.14/27
Parity checker
The grammar G with N = {A, B, C, D}, = {0, 1}, S = A, and the productions below is regular. A 0C | 1B | B 0D | 1A C 0A | 1D D 0B | 1C
We shall reduce the three equations above to two equations as follows: 1. Find a solution for Z in terms of only X and Y .
25
. p.11/27
2. Replace that solution for Z in equations (1) and (2). (That yields equations for X and Y that dont rely on Z anymore.)
. p.15/27
(1 ) (2 ).
that doesnt rely on Z anymore? The idea is Z can produce an e and become Z again for a number of times, but nally Z must become f Y . Formally, the solution of Equation (3) is Z = e f Y.
. p.16/27
Next, we repeat our game of eliminating non-terminals and nd a solution for Y : Y = (de f ) cX.
. p.20/27
. p.17/27
. p.21/27
8.2 Exercise
Consider the NFA given by the transition table below. 0 1 X {X} {X, Y } Y {Y } {Y } 1. Give a regular grammar for it. 2. Calculate a regular expression that describes the language.
. p.18/27 . p.22/27
(1) (2).
(1 ) (2 ).
8.3 Exercise
Repeat the exercise for the NFA below. 0 1 X {X} {Y } Y {Y } {Z} Z {Z} {X} Also, describe the accepted language in English.
Now instead of three equations over X, Y, Z, we have only two equations over X, Y : X = (a + be f )Y + Y = cX + de f Y
(1 ) (2 ).
26
. p.19/27 . p.23/27
. p.24/27
A non-regular language
Proposition. The language L = {an bn | n 1} (for which we gave a CFG earlier) is not regular.
Proof. By contradiction: Assume that L is regular. Then there is a DFA A that accepts L. Let n be the number of states of A. We read in the string an . We make n state transitions so must visit n + 1 states hence we must visit one state twice, lets call that state q.
A non-regular language
. p.25/27
That means there are two substrings of an , say am and ak with m < k such that q = (q0 , am ) = (q0 , ak ). Now, the machine accepts am bm so the machine reads in am and is in state q and starts from q, reads in bm and then accepts the string. If we input ak bm the machine reads in ak and is in state q. We know that from state q with input bm the machine ends up in an accepting state so the machine accepts ak bm which implies ak bm L, k < m. Contradiction! The machine cant remember whether the input was am or ak .
. p.26/27
27
28
. p.1/31
29
To dene each type we need only restrict the type of production rules permitted.
. p.2/31
Pushdown automata
Informally a pushdown automata (PDA) is an -NFA with one additional capability. We introduce a stack on which we can store a string of symbols. This means a pushdown automata can remember an arbitary amount of information. However we can only access the information on the stack in a last-in, rst-out way.
. p.7/31
The stack
The PDA works like an FA does except for the stack. The PDA can write symbols on to the top of the stack, pushing down all the other symbols on the stack. The PDA can read and remove a symbol from the top of the stack and the symbols below move back up. Writing a symbol is often referred to as pushing; reading and removing a symbol is referred to as popping.
. p.8/31
Example
The grammar G with non-terminal symbols N = {S}, terminal symbols = {a, b}, and productions S aSb S ab. This grammar describes the language L = {an bn |n 1}. We cannot build an FA that accepts this language. What type of machines accept context free languages?
. p.5/31
Pushdown automata
The PDA is non deterministic so the transition function produces a set of pairs. Each pair is a new state q and a string of stack symbols .
30
. p.6/31
If a = the PDA changes state without reading an input symbol. If s = the PDA changes state without reading a stack symbol.
. p.10/31
Exercise 9.1
q
, > $
0, > 0
1, > 1
, > 0, 0 >
, $ >
1, 1>
1. Write out the transition table for the above PDA. 2. What is the stack used to store?
. p.11/31
. p.15/31
Exercise 9.2
, >x$
q0
, > $
q1
, x >
q3
, $ >
q2
The language accepted by the machine is L = {an bn |n 1}. How does it remember the number of as?
1. What strings of length 1, 2, 3 or 4 does the machine accept? 2. Let i stand for if and e stand for else. What does the PDA do?
. p.16/31
. p.12/31
{(q1 , $)}
{(q2 , )} {(q3 , )}
. p.13/31
. p.17/31
31
. p.14/31
Example
The grammar G with non-terminal symbols N = {S, B}, terminal symbols = {a, b, c}, and productions S abc S aSBc cB Bc bB bb This grammar describes the language L = {an bn cn |n 1}. We cannot build a PDA to accept this language (see Hopcroft/Motwani/Ullman).
. p.19/31
32