Вы находитесь на странице: 1из 41

CS138, Wim van Dam, UCSB

Automata and
Formal Languages

CS138, Winter 2006
Wim van Dam
Room 5109, Engr. I
vandam@cs.ucsb.edu
http://www.cs.ucsb.edu/~vandam/
CS138, Wim van Dam, UCSB
Formalities
New homework has been announced and is due
Monday January 30, 11:30 in CS 138 homework box.

Questions?

CS138, Wim van Dam, UCSB
Transitions of (N)FA
For deterministic Finite Automata, each input has a unique
path of Q-states that the automaton goes through.
For nondeterministic Finite Automata, each input has a set
of paths of Q-states that the automaton could go through.
For a FA, the function :QQ is defined for all xe
For a NFA, the function :QH(Q) might have (q,x)=C
Hence


1 0
0,1
0 1
Deterministic
1 0
0
0 1
Nondeterministic
CS138, Wim van Dam, UCSB
FA = NFA
Theorem 1.39: For every language L that is accepted by a
nondeterministic finite automaton, there is a (deterministic)
finite automaton that accepts L as well.
FA and NFA are equivalent computational models.
Proof idea: When keeping track of a nondeterministic
computation of an NFA N we use many fingers to point
at the subset _ Q of states of N that can be reached on a
given input string.
We can simulate this computation with a deterministic
automaton M with state space H(Q).
CS138, Wim van Dam, UCSB
NA=NFA Proof
More formal proof of Theorem 1.39: Let A be the language
recognized by the NFA N = (Q,,,q
0
,F). Define the
deterministic finite automaton M = (Q,,,q
0
,F) by
1. Q = H(Q)
2. (R,a) = { qeQ | qe(r,a) for an reR }
3. q
0
= {q
0
}
4. F = {ReQ | R contains an accept state of N}
This works almost, except for the -arrows: Define
- E(R) = { q | q reachable from R using * steps }
- (R,a) = { qeQ | qeE((r,a)) for an reR }
- q
0
= E({q
0
})
CS138, Wim van Dam, UCSB
NA=NFA Proof (cont.)
It is easy to see that the previously described deterministic
finite automaton M accepts the same language as N.
See Example 1.41 for the construction in action.

Because FA are a subset of NFA, we have proven:
Corollary 1.40: A language is regular if and only if it is
accepted by a nondeterministic finite automaton.
CS138, Wim van Dam, UCSB
Closure under Regular Operations
With NFA it is much simpler to prove the various closure
properties of the regular languages.
We will show the closure of regular languages under the
union, concatenation and star * operation by construction.
Theorem 1.45: RLs are closed under union operation.
Given NFA N
1
and N
2
that accept L
1
and L
2
, make a
NFA N (using N
1
and N
2
) that accepts L
1
L
2
.
Theorem 1.47: RLs are closed under concatenation.
make NFA N that accepts L
1
L
2
.
Theorem 1.49: RLs are closed under star * operation.
make NFA N that accepts (L
1
)*.
CS138, Wim van Dam, UCSB
Union Closure
Construction of Theorem 1.45: Given two NFAs N
1
and N
2
,
put them in parallel to recognize the language L(N
1
)L(N
2
):
N
1
N
2


CS138, Wim van Dam, UCSB
Concatenation Closure
Theorem 1.47: Given two NFAs N
1
and N
2
, put them
sequential to recognize the concatentation L(N
1
)L(N
2
):
N
1
N
2


CS138, Wim van Dam, UCSB
Star Operation Closure
Construction of Theorem 1.49: Given a NFA N
1
, make a
loop to recognize the language L(N
1
)*:
N
1



CS138, Wim van Dam, UCSB
Question Time
What about complements?
CS138, Wim van Dam, UCSB
Regular Expressions (Def. 1.52)
Given an alphabet , R is a regular expression if:

1. R = a, with ae
2. R =
3. R = C
4. R = (R
1
R
2
), with R
1
and R
2
regular expressions
5. R = (R
1
-R
2
), with R
1
and R
2
regular expressions
6. R = (R
1
*), with R
1
a regular expression
CS138, Wim van Dam, UCSB
Reading Regular Expressions
Assume for the moment that ={a,b,c}.
- We allow ourselves to write instead of ((ab)c).
So, ((*)b) stands for the set of strings ending with a b.
- R
+
is a shorthand for RR*.
- Just as with multiplication, you can drop the
concatenation symbol : ab equals ab
- Just as with arithmetic you can drop some parentheses.
We have the precedence order: star, concatenation, union.
Hence: aa* equals a(a*), which does not equal (aa)*.
Also, 0110 equals (01)(10) does not equal 0(11)0.
And 01* equals 0(1*) and not (01)*.
CS138, Wim van Dam, UCSB
Last Monday
We proved that each Nondeterministic Finite Automaton
can be transformed into a deterministic one: NFA=FA
We proved that NFA recognized languages are closed
under union, concatenation and star operation *.
Hence the set of Regular Languages is closed under
these regular operations. [Reader, pp. 2938]

Another way of expressing simple languages is done by
Regular Expressions (Def. 1.52) like a*(bc) for strings
that have to start with an a and end with a b or a c.
CS138, Wim van Dam, UCSB
Languages and RE
A RE R describes a language L(R) in the obvious way:
1. If R = a, then L(R) = {a}
2. If R = , then L(R) = {}
3. If R = C, then L(R) = {}
4. If R = (R
1
R
2
), then L(R) = L(R
1
)L(R
2
)
5. If R = (R
1
-R
2
), then L(R) = L(R
1
)-L(R
2
)
6. If R = (R
1
*), then L(R) = (L(R
1
))*
Note that formally there is a difference between the
expression R and the language L(R) that it describes:
0*1* and (01)* are different expressions,
but they describe the same language.
CS138, Wim van Dam, UCSB
Some Examples
Bit string with at least two 1s: {0,1}* 1 {0,1}* 1 {0,1}*
Bit string with at most two 1s: 0* 0*10* 0*10*10*
Alternatively: 0*(1)0*(1)0*
aC = C
a = a (note the difference between C and )
()* (strings of even length)
()*()* (strings of even length or multiple of 3)
()* (strings of length 0 or of length greater than 1)


CS138, Wim van Dam, UCSB
Applications of REs
Regular expressions are commonly when analyzing or
editing text strings. Two common examples:
The grep command in UNIX and LINUX
Use man grap to see how it works
String processing in PERL
See Wikipedias Perl regular expression examples

There is a good reason why we use regular expressions
for this kind of pattern matching that we want to do fast
CS138, Wim van Dam, UCSB
Thm 1.54: RL = RE
As the names suggest, the following result holds:
Theorem 1.54: A language is regular if and only if
some regular expression describes it.

Lemma 1.55: If a language is described by a regular
expression, then it is regular. This is relatively easy,
using the closure properties of RLs that we proved.

Lemma 1.60: If a language is regular, then it can be
described by a regular expression. This is harder to
prove and requires the definition of Generalized
Nondeterministic Finite Automata (GNFA).
CS138, Wim van Dam, UCSB
Proof of Lemma 1.55
Given a regular expression R, construct (by structural
induction on R) a NFA N such that L(R) = L(N):
1. If R = a with L(R) = {a}, then

2. If R = with L(R) = {}, then

3. If R = C with L(R) = {}, then

4. If R = (R
1
R
2
), then L(R) = L(R
1
)L(R
2
)
5. If R = (R
1
-R
2
), then L(R) = L(R
1
)-L(R
2
)
6. If R = (R
1
*), then L(R) = (L(R
1
))*
a

CS138, Wim van Dam, UCSB
Proof Lemma 1.60
If a language is regular, then it is described by a RE.
Proof Idea: Use generalized nondeterministic finite
automata where the labels on the transition arrows are
allowed to be REs (elements of P).
For an ReP this GNFA recognizes L(R):

We have to prove that for each regular language L,
the corresponding FA M can be transformed into
a GNFA with only two states like the one above.
We do this by removing the internal states of a GNFA
one-by-one until we are left with a GNFA that has only
one start state, one accept state and no loops.
R
CS138, Wim van Dam, UCSB
Example GNFA
q
S
q
A
01*

0

0* 11

0110

C


CS138, Wim van Dam, UCSB
Example GNFA
q
S
q
A
01*

0

0* 11

0110

C


R
CS138, Wim van Dam, UCSB
Generalized NFA
Def. 1.64: A Generalized nondeterministic finite automaton
(GNFA) is defined by M=(Q, , , q
start
, q
accept
) with
Q finite set of states
the input alphabet
q
start
the start state
q
accept
the accept state

:(Q\{q
accept
})(Q\{q
start
}) P the transition function

(P is the set of regular expressions over )
CS138, Wim van Dam, UCSB
Characteristics of GNFAs o
:(Q\{q
accept
})(Q\{q
start
}) P
The interior Q\{q
accept
,q
start
} is fully connected by
From q
start
we have only outgoing transitions
To q
accept
we have only ingoing transitions
Impossible q
i
q
j
transitions are (q
i
,q
j
) = C
q
S
q
A
ReP
Observation: This GNFA:
recognizes the language L(R)
CS138, Wim van Dam, UCSB
Proof Idea of Lemma 1.60
Proof idea (given a DFA M):

Construct an equivalent GNFA M with k>2 states

Reduce one-by-one the internal states until k=2

This GNFA M will be of the form

This regular expression R
will be such that L(R) = L(M)
q
S
q
A
R
CS138, Wim van Dam, UCSB
DFA M Equivalent GNFA M
Let M have k states Q={q
1
,,q
k
}
- Add two states q
accept
and q
start

q
S
q
1


- Connect q
start
to earlier q
1
:
q
i
C

q
j
- Complete missing transitions by
q
A
q
j


- Connect old accepting states to q
accept

- Join multiple transitions:
q
i
0

q
j
1

becomes
q
i
01

q
j
CS138, Wim van Dam, UCSB
Remove Internal state of GNFA
If the GNFA M has more than 2 states, rip
internal q
rip
to get equivalent GNFA M by:
- Removing state q
rip
: Q=Q\{q
rip
}
- Changing the transition function by

(q
i
,q
j
) = (q
i
,q
j
) ((q
i
,q
rip
)((q
rip
,q
rip
))*(q
rip
,q
j
))

for every q
i
eQ\{q
accept
} and q
j
eQ\{q
start
}
q
i
R
4
(R
1
R
2
*R
3
)
q
j
q
i
R
2
q
j
R
4
q
rip
R
1

R
3
=
CS138, Wim van Dam, UCSB
Recap Proof Lemma 1.60
1. Let M be DFA with k states.
2. Create equivalent GNFA M with k+2 states
3. Reduce in k steps M to M with 2 states
4. The resulting GNFA describes a RE R
5. The language L(M) equals L(R)
Ingredients Theorem 1.54 RL=RE:
Lemma 1.55: Let R be a regular expression,
then there exists an NFA M such that L(R) = L(M).
Lemma 1.60: The language L(M) of a DFA M is equivalent
to a language L(M) of a GNFA = M, which equals a
two-state M q
start
R q
accept
such that L(R) = L(M)
Hence: RE _ NFA = DFA _ GNFA _ RE
CS138, Wim van Dam, UCSB
Usefulness of RE
The fact that regular expressions can be recognized
(matched) by deterministic FA is very useful as this is fast.

Consider a RE like ((011)**00 ()*)* that you want
to search for in a very long binary file
There seem to be too many options to do this efficiently:
(where in the file? plus the nondeterministic operations ,*)
Solution: Create a deterministic FA that accepts the
corresponding language L and use it on the file?
Actually, create a FA that accepts *L* and use it.
Complexity ~ length(file) + 2
length(regular expression)
.

CS138, Wim van Dam, UCSB
Formalities
Next Friday: Midterm on Automata: The Methods and the
Madness and Regular Languages [pp. 156, Reader]

Note that this weeks material will be part of the Midterm,
although you will not have had it as homework.

Questions?
CS138, Wim van Dam, UCSB
1.4: Nonregular Languages
What languages can not be recognized by finite automata?
How to prove that a language is nonregular?

Example: L={ 0
n
1
n
| neN }
Because DFA = NFA = GNFA, it is sufficient to prove
that the language can not be accepted by a DFA.
Playing around with DFA convinces you that the
finiteness of DFA is problematic for all neN.
The problem occurs between the 0
n
and the 1
n
.

Informal observation: the memory of a FA is limited
by the the number of states |Q|.
CS138, Wim van Dam, UCSB
Repeating DFA Paths
q
1
q
k
q
j
Consider an accepting DFA M with size |Q|.
On a string of length p, p+1 states get visited.
For p>|Q|, there must be a j such that the deterministic
computational path looks like: q
1
,,q
j
,,q
j
,,q
k
.
CS138, Wim van Dam, UCSB
Repeating DFA Paths
q
1
q
k
q
j
The action of the DFA in q
j
is always the same.
If we repeat (or ignore) the q
j
,,q
j
part, the new
path will again be an accepting path:
CS138, Wim van Dam, UCSB
Line of Reasoning
If we want to prove that a language L is nonregular,
we can use the following proof by contradiction technique:
Assume that L is regular.
Hence, there is a DFA M that recognizes L.
For strings of length > |Q| the DFA M has to repeat itself.
Show that M will accept strings outside L.
Conclude that the assumption was wrong.
Note that we use the simple DFA, not the
more elaborate (but equivalent) NFA or GNFA.
CS138, Wim van Dam, UCSB
Thm 1.70: Pumping Lemma
For every regular language L, there is a finite
pumping length p, such that for any string seL
and |s|>p, we can write s=xyz with:

1) x y
i
z e L for every ie{0,1,2,}
2) |y| > 1
3) |xy| s p

Note that: (1) implies that xz e L, (2) says that y can
not be the empty string , (3) is not always used.
This is a lemma about regular languages
CS138, Wim van Dam, UCSB
Formal Proof of Pumping Lemma
Let M = (Q,,,q
1
,F) with Q = {q
1
,,q
p
}.
Let s = s
1
s
n
eL(M) with |s| = n > p.
The computational path of M on s is the sequence r
1
r
n+1
e Q
n+1
with r
1
= q
1
, r
n+1
eF and r
t+1
= (r
t
,s
t
) for 1stsn.
Because n+1 > p+1, there have to be two states r
j
and r
k
such that r
j
= q
i
= r
k
(with 1 j < k p+1).
Let x = s
1
s
j1
, y = s
j
s
k1
, and z = s
k
s
n+1
.
The string x takes M from q
1
=r
1
to r
j
, the string y takes M
from r
j
to r
j
, and the string z takes M from r
j
to r
n+1
eF.
As a result: xy
i
z takes M from q
1
to r
n+1
eF (for all i > 0).
CS138, Wim van Dam, UCSB
Formal Proof of Pumping Lemma
Let M = (Q,,,q
1
,F) with Q = {q
1
,,q
p
}.
Let s = s
1
s
n
eL(M) with |s| = n > p.
The computational path of M on s is the sequence r
1
r
n+1
e Q
n+1
with r
1
= q
1
, r
n+1
eF and r
t+1
= (r
t
,s
t
) for 1stsn.
Because n+1 > p+1, there have to be two states r
j
and r
k
such that r
j
= q
i
= r
k
(with 1 j < k p+1).
Let x = s
1
s
j1
, y = s
j
s
k1
, and z = s
k
s
n+1
.
The string x takes M from q
1
=r
1
to r
j
, the string y takes M
from r
j
to r
j
, and the string z takes M from r
j
to r
n+1
eF.
As a result: xy
i
z takes M from q
1
to r
n+1
eF (for all i > 0).
CS138, Wim van Dam, UCSB
Pumping 0
n
1
n
(Ex. 1.73)
Assume that B = {0
n
1
n
| n>0} is regular.
Let p be the pumping length, and s = 0
p
1
p
e B.
Pumping Lemma: s = xyz = 0
p
1
p
with xy
i
z e B for all i>0.
Three options for y:
1) y=0
k
, hence xyyz = 0
p+k
1
p
e B
2) y=1
k
, hence xyyz = 0
p
1
k+p
e B
3) y=0
k
1
l
, hence xyyz = 0
p
1
l
0
k
1
p
e B

Conclusion: The pumping lemma does not hold,
hence the language B is not regular.
CS138, Wim van Dam, UCSB
F = { ww | we{0,1}* } (Ex. 1.75)
Let p be the pumping length, and take s = 0
p
10
p
1.
Let s = xyz = 0
p
10
p
1 with condition 3) |xy|sp.
Only one option: y=0
k
, with xyyz = 0
p+k
10
p
1 e F.

Without 3) this would have been a pain.
CS138, Wim van Dam, UCSB
Intersecting Regular Languages
Example 1.74: Let C = { w | # of 0s in w = # of 1s in w}.
Problem: If xyzeC with yeC, then xy
i
zeC.
Subversive Idea: If C is regular and F is regular,
then the intersection CF has to be regular as well.
Proof by Contradiction: Assume that C is regular.
Take the regular language F = { 0
n
1
m
| n,meN} such that the
intersection CF = { 0
n
1
n
| neN } has to be regular as well.
But we know that CF is not regular.
Conclusion: C is not regular.
CS138, Wim van Dam, UCSB
Pumping Down E = { 0
i
1
j
| i>j }
Problem: pumping up s=0
p
1
p
with y=0
k
gives
xyyz = 0
p+k
1
p
, xy
3
z = 0
p+2k
1
p
, which are all in E
(hence do not give contradictions).
Solution: pump down to xz = 0
pk
1
p
.
Overall for s = xyz = 0
p
1
p
(with |xy|sp):
we have y=0
k
with k>0, hence xz = 0
pk
1
p
e E.

Contradiction: E is not regular.
End of Regular Languages

Вам также может понравиться