Automata 4

CS138, Wim van Dam, UCSB
Automata and
Formal Languages

CS138, Winter 2006
Wim van Dam
Room 5109, Engr. I
vandam@cs.ucsb.edu
http://www.cs.ucsb.edu/~vandam/
Formalities
New homework has been announced and is due
Monday January 30, 11:30 in CS 138 homework box.

Questions?

Transitions of (N)FA
For deterministic Finite Automata, each input has a unique
path of Q-states that the automaton goes through.
For nondeterministic Finite Automata, each input has a set
of paths of Q-states that the automaton could go through.
For a FA, the function :QQ is defined for all xe
For a NFA, the function :QH(Q) might have (q,x)=C
Hence

1 0
0,1
0 1
Deterministic
1 0
0
0 1
Nondeterministic
FA = NFA
Theorem 1.39: For every language L that is accepted by a
nondeterministic finite automaton, there is a (deterministic)
finite automaton that accepts L as well.
FA and NFA are equivalent computational models.
Proof idea: When keeping track of a nondeterministic
computation of an NFA N we use many fingers to point
at the subset _ Q of states of N that can be reached on a
given input string.
We can simulate this computation with a deterministic
automaton M with state space H(Q).
NA=NFA Proof
More formal proof of Theorem 1.39: Let A be the language
recognized by the NFA N = (Q,,,q
0
,F). Define the
deterministic finite automaton M = (Q,,,q
0
,F) by
1. Q = H(Q)
2. (R,a) = { qeQ | qe(r,a) for an reR }
3. q
0
= {q
0
}
4. F = {ReQ | R contains an accept state of N}
This works almost, except for the -arrows: Define
- E(R) = { q | q reachable from R using * steps }
- (R,a) = { qeQ | qeE((r,a)) for an reR }
- q
0
= E({q
0
})
NA=NFA Proof (cont.)
It is easy to see that the previously described deterministic
finite automaton M accepts the same language as N.
See Example 1.41 for the construction in action.

Because FA are a subset of NFA, we have proven:
Corollary 1.40: A language is regular if and only if it is
accepted by a nondeterministic finite automaton.
Closure under Regular Operations
With NFA it is much simpler to prove the various closure
properties of the regular languages.
We will show the closure of regular languages under the
union, concatenation and star * operation by construction.
Theorem 1.45: RLs are closed under union operation.
Given NFA N
1
and N
2
that accept L
1
and L
2
, make a
NFA N (using N
1
and N
2
) that accepts L
1
L
2
.
Theorem 1.47: RLs are closed under concatenation.
make NFA N that accepts L
1
L
2
.
Theorem 1.49: RLs are closed under star * operation.
make NFA N that accepts (L
1
)*.
Union Closure
Construction of Theorem 1.45: Given two NFAs N
1
and N
2
,
put them in parallel to recognize the language L(N
1
)L(N
2
):
N
1
N
2

Concatenation Closure
Theorem 1.47: Given two NFAs N
1
and N
2
, put them
sequential to recognize the concatentation L(N
1
)L(N
2
):
N
1
N
2

Star Operation Closure
Construction of Theorem 1.49: Given a NFA N
1
, make a
loop to recognize the language L(N
1
)*:
N
1

Question Time
What about complements?
Regular Expressions (Def. 1.52)
Given an alphabet , R is a regular expression if:

1. R = a, with ae
2. R =
3. R = C
4. R = (R
1
R
2
), with R
1
and R
2
regular expressions
5. R = (R
1
-R
2
), with R
1
and R
2
regular expressions
6. R = (R
1
*), with R
1
a regular expression
Reading Regular Expressions
Assume for the moment that ={a,b,c}.
- We allow ourselves to write instead of ((ab)c).
So, ((*)b) stands for the set of strings ending with a b.
- R
+
is a shorthand for RR*.
- Just as with multiplication, you can drop the
concatenation symbol : ab equals ab
- Just as with arithmetic you can drop some parentheses.
We have the precedence order: star, concatenation, union.
Hence: aa* equals a(a*), which does not equal (aa)*.
Also, 0110 equals (01)(10) does not equal 0(11)0.
And 01* equals 0(1*) and not (01)*.
Last Monday
We proved that each Nondeterministic Finite Automaton
can be transformed into a deterministic one: NFA=FA
We proved that NFA recognized languages are closed
under union, concatenation and star operation *.
Hence the set of Regular Languages is closed under
these regular operations. [Reader, pp. 2938]

Another way of expressing simple languages is done by
Regular Expressions (Def. 1.52) like a*(bc) for strings
that have to start with an a and end with a b or a c.
Languages and RE
A RE R describes a language L(R) in the obvious way:
1. If R = a, then L(R) = {a}
2. If R = , then L(R) = {}
3. If R = C, then L(R) = {}
4. If R = (R
1
R
2
), then L(R) = L(R
1
)L(R
2
)
5. If R = (R
1
-R
2
), then L(R) = L(R
1
)-L(R
2
)
6. If R = (R
1
*), then L(R) = (L(R
1
))*
Note that formally there is a difference between the
expression R and the language L(R) that it describes:
0*1* and (01)* are different expressions,
but they describe the same language.
Some Examples
Bit string with at least two 1s: {0,1}* 1 {0,1}* 1 {0,1}*
Bit string with at most two 1s: 0* 0*10* 0*10*10*
Alternatively: 0*(1)0*(1)0*
aC = C
a = a (note the difference between C and )
()* (strings of even length)
()*()* (strings of even length or multiple of 3)
()* (strings of length 0 or of length greater than 1)

Applications of REs
Regular expressions are commonly when analyzing or
editing text strings. Two common examples:
The grep command in UNIX and LINUX
Use man grap to see how it works
String processing in PERL
See Wikipedias Perl regular expression examples

There is a good reason why we use regular expressions
for this kind of pattern matching that we want to do fast
Thm 1.54: RL = RE
As the names suggest, the following result holds:
Theorem 1.54: A language is regular if and only if
some regular expression describes it.

Lemma 1.55: If a language is described by a regular
expression, then it is regular. This is relatively easy,
using the closure properties of RLs that we proved.

Lemma 1.60: If a language is regular, then it can be
described by a regular expression. This is harder to
prove and requires the definition of Generalized
Nondeterministic Finite Automata (GNFA).
Proof of Lemma 1.55
Given a regular expression R, construct (by structural
induction on R) a NFA N such that L(R) = L(N):
1. If R = a with L(R) = {a}, then

2. If R = with L(R) = {}, then

3. If R = C with L(R) = {}, then

4. If R = (R
1
R
2
), then L(R) = L(R
1
)L(R
2
)
5. If R = (R
1
-R
2
), then L(R) = L(R
1
)-L(R
2
)
6. If R = (R
1
*), then L(R) = (L(R
1
))*
a

Proof Lemma 1.60
If a language is regular, then it is described by a RE.
Proof Idea: Use generalized nondeterministic finite
automata where the labels on the transition arrows are
allowed to be REs (elements of P).
For an ReP this GNFA recognizes L(R):

We have to prove that for each regular language L,
the corresponding FA M can be transformed into
a GNFA with only two states like the one above.
We do this by removing the internal states of a GNFA
one-by-one until we are left with a GNFA that has only
one start state, one accept state and no loops.
R
Example GNFA
q
S
q
A
01*

0

0* 11

0110

C

Example GNFA
q
S
q
A
01*

0

0* 11

0110

C

R
Generalized NFA
Def. 1.64: A Generalized nondeterministic finite automaton
(GNFA) is defined by M=(Q, , , q
start
, q
accept
) with
Q finite set of states
the input alphabet
q
start
the start state
q
accept
the accept state

:(Q\{q
accept
})(Q\{q
start
}) P the transition function

(P is the set of regular expressions over )
Characteristics of GNFAs o
:(Q\{q
accept
})(Q\{q
start
}) P
The interior Q\{q
accept
,q
start
} is fully connected by
From q
start
we have only outgoing transitions
To q
accept
we have only ingoing transitions
Impossible q
i
q
j
transitions are (q
i
,q
j
) = C
q
S
q
A
ReP
Observation: This GNFA:
recognizes the language L(R)
Proof Idea of Lemma 1.60
Proof idea (given a DFA M):

Construct an equivalent GNFA M with k>2 states

Reduce one-by-one the internal states until k=2

This GNFA M will be of the form

This regular expression R
will be such that L(R) = L(M)
q
S
q
A
R
DFA M Equivalent GNFA M
Let M have k states Q={q
1
,,q
k
}
- Add two states q
accept
and q
start

q
S
q
1

- Connect q
start
to earlier q
1
:
q
i
C

q
j
- Complete missing transitions by
q
A
q
j

- Connect old accepting states to q
accept

- Join multiple transitions:
q
i
0

q
j
1

becomes
q
i
01

q
j
Remove Internal state of GNFA
If the GNFA M has more than 2 states, rip
internal q
rip
to get equivalent GNFA M by:
- Removing state q
rip
: Q=Q\{q
rip
}
- Changing the transition function by

(q
i
,q
j
) = (q
i
,q
j
) ((q
i
,q
rip
)((q
rip
,q
rip
))*(q
rip
,q
j
))

for every q
i
eQ\{q
accept
} and q
j
eQ\{q
start
}
q
i
R
4
(R
1
R
2
*R
3
)
q
j
q
i
R
2
q
j
R
4
q
rip
R
1

R
3
=
Recap Proof Lemma 1.60
1. Let M be DFA with k states.
2. Create equivalent GNFA M with k+2 states
3. Reduce in k steps M to M with 2 states
4. The resulting GNFA describes a RE R
5. The language L(M) equals L(R)
Ingredients Theorem 1.54 RL=RE:
Lemma 1.55: Let R be a regular expression,
then there exists an NFA M such that L(R) = L(M).
Lemma 1.60: The language L(M) of a DFA M is equivalent
to a language L(M) of a GNFA = M, which equals a
two-state M q
start
R q
accept
such that L(R) = L(M)
Hence: RE _ NFA = DFA _ GNFA _ RE
Usefulness of RE
The fact that regular expressions can be recognized
(matched) by deterministic FA is very useful as this is fast.

Consider a RE like ((011)**00 ()*)* that you want
to search for in a very long binary file
There seem to be too many options to do this efficiently:
(where in the file? plus the nondeterministic operations ,*)
Solution: Create a deterministic FA that accepts the
corresponding language L and use it on the file?
Actually, create a FA that accepts *L* and use it.
Complexity ~ length(file) + 2
length(regular expression)
.

Formalities
Next Friday: Midterm on Automata: The Methods and the
Madness and Regular Languages [pp. 156, Reader]

Note that this weeks material will be part of the Midterm,
although you will not have had it as homework.

Questions?
1.4: Nonregular Languages
What languages can not be recognized by finite automata?
How to prove that a language is nonregular?

Example: L={ 0
n
1
n
| neN }
Because DFA = NFA = GNFA, it is sufficient to prove
that the language can not be accepted by a DFA.
Playing around with DFA convinces you that the
finiteness of DFA is problematic for all neN.
The problem occurs between the 0
n
and the 1
n
.

Informal observation: the memory of a FA is limited
by the the number of states |Q|.
Repeating DFA Paths
q
1
q
k
q
j
Consider an accepting DFA M with size |Q|.
On a string of length p, p+1 states get visited.
For p>|Q|, there must be a j such that the deterministic
computational path looks like: q
1
,,q
j
,,q
j
,,q
k
.
Repeating DFA Paths
q
1
q
k
q
j
The action of the DFA in q
j
is always the same.
If we repeat (or ignore) the q
j
,,q
j
part, the new
path will again be an accepting path:
Line of Reasoning
If we want to prove that a language L is nonregular,
we can use the following proof by contradiction technique:
Assume that L is regular.
Hence, there is a DFA M that recognizes L.
For strings of length > |Q| the DFA M has to repeat itself.
Show that M will accept strings outside L.
Conclude that the assumption was wrong.
Note that we use the simple DFA, not the
more elaborate (but equivalent) NFA or GNFA.
Thm 1.70: Pumping Lemma
For every regular language L, there is a finite
pumping length p, such that for any string seL
and |s|>p, we can write s=xyz with:

1) x y
i
z e L for every ie{0,1,2,}
2) |y| > 1
3) |xy| s p

Note that: (1) implies that xz e L, (2) says that y can
not be the empty string , (3) is not always used.
This is a lemma about regular languages
Formal Proof of Pumping Lemma
Let M = (Q,,,q
1
,F) with Q = {q
1
,,q
p
}.
Let s = s
1
s
n
eL(M) with |s| = n > p.
The computational path of M on s is the sequence r
1
r
n+1
e Q
n+1
with r
1
= q
1
, r
n+1
eF and r
t+1
= (r
t
,s
t
) for 1stsn.
Because n+1 > p+1, there have to be two states r
j
and r
k
such that r
j
= q
i
= r
k
(with 1 j < k p+1).
Let x = s
1
s
j1
, y = s
j
s
k1
, and z = s
k
s
n+1
.
The string x takes M from q
1
=r
1
to r
j
, the string y takes M
from r
j
to r
j
, and the string z takes M from r
j
to r
n+1
eF.
As a result: xy
i
z takes M from q
1
to r
n+1
eF (for all i > 0).
Formal Proof of Pumping Lemma
Let M = (Q,,,q
1
,F) with Q = {q
1
,,q
p
}.
Let s = s
1
s
n
eL(M) with |s| = n > p.
The computational path of M on s is the sequence r
1
r
n+1
e Q
n+1
with r
1
= q
1
, r
n+1
eF and r
t+1
= (r
t
,s
t
) for 1stsn.
Because n+1 > p+1, there have to be two states r
j
and r
k
such that r
j
= q
i
= r
k
(with 1 j < k p+1).
Let x = s
1
s
j1
, y = s
j
s
k1
, and z = s
k
s
n+1
.
The string x takes M from q
1
=r
1
to r
j
, the string y takes M
from r
j
to r
j
, and the string z takes M from r
j
to r
n+1
eF.
As a result: xy
i
z takes M from q
1
to r
n+1
eF (for all i > 0).
Pumping 0
n
1
n
(Ex. 1.73)
Assume that B = {0
n
1
n
| n>0} is regular.
Let p be the pumping length, and s = 0
p
1
p
e B.
Pumping Lemma: s = xyz = 0
p
1
p
with xy
i
z e B for all i>0.
Three options for y:
1) y=0
k
, hence xyyz = 0
p+k
1
p
e B
2) y=1
k
, hence xyyz = 0
p
1
k+p
e B
3) y=0
k
1
l
, hence xyyz = 0
p
1
l
0
k
1
p
e B

Conclusion: The pumping lemma does not hold,
hence the language B is not regular.
F = { ww | we{0,1}* } (Ex. 1.75)
Let p be the pumping length, and take s = 0
p
10
p
1.
Let s = xyz = 0
p
10
p
1 with condition 3) |xy|sp.
Only one option: y=0
k
, with xyyz = 0
p+k
10
p
1 e F.

Without 3) this would have been a pain.
Intersecting Regular Languages
Example 1.74: Let C = { w | # of 0s in w = # of 1s in w}.
Problem: If xyzeC with yeC, then xy
i
zeC.
Subversive Idea: If C is regular and F is regular,
then the intersection CF has to be regular as well.
Proof by Contradiction: Assume that C is regular.
Take the regular language F = { 0
n
1
m
| n,meN} such that the
intersection CF = { 0
n
1
n
| neN } has to be regular as well.
But we know that CF is not regular.
Conclusion: C is not regular.
Pumping Down E = { 0
i
1
j
| i>j }
Problem: pumping up s=0
p
1
p
with y=0
k
gives
xyyz = 0
p+k
1
p
, xy
3
z = 0
p+2k
1
p
, which are all in E
(hence do not give contradictions).
Solution: pump down to xz = 0
pk
1
p
.
Overall for s = xyz = 0
p
1
p
(with |xy|sp):
we have y=0
k
with k>0, hence xz = 0
pk
1
p
e E.

Contradiction: E is not regular.
End of Regular Languages

Automata 4

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Automata 4

Загружено:

Авторское право:

Доступные форматы

CS138, Wim van Dam, UCSB

Вам также может понравиться