Вы находитесь на странице: 1из 13

CLOSURE PROPERTIES OF CONTEXT FREE LANGUAGES

We now consider some closure operations regarding CFLs.

These operations are not only useful in constructing or proving that certain languages
are context free (CF), but also in proving that certain languages are not CF. A given
language L can be shown not to be CF by constructing from L a language that is not CF
using only operations preserving CFLs.

THEOREM
CFLs are closed under union, concatenation and Kleene star.

Proof Idea:
The proof is constructive using a technique we have already exercised several times.
Assume that grammars G1 = (N1, , S1, P1) and G2 = (N2, , S2, P2) generate context free
languages L1 and L2 respectively. Then, after making sure that N1  N2 = , the
following grammars generate the corresponding union, concatenation and star
respectively:
Union: Gu = (Nu, , Su, Pu) with
Nu = N1  N2  {Su} and
Pu = P1  P2  {Su  S1 | S2}

Concatenation: Gc = (Nc, , Sc, Pc) with


Nc = N1  N2  {Sc} and
Pc = P1  P2  {Sc  S1S2}

Star of, say, L1: Gs = (Ns, , Ss, Ps) with


Ns = N1  {Ss} and
Ps = P1  {Ss  S1Ss | }

THEOREM (EXERCISE 2.2)


CFLs are not closed under intersection

Proof:
Consider the following CFLs: A = {ambncn | m, n  0} and B = { anbncm | m, n  0} given by
the CFGs:
A: S  RT B: S  TR
R  aR |  R  aTb | 
T  bTc |  T  cR | 

We observe that: L = A  B = {anbncn | n  0} which is not a CFL (proven later using the
pumping lemma for CFLs). This shows that the class of CFLs is not closed under
intersection.
To see why L = A  B = {anbncn | n  0}, observe that A requires that there be the same
number of c’s and b’s, while B requires the number of a’s and b’s to be equal. A string in
both languages must have equal number of all three symbols and thus must be in L.

If the CFLs were closed under intersection, then we could prove the false statement that
L is CF. We conclude by contradiction that the CFLs are not closed under intersection.

THEOREM
CFLs are not closed under complementation

Proof:
Hint: Use Morgan’s law (AB) c = (Ac)(Bc).

THEOREM: Closure under regular intersection.


The intersection of a CFL and a Regular Language is a CFL.

Proof:
Let L1 be a CFL and L2 be a Regular Language.
Let M1 = (S, , ,  1, q0, $, F1) be a NPDA that accepts L1 and M2 = (P, ,  2, p0, F2) be a
DFA that accepts L2. (We’ll call si the elements of S and pi the elements of P).
We construct a PDA M = (Q, , , , q, $, F) that simulates the parallel action of M1 and
M2. Whenever a symbol is read from the input string, M simultaneously executes the
moves of M1 and M2. To this end we let
Q=SxP
q = (s0, p0)
F = F1 x F2

And define  such that


((sk, pn), x)  ((si, pj), a, b)
if and only if
(sk, x)   1(si, a, b) and  2(pj, a) = pn

In this we also require that if a = , then pj = pn.

In other words, the states of M are labeled with pairs (si, pj), representing the respective
states of M1 and M2 in which M1 and M2 can be after reading a certain input string. The
rest is a straightforward induction to show that

((s0,p0) w, z)  ((sr, pm, , x) with sr  F1 and pm  F2

if and only if

(s0, w, z)  (sr, , x) and *(p0, w) = pm

Therefore, a string is accepted by M if and only if it is accepted by M1 and M2, that is, it
is in the intersection of L1 = L(M1) with L2 = L(M2).
Example (8.7 in Linz)
Show that the language L = {anbn : n  0, n  100} is CF.

Solution:
Let L1 = {a100b100}, which is a regular language (finite). Also, (L1)c is regular. It is easy to see that

L = {anbn : n  0}  (L1)c

and by the previous theorem L is CF.

THEOREM
CFL are closed under reversal, that is, if L is a CFL, then so is LR.

Proof:
Let L = L(G) for some for some CFL G = (V, T, P, S).
Construct GR = (V, T, PR, S), where PR is the “reverse” of each production in P.
That is, if A   is a production in G, then A  R is a production of GR.
Induction on the lengths of derivations in G and GR shows that L(GR) = LR.
Essentially, all the sentential forms of GR are reverses of sentential forms of G, and vice-
versa. (Exercise: complete the formal proof).

THEOREM
Regular languages and CFLs are closed under homomorphisms and inverse
homomorphisms.

Proof: TBD after homomorphisms are discussed.


NON-CONTEXT-FREE LANGUAGES

In this section, we present a technique for proving that certain languages are not context
free.

(Recall that earlier we introduced the pumping lemma for showing that certain languages
are not regular. Here we present a similar pumping lemma for context-free languages).

It states that every context-free language has a special value called the pumping length such
that all longer strings in the language can be "pumped."

This time the meaning of pumped is a bit more complex. It means that the string can be
divided into five parts so that the second and the fourth parts may be repeated together any
number of times and the resulting string still remains in the language.
THE PUMPING LEMMA FOR CONTEX FREE LANGUAGES

THEOREM 2.19
Pumping Lemma for Context Free Languages. If A is a context free language, then there
is a number m (the pumping length) where, if s is any string in A of length at least m, then s
may be divided into five pieces s = uvxyz satisfying the conditions:

1. For each i  0, uvixyiz  A,


2. |vy| > 0, and
3. |vxy|  m.

When s is being divided into uvxyz, condition 2 says that either v or y is not the empty
string. Otherwise the theorem would be trivially true.

Condition 3 states that the pieces v, x, and y together have length at most m. This technical
condition sometimes is useful in proving that certain languages are not context free.

PROOF IDEA
Let A be a CFL and let G be a CFG that generates it. We must show that any sufficiently
long string s in A can be pumped and remain in A. The idea behind this approach is simple.

Let s be a very long string in A. (We make clear later what we mean by "very long.")

Because s is in A, it is derivable from G and so has a parse tree. The parse tree for s must
be very tall because s is very long. That is, the parse tree must contain some long path from
the start variable at the root of the tree to one of the terminal symbols at a leaf.

On this long path, some variable symbol R must repeat because of the pigeonhole principle.

As the following figure shows, this repetition allows us to replace the subtree under the
second occurrence of R with the subtree under the first occurrence of R and still get a legal
parse tree.

Therefore, we may cut s into five pieces uvxyz as the figure indicates, and we may repeat
the second and fourth pieces and obtain a string still in the language. In other words,
uvixyiz is in A for any i  0.

Let's now turn to the details to obtain all three conditions of the pumping lemma. We
also show how to calculate the pumping length m.
u v x y z

v x y

FIGURE 2.16 Surgery on parse tree

7­6
PROOF
Let G be a CFG for CFL A. Let b be the maximum number of symbols in the right-hand side
of a rule. We may assume that b  2. In any parse tree using this grammar we know that a
node can have no more than b children. In other words, at most b leaves are 1 step from the
start variable; at most b2 leaves are at most 2 steps from the start variable; and at most bh
leaves are at most h steps from the start variable. So, if the height of the parse tree is at most
h, the length of the string generated is at most bh.

Let |V| be the number of variables in G. We set m to be b|V|+2. Because b  2, we know that m > b|
V|+1
, so a parse tree for any string in A of length at least m requires height at least |V| + 2.

Suppose that s is a string in A of length at least m. We now show how to pump s.


If s has several parse trees, we choose  to be a parse tree that has the smallest number of
nodes. As |s|  m, we know that  has height at least |V| + 2, so the longest path in  has length
at least |V| + 2.

Let r be a parse tree for s. If s has several parse trees, we choose  to be a parse tree that has
the smallest number of nodes. As |s|  m, we know that  has height at least |V| + 2, so the
longest path in  has length at least |V| + 2.

Suppose that s is a string in A of length at least m. We now show how to pump s. Let r be a
parse tree for s. If s has several parse trees, we choose  to be a parse tree that has the
smallest number of nodes. As |s|  m, we know that  has height at least |V| + 2, so the longest
path in  has length at least |V|+2.

This path must have at least |V| + 1 variables because only the leaf is a terminal. With G having 
only |V| variables, some variable R appears more than once on the path. For convenience 
later, we select R to be a variable that repeats among the lowest |V| + 1 variables on this path.

We divide a into uvxyz according to Figure 2.16. Each occurrence of R has a subtree under it,
generating a part of the string s. The upper occurrence of R has a larger subtree and
generates vxy, whereas the lower occurrence generates just x with a smaller subtree. Both of
these subtrees are generated by the same variable, so we may substitute one for the other and
still obtain a valid parse tree.

Replacing the smaller by the larger repeatedly gives parse trees for the strings uvixyiz at
each i > 1. Replacing the larger by the smaller generates the string uxz. That establishes
condition 1 of the lemma. We now turn to conditions 2 and 3.

To get condition 2 we must be sure that both v and y are not . If they were, the parse tree
obtained by substituting the smaller subtree for the larger would have fewer nodes than does
and would still generate s. This result isn't possible because we had already chosen  to be a

7
parse tree for s with the smallest number of nodes. That is the reason for selecting  in this
way.

In order to get condition 3 we need to be sure that vxy has length at most m. In the parse
tree for s the upper occurrence of R generates vxy. We chose R so that both occurrences fall
within the bottom |V| +1 variables on the path, and we chose the longest path in the parse tree,
so the subtree where R generates vxy is at most |V| +2 high. A tree of this height can generate a
string of length at most b|V|+2 = m. ***

For some tips on using the pumping lemma to prove that languages are not context free,
review the pages where we discuss the related problem of proving nonregularity with the
pumping lemma for regular languages.

8
EXAMPLE 2.20
Use the pumping lemma to show that the language B = {anbncn | n  0} is not context free.

We assume that B is a CFL and obtain a contradiction.

Let m be the pumping length for B that is guaranteed to exist by the pumping lemma.

Select the string s = ambmcm  |s| = m+m+m = 3m  m (Clearly s is a member of B and of


length at least m).

The pumping lemma states that s can be pumped, but we show that it cannot.

In other words, we show that no matter how we divide s into uvxyz, one of the three conditions
of the lemma is violated.

First, condition 2 stipulates that either v or y is nonempty. Then we consider one of two cases,
depending on whether substrings v and y contain more than one type of alphabet symbol.

1. When both v and y contain only one type of alphabet symbol, v does not contain both a's
and b's or both b's and c's, and the same holds for y.

In this case, the string uv2xy2z cannot contain equal numbers of a's, b's, and c's. Therefore,
it cannot be a member of B. That violates condition 1 of the lemma and is thus a
contradiction.

2. When either v or y contain more than one type of symbol uv2xy2z may contain equal
numbers of the three alphabet symbols but won't contain them in the correct order. Hence
it cannot be a member of B and a contradiction occurs.

One of these cases must occur. Because both cases result in a contradiction, a contradiction is
unavoidable. So, the assumption that B is a CFL must be false. Thus, we have proved that B is
not a CFL.

YET ANOTHER PROOF THAT B = {anbncn | n ≥ 0} is not a CFL.

Suppose that B is a CFL and thus B = L(G) for some CFG G.  Let m be the constant 
specified by the pumping lemma (PL).

Then, w = aNbNcN  with N ≥ m/3  is in L(G) and has the representation  w = uvxyz such that  
v  or  y  is not the empty string Note that |w| ≥ m).

Since B is a CFL, uvixyiz L(G) for each  i = 0, 1, 2, ...

9
Since |vxy| ≤ m, vxy can contain at most 2 of the 3 symbols from {a, b, c}.

Since |vy| > 0, v and y together contain at least one symbol.

Consider the string uv2xy2z.  

This string contains additional occurrences of the symbols contained in v and y;

Therefore, this string cannot contain equal number of all 3 symbols and thus is not in L(G).
Contradiction!!!

w not only fails to be an element of B but also fails to be an element of the larger language:

L = {w in {a, b, c}* | w has equal numbers of a’s, b’s and c’s}

EXAMPLE 2.21
Let C = {aibjck| 0 < i < j < k}. We use the pumping lemma to show that C is not a CFL.

This language is similar to language B in Example 2.20, but proving that it is not context free is a
bit more complicated.

Assume that C is a CFL and obtain a contradiction. Let m be the pumping length given by the
pumping lemma.

We use the string s = ambm+1cm+2 that we used earlier, but this time we must "pump down" as
well as "pump up." Let s = uvxyz and again consider the two cases that occurred in Example
2.20.

1. When both v and y contain only one type of alphabet symbol, v does not contain both a's
and b's or both b's and c's, and the same holds for y. Note that the reasoning used previously in
case 1 no longer applies. The reason is that C contains strings with unequal numbers of a's, b's,
and c's as long as the numbers are not decreasing. We must analyze the situation more
carefully to show that s cannot be pumped. Observe that because v and y contain only one
type of alphabet symbol, one of the symbols a, b, or c doesn't appear in v or y. We further
subdivide this case into three subcases according to which symbol does not appear.

i. The a's do not appear. Then we try pumping down to obtain the string uv°xy°z = uxz.
That contains the same number of a's as s does, but it contains fewer b's or fewer c's.
Therefore, it is not a member of C, and a contradiction occurs.

10
ii. The b's do not appear. Then either a's or c's must appear in v or y because both can't be the
empty string. If a's appear, the string uv2xy2z contains more a's than b's, so it is not in C. If c's
appear, the string uv°xy°z contains more b's than c's, so it is not in C. Either way a
contradiction occurs.

iii. The c's do not appear. Then the string uv2xy2z contains more a's or more b's than c's, so
it is not in C, and a contradiction occurs.

2. When either v or y contain more than one type of symbol, uv2xy2 z will not contain the
symbols in the correct order. Hence it cannot be a member of C, and a contradiction occurs.

Thus, we have shown that s cannot be pumped in violation of the pumping lemma and that C is
not context free.

EXAMPLE 2.22
Let D = {ww | w {0,1}*}. Use the pumping lemma to show that D is not a CFL.

Assume that D is a CFL and obtain a contradiction.


Let m be the pumping length given by the pumping lemma,

This time choosing string s is less obvious. One possibility is the string 0m10m1. It is a
member of D and has length greater than m, so it appears to be a good candidate.

But this string can be pumped by dividing it as follows, so it is not adequate for our purposes.

u = 0 m-1
v=0 0m 0m
x=1 1 1
y=0 000 … 000   0   1   0   000 … 0001
z = 0 m-1 1
u v x y z
Let's try another candidate for s.

Intuitively, the string 0m1m0m1m seems to capture more of the "essence" of the language D than
the previous candidate did. In fact, we can show that this string does work, as follows.

We show that the string s = 0P1P0P1P cannot be pumped. This time we use condition 3 of the
pumping lemma to restrict the way that s can be divided. It says that we can pump s by
dividing s = uvxyz, |vxy|  m.

First, we show that the substring vxy must straddle the midpoint of s.

Otherwise, if the substring occurs only in the first half of s, pumping s up to uv2xy2z moves a 1
into the first position of the second half, and so it cannot be of the form ww. Similarly, if vxy

11
occurs in the second half of s, pumping s up to uv2xy2z moves a 0 into the last position of the
first half, and so it cannot be of the form ww.

But if the substring vxy straddles the midpoint of s, when we try to pump s down to uxz it has
the form 0m1i0j1m, where i and j cannot both be m. This string is not of the form ww. Thus, s
cannot be pumped, and D is not a CFL. 

ADITIONAL EXERCISES WITH HINTS!

2. Use the pumping lemma to show that the following languages are not context free.
(a) {0n1n 0n1n | n 0}
(b) {0n#02n#03n | n 0}
(c) {w#x | w is a substring of x, where w, x  {a,b}* }
(d) {x1#x2# ...#xk | k  2, each xi {a,b}*, and for some i  j, xi = xj}

a.  Let A = {0n1n0n1n| n  0}.   Let m be the pumping length given by the pumping lemma.   
We show that s = 0m1m0m1m cannot be pumped.   Let s = uvxyz.
If either v or y contain more than one type of alphabet symbol, uv2xy2z does not contain the 
symbols in the correct order. Hence it cannot be a member of A. If both v and y contain (at most) 
one type of alphabet symbol, uv2xy2z contains runs of 0's and l's of unequal length. Hence it 
cannot be a member of A. Because s cannot be pumped without violating the pumping lemma 
conditions, A is not context free.

b.  Let B = {0n#02n#03n| n  0}.   Let m be the pumping length given by the pumping lemma.  Let
s = 0m#02m#03pm.  We show that s = uvxyz cannot be pumped.
Neither v nor y can contain #, otherwise xv2wy2z contains more than two #s. Therefore, if we divide
s into three segments by #'s: 0m, 02m, and 03m, at least one of the segments is not contained within 
either v or y. Hence, xv2wy2z is not in B because the 1 : 2 : 3  length ratio of the segments is not 
maintained.

c.  Let C = {w#x | w is a substring of x, where w, x  {a,b}*}.   Let m be the pumping length 
given by the pumping lemma. Let s = ambm#ambm. We show that the string s = uvxyz cannot be 
pumped.
Neither v nor y can contain #, otherwise uv°xy°z does not contain # and therefore is not in C. If 
both v and y are nonempty and occur on the left­hand side of the #, string uv2xy2z cannot be in C
because it is longer on the left­hand side of the #. Similarly, if both strings occur on the right­hand 
side of the #, string uv°xy°z cannot be in C, because it is again longer on the left­hand side of the 

12
#. If only one of v and y is nonempty (both cannot be nonempty), treat them as if both occurred 
on the same side of the # as above.

The only remaining case is where both v and y are nonempty and straddle the #. But then v 
consists of b's and y consists of a's because of the third pumping lemma condition |vxy|  m. 
Hence, uv2xy2z contains more b's on the left­hand side of the #, so it cannot be a member of C.

d.  Let D ={x1#x2# ... xk | k  2, each xi  {a,b}*,and for some i  j, xi = xj}. Let m be the 
pumping length given by the pumping lemma. Let s = ambm#ambm. We show that s = uvxyz 
cannot be pumped. Use the same reasoning as in part (c).

PROBLEM 2.20
Let G be a CFG in CNF that contains b variables. Show that, if G generates some string 
using a derivation with at least 2b steps, then L(G) is infinite.

SOLUTION
Assume that G generates a string w using a derivation with at least 2b steps. Let n be the 
length of w.  Consider a parse tree of w. 

The right­hand side of each rule contains at most two variables, so each node of the parse 
tree has at most two children. 

Additionally, the length of w is at least 2b, so the parse tree of w must have height b+1 to 
generate a string of length at least 2b. 

Hence, the tree contains a path with at least b+1 variables, and therefore some variable is 
repeated on that path. 

Using a surgery on trees argument identical to the one used in the proof of the CFL 
pumping lemma, we can now divide w into pieces uvxyz where uvixyiz for all i  0. 
Therefore, L(G) is infinite.

13

Вам также может понравиться