Вы находитесь на странице: 1из 96

CS 120: Computation Theory

Context-Free Grammars and Languages

7/21/2017 10:52 AM CS 120 SemesterII-2013 1


So far ...
A language is a set of strings over an alphabet.
Languages serve two purposes in computing:
(a) communicating instructions or
information
(b) defining valid communications
We have defined languages by:
(i) regular expressions
(ii) finite state automata
Both (i) and (ii) give us exactly the same class
of languages.
What about languages without this
class?
7/21/2017 10:52 AM CS 120 SemesterII-2013 2
Specifying Non-Regular Languages
We have already seen a number of languages that are not regular. In
particular,
{anbn : n 0}, the language of matched round brackets
arithmetic expressions
standard programming languages
The languages above are not regular. However, these languages are all
systematic constructions, and can be clearly and explicitly defined.
Consider L = {anbn : n 0}:
(i) L
(ii) if x L, then axb L
(iii) nothing else is in L
This is a clear and concise specification of L. Can we use it to generate
members of L?
7/21/2017 10:52 AM CS 120 SemesterII-2013 3
Limitation of Finite Automata
There are languages, such as {0n1n|n0} that can not be
described (specified)by NFAs or REs.

Context-Free Grammars (CFGs) provide a more powerful


mechanism for language specification.

CFGs can describe features that have a recursive structure


making them useful beyond finite automata.

7/21/2017 10:52 AM CS 120 SemesterII-2013 4


Historical Notes
CFGs were first used to study human languages.

One way of understanding the relationship between


syntactic categories (such as a noun, verb, preposition etc)
and their respective phrases leads to natural recursion.

This is because noun phrases may occur inside the verb


phrases and vice-versa.

CFGs can capture important aspects of these relationships.

7/21/2017 10:52 AM CS 120 SemesterII-2013 5


Applications
CFGs are used as basis for compiler design and
implementation

CFGs are used as specification mechanisms for


programming languages

Designers of compilers use such grammars to implement


compilers components, such a scanners, parsers, and code
generators.

The implementation of any programming language is


preceded by a context-free grammar that specifies it
7/21/2017 10:52 AM CS 120 SemesterII-2013 6
Context-Free Languages
The collection of languages specified by CFGs are called
Context-Free Languages(CFLs).
CFLs include Regular Languages and many others.
Notation:
Abbreviate the phrase context-free grammar to CFG

Abbreviate the phrase context-free language to CFL

Abbreviate the concept of a CFG specification rule to the tuple


lhs rhs where lhs stands for left hand side and rhs stands for right
hand side.

7/21/2017 10:52 AM CS 120 SemesterII-2013 7


Specification Rules
The lhs of a specification rule is also called the Variable and
denoted by capital letters.

The rhs of a specification rule is also called a specification


pattern and consists of variables and constants.

The variables that occur in a specification pattern are also


called non-terminal symbols; the constants that occur in a
specification pattern are also called terminal symbols.

7/21/2017 10:52 AM CS 120 SemesterII-2013 8


CFG:Informal
A CFG grammar consists of a collection of specification
rules where one variable is designated as start symbol or
axiom.
Example: The CFG G1 has the following specification
rules:
A 0A1
A B
B #
For G1, non-terminal symbols are {A,B} and A is the
Axiom. The terminals are {0, 1, #}

7/21/2017 10:52 AM CS 120 SemesterII-2013 9


More Terminologies
The specification rules of a CFG are also called
productions or substitution rules
Non-terminals used in the specification rules defining a
CFG may be strings.
Terminals in the specification rules defining a CFG are
constant strings.
Terminals used in CFG specification rules are analogous to the
input alphabet of an automaton.
Example terminals used in CFGs are letters of an alphabet,
numbers, special symbols, and strings of such elements.
Strings used to denote terminals in CFG specification rules are
quoted.
7/21/2017 10:52 AM CS 120 SemesterII-2013 10
Language Specification
A CFG is used as a language specification mechanism
by generating each string of the language in the
following manner:
i. Write down the start variable; it is the lhs of one of the
specification rules, the top rule, unless specified otherwise.

ii. Find a variable that is written down and a rule whose lhs is
that variable. Replace the written down variable with the rhs
of that rule.

iii. Repeat step ii until no variables remain in the string thus


generated.
7/21/2017 10:52 AM CS 120 SemesterII-2013 11
Example 1: String Generation
Using CFG G1, we can generate the string 000#111 as
follows:
A 0A1 00A11 000A111 000B111 000#111
Note: The sequence of substations used to obtain a string
using CFG is called a derivation and may be represented
by a tree called a derivation tree or parse tree.
We often specify a grammar by writing down only its
rules. We can identify the variables as the symbols that
appear only as the lhs of he rules.
Terminals are the remaining strings used in the rules.

7/21/2017 10:52 AM CS 120 SemesterII-2013 12


Derivation Tree
The derivation tree of the string 000#111 using CFG
G1 is as follows:

7/21/2017 10:52 AM CS 120 SemesterII-2013 13


CFL & CFG
All strings of terminals generated in this way constitute a
language specified by a grammar.
We write L(G) for the language generated by the grammar
G. Thus, L(G1)={0n1n|n0}.
To distinguish non-terminal and terminal strings, we often
enclose non-terminals in angular parentheses, ,, and
terminals in quotes, ,.
If two or more rules have the same lhs, as in the example
A 0A1 and A B, we may compact them using the form
lhs rhs1|rhs2|.rhsn where | is used with the meaning of an
or.
Example: The rules A 0A1 and A B may be written as:
A 0A1|B
7/21/2017 10:52 AM CS 120 SemesterII-2013 14
Example 2:
The CFG G2 specifies a fragment of English

7/21/2017 10:52 AM CS 120 SemesterII-2013 15


Example 2: GFG G2
The CFG G2 has 10 variables (capitalized and in angular
brackets) and 9 terminals (written in the standard English
alphabet) plus a space character.
Also, the CFG has 18 rules
Examples of strings that belongs to L(G2) are:
a boy sees
the boy sees a flower
a girl with a flower likes the boy

7/21/2017 10:52 AM CS 120 SemesterII-2013 16


Derivation with GFG G2

7/21/2017 10:52 AM CS 120 SemesterII-2013 17


Formal Definition of a GFG
A Context-Free Grammar is a 4-tuple (V, , R, S) where:
i. V is a set of strings called the variables or non-terminals
ii. is finite set of strings, disjoint from V, called terminals
iii. R is a finite set of rules (or specification rules) of the form
lhs rhs where lhs belongs to V and rhs belongs to (VU )* .
iv. S V is the start variable (grammar axiom)
Example: G1= ({A,B},{0,1,#}, R,A) where:
A 0A1
A B
B #

7/21/2017 10:52 AM CS 120 SemesterII-2013 18


Direct Derivation
If u,v,w (VU)* (i.e are stings of variables and terminals) and
A wR (i.e is a rule of the grammar) then we say uAv yields uwv ,
written uAv uwv

We may also say that uwv directly derived from uAv using the
rule A w

We write u * v if u=v or if a sequence u1, u2,.uk (VU)*


exists, for k0 and u1 u2 . uk v

We may also say that u1, u2 uk, v is a derivation of v from u1

Language specified by G:
If G=(V, , R, S) is a CFG then the language specified by G (or the
language of G) is L(G)={w *|S * w}
7/21/2017 10:52 AM CS 120 SemesterII-2013 19
Example 3: GFG G3
Consider the grammar:
G3=({S},{a,b},{S aSb|SS| },S)
L(G3) strings such as:
abab
aaabbb
aababb
Note: if one think of a, b as (,) then we can see that
L(G3) is the language of all strings of properly nested
parentheses.

7/21/2017 10:52 AM CS 120 SemesterII-2013 20


Example 4: GFG G4
Consider the grammar:
G4 = ({E,T,F},{+,*,(,)}, R,E) where R is:

L(G4) is a language of Arithmetic expressions.

7/21/2017 10:52 AM CS 120 SemesterII-2013 21


Example 4: GFG G4
The variables and constants in L(G4) are represented by terminal
the a
Arithmetic operations in L(G4) are addition represented by +,
multiplication represented by *
Examples of derivation (a+a*a) using G4 is as in the figure
below:

7/21/2017 10:52 AM CS 120 SemesterII-2013 22


Designing CFGs
As with the design of automata, the design of CFGs
requires creativity.
CFGs are even trickier to construct than finite automata
because we are more accustomed to programming a
machine than we are to specify programming languages.
Many CFGs are unions of simpler CFGs. Hence the
suggestion is to construct smaller, simpler grammars first
and then to join them into a larger grammar.
The mechanism of grammar combination consist of putting
all their rules together and adding the new rules S S1|S2| ..
|Sk where the variables are the start variables Si ,1i k of
the individual grammars and S is a new variable.

7/21/2017 10:52 AM CS 120 SemesterII-2013 23


First Grammar Design
Design a grammar for the language:
{0n1n|n0}U {1n0n|n0}
Construct the grammar S1 0S11| that generates
{0n1n|n0}
Construct the grammar S2 1S20| that generates
{1n0n|n0}
Put them together adding the value S S1|S2 thus getting:

7/21/2017 10:52 AM CS 120 SemesterII-2013 24


Second Design Technique
Constructing a CFG for a regular language is easy if one can
first construct a DFA for that language.
Conversion procedure:
Make a variable Ri for each state qi of the DFA
Add rule Ri aRj to the CFG if (qi ,a)= qi is a transition in the
DFA
Add rule Ri if qi is an accept state of the DFA
If q0 is the start state of the DFA make R0 the start variable of the
CFG.
Verify that the CFG constructed by the conversion of a
DFA into a CFG generates the language that the DFA
recognizes.

7/21/2017 10:52 AM CS 120 SemesterII-2013 25


Third Design Technique
Certain CFLs contain strings with two related substrings as
are 0n and 1n in {0n1n|n0}

Example of the relationship: to recognize such a language a


machine would need to remember an unbounded amount
of info about one of the substrings.

A CFG that handles this situation uses a rule of the form


R uRv which generates strings wherein the portion
containing us corresponds to the portion containing vs.

7/21/2017 10:52 AM CS 120 SemesterII-2013 26


Fourth Design Technique
In a complex language, strings may contain certain
structures that appear recursively.

Example: in arithmetic expressions any time the symbol a


appear, the entire parenthesized expression may appear.

To achieve this effect one needs to place the variable


generating the structure (E in case of G4) in the location of
the rule corresponding to where the structure may
recursively appear as in E E+T in case of G4.

7/21/2017 10:52 AM CS 120 SemesterII-2013 27


Designing CFG
Can we design CFG for
{0n1n | n 0} U {1n0n | n 0} ?

Do we know CFG for {0n1n | n 0}?

Do we know CFG for {1n0n | n 0}?

7/21/2017 10:52 AM CS 120 SemesterII-2013 28


Designing CFG

CFG for the language L1 = {0n1n | n 0}


S 0S1 |
CFG for the language L2 = {1n0n | n 0}
S 1S0 |
CFG for L1|L2
S S1 | S2
S1 0S11 |
S2 1S20 |
7/21/2017 10:52 AM CS 120 SemesterII-2013 29
Designing CFG

Can we design CFG for {02n13n | n 0}?

Yes, by linking the occurrence of 0s with the


occurrence of 1s

The desired CFG is:


S 00S111 |

7/21/2017 10:52 AM CS 120 SemesterII-2013 30


Designing CFG
Can we construct the CFG for the language { w | w is a
palindrome } ?
Assume that the alphabet of w is {0,1}

Examples for palindrome: 010, 0110, 001100, 01010,


1101011,

7/21/2017 10:52 AM CS 120 SemesterII-2013 31


Regular Language & CFG
Theorem: Any regular language can be described by a
CFG.

CFL

Regular

How to prove? (By construction)

7/21/2017 10:52 AM CS 120 SemesterII-2013 32


Regular Language & CFG

Proof: Let D be the DFA recognizing the language.


Create a distinct variable Vi for each state qi in D.

Make V0 the start variable of CFG


Assume that q0 is the start state of D
Add a rule Vi aVj if (qi,a) = qj
Add a rule Vi if qi is an accept state

Then, we can show that the above CFG generates exactly the
same language as D (how to show?)
7/21/2017 10:52 AM CS 120 SemesterII-2013 33
Regular Language & CFG (Example)

DFA 0 1
1

q0 q1
start

0
CFG G = ( {V0, V1}, {0,1}, R, V0 ), where R is
V0 0V0 | 1V1 |
V1 1V1 | 0V0
7/21/2017 10:52 AM CS 120 SemesterII-2013 34
Leftmost Derivation
A derivation which always replace the leftmost
variable in each step is called a leftmost derivation
E.g., Consider the CFG for the properly nested
parentheses ( {S}, {(,)}, R, S ) with rule R: S (
S ) | SS |
Then, S SS (S)S ( )S ( ) ( S )
( ) ( ) is a leftmost derivation
But, S SS S(S) (S)(S) ( ) ( S )
( ) ( ) is not a leftmost derivation
However, we note that both derivations correspond to
the same parse tree

7/21/2017 10:52 AM CS 120 SemesterII-2013 35


Leftmost & Rightmost Derivations
A derivation which always replace the rightmost variable
in each step is called a rightmost derivation.
S A|AB Sample derivations:
A | a | A b | AA S AB AAB aAB aaB aabB aabb
B b|bc|Bc|bB S AB AbB Abb AAbb Aabb aabb

S These two derivations are special.

A B 1st derivation is leftmost.


Always picks leftmost variable.
A A b B
2nd derivation is rightmost.
a a b Always picks rightmost variable.

7/21/2017 10:52 AM CS 120 SemesterII-2013 36


Ambiguity
Sometimes, a string can have two or more leftmost
derivations!!
E.g., Consider CFG ( {S}, {+,x,a}, R, S) with rules R:
SS+S|SxS|a
The string a + a x a has two leftmost derivations as
follows:
S S + S a + S a +S x S a + a x S
a+axa
S S x S S + S x S a +S x S a + a x S
a+axa

7/21/2017 10:52 AM CS 120 SemesterII-2013 37


Two parse trees for a + a x a

S S

S + S S x S

a
S x S S + S a

a a a a

7/21/2017 10:52 AM CS 120 SemesterII-2013 38


Parse Trees

S A|AB Other derivation trees for


A | a | A b | AA w = aabb this string?
B b|bc|Bc|bB

S S
S ? ?
A
A B A B
A A Infinitely
A A b B A A b many others
A A A b possible.
a a b a A b
a A b
a
a
7/21/2017 10:52 AM CS 120 SemesterII-2013 39
Grammar Ambiguity
If a string has two or more leftmost(or rightmost)
derivations in a CFG G, we say the string is derived
ambiguously in G
A grammar is ambiguous if some strings are derived
ambiguously.
Note that the two leftmost derivations in the previous
example correspond to different parse trees (see previous
slide)
In fact, each leftmost derivation corresponds to a
unique parse tree.

7/21/2017 10:52 AM CS 120 SemesterII-2013 40


Ambiguity & Disambiguation
Ambiguity is not a good thing, if avoidable.
How to avoid?
Either say the same message in a slightly different language that is
not ambiguous,
Or change the grammar so it is not ambigous
Or cope.
Example: The grammar S SS|if test:S|if test:S else:S|write
is ambiguous. It makes sentences like: if test: if test:write
else:write
But we can write the same message differently:
S SS|if test:S end|if test S: else S end|write.
This makes sentences like :
if test: if test:write end else: write end
if test: if test: write else: write end end.
7/21/2017 10:52 AM CS 120 SemesterII-2013 41
Ambiguity & Disambiguation..
Exp n Exp Term
| Exp + Exp | Term + Exp
| Exp Exp Term n
| n Term
? ?
What is an equivalent
Uses
unambiguous
operator precedence
grammar? left-associativity

7/21/2017 10:52 AM CS 120 SemesterII-2013 42


Ambiguity & Disambiguation..
What is a general algorithm?
?
None exists!
?
There are CFLs that are inherently ambiguous
Every CFG for this language is ambiguous.

E.g., {anbncmdm | n1, m1} {anbmcmdn | n1, m1}.

So, cant necessarily eliminate ambiguity!

7/21/2017 10:52 AM CS 120 SemesterII-2013 43


Inherently Ambiguous
Sometimes when we have an ambiguous grammar, we can
find an unambiguous grammar that generates the same
language
However, some language can only be generated by
ambiguous grammar
E.g., { anbncm | n, m 0} [ {anbmcm | n, m 0}

Such language is called inherently ambiguous. That is, a


CFL is inherently ambiguous if all grammars for it are
ambiguous.

7/21/2017 10:52 AM CS 120 SemesterII-2013 44


Parsing
S

S S+M|M S + M
MM*T | T
T (S) | number
Derivation
M M * T

Parsing
Programming
languages 1
T T
are (should be)
designed to make
parsing easy, 2
3
efficient, and
unambiguous.
3 + 2 * 1
7/21/2017 10:52 AM CS 120 SemesterII-2013 45
Easy and Efficient Parsing

Easy - we can automate the process of


building a parser from a description of a
grammar.

Efficient the resulting parser can build a


parse tree quickly (linear time in the length of
the input).

7/21/2017 10:52 AM CS 120 SemesterII-2013 46


CFG Simplification

Cant always eliminate ambiguity.


But, CFG simplification & restriction still useful
theoretically & pragmatically.
Simpler grammars are easier to understand.
Simpler grammars can lead to faster parsing.
Restricted forms useful for some parsing algorithms.
Restricted forms can give you more knowledge about
derivations.

7/21/2017 10:52 AM CS 120 SemesterII-2013 47


CFG Simplification
Cant always eliminate ambiguity.
But, CFG simplification & restriction still useful
theoretically & pragmatically.
Simpler grammars are easier to understand.
Simpler grammars can lead to faster parsing.
Restricted forms useful for some parsing algorithms.
Restricted forms can give you more knowledge about
derivations.
CFG simplification include:
Eliminate useless variables.
Eliminate -productions: A.
Eliminate unit productions: AB.
Eliminate redundant productions.
Trade left- & right-recursion.
7/21/2017 10:52 AM CS 120 SemesterII-2013 48
Killing -Productions
-Productions:
In a given CFG, we call a non-terminal N null able
if there is a production N , or there is a
derivation that starts at N and lead to a .
N
-Productions are undesirable.
We can replace -production with appropriate
non- productions.
49
7/21/2017 10:52 AM CS 120 SemesterII-2013
Killing -Productions
If L is CFL generated by a CFG having -productions, then
there is a different CFG that has no -production and still
generates either the whole language L (if L does not include )
or else generate the language of all the words in L other than .
Replacement Rule.
1. Delete all -Productions.
2. Add the following productions:
For every production of the X old string
Add new production of the form X .., where right side will
account for every modification of the old string that can be
formed by deleting all possible subsets of null-able Non-
Terminals, except that we do not allow X , to be formed if
all the character in old string are null-able

50
7/21/2017 10:52 AM CS 120 SemesterII-2013
Example Consider the CFG
S a | Xb | aYa
XY|
Yb|X

Old nullable New So the new CFG is


Production Production
XY nothing S a | Xb | aa | aYa |b
X nothing XY
YX nothing Yb|X
S Xb Sb
S aYa S aa

51
7/21/2017 10:52 AM CS 120 SemesterII-2013
Consider the CFG
Example S Xa
X aX | bX |

Old nullable New So the new CFG is


Production Production
S Xa Sa S a | Xa
X aX | bX | a | b
X aX Xa

X bX Xb

52
7/21/2017 10:52 AM CS 120 SemesterII-2013
Example

S XY
X Zb
Null-able Non-terminals are?
Y bW
Z AB A, B, Z and W

WZ
A aA | bA |
B Ba | Bb |
53
7/21/2017 10:52 AM CS 120 SemesterII-2013
S XY
X Zb
Example Contd. Y bW
Z AB
WZ
A aA | bA |
B Ba | Bb |
Old nullable New So the new CFG is
Production Production S XY
X Zb Xb
Y bW Yb X Zb | b
Z AB Z A and Z B Y bW | b
WZ Nothing new Z AB | A | B
A aA Aa
A bA Ab
WZ
B Ba B a A aA | bA | a | b
B Bb Bb B Ba | Ba | a | b
54
7/21/2017 10:52 AM CS 120 SemesterII-2013
Killing unit-productions

Definition: A production of the form


Nonterminal one Nonterminal
is called a unit production.
The following theorem allows us to get rid of unit
productions:
Theorem :
If there is a CFG for the language L that has no -
productions, then there is also a CFG for L with no -
productions and no unit productions.
55
7/21/2017 10:52 AM CS 120 SemesterII-2013
Killing unit-productions

This is another proof by constructive algorithm.


Algorithm: For every pair of non-terminals A and B, if the
CFG has a unit production A B, or if there is a chain
A X1 X2 B where X1, X2, ... are non-
terminals, create new productions as follows:
If the non-unit productions from B are
B s1 | s2| where s1, s2, ... are strings, we create the
productions A s1| s2|

56
7/21/2017 10:52 AM CS 120 SemesterII-2013
Killing unit-productions
Consider the CFG
S A| bb
AB|b
BS|a
The non-unit productions are
S bb, A b ,B a
And unit productions are
SA
AB
BS

57
7/21/2017 10:52 AM CS 120 SemesterII-2013
Killing unit-productions: Example
contd.
Lets list all unit productions and their sequences and create new
productions:
SA gives Sb
SAB gives Sa
AB gives Aa
ABS gives A bb
BS gives B bb
BSA gives Bb
Eliminating all unit productions, the new CFG is
S bb | b | a
A b | a | bb
B a | bb | b
This CFG generates a finite language since there are no non-terminals
in any strings produced from S.
58
7/21/2017 10:52 AM CS 120 SemesterII-2013
Useless Symbols
Let a CFG G. A symbol X (V U ) is useful if there is a derivation

* *
S UxV w
G G
Where U and V (V U ) and w *. A symbol that is not useful is
useless
A terminal is useful if it occurs in a string of the language of G.
A variable is useful if it occurs in a derivation that begins from S and
generates a terminal string

For a variable to be useful two conditions must be satisfied.

1. The variable must occur in a sentential form of the grammar


2. There must be a derivation of a terminal string from the variable.
A variable that occurs in a sentential form is said to be reachable from
S.
A two part procedure is presented to eliminate useless symbols.
7/21/2017 10:52 AM CS 120 SemesterII-2013 59
Useless Productions
S aSb
S
SA
Useless Production
A aA

Some derivations never terminate...

S A aA aaA aaaA
63
7/21/2017 10:52 AM CS 120 SemesterII-2013
Another grammar:
SA
A aA
A
B bA Useless Production

Not reachable from S

64
7/21/2017 10:52 AM CS 120 SemesterII-2013
contains only
In general:
terminals
if S xAy w

w L(G )

then variable A is useful

otherwise, variable A is useless

65
7/21/2017 10:52 AM CS 120 SemesterII-2013
A production A x is useless
if any of its variables is useless
S aSb
S Productions
Variables SA useless
useless A aA useless
useless BC useless

useless CD useless
66 7/21/2017 10:52 AM CS 120 SemesterII-2013
Removing Useless Productions
Example Grammar:

S aS | A | C
Aa
B aa
C aCb

67
7/21/2017 10:52 AM CS 120 SemesterII-2013
First: Find all variables that can produce
strings with only terminals

Round 1:
S aS | A | C { A, B}
Aa SA
B aa
C aCb Round 2: { A, B, S}

68
7/21/2017 10:52 AM CS 120 SemesterII-2013
Keep only the variables
that produce terminal symbols:{ A, B, S}
(the rest variables are useless)

S aS | A | C
Aa S aS | A
B aa Aa
C aCb B aa
Remove useless productions
69
7/21/2017 10:52 AM CS 120 SemesterII-2013
Find all variables
Second:
reachable from S

Use a Dependency Graph

S aS | A
Aa S A B
B aa not
reachable
70
7/21/2017 10:52 AM CS 120 SemesterII-2013
Keep only the variables
reachable from S
(the rest variables are useless)
Final Grammar
S aS | A
S aS | A
Aa
Aa
B aa

Remove useless productions

71
7/21/2017 10:52 AM CS 120 SemesterII-2013
Set of variables that Derive terminal
symbols
Input = CFG (V, , P , S)
TERM = { A | there is a rule Aw P with
w *}
repeat
PREV = TERM
For each variable in A V do
o If there is a rule A w and w (PREV U )* then
TERM = TERM U {A}
Until PREV = TERM
72 CS 120 SemesterII-2013 7/21/2017 10:52 AM
Example

Consider following CFG


G: S AC | BS | B
A aA | aF
B CF | b
C cC | D
D aD | BD | C
E aA | BSA
73
F bB | b
7/21/2017 10:52 AM CS 120 SemesterII-2013
S AC | BS | B New Grammar from
A aA | aF TERM will be
B CF | b GT:
C cC | D S BS | B
D aD | BD | C A aA | aF
E aA | BSA Bb
F bB | b E aA | BSA
F bB | b
ITERATION TERM PREV
0 {B, F} {}
1 {B, F, A, S} {B, F}
2 {B, F, A, S, E} {B, F, A, S}
3 {B, F, A, S, E} {B, F, A, S, E}

7/21/2017 10:52 AM CS 120 SemesterII-2013 74


Construction of set of reachable
Variables
Input = CFG (V, , P , S)
REACH = {S}
1. PREV = null
2. repeat
i. NEW = REACH PREV
ii. PREV = REACH
iii. For each variable A in NEW do
i. For each rule A w do add all variables in w to
REACH
3.
75
Until
7/21/2017 10:52 AM
REACH = PREV
CS 120 SemesterII-2013
GT:
S BS | B
A aA | aF
Bb
E aA | BSA
F bB | b

Iteration REACH PREV NEW


0 {S} {}
1 {S, B} {S} {S}
2 {S, B} {S, B} {B}
3 {S, B} {S, B}

76
7/21/2017 10:52 AM CS 120 SemesterII-2013
Removing All

Step 1: Remove Nullable Variables

Step 2: Remove Unit-Productions

Step 3: Remove Useless Variables

77
7/21/2017 10:52 AM CS 120 SemesterII-2013
Chomsky Normal Form (CNF)
A CFG is in Chomsky Normal Form if each rule is of the
form
A BC
Aa
where
a is any terminal
A,B,C are variables
B, C cannot be start variable
However, S is allowed

7/21/2017 10:52 AM CS 120 SemesterII-2013 78


Examples:
S AS S AS
S a S AAS
A SA A SA
Ab A aa
Chomsky Not Chomsky
Normal Form Normal Form

79
7/21/2017 10:52 AM CS 120 SemesterII-2013
Chomsky Normal Form (CNF)
Why should we care for CNF? Well, its an effective
grammar, in the sense that every variable that being
expanded (being a node in a parse tree), is guaranteed
to generate a letter in the final string.

As such, a word w of length n, must be generated by a


parse tree that has O(n) nodes. This is of course not
necessarily true with general grammars that might have
huge trees, with little strings generated by them.

7/21/2017 10:52 AM CS 120 SemesterII-2013 80


Converting a CFG to CNF
Theorem: Any context-free language can be
generated by a context-free grammar in
Chomsky Normal Form.

Hint: When is a general CFG not in Chomsky Normal


Form?

7/21/2017 10:52 AM CS 120 SemesterII-2013 81


Proof Idea
The only reasons for a CFG not in CNF:
Start variable appears on right side
It has rules, such as A
It has unit rules, such as A A, or B C
Some rules does not have exactly two variables or
one terminal on right side

Prove idea: Convert a grammar into CNF by handling


the above cases

7/21/2017 10:52 AM CS 120 SemesterII-2013 82


Convertion to Chomsky Normal Form
S ABa
Example:
A aab
B Ac

Not Chomsky
Normal Form

83
7/21/2017 10:52 AM CS 120 SemesterII-2013
Introduce variables for Ta , Tb , Tc
terminals:
S ABTa
S ABa A TaTaTb
A aab B ATc
B Ac Ta a
Tb b
Tc c
84
7/21/2017 10:52 AM CS 120 SemesterII-2013
Introduce intermediate variable: V1
S AV1
S ABTa
V1 BTa
A TaTaTb
A TaTaTb
B ATc
B ATc
Ta a
Ta a
Tb b
Tb b
Tc c
Tc c
85
7/21/2017 10:52 AM CS 120 SemesterII-2013
Introduce intermediate variable: V2
S AV1
S AV1
V1 BTa
V1 BTa
A TaV2
A TaTaTb
V2 TaTb
B ATc
B ATc
Ta a
Ta a
Tb b
Tb b
Tc c
86
7/21/2017 10:52 AM CS 120 SemesterII-2013 Tc c
Final grammar in S AV1
Chomsky Normal Form:
V1 BTa
A TaV2
Initial grammar
V2 TaTb
S ABa B ATc
A aab Ta a
B Ac Tb b
87
7/21/2017 10:52 AM CS 120 SemesterII-2013
Tc c
General Conversion Steps [Step 1]

Proof: Let G be the context-free grammar generating


the context-free language. We want to convert G into
CNF.
Step 1: Add a new start variable S0 and the rule S0
S, where S is the start variable of G

This ensures that start variable of the new grammar


does not appear on right side

7/21/2017 10:52 AM CS 120 SemesterII-2013 88


General Conversion Steps [Step 2]
Step 2: We take care of all rules. To remove the
rule A , for each occurrence of A on the right side
of a rule, we add a new rule with that occurrence
deleted.
E.g., R uAvAw causes us to add the rules: R
uAvw, R uvAw, R uvw
If we have the rule R A, we add R unless we
had previously removed R
After removing A , the new grammar still generates
the same language as G.

7/21/2017 10:52 AM CS 120 SemesterII-2013 89


General Conversion Steps [Step 3]
Step 3: We remove the unit rule A B. To do so,
for each rule B u (where u is a string of variables
and terminals), we add the rule A u.
E.g., if we have A B, B aC, B CC, we add:
A aC, A CC

After removing A B, the new grammar still generates


the same language as G.

7/21/2017 10:52 AM CS 120 SemesterII-2013 90


General Conversion Steps [Step 4]
Step 4: Suppose we have a rule A u1 u2 uk,
where k > 2 and each ui is a variable or a terminal. We
replace this rule by
A u1A1, A1 u2A2, A2 u3A3, ,
Ak-2 uk-1uk

After the change, the string on the right side of any rule
is either of length 1 (a terminal) or length 2 (two
variables, or 1 variable + 1 terminal, or two terminals)

7/21/2017 10:52 AM CS 120 SemesterII-2013 91


General Conversion Steps [Step 4 ..]

To remove a rule A u1u2 with some terminals on the


right side, we replace the terminal ui by a new variable
Ui and add the rule Ui ui

After the change, the string on the right side of any rule
is exactly a terminal or two variables

7/21/2017 10:52 AM CS 120 SemesterII-2013 92


Example 2

Let G be the grammar on the left side. We get the new


grammar on the right side after the first step.

S ASA | aB S0 S
AB|S S ASA | aB
Bb| AB|S
Bb|

7/21/2017 10:52 AM CS 120 SemesterII-2013 93


Example 2 ..

After that, we remove B

S0 S S0 S
S ASA | aB S ASA | aB | a
AB|S AB|S|
Bb| Bb

Before removing B After removing B

7/21/2017 10:52 AM CS 120 SemesterII-2013 94


Example 2..
After that, we remove A
S0 S S0 S
S ASA | aB | a S ASA | aB | a |
AB|S| SA | AS | S
Bb AB|S
Bb
Before removing After removing
A A
7/21/2017 10:52 AM CS 120 SemesterII-2013 95
Example 2..

Then, we remove S S and S0 S

S0 S S0 ASA | aB | a |
S ASA | aB | a | SA | AS
SA | AS S ASA | aB | a |
AB|S SA | AS
Bb AB|S
Bb
After removing S S After removing S0 S

7/21/2017 10:52 AM CS 120 SemesterII-2013 96


Example 2..

Then, we remove A B

S0 ASA | aB | a | S0 ASA | aB | a |
SA | AS SA | AS
S ASA | aB | a | S ASA | aB | a |
SA | AS SA | AS
AB|S Ab|S
Bb Bb

Before removing After removing


AB AB
7/21/2017 10:52 AM CS 120 SemesterII-2013 97
Example 2..
Then, we remove A S

S0 ASA | aB | a | S0 ASA | aB | a |
SA | AS SA | AS
S ASA | aB | a | S ASA | aB | a |
SA | AS SA | AS
Ab|S A b | ASA | aB |
a | SA | AS
Bb
Bb

Before removing After removing


AS AS
7/21/2017 10:52 AM CS 120 SemesterII-2013 98
Example 2...
Then, we apply Step 4
S0 AA1 | UB | a | SA |
S0 ASA | aB | a |
AS
SA | AS
S AA1 | UB | a | SA | AS
S ASA | aB | a |
A b | AA1 | UB | a | SA |
SA | AS
AS
A b | ASA | aB |
Bb
a | SA | AS
A1 SA
Bb
Ua After Step 4 Grammar
Before Step 4
is in CNF

7/21/2017 10:52 AM CS 120 SemesterII-2013 99