Context Free Grammar

c
Chuen-Liang Chen, NTUCS&IE / #

CONTEXT-FREE GRAMMARS
Chuen-Liang Chen
Department of Computer Science
and Information Engineering
National Taiwan University
Taipei, TAIWAN
c
Parsing
function: checking syntactically validity of the input string
producing structure of the corresponding parse tree
callee: scanner (when need a token)
semantic routine (when match a production rule)
theoretical basis: context-free grammar
executor: parser, syntax analyzer
4 top-down parsing
beginning at the start symbol, expanding nonterminals in depth-
first manner (predictive in nature)
left-most derivation
pre-order traversal of parse tree
e.g. LL(k) [read from Left; Left-most derivation; k lookaheads],
recursive descent parsing
4 bottom-up parsing
beginning from terminal string, determining the production used
to generate leaves
right-most derivation in reverse order
post-order traversal of parse tree
e.g. LR(k) [read from Left; Right-most derivation; k lookaheads]
c
Definitions about context-free grammar (1/2)
context-free grammar -- G = (V
t
, V
n
, S, P)
4 V
t
-- set of terminal symbols
4 V
n
-- set of nonterminal symbols
a, b, c, ... V
t
A, B, C, ... V
n
U, V, W, ... V = V
t
V
n

u, v, w, ... V
t
* a, b, g, ... V*
4 S -- start symbol, goal symbol; S V
n

4 P -- set of production rules of the form : A a
derivation by production rule A g
4 one step derivation : a A b a g b
4 left-most derivation : u A b
lm
u g b
4 right-most derivation : a A v
rm
a g v
4 one or more steps derivation :
+

+
lm

+
rm

4 zero or more steps derivation : * *
lm
*
rm

c
set of sentential forms -- SF(G) = { b | S * b }
4 left-most sentential form -- the b so that S *
lm
b
4 right-most sentential form -- the b so that S *
rm
b
context-free language -- L(G) = SF(G) V
t
*
parse tree, derivation tree --
4 graphic representation of derivations
4 root -- start symbol
4 leaf nodes -- grammar symbols or l
4 interior nodes -- nonterminals
4 offspring of a nonterminal -- a production
for a given sentential form --
4 phrase -- a sequence of symbols derived from a single nonterminal
4 simple phrase, prime phrase -- minimal phrase
4 handle -- left-most simple phrase
c
Example of context-free grammar
grammar G
0
--

E Prefix ( E ) | V Tail
Prefix F | l
Tail + E | l
left-most derivation -- right-most derivation --

E
lm
Prefix ( E ) E
rm
Prefix ( E )

lm
F ( E )
rm
Prefix ( V Tail )

lm
F ( V Tail )
rm
Prefix ( V + E )

lm
F ( V + E )
rm
Prefix ( V + V Tail )

lm
F ( V + V Tail )
rm
Prefix ( V + V )

lm
F ( V + V )
rm
F ( V + V )
right-most sentential forms -- 1. E 2. Prefix ( E ) 3. Prefix ( V Tail )
4. Prefix ( V + E ) 5. Prefix ( V + V Tail ) 6. Prefix ( V + V ) 7. F ( V + V )
8. and so on
L(G
0
) { F ( V + V ) }
c
parse trees of left-most derivations
4 blue symbols : left-most sentential forms
Example of left-most derivation
Tail
E
Prefix ( E )
F V Tail
+ E
V
l
E
Prefix ( E )
E
Prefix ( E )
F V Tail
E
Prefix ( E )
F V Tail
+ E Tail
E
Prefix ( E )
F V Tail
+ E
V
E E
Prefix ( E )
F
c
Parsing
4 top-down parsing
4 bottom-up parsing
to generate leaves
c
trace of top-down parsing (left-most derivation)
4 orange : just derived (predicted) blue : just read (matched)
black : derived or read green : un-processed (parse stack)
Example of top-down parsing
Tail
E
Prefix ( E )
F V Tail
+ E
V
l
E
Prefix ( E )
E
Prefix ( E )
F V Tail
E
Prefix ( E )
F V Tail
+ E Tail
E
Prefix ( E )
F V Tail
+ E
V
E E
Prefix ( E )
F
c
set of sentential forms -- SF(G) = { b | S * b }
4 left-most sentential form -- the b so that S *
lm
b
4 right-most sentential form -- the b so that S *
rm
b
context-free language -- L(G) = SF(G) V
t
*
parse tree, derivation tree --
4 graphic representation of derivations
4 root -- start symbol
4 leaf nodes -- grammar symbols or l
4 interior nodes -- nonterminals
4 offspring of a nonterminal -- a production
for a given sentential form --
4 phrase -- a sequence of symbols derived from a single nonterminal
4 simple phrase, prime phrase -- minimal phrase
4 handle -- left-most simple phrase
c
Example of right-most derivation (1/2)
parse trees of right-most derivations and corresponding sentential
form, phrases, simple phrases, handle
4 blue symbols : sentential form
4 : phrase
4 : simple phrase
4 : handle

E
Prefix ( E )
E
Prefix ( E )
V Tail
E
Prefix ( E )
V Tail
+ E
E
Prefix ( V + E ) Prefix ( V Tail ) E Prefix ( E )
c
Example of right-most derivation (2/2)
E
Prefix ( E )
F V Tail
+ E
V Tail
l
E
Prefix ( E )
V Tail
+ E
V Tail
E
Prefix ( E )
V Tail
+ E
V Tail
l
Prefix ( V + V Tail ) Prefix ( V + V l ) F ( V + V l )
c
Parsing
4 top-down parsing
4 bottom-up parsing
to generate leaves
c
trace of bottom-up parsing (inverse order of right-most derivation)
4 blue : just read (shifted) orange : just derived (reduced to)
pink : not read green : derived or read (parse stack)
Example of bottom-up parsing
( ) F V + V l
Prefix ( )
F
V + E
V Tail
l
Prefix ( )
F
V + V l
Prefix ( )
F
V + V Tail
l
Prefix ( )
F
V Tail
+ E
V Tail
l
Prefix ( E )
F V Tail
+ E
V Tail
l
E
Prefix ( E )
F V Tail
+ E
V Tail
l
c
Examples -
example 1
4

4 lookahead is unnecessary
example 2
4 service
|
service | (l)
4 lookahed is required
c
Ambiguity of grammar
a string with two different parse trees (i.e., two different structures)
example : <exp> <exp> - <exp>
<exp> id
for an unambiguous grammar, parse trees of leftmost derivation and
right-most derivation are the same
<exp> <exp>
<exp> <exp>
<exp>
id
-
-
id
id
<exp> <exp>
<exp> <exp>
<exp>
id
-
-
id
id
c
First set and Follow set (1/2)
First(a) = { a V
t
| a * a b } ( if a * l then {l} else )
4 set of all terminals that can begin a sentential form derived from a
4 First
k
(a) -- set of k-symbol terminal strings that can begin a
sentential form derived from a
4 QUIZ: for what?
Follow(A) = { a V
t
| S
+
a A a b } ( if S
+
a A then {l} else )
4 set of all terminals that may follow A in some sentential form
4 Follow
k
(A) -- set of k-symbol terminal strings that may follow A in
some sentential form
4 QUIZ: for what?
c
First set and Follow set (2/2)
example 1 --
E Prefix ( E )
E V Tail
Prefix F | l
Tail + E | l
example 2 --
S a S e | B
B b B e | C
C c C e | d
example 3 --
S A B c
A a | l
B b | l
S B C
First_set { a, b, c, d } { b, c, d } { c, d }
Follow_set { e, l } { e, l } { e, l }
S A B
First_set { a, b, c } { a, l } { b, l }
Follow_set { l } { b, c } { c }
E Prefix Tail
First_set { V, F, ( } { F, l } { +, l }
Follow_set { l, ) } { ( } { l, ) }
c
Algorithms for First & Follow sets (1/6)
typedef int symbol;
/* a symbol in the grammar */

/* The symbolic constants used
* below, NUM_TERMINALS,
* NUM_NONTERMINALS, and
* NUM_PRODUCTIONS are
* determined by the grammar.
* MAX_RHS_LENGTH should
* simply be "big enough."
*/

#define VOCABULARY
(NUM_NONTERMINALS +
NUM_TERMINALS)
typedef struct gram {
symbol terminals[NUM_TERMINALS];
symbol nonterminals[NUM_NONTERMINALS];
symbol start_symbol;
int num_productions;
struct prod {
symbol lhs;
int rhs_length;
symbol rhs[MAX_RHS_LENGTH];
} productions[NUM_PRODUCTIONS];
symbol vocabulary[VOCABULARY];
} grammar;

typedef struct prod production;

typedef symbol terminal;
typedef symbol nonterminal;
c
typedef short boolean;
typedef boolean marked_vocabulary[VOCABULARY];
/*
* Mark those vocabulary symbols found to derive l (directly or indirectly).
*/
marked_vocabulary mark_lambda(const grammar g)
{
static marked_vocabulary derives_lambda;
boolean changes; /* any changes during last iteration? */
boolean rhs_derives_lambda; /* does the RHS derive l? */
symbol v; /* a word in the vocabulary */
production p; /* a production in the grammar */
int i, j; /* loop variables */

for (v = 0; v < VOCABULARY; v++)
derives_lambda[v] = FALSE;
/* initially, nothing is marked */
c
do {
changes = FALSE;
for (i = 0; i < g.num_productions; i++) {
p = g.productions[i];
if (! derives_lambda[p.lhs]) {
if (p.rhs_length == 0) {
/* derives l directly */
changes = derives_lambda[p.lhs] = TRUE;
continue;
}
/* does each part of RHS derive l? */
rhs_derives_lambda = derives_lambda[p.rhs[0]];
for (j = 1; j < p.rhs_length, j++)
rhs_derives_lambda = rhs_derives_lambda && derives_lambda[p.rhs[j]];
if (rhs_derives_lambda)
changes = derives_lambda[p.lhs] = TRUE;
}
}
} while (changes);
return derives_lambda;
}
c
typedef set_of_terminal_or_lambda termset;
termset follow_set[NUM_NONTERMINAL];
termset first_set[SYMBOL];
marked_vocabulary derives_lambda = mark_lambda(g);
/* mark_lambda(g) as defined above */
termset compute_first(string_of_symbols alpha)
{
int i, k;
termset result;
k = length(alpha);
if (k == 0)
result = SET_OF( l );
else {
result = first_set[alpha[0]] - SET_OF( l ) ;
for (i = 1; i < k && l first_set[alpha[i-1] ]; i++)
result = result ( first_set[alpha[i]] - SET_OF( l ) );
if (i == k && l first_set[alpha[k - 1]])
result = result SET_OF( l );
}
return result;
}
c
extern grammar g;

void fill_first_set(void)
{
nonterminal A;
terminal a;
production p;
boolean changes;
int i, j;

for (i = 0; i < NUM_NONTERMINAL;
i++) {
A = g.nonterminals[i];
if (derives_lambda[A])
first_set[A] = SET_OF( l );
else
first_set[A] = ;
}

for (i = 0; i < NUM_TERMINAL; i++) {
a = g.terminals[i];
first_set[a] = SET_OF( a );
for (j = 0; j < NUM_NONTERMINAL; j++) {
A = g.nonterminals[j];
if (there exists a production Aab)
first_set[A] = first_set[A] SET_OF( a );
}
}
do {
changes = FALSE;
for (i = 0; i < g.num_productions; i++) {
p = g.productions[i];
first_set[p.lhs] = first_set[p.lhs]
compute_first(p.rhs);
if ( first_set changed )
changes = TRUE;
}
} while (changes);
}
QUIZ: termination?
QUIZ: correctness?
c
void fill_follow_set(void)
{
nonterminal A, B;
int i;
boolean changes;

for (i = 0; i < NUM_NONTERMINAL; i++) {
A = g.nonterminals[i];
follow_set[A] = ;
}
follow_set[g.start_symbol] = SET_OF( l );

do {
changes = FALSE;
for (each production A a B b ) {
/*
* I.e. for each production and each
* occurrence of a nonterminal in its
* right-hand side.
*/
follow_set[B] = follow_set[B]
(compute_first(b) - SET_OF( l ));
if ( l compute_first(b) )
follow_set[B] = follow_set[B]
follow_set[A];
if ( follow_set[B] changed )
changes = TRUE;
}
} while (changes);
}
QUIZ: termination?
QUIZ: correctness?
c
Tracing examples
example 1 --
E Prefix
C
( E
C
)O
E V Tail
C
O
Prefix FO| lO
Tail + E
C
O | lO
example 2 --
S a S
C
eO | B
C
OO
B b B
C
eO| C
C
O
C c C
C
eO | dO
example 3 --
S A
C
B
C
cO
A aO | lO
B bO | lO
S A B
First_set { a, b, c } { a, l } { b, l }
Follow_set { l } { b, c } { c }
C C C
O O O O O O O
S B C
First_set { a, b, c, d } { b, c, d } { c, d }
Follow_set { l, e } { e, l } { e, l }
C C C C C
O O O O O
C C
O O O O
E Prefix Tail
First_set { V, F, ( } { F, l } { +, l }
Follow_set { l, ) } { ( } { l, ) }
C C C C
O O
C C
O O O O O
c
From extended BNF to CFG
<statement list> <statement> { <statement> }
+
<statement list> <statement> <statement tail>
<statement tail> <statement> <statement tail>
<statement tail> l
QUIZ: how, systematically?
c
Other types of grammars
regular grammar -- A a B or C l
4 QUIZ: how?
context-free grammar -- A a
context-sensitive grammar -- a A b a d b
type 0 grammar -- a b

regular grammar : too simple, e.g., { [
i
]
i
| i 1 }
4 QUIZ: how to specify { [
i
]
i
| i 1 } by context-free grammar?
context-sensitive, type 0 : without sufficient parser
context-free grammar : a balance between generality and practicality

Context Free Grammar

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Context Free Grammar

Загружено:

Авторское право:

Доступные форматы

c

Chuen-Liang Chen, NTUCS&IE / #

Вам также может понравиться